* [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH @ 2021-02-02 13:50 Brendan Jackman 2021-02-03 17:07 ` Yonghong Song 2021-06-27 15:34 ` [BUG soft lockup] " Jiri Olsa 0 siblings, 2 replies; 16+ messages in thread From: Brendan Jackman @ 2021-02-02 13:50 UTC (permalink / raw) To: bpf Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, linux-kernel, Brendan Jackman When BPF_FETCH is set, atomic instructions load a value from memory into a register. The current verifier code first checks via check_mem_access whether we can access the memory, and then checks via check_reg_arg whether we can write into the register. For loads, check_reg_arg has the side-effect of marking the register's value as unkonwn, and check_mem_access has the side effect of propagating bounds from memory to the register. This currently only takes effect for stack memory. Therefore with the current order, bounds information is thrown away, but by simply reversing the order of check_reg_arg vs. check_mem_access, we can instead propagate bounds smartly. A simple test is added with an infinite loop that can only be proved unreachable if this propagation is present. This is implemented both with C and directly in test_verifier using assembly. Suggested-by: John Fastabend <john.fastabend@gmail.com> Signed-off-by: Brendan Jackman <jackmanb@google.com> --- Difference from v2->v3 [1]: * Fixed missing ENABLE_ATOMICS_TESTS check. Difference from v1->v2: * Reworked commit message to clarify this only affects stack memory * Added the Suggested-by * Added a C-based test. [1]: https://lore.kernel.org/bpf/CA+i-1C2ZWUbGxWJ8kAxbri9rBboyuMaVj_BBhg+2Zf_Su9BOJA@mail.gmail.com/T/#t kernel/bpf/verifier.c | 32 +++++++++++-------- .../selftests/bpf/prog_tests/atomic_bounds.c | 15 +++++++++ .../selftests/bpf/progs/atomic_bounds.c | 24 ++++++++++++++ .../selftests/bpf/verifier/atomic_bounds.c | 27 ++++++++++++++++ 4 files changed, 84 insertions(+), 14 deletions(-) create mode 100644 tools/testing/selftests/bpf/prog_tests/atomic_bounds.c create mode 100644 tools/testing/selftests/bpf/progs/atomic_bounds.c create mode 100644 tools/testing/selftests/bpf/verifier/atomic_bounds.c diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 972fc38eb62d..5e09632efddb 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3665,9 +3665,26 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i return -EACCES; } + if (insn->imm & BPF_FETCH) { + if (insn->imm == BPF_CMPXCHG) + load_reg = BPF_REG_0; + else + load_reg = insn->src_reg; + + /* check and record load of old value */ + err = check_reg_arg(env, load_reg, DST_OP); + if (err) + return err; + } else { + /* This instruction accesses a memory location but doesn't + * actually load it into a register. + */ + load_reg = -1; + } + /* check whether we can read the memory */ err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, - BPF_SIZE(insn->code), BPF_READ, -1, true); + BPF_SIZE(insn->code), BPF_READ, load_reg, true); if (err) return err; @@ -3677,19 +3694,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i if (err) return err; - if (!(insn->imm & BPF_FETCH)) - return 0; - - if (insn->imm == BPF_CMPXCHG) - load_reg = BPF_REG_0; - else - load_reg = insn->src_reg; - - /* check and record load of old value */ - err = check_reg_arg(env, load_reg, DST_OP); - if (err) - return err; - return 0; } diff --git a/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c new file mode 100644 index 000000000000..addf127068e4 --- /dev/null +++ b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <test_progs.h> + +#include "atomic_bounds.skel.h" + +void test_atomic_bounds(void) +{ + struct atomic_bounds *skel; + __u32 duration = 0; + + skel = atomic_bounds__open_and_load(); + if (CHECK(!skel, "skel_load", "couldn't load program\n")) + return; +} diff --git a/tools/testing/selftests/bpf/progs/atomic_bounds.c b/tools/testing/selftests/bpf/progs/atomic_bounds.c new file mode 100644 index 000000000000..e5fff7fc7f8f --- /dev/null +++ b/tools/testing/selftests/bpf/progs/atomic_bounds.c @@ -0,0 +1,24 @@ +// SPDX-License-Identifier: GPL-2.0 +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include <bpf/bpf_tracing.h> +#include <stdbool.h> + +#ifdef ENABLE_ATOMICS_TESTS +bool skip_tests __attribute((__section__(".data"))) = false; +#else +bool skip_tests = true; +#endif + +SEC("fentry/bpf_fentry_test1") +int BPF_PROG(sub, int x) +{ +#ifdef ENABLE_ATOMICS_TESTS + int a = 0; + int b = __sync_fetch_and_add(&a, 1); + /* b is certainly 0 here. Can the verifier tell? */ + while (b) + continue; +#endif + return 0; +} diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c new file mode 100644 index 000000000000..e82183e4914f --- /dev/null +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c @@ -0,0 +1,27 @@ +{ + "BPF_ATOMIC bounds propagation, mem->reg", + .insns = { + /* a = 0; */ + /* + * Note this is implemented with two separate instructions, + * where you might think one would suffice: + * + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), + * + * This is because BPF_ST_MEM doesn't seem to set the stack slot + * type to 0 when storing an immediate. + */ + BPF_MOV64_IMM(BPF_REG_0, 0), + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), + /* b = atomic_fetch_add(&a, 1); */ + BPF_MOV64_IMM(BPF_REG_1, 1), + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8), + /* Verifier should be able to tell that this infinite loop isn't reachable. */ + /* if (b) while (true) continue; */ + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1), + BPF_EXIT_INSN(), + }, + .result = ACCEPT, + .result_unpriv = REJECT, + .errstr_unpriv = "back-edge", +}, base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439 -- 2.30.0.365.g02bc693789-goog ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-02-02 13:50 [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH Brendan Jackman @ 2021-02-03 17:07 ` Yonghong Song 2021-02-03 17:37 ` Alexei Starovoitov 2021-06-27 15:34 ` [BUG soft lockup] " Jiri Olsa 1 sibling, 1 reply; 16+ messages in thread From: Yonghong Song @ 2021-02-03 17:07 UTC (permalink / raw) To: Brendan Jackman, bpf Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, linux-kernel On 2/2/21 5:50 AM, Brendan Jackman wrote: > When BPF_FETCH is set, atomic instructions load a value from memory > into a register. The current verifier code first checks via > check_mem_access whether we can access the memory, and then checks > via check_reg_arg whether we can write into the register. > > For loads, check_reg_arg has the side-effect of marking the > register's value as unkonwn, and check_mem_access has the side effect > of propagating bounds from memory to the register. This currently only > takes effect for stack memory. > > Therefore with the current order, bounds information is thrown away, > but by simply reversing the order of check_reg_arg > vs. check_mem_access, we can instead propagate bounds smartly. > > A simple test is added with an infinite loop that can only be proved > unreachable if this propagation is present. This is implemented both > with C and directly in test_verifier using assembly. > > Suggested-by: John Fastabend <john.fastabend@gmail.com> > Signed-off-by: Brendan Jackman <jackmanb@google.com> Ack with a nit below. Acked-by: Yonghong Song <yhs@fb.com> > --- > > Difference from v2->v3 [1]: > > * Fixed missing ENABLE_ATOMICS_TESTS check. > > Difference from v1->v2: > > * Reworked commit message to clarify this only affects stack memory > * Added the Suggested-by > * Added a C-based test. > > [1]: https://lore.kernel.org/bpf/CA+i-1C2ZWUbGxWJ8kAxbri9rBboyuMaVj_BBhg+2Zf_Su9BOJA@mail.gmail.com/T/#t > > kernel/bpf/verifier.c | 32 +++++++++++-------- > .../selftests/bpf/prog_tests/atomic_bounds.c | 15 +++++++++ > .../selftests/bpf/progs/atomic_bounds.c | 24 ++++++++++++++ > .../selftests/bpf/verifier/atomic_bounds.c | 27 ++++++++++++++++ > 4 files changed, 84 insertions(+), 14 deletions(-) > create mode 100644 tools/testing/selftests/bpf/prog_tests/atomic_bounds.c > create mode 100644 tools/testing/selftests/bpf/progs/atomic_bounds.c > create mode 100644 tools/testing/selftests/bpf/verifier/atomic_bounds.c > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 972fc38eb62d..5e09632efddb 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -3665,9 +3665,26 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i > return -EACCES; > } > > + if (insn->imm & BPF_FETCH) { > + if (insn->imm == BPF_CMPXCHG) > + load_reg = BPF_REG_0; > + else > + load_reg = insn->src_reg; > + > + /* check and record load of old value */ > + err = check_reg_arg(env, load_reg, DST_OP); > + if (err) > + return err; > + } else { > + /* This instruction accesses a memory location but doesn't > + * actually load it into a register. > + */ > + load_reg = -1; > + } > + > /* check whether we can read the memory */ > err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off, > - BPF_SIZE(insn->code), BPF_READ, -1, true); > + BPF_SIZE(insn->code), BPF_READ, load_reg, true); > if (err) > return err; > > @@ -3677,19 +3694,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i > if (err) > return err; > > - if (!(insn->imm & BPF_FETCH)) > - return 0; > - > - if (insn->imm == BPF_CMPXCHG) > - load_reg = BPF_REG_0; > - else > - load_reg = insn->src_reg; > - > - /* check and record load of old value */ > - err = check_reg_arg(env, load_reg, DST_OP); > - if (err) > - return err; > - > return 0; > } > > diff --git a/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c > new file mode 100644 > index 000000000000..addf127068e4 > --- /dev/null > +++ b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c > @@ -0,0 +1,15 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +#include <test_progs.h> > + > +#include "atomic_bounds.skel.h" > + > +void test_atomic_bounds(void) > +{ > + struct atomic_bounds *skel; > + __u32 duration = 0; > + > + skel = atomic_bounds__open_and_load(); > + if (CHECK(!skel, "skel_load", "couldn't load program\n")) > + return; You are missing atomic_bounds__destroy(skel); here. > +} > diff --git a/tools/testing/selftests/bpf/progs/atomic_bounds.c b/tools/testing/selftests/bpf/progs/atomic_bounds.c > new file mode 100644 > index 000000000000..e5fff7fc7f8f > --- /dev/null > +++ b/tools/testing/selftests/bpf/progs/atomic_bounds.c > @@ -0,0 +1,24 @@ > +// SPDX-License-Identifier: GPL-2.0 > +#include <linux/bpf.h> > +#include <bpf/bpf_helpers.h> > +#include <bpf/bpf_tracing.h> > +#include <stdbool.h> > + > +#ifdef ENABLE_ATOMICS_TESTS > +bool skip_tests __attribute((__section__(".data"))) = false; > +#else > +bool skip_tests = true; > +#endif > + > +SEC("fentry/bpf_fentry_test1") > +int BPF_PROG(sub, int x) > +{ > +#ifdef ENABLE_ATOMICS_TESTS > + int a = 0; > + int b = __sync_fetch_and_add(&a, 1); > + /* b is certainly 0 here. Can the verifier tell? */ > + while (b) > + continue; > +#endif > + return 0; > +} [...] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-02-03 17:07 ` Yonghong Song @ 2021-02-03 17:37 ` Alexei Starovoitov 0 siblings, 0 replies; 16+ messages in thread From: Alexei Starovoitov @ 2021-02-03 17:37 UTC (permalink / raw) To: Yonghong Song Cc: Brendan Jackman, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML On Wed, Feb 3, 2021 at 9:07 AM Yonghong Song <yhs@fb.com> wrote: > > > > On 2/2/21 5:50 AM, Brendan Jackman wrote: > > When BPF_FETCH is set, atomic instructions load a value from memory > > into a register. The current verifier code first checks via > > check_mem_access whether we can access the memory, and then checks > > via check_reg_arg whether we can write into the register. > > > > For loads, check_reg_arg has the side-effect of marking the > > register's value as unkonwn, and check_mem_access has the side effect > > of propagating bounds from memory to the register. This currently only > > takes effect for stack memory. > > > > Therefore with the current order, bounds information is thrown away, > > but by simply reversing the order of check_reg_arg > > vs. check_mem_access, we can instead propagate bounds smartly. > > > > A simple test is added with an infinite loop that can only be proved > > unreachable if this propagation is present. This is implemented both > > with C and directly in test_verifier using assembly. > > > > Suggested-by: John Fastabend <john.fastabend@gmail.com> > > Signed-off-by: Brendan Jackman <jackmanb@google.com> > > Ack with a nit below. Sorry it was already applied yesterday. patchbot just didn't send auto-reply. ^ permalink raw reply [flat|nested] 16+ messages in thread
* [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-02-02 13:50 [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH Brendan Jackman 2021-02-03 17:07 ` Yonghong Song @ 2021-06-27 15:34 ` Jiri Olsa 2021-06-28 9:21 ` Brendan Jackman 1 sibling, 1 reply; 16+ messages in thread From: Jiri Olsa @ 2021-06-27 15:34 UTC (permalink / raw) To: Brendan Jackman Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, linux-kernel, Naveen N. Rao, Sandipan Das On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote: SNIP > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c > new file mode 100644 > index 000000000000..e82183e4914f > --- /dev/null > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c > @@ -0,0 +1,27 @@ > +{ > + "BPF_ATOMIC bounds propagation, mem->reg", > + .insns = { > + /* a = 0; */ > + /* > + * Note this is implemented with two separate instructions, > + * where you might think one would suffice: > + * > + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), > + * > + * This is because BPF_ST_MEM doesn't seem to set the stack slot > + * type to 0 when storing an immediate. > + */ > + BPF_MOV64_IMM(BPF_REG_0, 0), > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), > + /* b = atomic_fetch_add(&a, 1); */ > + BPF_MOV64_IMM(BPF_REG_1, 1), > + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8), > + /* Verifier should be able to tell that this infinite loop isn't reachable. */ > + /* if (b) while (true) continue; */ > + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1), > + BPF_EXIT_INSN(), > + }, > + .result = ACCEPT, > + .result_unpriv = REJECT, > + .errstr_unpriv = "back-edge", > +}, > > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439 > -- > 2.30.0.365.g02bc693789-goog > hi, I tracked soft lock up on powerpc to this test: [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25 #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP #25/p BPF_ATOMIC bounds propagation, mem->reg Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ... kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055] Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ... kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055] please check the console output below.. it looks like the verifier allowed the loop to happen for some reason on powerpc.. any idea? I'm on latest bpf-next/master, I can send the config if needed thanks, jirka --- ibm-p9z-07-lp1 login: [ 184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055] [ 184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [ 184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G E 5.13.0-rc3+ #3 [ 184.108728] NIP: c00800000131314c LR: c000000000c56918 CTR: c008000001313118 [ 184.108733] REGS: c0000000119ef820 TRAP: 0900 Tainted: G E (5.13.0-rc3+) [ 184.108739] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040003 [ 184.108752] CFAR: c008000001313150 IRQMASK: 0 [ 184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001 [ 184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008 [ 184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937 [ 184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000 [ 184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8 [ 184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048 [ 184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638 [ 184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0 [ 184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8 [ 184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470 [ 184.108826] Call Trace: [ 184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable) [ 184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470 [ 184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660 [ 184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60 [ 184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40 [ 184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280 [ 184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268 [ 184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24 [ 184.108878] NIP: 00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000 [ 184.108883] REGS: c0000000119efe80 TRAP: 3000 Tainted: G E (5.13.0-rc3+) [ 184.108887] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000848 XER: 00000000 [ 184.108903] IRQMASK: 0 [ 184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a [ 184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008 [ 184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000 [ 184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000 [ 184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 [ 184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0 [ 184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710 [ 184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24 [ 184.108964] LR [0000000000000000] 0x0 [ 184.108967] --- interrupt: 3000 [ 184.108970] Instruction dump: [ 184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 [ 184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068 ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-06-27 15:34 ` [BUG soft lockup] " Jiri Olsa @ 2021-06-28 9:21 ` Brendan Jackman 2021-06-29 14:10 ` Jiri Olsa 0 siblings, 1 reply; 16+ messages in thread From: Brendan Jackman @ 2021-06-28 9:21 UTC (permalink / raw) To: Jiri Olsa Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao, Sandipan Das On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote: > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote: > > SNIP > > > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c > > new file mode 100644 > > index 000000000000..e82183e4914f > > --- /dev/null > > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c > > @@ -0,0 +1,27 @@ > > +{ > > + "BPF_ATOMIC bounds propagation, mem->reg", > > + .insns = { > > + /* a = 0; */ > > + /* > > + * Note this is implemented with two separate instructions, > > + * where you might think one would suffice: > > + * > > + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), > > + * > > + * This is because BPF_ST_MEM doesn't seem to set the stack slot > > + * type to 0 when storing an immediate. > > + */ > > + BPF_MOV64_IMM(BPF_REG_0, 0), > > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), > > + /* b = atomic_fetch_add(&a, 1); */ > > + BPF_MOV64_IMM(BPF_REG_1, 1), > > + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8), > > + /* Verifier should be able to tell that this infinite loop isn't reachable. */ > > + /* if (b) while (true) continue; */ > > + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1), > > + BPF_EXIT_INSN(), > > + }, > > + .result = ACCEPT, > > + .result_unpriv = REJECT, > > + .errstr_unpriv = "back-edge", > > +}, > > > > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439 > > -- > > 2.30.0.365.g02bc693789-goog > > > > hi, > I tracked soft lock up on powerpc to this test: > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25 > #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP > #25/p BPF_ATOMIC bounds propagation, mem->reg > > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ... > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055] > > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ... > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055] > > please check the console output below.. it looks like the verifier > allowed the loop to happen for some reason on powerpc.. any idea? > > I'm on latest bpf-next/master, I can send the config if needed > > thanks, > jirka > > > --- > ibm-p9z-07-lp1 login: [ 184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055] > [ 184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > [ 184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G E 5.13.0-rc3+ #3 > [ 184.108728] NIP: c00800000131314c LR: c000000000c56918 CTR: c008000001313118 > [ 184.108733] REGS: c0000000119ef820 TRAP: 0900 Tainted: G E (5.13.0-rc3+) > [ 184.108739] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040003 > [ 184.108752] CFAR: c008000001313150 IRQMASK: 0 > [ 184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001 > [ 184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008 > [ 184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937 > [ 184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000 > [ 184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8 > [ 184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048 > [ 184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638 > [ 184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0 > [ 184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8 > [ 184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470 > [ 184.108826] Call Trace: > [ 184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable) > [ 184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470 > [ 184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660 > [ 184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60 > [ 184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40 > [ 184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280 > [ 184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268 > [ 184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24 > [ 184.108878] NIP: 00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000 > [ 184.108883] REGS: c0000000119efe80 TRAP: 3000 Tainted: G E (5.13.0-rc3+) > [ 184.108887] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000848 XER: 00000000 > [ 184.108903] IRQMASK: 0 > [ 184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a > [ 184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008 > [ 184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000 > [ 184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000 > [ 184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > [ 184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0 > [ 184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710 > [ 184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24 > [ 184.108964] LR [0000000000000000] 0x0 > [ 184.108967] --- interrupt: 3000 > [ 184.108970] Instruction dump: > [ 184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 > [ 184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068 Hmm, is the test prog from atomic_bounds.c getting JITed there (my dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d) unsupported\n' in dmesg instead. I wonder if I missed something in commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other atomics in .imm). Any idea if this test was ever passing on PowerPC? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-06-28 9:21 ` Brendan Jackman @ 2021-06-29 14:10 ` Jiri Olsa 2021-06-29 16:04 ` Jiri Olsa 0 siblings, 1 reply; 16+ messages in thread From: Jiri Olsa @ 2021-06-29 14:10 UTC (permalink / raw) To: Brendan Jackman Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao, Sandipan Das On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote: > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote: > > > > SNIP > > > > > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c > > > new file mode 100644 > > > index 000000000000..e82183e4914f > > > --- /dev/null > > > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c > > > @@ -0,0 +1,27 @@ > > > +{ > > > + "BPF_ATOMIC bounds propagation, mem->reg", > > > + .insns = { > > > + /* a = 0; */ > > > + /* > > > + * Note this is implemented with two separate instructions, > > > + * where you might think one would suffice: > > > + * > > > + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), > > > + * > > > + * This is because BPF_ST_MEM doesn't seem to set the stack slot > > > + * type to 0 when storing an immediate. > > > + */ > > > + BPF_MOV64_IMM(BPF_REG_0, 0), > > > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), > > > + /* b = atomic_fetch_add(&a, 1); */ > > > + BPF_MOV64_IMM(BPF_REG_1, 1), > > > + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8), > > > + /* Verifier should be able to tell that this infinite loop isn't reachable. */ > > > + /* if (b) while (true) continue; */ > > > + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1), > > > + BPF_EXIT_INSN(), > > > + }, > > > + .result = ACCEPT, > > > + .result_unpriv = REJECT, > > > + .errstr_unpriv = "back-edge", > > > +}, > > > > > > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439 > > > -- > > > 2.30.0.365.g02bc693789-goog > > > > > > > hi, > > I tracked soft lock up on powerpc to this test: > > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25 > > #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP > > #25/p BPF_ATOMIC bounds propagation, mem->reg > > > > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ... > > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055] > > > > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ... > > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055] > > > > please check the console output below.. it looks like the verifier > > allowed the loop to happen for some reason on powerpc.. any idea? > > > > I'm on latest bpf-next/master, I can send the config if needed > > > > thanks, > > jirka > > > > > > --- > > ibm-p9z-07-lp1 login: [ 184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055] > > [ 184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > > [ 184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G E 5.13.0-rc3+ #3 > > [ 184.108728] NIP: c00800000131314c LR: c000000000c56918 CTR: c008000001313118 > > [ 184.108733] REGS: c0000000119ef820 TRAP: 0900 Tainted: G E (5.13.0-rc3+) > > [ 184.108739] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040003 > > [ 184.108752] CFAR: c008000001313150 IRQMASK: 0 > > [ 184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001 > > [ 184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008 > > [ 184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937 > > [ 184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000 > > [ 184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8 > > [ 184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048 > > [ 184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638 > > [ 184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0 > > [ 184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8 > > [ 184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470 > > [ 184.108826] Call Trace: > > [ 184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable) > > [ 184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470 > > [ 184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660 > > [ 184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60 > > [ 184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40 > > [ 184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280 > > [ 184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268 > > [ 184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24 > > [ 184.108878] NIP: 00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000 > > [ 184.108883] REGS: c0000000119efe80 TRAP: 3000 Tainted: G E (5.13.0-rc3+) > > [ 184.108887] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000848 XER: 00000000 > > [ 184.108903] IRQMASK: 0 > > [ 184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a > > [ 184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008 > > [ 184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000 > > [ 184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000 > > [ 184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > [ 184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > [ 184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0 > > [ 184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710 > > [ 184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24 > > [ 184.108964] LR [0000000000000000] 0x0 > > [ 184.108967] --- interrupt: 3000 > > [ 184.108970] Instruction dump: > > [ 184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 > > [ 184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068 > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d) > unsupported\n' in dmesg instead. I wonder if I missed something in > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other > atomics in .imm). Any idea if this test was ever passing on PowerPC? > hum, I guess not.. will check jirka ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-06-29 14:10 ` Jiri Olsa @ 2021-06-29 16:04 ` Jiri Olsa 2021-06-29 16:25 ` Brendan Jackman 0 siblings, 1 reply; 16+ messages in thread From: Jiri Olsa @ 2021-06-29 16:04 UTC (permalink / raw) To: Brendan Jackman Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao, Sandipan Das On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote: > > > > > > SNIP > > > > > > > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c > > > > new file mode 100644 > > > > index 000000000000..e82183e4914f > > > > --- /dev/null > > > > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c > > > > @@ -0,0 +1,27 @@ > > > > +{ > > > > + "BPF_ATOMIC bounds propagation, mem->reg", > > > > + .insns = { > > > > + /* a = 0; */ > > > > + /* > > > > + * Note this is implemented with two separate instructions, > > > > + * where you might think one would suffice: > > > > + * > > > > + * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0), > > > > + * > > > > + * This is because BPF_ST_MEM doesn't seem to set the stack slot > > > > + * type to 0 when storing an immediate. > > > > + */ > > > > + BPF_MOV64_IMM(BPF_REG_0, 0), > > > > + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8), > > > > + /* b = atomic_fetch_add(&a, 1); */ > > > > + BPF_MOV64_IMM(BPF_REG_1, 1), > > > > + BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8), > > > > + /* Verifier should be able to tell that this infinite loop isn't reachable. */ > > > > + /* if (b) while (true) continue; */ > > > > + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1), > > > > + BPF_EXIT_INSN(), > > > > + }, > > > > + .result = ACCEPT, > > > > + .result_unpriv = REJECT, > > > > + .errstr_unpriv = "back-edge", > > > > +}, > > > > > > > > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439 > > > > -- > > > > 2.30.0.365.g02bc693789-goog > > > > > > > > > > hi, > > > I tracked soft lock up on powerpc to this test: > > > > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25 > > > #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP > > > #25/p BPF_ATOMIC bounds propagation, mem->reg > > > > > > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ... > > > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055] > > > > > > Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ... > > > kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055] > > > > > > please check the console output below.. it looks like the verifier > > > allowed the loop to happen for some reason on powerpc.. any idea? > > > > > > I'm on latest bpf-next/master, I can send the config if needed > > > > > > thanks, > > > jirka > > > > > > > > > --- > > > ibm-p9z-07-lp1 login: [ 184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055] > > > [ 184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) > > > [ 184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G E 5.13.0-rc3+ #3 > > > [ 184.108728] NIP: c00800000131314c LR: c000000000c56918 CTR: c008000001313118 > > > [ 184.108733] REGS: c0000000119ef820 TRAP: 0900 Tainted: G E (5.13.0-rc3+) > > > [ 184.108739] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040003 > > > [ 184.108752] CFAR: c008000001313150 IRQMASK: 0 > > > [ 184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001 > > > [ 184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008 > > > [ 184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937 > > > [ 184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000 > > > [ 184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8 > > > [ 184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048 > > > [ 184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638 > > > [ 184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0 > > > [ 184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8 > > > [ 184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470 > > > [ 184.108826] Call Trace: > > > [ 184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable) > > > [ 184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470 > > > [ 184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660 > > > [ 184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60 > > > [ 184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40 > > > [ 184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280 > > > [ 184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268 > > > [ 184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24 > > > [ 184.108878] NIP: 00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000 > > > [ 184.108883] REGS: c0000000119efe80 TRAP: 3000 Tainted: G E (5.13.0-rc3+) > > > [ 184.108887] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 28000848 XER: 00000000 > > > [ 184.108903] IRQMASK: 0 > > > [ 184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a > > > [ 184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008 > > > [ 184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000 > > > [ 184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000 > > > [ 184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > [ 184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 > > > [ 184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0 > > > [ 184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710 > > > [ 184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24 > > > [ 184.108964] LR [0000000000000000] 0x0 > > > [ 184.108967] --- interrupt: 3000 > > > [ 184.108970] Instruction dump: > > > [ 184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 > > > [ 184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068 > > > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d) > > unsupported\n' in dmesg instead. I wonder if I missed something in > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other I see that for all the other atomics tests: [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21 #21/p BPF_ATOMIC_AND without fetch FAIL Failed to load prog 'Unknown error 524'! verification time 32 usec stack depth 8 processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 Summary: 0 PASSED, 0 SKIPPED, 2 FAILED console: [ 51.850952] eBPF filter atomic op code db (@2) unsupported [ 51.851134] eBPF filter atomic op code db (@2) unsupported [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22 #22/u BPF_ATOMIC_AND with fetch FAIL Failed to load prog 'Unknown error 524'! verification time 38 usec stack depth 8 processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 #22/p BPF_ATOMIC_AND with fetch FAIL Failed to load prog 'Unknown error 524'! verification time 26 usec stack depth 8 processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 console: [ 223.231420] eBPF filter atomic op code db (@3) unsupported [ 223.231596] eBPF filter atomic op code db (@3) unsupported ... but no such console output for: [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24 #24/u BPF_ATOMIC bounds propagation, mem->reg OK > > atomics in .imm). Any idea if this test was ever passing on PowerPC? > > > > hum, I guess not.. will check nope, it locks up the same: ibm-p9z-07-lp1 login: [ 88.070644] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [test_verifier:950] [ 88.070674] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [ 88.070714] CPU: 4 PID: 950 Comm: test_verifier Tainted: G E 5.11.0-rc4+ #4 [ 88.070721] NIP: c0080000017daad8 LR: c000000000c26ea8 CTR: c0080000017daaa0 [ 88.070726] REGS: c000000011407870 TRAP: 0900 Tainted: G E (5.11.0-rc4+) [ 88.070732] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040169 [ 88.070745] CFAR: c0080000017daad8 IRQMASK: 0 [ 88.070745] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001 [ 88.070745] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37 [ 88.070745] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08 [ 88.070745] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000 [ 88.070745] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000 [ 88.070745] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238 [ 88.070745] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300 [ 88.070745] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40 [ 88.070807] NIP [c0080000017daad8] bpf_prog_a2eb9104e5e8a5bf+0x38/0x5560 [ 88.070815] LR [c000000000c26ea8] bpf_test_run+0x288/0x470 [ 88.070823] Call Trace: [ 88.070825] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable) [ 88.070832] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470 [ 88.070840] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660 [ 88.070846] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30 [ 88.070853] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230 [ 88.070861] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c [ 88.070869] Instruction dump: [ 88.070873] f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 7d4048a8 [ 88.070886] 7d4a1a14 7d4049ad 4082fff4 28230000 <4082fffc> 60000000 ebe10068 38210070 ibm-p9z-07-lp1 login: [ 121.762511] rcu: INFO: rcu_sched self-detected stall on CPU [ 121.762536] rcu: 4-....: (5999 ticks this GP) idle=9ee/1/0x4000000000000002 softirq=2299/2299 fqs=2998 [ 121.762546] (t=6000 jiffies g=3681 q=798) [ 121.762551] NMI backtrace for cpu 4 [ 121.762555] CPU: 4 PID: 950 Comm: test_verifier Tainted: G EL 5.11.0-rc4+ #4 [ 121.762562] Call Trace: [ 121.762565] [c0000000114072b0] [c0000000007cfe9c] dump_stack+0xc0/0x104 (unreliable) [ 121.762575] [c000000011407300] [c0000000007dd028] nmi_cpu_backtrace+0xe8/0x130 [ 121.762582] [c000000011407370] [c0000000007dd21c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0 [ 121.762589] [c000000011407410] [c0000000000718a8] arch_trigger_cpumask_backtrace+0x28/0x40 [ 121.762596] [c000000011407430] [c000000000215740] rcu_dump_cpu_stacks+0x10c/0x168 [ 121.762604] [c000000011407480] [c00000000020b080] print_cpu_stall+0x1d0/0x2a0 [ 121.762611] [c000000011407530] [c00000000020e4d4] check_cpu_stall+0x174/0x340 [ 121.762618] [c000000011407560] [c000000000213b28] rcu_sched_clock_irq+0x98/0x3f0 [ 121.762625] [c0000000114075a0] [c00000000022b974] update_process_times+0xc4/0x150 [ 121.762631] [c0000000114075e0] [c000000000245cec] tick_sched_handle+0x3c/0xd0 [ 121.762638] [c000000011407610] [c000000000246408] tick_sched_timer+0x68/0xe0 [ 121.762645] [c000000011407650] [c00000000022ca18] __hrtimer_run_queues+0x1c8/0x430 [ 121.762652] [c0000000114076f0] [c00000000022ddd4] hrtimer_interrupt+0x124/0x300 [ 121.762658] [c0000000114077a0] [c0000000000286a4] timer_interrupt+0x104/0x290 [ 121.762665] [c000000011407800] [c000000000009978] decrementer_common_virt+0x1c8/0x1d0 [ 121.762673] --- interrupt: 900 at bpf_prog_a2eb9104e5e8a5bf+0x34/0x5560 [ 121.762680] NIP: c0080000017daad4 LR: c000000000c26ea8 CTR: c0080000017daaa0 [ 121.762684] REGS: c000000011407870 TRAP: 0900 Tainted: G EL (5.11.0-rc4+) [ 121.762689] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040169 [ 121.762702] CFAR: c0080000017daad8 IRQMASK: 0 [ 121.762702] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001 [ 121.762702] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37 [ 121.762702] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08 [ 121.762702] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000 [ 121.762702] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000 [ 121.762702] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238 [ 121.762702] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300 [ 121.762702] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40 [ 121.762765] NIP [c0080000017daad4] bpf_prog_a2eb9104e5e8a5bf+0x34/0x5560 [ 121.762770] LR [c000000000c26ea8] bpf_test_run+0x288/0x470 [ 121.762777] --- interrupt: 900 [ 121.762779] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable) [ 121.762786] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470 [ 121.762793] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660 [ 121.762800] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30 [ 121.762807] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230 [ 121.762814] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c [ 148.073962] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [test_verifier:950] [ 148.073990] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E) [ 148.074029] CPU: 4 PID: 950 Comm: test_verifier Tainted: G EL 5.11.0-rc4+ #4 [ 148.074036] NIP: c0080000017daad8 LR: c000000000c26ea8 CTR: c0080000017daaa0 [ 148.074041] REGS: c000000011407870 TRAP: 0900 Tainted: G EL (5.11.0-rc4+) [ 148.074047] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44222840 XER: 20040169 [ 148.074060] CFAR: c0080000017daad8 IRQMASK: 0 [ 148.074060] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001 [ 148.074060] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37 [ 148.074060] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08 [ 148.074060] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000 [ 148.074060] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000 [ 148.074060] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238 [ 148.074060] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300 [ 148.074060] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40 [ 148.074122] NIP [c0080000017daad8] bpf_prog_a2eb9104e5e8a5bf+0x38/0x5560 [ 148.074129] LR [c000000000c26ea8] bpf_test_run+0x288/0x470 [ 148.074137] Call Trace: [ 148.074139] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable) [ 148.074147] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470 [ 148.074154] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660 [ 148.074160] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30 [ 148.074168] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230 [ 148.074175] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c [ 148.074183] Instruction dump: [ 148.074188] f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 7d4048a8 [ 148.074199] 7d4a1a14 7d4049ad 4082fff4 28230000 <4082fffc> 60000000 ebe10068 38210070 jirka ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-06-29 16:04 ` Jiri Olsa @ 2021-06-29 16:25 ` Brendan Jackman 2021-06-29 16:41 ` Jiri Olsa 0 siblings, 1 reply; 16+ messages in thread From: Brendan Jackman @ 2021-06-29 16:25 UTC (permalink / raw) To: Jiri Olsa Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao, Sandipan Das On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote: > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote: [snip] > > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my > > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That > > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d) > > > unsupported\n' in dmesg instead. I wonder if I missed something in > > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other > > I see that for all the other atomics tests: > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21 > #21/p BPF_ATOMIC_AND without fetch FAIL > Failed to load prog 'Unknown error 524'! > verification time 32 usec > stack depth 8 > processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > Summary: 0 PASSED, 0 SKIPPED, 2 FAILED Hm that's also not good - failure to JIT shouldn't mean failure to load. Are there other test_verifier failures or is it just the atomics ones? > console: > > [ 51.850952] eBPF filter atomic op code db (@2) unsupported > [ 51.851134] eBPF filter atomic op code db (@2) unsupported > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22 > #22/u BPF_ATOMIC_AND with fetch FAIL > Failed to load prog 'Unknown error 524'! > verification time 38 usec > stack depth 8 > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > #22/p BPF_ATOMIC_AND with fetch FAIL > Failed to load prog 'Unknown error 524'! > verification time 26 usec > stack depth 8 > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > > console: > [ 223.231420] eBPF filter atomic op code db (@3) unsupported > [ 223.231596] eBPF filter atomic op code db (@3) unsupported > > ... > > > but no such console output for: > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24 > #24/u BPF_ATOMIC bounds propagation, mem->reg OK > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC? > > > > > > > hum, I guess not.. will check > > nope, it locks up the same: Do you mean it locks up at commit 91c960b0056 too? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-06-29 16:25 ` Brendan Jackman @ 2021-06-29 16:41 ` Jiri Olsa 2021-06-29 21:08 ` Jiri Olsa 0 siblings, 1 reply; 16+ messages in thread From: Jiri Olsa @ 2021-06-29 16:41 UTC (permalink / raw) To: Brendan Jackman Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao, Sandipan Das On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote: > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote: > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > > > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote: > [snip] > > > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my > > > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That > > > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d) > > > > unsupported\n' in dmesg instead. I wonder if I missed something in > > > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other > > > > I see that for all the other atomics tests: > > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21 > > #21/p BPF_ATOMIC_AND without fetch FAIL > > Failed to load prog 'Unknown error 524'! > > verification time 32 usec > > stack depth 8 > > processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > > Summary: 0 PASSED, 0 SKIPPED, 2 FAILED > > Hm that's also not good - failure to JIT shouldn't mean failure to > load. Are there other test_verifier failures or is it just the atomics > ones? I have CONFIG_BPF_JIT_ALWAYS_ON=y so I think that's fine > > > console: > > > > [ 51.850952] eBPF filter atomic op code db (@2) unsupported > > [ 51.851134] eBPF filter atomic op code db (@2) unsupported > > > > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22 > > #22/u BPF_ATOMIC_AND with fetch FAIL > > Failed to load prog 'Unknown error 524'! > > verification time 38 usec > > stack depth 8 > > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > > #22/p BPF_ATOMIC_AND with fetch FAIL > > Failed to load prog 'Unknown error 524'! > > verification time 26 usec > > stack depth 8 > > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > > > > console: > > [ 223.231420] eBPF filter atomic op code db (@3) unsupported > > [ 223.231596] eBPF filter atomic op code db (@3) unsupported > > > > ... > > > > > > but no such console output for: > > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24 > > #24/u BPF_ATOMIC bounds propagation, mem->reg OK > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC? > > > > > > > > > > hum, I guess not.. will check > > > > nope, it locks up the same: > > Do you mean it locks up at commit 91c960b0056 too? > I tried this one: 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH I will check also 91c960b0056, but I think it's the new test issue jirka ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-06-29 16:41 ` Jiri Olsa @ 2021-06-29 21:08 ` Jiri Olsa 2021-06-30 10:34 ` Brendan Jackman 0 siblings, 1 reply; 16+ messages in thread From: Jiri Olsa @ 2021-06-29 21:08 UTC (permalink / raw) To: Brendan Jackman Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao, Sandipan Das On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote: > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote: > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > > > > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > > > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote: > > [snip] > > > > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my > > > > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That > > > > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d) > > > > > unsupported\n' in dmesg instead. I wonder if I missed something in > > > > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other > > > > > > I see that for all the other atomics tests: > > > > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21 > > > #21/p BPF_ATOMIC_AND without fetch FAIL > > > Failed to load prog 'Unknown error 524'! > > > verification time 32 usec > > > stack depth 8 > > > processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > > > Summary: 0 PASSED, 0 SKIPPED, 2 FAILED > > > > Hm that's also not good - failure to JIT shouldn't mean failure to > > load. Are there other test_verifier failures or is it just the atomics > > ones? > > I have CONFIG_BPF_JIT_ALWAYS_ON=y so I think that's fine > > > > > > console: > > > > > > [ 51.850952] eBPF filter atomic op code db (@2) unsupported > > > [ 51.851134] eBPF filter atomic op code db (@2) unsupported > > > > > > > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22 > > > #22/u BPF_ATOMIC_AND with fetch FAIL > > > Failed to load prog 'Unknown error 524'! > > > verification time 38 usec > > > stack depth 8 > > > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > > > #22/p BPF_ATOMIC_AND with fetch FAIL > > > Failed to load prog 'Unknown error 524'! > > > verification time 26 usec > > > stack depth 8 > > > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1 > > > > > > console: > > > [ 223.231420] eBPF filter atomic op code db (@3) unsupported > > > [ 223.231596] eBPF filter atomic op code db (@3) unsupported > > > > > > ... > > > > > > > > > but no such console output for: > > > > > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24 > > > #24/u BPF_ATOMIC bounds propagation, mem->reg OK > > > > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC? > > > > > > > > > > > > > hum, I guess not.. will check > > > > > > nope, it locks up the same: > > > > Do you mean it locks up at commit 91c960b0056 too? > > > > I tried this one: > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH > > I will check also 91c960b0056, but I think it's the new test issue for i91c960b0056 in HEAD I'm getting just 2 fails: #1097/p xadd/w check whether src/dst got mangled, 1 FAIL Failed to load prog 'Unknown error 524'! verification time 25 usec stack depth 8 processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0 #1098/p xadd/w check whether src/dst got mangled, 2 FAIL Failed to load prog 'Unknown error 524'! verification time 30 usec stack depth 8 processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0 with console output: [ 289.499341] eBPF filter atomic op code db (@4) unsupported [ 289.499510] eBPF filter atomic op code c3 (@4) unsupported no lock up jirka ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-06-29 21:08 ` Jiri Olsa @ 2021-06-30 10:34 ` Brendan Jackman [not found] ` <YNxmwZGtnqiXGnF0@krava> 0 siblings, 1 reply; 16+ messages in thread From: Brendan Jackman @ 2021-06-30 10:34 UTC (permalink / raw) To: Jiri Olsa Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao, Sandipan Das On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote: > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote: > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote: > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote: > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC? > > > > > > > > > > > > > > > > hum, I guess not.. will check > > > > > > > > nope, it locks up the same: > > > > > > Do you mean it locks up at commit 91c960b0056 too? Sorry I was being stupid here - the test didn't exist at this commit > > I tried this one: > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH > > > > I will check also 91c960b0056, but I think it's the new test issue So yeah hard to say whether this was broken on PowerPC all along. How hard is it for me to get set up to reproduce the failure? Is there a rootfs I can download, and some instructions for running a PowerPC QEMU VM? If so if you can also share your config and I'll take a look. If it's not as simple as that, I'll stare at the code for a while and see if anything jumps out. ^ permalink raw reply [flat|nested] 16+ messages in thread
[parent not found: <YNxmwZGtnqiXGnF0@krava>]
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH [not found] ` <YNxmwZGtnqiXGnF0@krava> @ 2021-07-01 8:18 ` Brendan Jackman 2021-07-01 10:16 ` Jiri Olsa 2021-07-01 11:02 ` Naveen N. Rao 0 siblings, 2 replies; 16+ messages in thread From: Brendan Jackman @ 2021-07-01 8:18 UTC (permalink / raw) To: Jiri Olsa, Sandipan Das, Naveen N. Rao Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <jolsa@redhat.com> wrote: > > On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote: > > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote: > > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote: > > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: > > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > > > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC? > > > > > > > > > > > > > > > > > > > > > > hum, I guess not.. will check > > > > > > > > > > > > nope, it locks up the same: > > > > > > > > > > Do you mean it locks up at commit 91c960b0056 too? > > > > Sorry I was being stupid here - the test didn't exist at this commit > > > > > > I tried this one: > > > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH > > > > > > > > I will check also 91c960b0056, but I think it's the new test issue > > > > So yeah hard to say whether this was broken on PowerPC all along. How > > hard is it for me to get set up to reproduce the failure? Is there a > > rootfs I can download, and some instructions for running a PowerPC > > QEMU VM? If so if you can also share your config and I'll take a look. > > > > If it's not as simple as that, I'll stare at the code for a while and > > see if anything jumps out. > > > > I have latest fedora ppc server and compile/install latest bpf-next tree > I think it will be reproduced also on vm, I attached my config OK, getting set up to boot a PowerPC QEMU isn't practical here unless someone's got commands I can copy-paste (suspect it will need .config hacking too). Looks like you need to build a proper bootloader, and boot an installer disk. Looked at the code for a bit but nothing jumped out. It seems like the verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't detect an infinite loop, but then we lose the BPF_FETCH flag somewhere between do_check in verifier.c and bpf_jit_build_body in bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter atomic op code %02x (@%d) unsupported", and would also explain the lockup because a normal atomic add without fetch would leave BPF R1 unchanged. We should be able to confirm that theory by disassembling the JITted code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to 2... at least for PowerPC 32-bit... maybe you could paste those lines into the 64-bit version too? Here's some notes I made for disassembling the hexdump on x86, I guess you'd just need to change the objdump flags: -- - Enable console JIT output: ```shell echo 2 > /proc/sys/net/core/bpf_jit_enable ``` - Load & run the program of interest. - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a short program looks like. This includes a line of context - don't paste the `flen=` line. ``` [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390 from=test_verifier pid=258 [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00 [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48 [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02 [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa ``` - This incantation will split out and decode the hex, then disassemble the result: ```shell cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b binary -m i386:x86-64 /tmp/obj ``` -- Sandipan, Naveen, do you know of anything in the PowerPC code that might be leading us to drop the BPF_FETCH flag from the atomic instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-07-01 8:18 ` Brendan Jackman @ 2021-07-01 10:16 ` Jiri Olsa 2021-07-01 11:02 ` Naveen N. Rao 1 sibling, 0 replies; 16+ messages in thread From: Jiri Olsa @ 2021-07-01 10:16 UTC (permalink / raw) To: Brendan Jackman Cc: Sandipan Das, Naveen N. Rao, bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML On Thu, Jul 01, 2021 at 10:18:39AM +0200, Brendan Jackman wrote: > On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <jolsa@redhat.com> wrote: > > > > On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote: > > > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote: > > > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote: > > > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: > > > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > > > > > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC? > > > > > > > > > > > > > > > > > > > > > > > > > hum, I guess not.. will check > > > > > > > > > > > > > > nope, it locks up the same: > > > > > > > > > > > > Do you mean it locks up at commit 91c960b0056 too? > > > > > > Sorry I was being stupid here - the test didn't exist at this commit > > > > > > > > I tried this one: > > > > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH > > > > > > > > > > I will check also 91c960b0056, but I think it's the new test issue > > > > > > So yeah hard to say whether this was broken on PowerPC all along. How > > > hard is it for me to get set up to reproduce the failure? Is there a > > > rootfs I can download, and some instructions for running a PowerPC > > > QEMU VM? If so if you can also share your config and I'll take a look. > > > > > > If it's not as simple as that, I'll stare at the code for a while and > > > see if anything jumps out. > > > > > > > I have latest fedora ppc server and compile/install latest bpf-next tree > > I think it will be reproduced also on vm, I attached my config > > OK, getting set up to boot a PowerPC QEMU isn't practical here unless > someone's got commands I can copy-paste (suspect it will need .config > hacking too). Looks like you need to build a proper bootloader, and > boot an installer disk. > > Looked at the code for a bit but nothing jumped out. It seems like the > verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't > detect an infinite loop, but then we lose the BPF_FETCH flag somewhere > between do_check in verifier.c and bpf_jit_build_body in > bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter > atomic op code %02x (@%d) unsupported", and would also explain the > lockup because a normal atomic add without fetch would leave BPF R1 > unchanged. > > We should be able to confirm that theory by disassembling the JITted > code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to > 2... at least for PowerPC 32-bit... maybe you could paste those lines > into the 64-bit version too? Here's some notes I made for > disassembling the hexdump on x86, I guess you'd just need to change > the objdump flags: > > -- > > - Enable console JIT output: > ```shell > echo 2 > /proc/sys/net/core/bpf_jit_enable > ``` > - Load & run the program of interest. > - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a > short program looks like. This includes a line of context - don't paste the > `flen=` line. > ``` > [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390 > from=test_verifier pid=258 > [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00 > [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48 > [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02 > [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa > ``` > - This incantation will split out and decode the hex, then disassemble the > result: > ```shell > cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b > binary -m i386:x86-64 /tmp/obj > ``` that's where I decided to write to list and ask for help before googling ppc assembly ;-) I changed the test_verifier to stop before executing the test so I can dump the program via bpftool: [root@ibm-p9z-07-lp1 bpf-next]# bpftool prog dump xlated id 48 0: (b7) r0 = 0 1: (7b) *(u64 *)(r10 -8) = r0 2: (b7) r1 = 1 3: (db) r1 = atomic64_fetch_add((u64 *)(r10 -8), r1) 4: (55) if r1 != 0x0 goto pc-1 5: (95) exit [root@ibm-p9z-07-lp1 bpf-next]# bpftool prog dump jited id 48 bpf_prog_a2eb9104e5e8a5bf: 0: nop 4: nop 8: stdu r1,-112(r1) c: std r31,104(r1) 10: addi r31,r1,48 14: li r8,0 18: std r8,-8(r31) 1c: li r3,1 20: addi r9,r31,-8 24: ldarx r10,0,r9 28: add r10,r10,r3 2c: stdcx. r10,0,r9 30: bne 0x0000000000000024 34: cmpldi r3,0 38: bne 0x0000000000000034 3c: nop 40: ld r31,104(r1) 44: addi r1,r1,112 48: mr r3,r8 4c: blr I wanted to also do it through bpf_jit_enable and bpf_jit_dump, but I need to check the setup, because I can't set bpf_jit_enable to 2 at the moment.. might take some time [root@ibm-p9z-07-lp1 bpf-next]# echo 2 > /proc/sys/net/core/bpf_jit_enable -bash: echo: write error: Invalid argument jirka > > -- > > Sandipan, Naveen, do you know of anything in the PowerPC code that > might be leading us to drop the BPF_FETCH flag from the atomic > instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c? > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-07-01 8:18 ` Brendan Jackman 2021-07-01 10:16 ` Jiri Olsa @ 2021-07-01 11:02 ` Naveen N. Rao 2021-07-01 14:15 ` Jiri Olsa 1 sibling, 1 reply; 16+ messages in thread From: Naveen N. Rao @ 2021-07-01 11:02 UTC (permalink / raw) To: Brendan Jackman, Jiri Olsa, Sandipan Das Cc: Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, John Fastabend, KP Singh, LKML, Florent Revest Hi Brendan, Hi Jiri, Brendan Jackman wrote: > On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <jolsa@redhat.com> wrote: >> >> On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote: >> > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote: >> > > >> > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote: >> > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote: >> > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote: >> > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: >> > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: >> > >> > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC? >> > > > > > > > >> > > > > > > >> > > > > > > hum, I guess not.. will check >> > > > > > >> > > > > > nope, it locks up the same: >> > > > > >> > > > > Do you mean it locks up at commit 91c960b0056 too? >> > >> > Sorry I was being stupid here - the test didn't exist at this commit >> > >> > > > I tried this one: >> > > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH >> > > > >> > > > I will check also 91c960b0056, but I think it's the new test issue >> > >> > So yeah hard to say whether this was broken on PowerPC all along. How >> > hard is it for me to get set up to reproduce the failure? Is there a >> > rootfs I can download, and some instructions for running a PowerPC >> > QEMU VM? If so if you can also share your config and I'll take a look. >> > >> > If it's not as simple as that, I'll stare at the code for a while and >> > see if anything jumps out. >> > >> >> I have latest fedora ppc server and compile/install latest bpf-next tree >> I think it will be reproduced also on vm, I attached my config > > OK, getting set up to boot a PowerPC QEMU isn't practical here unless > someone's got commands I can copy-paste (suspect it will need .config > hacking too). Looks like you need to build a proper bootloader, and > boot an installer disk. There are some notes put up here, though we can do better: https://github.com/linuxppc/wiki/wiki/Booting-with-Qemu If you are familiar with ubuntu/fedora cloud images (and cloud-init), you should be able to grab one of the ppc64le images and boot it in qemu: https://cloud-images.ubuntu.com/releases/hirsute/release/ https://alt.fedoraproject.org/alt/ > > Looked at the code for a bit but nothing jumped out. It seems like the > verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't > detect an infinite loop, but then we lose the BPF_FETCH flag somewhere > between do_check in verifier.c and bpf_jit_build_body in > bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter > atomic op code %02x (@%d) unsupported", and would also explain the > lockup because a normal atomic add without fetch would leave BPF R1 > unchanged. > > We should be able to confirm that theory by disassembling the JITted > code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to > 2... at least for PowerPC 32-bit... maybe you could paste those lines > into the 64-bit version too? Here's some notes I made for > disassembling the hexdump on x86, I guess you'd just need to change > the objdump flags: > > -- > > - Enable console JIT output: > ```shell > echo 2 > /proc/sys/net/core/bpf_jit_enable > ``` > - Load & run the program of interest. > - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a > short program looks like. This includes a line of context - don't paste the > `flen=` line. > ``` > [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390 > from=test_verifier pid=258 > [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00 > [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48 > [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02 > [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa > ``` > - This incantation will split out and decode the hex, then disassemble the > result: > ```shell > cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b > binary -m i386:x86-64 /tmp/obj > ``` > > -- > > Sandipan, Naveen, do you know of anything in the PowerPC code that > might be leading us to drop the BPF_FETCH flag from the atomic > instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c? Yes, I think I just found the issue. We aren't looking at the correct BPF instruction when checking the IMM value. --- a/arch/powerpc/net/bpf_jit_comp64.c +++ b/arch/powerpc/net/bpf_jit_comp64.c @@ -673,7 +673,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * * BPF_STX ATOMIC (atomic ops) */ case BPF_STX | BPF_ATOMIC | BPF_W: - if (insn->imm != BPF_ADD) { + if (insn[i].imm != BPF_ADD) { pr_err_ratelimited( "eBPF filter atomic op code %02x (@%d) unsupported\n", code, i); @@ -695,7 +695,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * PPC_BCC_SHORT(COND_NE, tmp_idx); break; case BPF_STX | BPF_ATOMIC | BPF_DW: - if (insn->imm != BPF_ADD) { + if (insn[i].imm != BPF_ADD) { pr_err_ratelimited( "eBPF filter atomic op code %02x (@%d) unsupported\n", code, i); Thanks, Naveen ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-07-01 11:02 ` Naveen N. Rao @ 2021-07-01 14:15 ` Jiri Olsa 2021-07-02 9:44 ` Brendan Jackman 0 siblings, 1 reply; 16+ messages in thread From: Jiri Olsa @ 2021-07-01 14:15 UTC (permalink / raw) To: Naveen N. Rao Cc: Brendan Jackman, Sandipan Das, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, John Fastabend, KP Singh, LKML, Florent Revest On Thu, Jul 01, 2021 at 04:32:03PM +0530, Naveen N. Rao wrote: > Hi Brendan, Hi Jiri, > > > Brendan Jackman wrote: > > On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote: > > > > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > > > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote: > > > > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote: > > > > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote: > > > > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote: > > > > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote: > > > > > > > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC? > > > > > > > > > > > > > > > > > > > > > > > > > > > > hum, I guess not.. will check > > > > > > > > > > > > > > > > nope, it locks up the same: > > > > > > > > > > > > > > Do you mean it locks up at commit 91c960b0056 too? > > > > > > > > Sorry I was being stupid here - the test didn't exist at this commit > > > > > > > > > > I tried this one: > > > > > > 37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH > > > > > > > > > > > > I will check also 91c960b0056, but I think it's the new test issue > > > > > > > > So yeah hard to say whether this was broken on PowerPC all along. How > > > > hard is it for me to get set up to reproduce the failure? Is there a > > > > rootfs I can download, and some instructions for running a PowerPC > > > > QEMU VM? If so if you can also share your config and I'll take a look. > > > > > > > > If it's not as simple as that, I'll stare at the code for a while and > > > > see if anything jumps out. > > > > > > > > > > I have latest fedora ppc server and compile/install latest bpf-next tree > > > I think it will be reproduced also on vm, I attached my config > > > > OK, getting set up to boot a PowerPC QEMU isn't practical here unless > > someone's got commands I can copy-paste (suspect it will need .config > > hacking too). Looks like you need to build a proper bootloader, and > > boot an installer disk. > > There are some notes put up here, though we can do better: > https://github.com/linuxppc/wiki/wiki/Booting-with-Qemu > > If you are familiar with ubuntu/fedora cloud images (and cloud-init), you > should be able to grab one of the ppc64le images and boot it in qemu: > https://cloud-images.ubuntu.com/releases/hirsute/release/ > https://alt.fedoraproject.org/alt/ > > > > > Looked at the code for a bit but nothing jumped out. It seems like the > > verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't > > detect an infinite loop, but then we lose the BPF_FETCH flag somewhere > > between do_check in verifier.c and bpf_jit_build_body in > > bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter > > atomic op code %02x (@%d) unsupported", and would also explain the > > lockup because a normal atomic add without fetch would leave BPF R1 > > unchanged. > > > > We should be able to confirm that theory by disassembling the JITted > > code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to > > 2... at least for PowerPC 32-bit... maybe you could paste those lines > > into the 64-bit version too? Here's some notes I made for > > disassembling the hexdump on x86, I guess you'd just need to change > > the objdump flags: > > > > -- > > > > - Enable console JIT output: > > ```shell > > echo 2 > /proc/sys/net/core/bpf_jit_enable > > ``` > > - Load & run the program of interest. > > - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a > > short program looks like. This includes a line of context - don't paste the > > `flen=` line. > > ``` > > [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390 > > from=test_verifier pid=258 > > [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00 > > [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48 > > [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02 > > [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa > > ``` > > - This incantation will split out and decode the hex, then disassemble the > > result: > > ```shell > > cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b > > binary -m i386:x86-64 /tmp/obj > > ``` > > > > -- > > > > Sandipan, Naveen, do you know of anything in the PowerPC code that > > might be leading us to drop the BPF_FETCH flag from the atomic > > instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c? > > Yes, I think I just found the issue. We aren't looking at the correct BPF > instruction when checking the IMM value. great, nice catch! :-) that fixes it for me.. Tested-by: Jiri Olsa <jolsa@redhat.com> thanks, jirka > > > --- a/arch/powerpc/net/bpf_jit_comp64.c > +++ b/arch/powerpc/net/bpf_jit_comp64.c > @@ -673,7 +673,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * > * BPF_STX ATOMIC (atomic ops) > */ > case BPF_STX | BPF_ATOMIC | BPF_W: > - if (insn->imm != BPF_ADD) { > + if (insn[i].imm != BPF_ADD) { > pr_err_ratelimited( > "eBPF filter atomic op code %02x (@%d) unsupported\n", > code, i); > @@ -695,7 +695,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context * > PPC_BCC_SHORT(COND_NE, tmp_idx); > break; > case BPF_STX | BPF_ATOMIC | BPF_DW: > - if (insn->imm != BPF_ADD) { > + if (insn[i].imm != BPF_ADD) { > pr_err_ratelimited( > "eBPF filter atomic op code %02x (@%d) unsupported\n", > code, i); > > > > Thanks, > Naveen > ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH 2021-07-01 14:15 ` Jiri Olsa @ 2021-07-02 9:44 ` Brendan Jackman 0 siblings, 0 replies; 16+ messages in thread From: Brendan Jackman @ 2021-07-02 9:44 UTC (permalink / raw) To: Jiri Olsa Cc: Naveen N. Rao, Sandipan Das, Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann, John Fastabend, KP Singh, LKML, Florent Revest On Thu, 1 Jul 2021 at 16:15, Jiri Olsa <jolsa@redhat.com> wrote: > On Thu, Jul 01, 2021 at 04:32:03PM +0530, Naveen N. Rao wrote: > > Yes, I think I just found the issue. We aren't looking at the correct BPF > > instruction when checking the IMM value. > > great, nice catch! :-) that fixes it for me.. Ahh, nice. +1 thanks for taking a look! ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2021-07-02 9:44 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-02-02 13:50 [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH Brendan Jackman 2021-02-03 17:07 ` Yonghong Song 2021-02-03 17:37 ` Alexei Starovoitov 2021-06-27 15:34 ` [BUG soft lockup] " Jiri Olsa 2021-06-28 9:21 ` Brendan Jackman 2021-06-29 14:10 ` Jiri Olsa 2021-06-29 16:04 ` Jiri Olsa 2021-06-29 16:25 ` Brendan Jackman 2021-06-29 16:41 ` Jiri Olsa 2021-06-29 21:08 ` Jiri Olsa 2021-06-30 10:34 ` Brendan Jackman [not found] ` <YNxmwZGtnqiXGnF0@krava> 2021-07-01 8:18 ` Brendan Jackman 2021-07-01 10:16 ` Jiri Olsa 2021-07-01 11:02 ` Naveen N. Rao 2021-07-01 14:15 ` Jiri Olsa 2021-07-02 9:44 ` Brendan Jackman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).