bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
@ 2021-02-02 13:50 Brendan Jackman
  2021-02-03 17:07 ` Yonghong Song
  2021-06-27 15:34 ` [BUG soft lockup] " Jiri Olsa
  0 siblings, 2 replies; 16+ messages in thread
From: Brendan Jackman @ 2021-02-02 13:50 UTC (permalink / raw)
  To: bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh,
	Florent Revest, John Fastabend, linux-kernel, Brendan Jackman

When BPF_FETCH is set, atomic instructions load a value from memory
into a register. The current verifier code first checks via
check_mem_access whether we can access the memory, and then checks
via check_reg_arg whether we can write into the register.

For loads, check_reg_arg has the side-effect of marking the
register's value as unkonwn, and check_mem_access has the side effect
of propagating bounds from memory to the register. This currently only
takes effect for stack memory.

Therefore with the current order, bounds information is thrown away,
but by simply reversing the order of check_reg_arg
vs. check_mem_access, we can instead propagate bounds smartly.

A simple test is added with an infinite loop that can only be proved
unreachable if this propagation is present. This is implemented both
with C and directly in test_verifier using assembly.

Suggested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
---

Difference from v2->v3 [1]:

 * Fixed missing ENABLE_ATOMICS_TESTS check.

Difference from v1->v2:

 * Reworked commit message to clarify this only affects stack memory
 * Added the Suggested-by
 * Added a C-based test.

[1]: https://lore.kernel.org/bpf/CA+i-1C2ZWUbGxWJ8kAxbri9rBboyuMaVj_BBhg+2Zf_Su9BOJA@mail.gmail.com/T/#t

 kernel/bpf/verifier.c                         | 32 +++++++++++--------
 .../selftests/bpf/prog_tests/atomic_bounds.c  | 15 +++++++++
 .../selftests/bpf/progs/atomic_bounds.c       | 24 ++++++++++++++
 .../selftests/bpf/verifier/atomic_bounds.c    | 27 ++++++++++++++++
 4 files changed, 84 insertions(+), 14 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
 create mode 100644 tools/testing/selftests/bpf/progs/atomic_bounds.c
 create mode 100644 tools/testing/selftests/bpf/verifier/atomic_bounds.c

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 972fc38eb62d..5e09632efddb 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3665,9 +3665,26 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 		return -EACCES;
 	}

+	if (insn->imm & BPF_FETCH) {
+		if (insn->imm == BPF_CMPXCHG)
+			load_reg = BPF_REG_0;
+		else
+			load_reg = insn->src_reg;
+
+		/* check and record load of old value */
+		err = check_reg_arg(env, load_reg, DST_OP);
+		if (err)
+			return err;
+	} else {
+		/* This instruction accesses a memory location but doesn't
+		 * actually load it into a register.
+		 */
+		load_reg = -1;
+	}
+
 	/* check whether we can read the memory */
 	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
-			       BPF_SIZE(insn->code), BPF_READ, -1, true);
+			       BPF_SIZE(insn->code), BPF_READ, load_reg, true);
 	if (err)
 		return err;

@@ -3677,19 +3694,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
 	if (err)
 		return err;

-	if (!(insn->imm & BPF_FETCH))
-		return 0;
-
-	if (insn->imm == BPF_CMPXCHG)
-		load_reg = BPF_REG_0;
-	else
-		load_reg = insn->src_reg;
-
-	/* check and record load of old value */
-	err = check_reg_arg(env, load_reg, DST_OP);
-	if (err)
-		return err;
-
 	return 0;
 }

diff --git a/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
new file mode 100644
index 000000000000..addf127068e4
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#include "atomic_bounds.skel.h"
+
+void test_atomic_bounds(void)
+{
+	struct atomic_bounds *skel;
+	__u32 duration = 0;
+
+	skel = atomic_bounds__open_and_load();
+	if (CHECK(!skel, "skel_load", "couldn't load program\n"))
+		return;
+}
diff --git a/tools/testing/selftests/bpf/progs/atomic_bounds.c b/tools/testing/selftests/bpf/progs/atomic_bounds.c
new file mode 100644
index 000000000000..e5fff7fc7f8f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/atomic_bounds.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <stdbool.h>
+
+#ifdef ENABLE_ATOMICS_TESTS
+bool skip_tests __attribute((__section__(".data"))) = false;
+#else
+bool skip_tests = true;
+#endif
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(sub, int x)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+	int a = 0;
+	int b = __sync_fetch_and_add(&a, 1);
+	/* b is certainly 0 here. Can the verifier tell? */
+	while (b)
+		continue;
+#endif
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
new file mode 100644
index 000000000000..e82183e4914f
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
@@ -0,0 +1,27 @@
+{
+	"BPF_ATOMIC bounds propagation, mem->reg",
+	.insns = {
+		/* a = 0; */
+		/*
+		 * Note this is implemented with two separate instructions,
+		 * where you might think one would suffice:
+		 *
+		 * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+		 *
+		 * This is because BPF_ST_MEM doesn't seem to set the stack slot
+		 * type to 0 when storing an immediate.
+		 */
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+		/* b = atomic_fetch_add(&a, 1); */
+		BPF_MOV64_IMM(BPF_REG_1, 1),
+		BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
+		/* Verifier should be able to tell that this infinite loop isn't reachable. */
+		/* if (b) while (true) continue; */
+		BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.result_unpriv = REJECT,
+	.errstr_unpriv = "back-edge",
+},

base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
--
2.30.0.365.g02bc693789-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-02-02 13:50 [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH Brendan Jackman
@ 2021-02-03 17:07 ` Yonghong Song
  2021-02-03 17:37   ` Alexei Starovoitov
  2021-06-27 15:34 ` [BUG soft lockup] " Jiri Olsa
  1 sibling, 1 reply; 16+ messages in thread
From: Yonghong Song @ 2021-02-03 17:07 UTC (permalink / raw)
  To: Brendan Jackman, bpf
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko, KP Singh,
	Florent Revest, John Fastabend, linux-kernel



On 2/2/21 5:50 AM, Brendan Jackman wrote:
> When BPF_FETCH is set, atomic instructions load a value from memory
> into a register. The current verifier code first checks via
> check_mem_access whether we can access the memory, and then checks
> via check_reg_arg whether we can write into the register.
> 
> For loads, check_reg_arg has the side-effect of marking the
> register's value as unkonwn, and check_mem_access has the side effect
> of propagating bounds from memory to the register. This currently only
> takes effect for stack memory.
> 
> Therefore with the current order, bounds information is thrown away,
> but by simply reversing the order of check_reg_arg
> vs. check_mem_access, we can instead propagate bounds smartly.
> 
> A simple test is added with an infinite loop that can only be proved
> unreachable if this propagation is present. This is implemented both
> with C and directly in test_verifier using assembly.
> 
> Suggested-by: John Fastabend <john.fastabend@gmail.com>
> Signed-off-by: Brendan Jackman <jackmanb@google.com>

Ack with a nit below.

Acked-by: Yonghong Song <yhs@fb.com>

> ---
> 
> Difference from v2->v3 [1]:
> 
>   * Fixed missing ENABLE_ATOMICS_TESTS check.
> 
> Difference from v1->v2:
> 
>   * Reworked commit message to clarify this only affects stack memory
>   * Added the Suggested-by
>   * Added a C-based test.
> 
> [1]: https://lore.kernel.org/bpf/CA+i-1C2ZWUbGxWJ8kAxbri9rBboyuMaVj_BBhg+2Zf_Su9BOJA@mail.gmail.com/T/#t
> 
>   kernel/bpf/verifier.c                         | 32 +++++++++++--------
>   .../selftests/bpf/prog_tests/atomic_bounds.c  | 15 +++++++++
>   .../selftests/bpf/progs/atomic_bounds.c       | 24 ++++++++++++++
>   .../selftests/bpf/verifier/atomic_bounds.c    | 27 ++++++++++++++++
>   4 files changed, 84 insertions(+), 14 deletions(-)
>   create mode 100644 tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
>   create mode 100644 tools/testing/selftests/bpf/progs/atomic_bounds.c
>   create mode 100644 tools/testing/selftests/bpf/verifier/atomic_bounds.c
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 972fc38eb62d..5e09632efddb 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3665,9 +3665,26 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
>   		return -EACCES;
>   	}
> 
> +	if (insn->imm & BPF_FETCH) {
> +		if (insn->imm == BPF_CMPXCHG)
> +			load_reg = BPF_REG_0;
> +		else
> +			load_reg = insn->src_reg;
> +
> +		/* check and record load of old value */
> +		err = check_reg_arg(env, load_reg, DST_OP);
> +		if (err)
> +			return err;
> +	} else {
> +		/* This instruction accesses a memory location but doesn't
> +		 * actually load it into a register.
> +		 */
> +		load_reg = -1;
> +	}
> +
>   	/* check whether we can read the memory */
>   	err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
> -			       BPF_SIZE(insn->code), BPF_READ, -1, true);
> +			       BPF_SIZE(insn->code), BPF_READ, load_reg, true);
>   	if (err)
>   		return err;
> 
> @@ -3677,19 +3694,6 @@ static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_i
>   	if (err)
>   		return err;
> 
> -	if (!(insn->imm & BPF_FETCH))
> -		return 0;
> -
> -	if (insn->imm == BPF_CMPXCHG)
> -		load_reg = BPF_REG_0;
> -	else
> -		load_reg = insn->src_reg;
> -
> -	/* check and record load of old value */
> -	err = check_reg_arg(env, load_reg, DST_OP);
> -	if (err)
> -		return err;
> -
>   	return 0;
>   }
> 
> diff --git a/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
> new file mode 100644
> index 000000000000..addf127068e4
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
> @@ -0,0 +1,15 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include <test_progs.h>
> +
> +#include "atomic_bounds.skel.h"
> +
> +void test_atomic_bounds(void)
> +{
> +	struct atomic_bounds *skel;
> +	__u32 duration = 0;
> +
> +	skel = atomic_bounds__open_and_load();
> +	if (CHECK(!skel, "skel_load", "couldn't load program\n"))
> +		return;

You are missing
	atomic_bounds__destroy(skel);
here.

> +}
> diff --git a/tools/testing/selftests/bpf/progs/atomic_bounds.c b/tools/testing/selftests/bpf/progs/atomic_bounds.c
> new file mode 100644
> index 000000000000..e5fff7fc7f8f
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/atomic_bounds.c
> @@ -0,0 +1,24 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +#include <bpf/bpf_tracing.h>
> +#include <stdbool.h>
> +
> +#ifdef ENABLE_ATOMICS_TESTS
> +bool skip_tests __attribute((__section__(".data"))) = false;
> +#else
> +bool skip_tests = true;
> +#endif
> +
> +SEC("fentry/bpf_fentry_test1")
> +int BPF_PROG(sub, int x)
> +{
> +#ifdef ENABLE_ATOMICS_TESTS
> +	int a = 0;
> +	int b = __sync_fetch_and_add(&a, 1);
> +	/* b is certainly 0 here. Can the verifier tell? */
> +	while (b)
> +		continue;
> +#endif
> +	return 0;
> +}
[...]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-02-03 17:07 ` Yonghong Song
@ 2021-02-03 17:37   ` Alexei Starovoitov
  0 siblings, 0 replies; 16+ messages in thread
From: Alexei Starovoitov @ 2021-02-03 17:37 UTC (permalink / raw)
  To: Yonghong Song
  Cc: Brendan Jackman, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, KP Singh, Florent Revest, John Fastabend, LKML

On Wed, Feb 3, 2021 at 9:07 AM Yonghong Song <yhs@fb.com> wrote:
>
>
>
> On 2/2/21 5:50 AM, Brendan Jackman wrote:
> > When BPF_FETCH is set, atomic instructions load a value from memory
> > into a register. The current verifier code first checks via
> > check_mem_access whether we can access the memory, and then checks
> > via check_reg_arg whether we can write into the register.
> >
> > For loads, check_reg_arg has the side-effect of marking the
> > register's value as unkonwn, and check_mem_access has the side effect
> > of propagating bounds from memory to the register. This currently only
> > takes effect for stack memory.
> >
> > Therefore with the current order, bounds information is thrown away,
> > but by simply reversing the order of check_reg_arg
> > vs. check_mem_access, we can instead propagate bounds smartly.
> >
> > A simple test is added with an infinite loop that can only be proved
> > unreachable if this propagation is present. This is implemented both
> > with C and directly in test_verifier using assembly.
> >
> > Suggested-by: John Fastabend <john.fastabend@gmail.com>
> > Signed-off-by: Brendan Jackman <jackmanb@google.com>
>
> Ack with a nit below.

Sorry it was already applied yesterday.
patchbot just didn't send auto-reply.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-02-02 13:50 [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH Brendan Jackman
  2021-02-03 17:07 ` Yonghong Song
@ 2021-06-27 15:34 ` Jiri Olsa
  2021-06-28  9:21   ` Brendan Jackman
  1 sibling, 1 reply; 16+ messages in thread
From: Jiri Olsa @ 2021-06-27 15:34 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, linux-kernel,
	Naveen N. Rao, Sandipan Das

On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:

SNIP

> diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> new file mode 100644
> index 000000000000..e82183e4914f
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> @@ -0,0 +1,27 @@
> +{
> +	"BPF_ATOMIC bounds propagation, mem->reg",
> +	.insns = {
> +		/* a = 0; */
> +		/*
> +		 * Note this is implemented with two separate instructions,
> +		 * where you might think one would suffice:
> +		 *
> +		 * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> +		 *
> +		 * This is because BPF_ST_MEM doesn't seem to set the stack slot
> +		 * type to 0 when storing an immediate.
> +		 */
> +		BPF_MOV64_IMM(BPF_REG_0, 0),
> +		BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> +		/* b = atomic_fetch_add(&a, 1); */
> +		BPF_MOV64_IMM(BPF_REG_1, 1),
> +		BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
> +		/* Verifier should be able to tell that this infinite loop isn't reachable. */
> +		/* if (b) while (true) continue; */
> +		BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
> +		BPF_EXIT_INSN(),
> +	},
> +	.result = ACCEPT,
> +	.result_unpriv = REJECT,
> +	.errstr_unpriv = "back-edge",
> +},
> 
> base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
> --
> 2.30.0.365.g02bc693789-goog
> 

hi,
I tracked soft lock up on powerpc to this test:

	[root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25
	#25/u BPF_ATOMIC bounds propagation, mem->reg SKIP
	#25/p BPF_ATOMIC bounds propagation, mem->reg 

	Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ...
	 kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]

	Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ...
	 kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055]

please check the console output below.. it looks like the verifier
allowed the loop to happen for some reason on powerpc.. any idea?

I'm on latest bpf-next/master, I can send the config if needed

thanks,
jirka


---
ibm-p9z-07-lp1 login: [  184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
[  184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[  184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G            E     5.13.0-rc3+ #3
[  184.108728] NIP:  c00800000131314c LR: c000000000c56918 CTR: c008000001313118
[  184.108733] REGS: c0000000119ef820 TRAP: 0900   Tainted: G            E      (5.13.0-rc3+)
[  184.108739] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44222840  XER: 20040003
[  184.108752] CFAR: c008000001313150 IRQMASK: 0 
[  184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001 
[  184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008 
[  184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937 
[  184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000 
[  184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8 
[  184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048 
[  184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638 
[  184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0 
[  184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8
[  184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470
[  184.108826] Call Trace:
[  184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable)
[  184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470
[  184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660
[  184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60
[  184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40
[  184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280
[  184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268
[  184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24
[  184.108878] NIP:  00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000
[  184.108883] REGS: c0000000119efe80 TRAP: 3000   Tainted: G            E      (5.13.0-rc3+)
[  184.108887] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28000848  XER: 00000000
[  184.108903] IRQMASK: 0 
[  184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a 
[  184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008 
[  184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000 
[  184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000 
[  184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0 
[  184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710 
[  184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24
[  184.108964] LR [0000000000000000] 0x0
[  184.108967] --- interrupt: 3000
[  184.108970] Instruction dump:
[  184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 
[  184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-06-27 15:34 ` [BUG soft lockup] " Jiri Olsa
@ 2021-06-28  9:21   ` Brendan Jackman
  2021-06-29 14:10     ` Jiri Olsa
  0 siblings, 1 reply; 16+ messages in thread
From: Brendan Jackman @ 2021-06-28  9:21 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao,
	Sandipan Das

On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
>
> SNIP
>
> > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > new file mode 100644
> > index 000000000000..e82183e4914f
> > --- /dev/null
> > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > @@ -0,0 +1,27 @@
> > +{
> > +     "BPF_ATOMIC bounds propagation, mem->reg",
> > +     .insns = {
> > +             /* a = 0; */
> > +             /*
> > +              * Note this is implemented with two separate instructions,
> > +              * where you might think one would suffice:
> > +              *
> > +              * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > +              *
> > +              * This is because BPF_ST_MEM doesn't seem to set the stack slot
> > +              * type to 0 when storing an immediate.
> > +              */
> > +             BPF_MOV64_IMM(BPF_REG_0, 0),
> > +             BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> > +             /* b = atomic_fetch_add(&a, 1); */
> > +             BPF_MOV64_IMM(BPF_REG_1, 1),
> > +             BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
> > +             /* Verifier should be able to tell that this infinite loop isn't reachable. */
> > +             /* if (b) while (true) continue; */
> > +             BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
> > +             BPF_EXIT_INSN(),
> > +     },
> > +     .result = ACCEPT,
> > +     .result_unpriv = REJECT,
> > +     .errstr_unpriv = "back-edge",
> > +},
> >
> > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
> > --
> > 2.30.0.365.g02bc693789-goog
> >
>
> hi,
> I tracked soft lock up on powerpc to this test:
>
>         [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25
>         #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP
>         #25/p BPF_ATOMIC bounds propagation, mem->reg
>
>         Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ...
>          kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
>
>         Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ...
>          kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055]
>
> please check the console output below.. it looks like the verifier
> allowed the loop to happen for some reason on powerpc.. any idea?
>
> I'm on latest bpf-next/master, I can send the config if needed
>
> thanks,
> jirka
>
>
> ---
> ibm-p9z-07-lp1 login: [  184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> [  184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> [  184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G            E     5.13.0-rc3+ #3
> [  184.108728] NIP:  c00800000131314c LR: c000000000c56918 CTR: c008000001313118
> [  184.108733] REGS: c0000000119ef820 TRAP: 0900   Tainted: G            E      (5.13.0-rc3+)
> [  184.108739] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44222840  XER: 20040003
> [  184.108752] CFAR: c008000001313150 IRQMASK: 0
> [  184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001
> [  184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008
> [  184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937
> [  184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000
> [  184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8
> [  184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048
> [  184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638
> [  184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0
> [  184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8
> [  184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470
> [  184.108826] Call Trace:
> [  184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable)
> [  184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470
> [  184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660
> [  184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60
> [  184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40
> [  184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280
> [  184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268
> [  184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24
> [  184.108878] NIP:  00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000
> [  184.108883] REGS: c0000000119efe80 TRAP: 3000   Tainted: G            E      (5.13.0-rc3+)
> [  184.108887] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28000848  XER: 00000000
> [  184.108903] IRQMASK: 0
> [  184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a
> [  184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008
> [  184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000
> [  184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000
> [  184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0
> [  184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710
> [  184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24
> [  184.108964] LR [0000000000000000] 0x0
> [  184.108967] --- interrupt: 3000
> [  184.108970] Instruction dump:
> [  184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8
> [  184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068

Hmm, is the test prog from atomic_bounds.c getting JITed there (my
dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
unsupported\n' in dmesg instead. I wonder if I missed something in
commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
atomics in .imm). Any idea if this test was ever passing on PowerPC?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-06-28  9:21   ` Brendan Jackman
@ 2021-06-29 14:10     ` Jiri Olsa
  2021-06-29 16:04       ` Jiri Olsa
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Olsa @ 2021-06-29 14:10 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao,
	Sandipan Das

On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
> >
> > SNIP
> >
> > > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > > new file mode 100644
> > > index 000000000000..e82183e4914f
> > > --- /dev/null
> > > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > > @@ -0,0 +1,27 @@
> > > +{
> > > +     "BPF_ATOMIC bounds propagation, mem->reg",
> > > +     .insns = {
> > > +             /* a = 0; */
> > > +             /*
> > > +              * Note this is implemented with two separate instructions,
> > > +              * where you might think one would suffice:
> > > +              *
> > > +              * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > > +              *
> > > +              * This is because BPF_ST_MEM doesn't seem to set the stack slot
> > > +              * type to 0 when storing an immediate.
> > > +              */
> > > +             BPF_MOV64_IMM(BPF_REG_0, 0),
> > > +             BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> > > +             /* b = atomic_fetch_add(&a, 1); */
> > > +             BPF_MOV64_IMM(BPF_REG_1, 1),
> > > +             BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
> > > +             /* Verifier should be able to tell that this infinite loop isn't reachable. */
> > > +             /* if (b) while (true) continue; */
> > > +             BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
> > > +             BPF_EXIT_INSN(),
> > > +     },
> > > +     .result = ACCEPT,
> > > +     .result_unpriv = REJECT,
> > > +     .errstr_unpriv = "back-edge",
> > > +},
> > >
> > > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
> > > --
> > > 2.30.0.365.g02bc693789-goog
> > >
> >
> > hi,
> > I tracked soft lock up on powerpc to this test:
> >
> >         [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25
> >         #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP
> >         #25/p BPF_ATOMIC bounds propagation, mem->reg
> >
> >         Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ...
> >          kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> >
> >         Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ...
> >          kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055]
> >
> > please check the console output below.. it looks like the verifier
> > allowed the loop to happen for some reason on powerpc.. any idea?
> >
> > I'm on latest bpf-next/master, I can send the config if needed
> >
> > thanks,
> > jirka
> >
> >
> > ---
> > ibm-p9z-07-lp1 login: [  184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> > [  184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > [  184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G            E     5.13.0-rc3+ #3
> > [  184.108728] NIP:  c00800000131314c LR: c000000000c56918 CTR: c008000001313118
> > [  184.108733] REGS: c0000000119ef820 TRAP: 0900   Tainted: G            E      (5.13.0-rc3+)
> > [  184.108739] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44222840  XER: 20040003
> > [  184.108752] CFAR: c008000001313150 IRQMASK: 0
> > [  184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001
> > [  184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008
> > [  184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937
> > [  184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000
> > [  184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8
> > [  184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048
> > [  184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638
> > [  184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0
> > [  184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8
> > [  184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470
> > [  184.108826] Call Trace:
> > [  184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable)
> > [  184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470
> > [  184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660
> > [  184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60
> > [  184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40
> > [  184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280
> > [  184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268
> > [  184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24
> > [  184.108878] NIP:  00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000
> > [  184.108883] REGS: c0000000119efe80 TRAP: 3000   Tainted: G            E      (5.13.0-rc3+)
> > [  184.108887] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28000848  XER: 00000000
> > [  184.108903] IRQMASK: 0
> > [  184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a
> > [  184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008
> > [  184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000
> > [  184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000
> > [  184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [  184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > [  184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0
> > [  184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710
> > [  184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24
> > [  184.108964] LR [0000000000000000] 0x0
> > [  184.108967] --- interrupt: 3000
> > [  184.108970] Instruction dump:
> > [  184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8
> > [  184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068
> 
> Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> unsupported\n' in dmesg instead. I wonder if I missed something in
> commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
> atomics in .imm). Any idea if this test was ever passing on PowerPC?
> 

hum, I guess not.. will check

jirka


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-06-29 14:10     ` Jiri Olsa
@ 2021-06-29 16:04       ` Jiri Olsa
  2021-06-29 16:25         ` Brendan Jackman
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Olsa @ 2021-06-29 16:04 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao,
	Sandipan Das

On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
> > >
> > > SNIP
> > >
> > > > diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > > > new file mode 100644
> > > > index 000000000000..e82183e4914f
> > > > --- /dev/null
> > > > +++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
> > > > @@ -0,0 +1,27 @@
> > > > +{
> > > > +     "BPF_ATOMIC bounds propagation, mem->reg",
> > > > +     .insns = {
> > > > +             /* a = 0; */
> > > > +             /*
> > > > +              * Note this is implemented with two separate instructions,
> > > > +              * where you might think one would suffice:
> > > > +              *
> > > > +              * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
> > > > +              *
> > > > +              * This is because BPF_ST_MEM doesn't seem to set the stack slot
> > > > +              * type to 0 when storing an immediate.
> > > > +              */
> > > > +             BPF_MOV64_IMM(BPF_REG_0, 0),
> > > > +             BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
> > > > +             /* b = atomic_fetch_add(&a, 1); */
> > > > +             BPF_MOV64_IMM(BPF_REG_1, 1),
> > > > +             BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
> > > > +             /* Verifier should be able to tell that this infinite loop isn't reachable. */
> > > > +             /* if (b) while (true) continue; */
> > > > +             BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
> > > > +             BPF_EXIT_INSN(),
> > > > +     },
> > > > +     .result = ACCEPT,
> > > > +     .result_unpriv = REJECT,
> > > > +     .errstr_unpriv = "back-edge",
> > > > +},
> > > >
> > > > base-commit: 61ca36c8c4eb3bae35a285b1ae18c514cde65439
> > > > --
> > > > 2.30.0.365.g02bc693789-goog
> > > >
> > >
> > > hi,
> > > I tracked soft lock up on powerpc to this test:
> > >
> > >         [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 25
> > >         #25/u BPF_ATOMIC bounds propagation, mem->reg SKIP
> > >         #25/p BPF_ATOMIC bounds propagation, mem->reg
> > >
> > >         Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:24:34 ...
> > >          kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> > >
> > >         Message from syslogd@ibm-p9z-07-lp1 at Jun 27 11:25:04 ...
> > >          kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [test_verifier:1055]
> > >
> > > please check the console output below.. it looks like the verifier
> > > allowed the loop to happen for some reason on powerpc.. any idea?
> > >
> > > I'm on latest bpf-next/master, I can send the config if needed
> > >
> > > thanks,
> > > jirka
> > >
> > >
> > > ---
> > > ibm-p9z-07-lp1 login: [  184.108655] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [test_verifier:1055]
> > > [  184.108679] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
> > > [  184.108722] CPU: 2 PID: 1055 Comm: test_verifier Tainted: G            E     5.13.0-rc3+ #3
> > > [  184.108728] NIP:  c00800000131314c LR: c000000000c56918 CTR: c008000001313118
> > > [  184.108733] REGS: c0000000119ef820 TRAP: 0900   Tainted: G            E      (5.13.0-rc3+)
> > > [  184.108739] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44222840  XER: 20040003
> > > [  184.108752] CFAR: c008000001313150 IRQMASK: 0
> > > [  184.108752] GPR00: c000000000c5671c c0000000119efac0 c000000002a08400 0000000000000001
> > > [  184.108752] GPR04: c0080000010c0048 ffffffffffffffff 0000000001f3f8ec 0000000000000008
> > > [  184.108752] GPR08: 0000000000000000 c0000000119efae8 0000000000000001 49adb8fcb8417937
> > > [  184.108752] GPR12: c008000001313118 c00000001ecae400 0000000000000000 0000000000000000
> > > [  184.108752] GPR16: 0000000000000000 0000000000000000 0000000000000000 c0000000021cf6f8
> > > [  184.108752] GPR20: 0000000000000000 c0000000119efc34 c0000000119efc30 c0080000010c0048
> > > [  184.108752] GPR24: c00000000a1dc100 0000000000000001 c000000011fadc80 c0000000021cf638
> > > [  184.108752] GPR28: c0080000010c0000 0000000000000001 c0000000021cf638 c0000000119efaf0
> > > [  184.108812] NIP [c00800000131314c] bpf_prog_a2eb9104e5e8a5bf+0x34/0xcee8
> > > [  184.108819] LR [c000000000c56918] bpf_test_run+0x2f8/0x470
> > > [  184.108826] Call Trace:
> > > [  184.108828] [c0000000119efac0] [c0000000119efb30] 0xc0000000119efb30 (unreliable)
> > > [  184.108835] [c0000000119efb30] [c000000000c5671c] bpf_test_run+0xfc/0x470
> > > [  184.108841] [c0000000119efc10] [c000000000c57b6c] bpf_prog_test_run_skb+0x38c/0x660
> > > [  184.108848] [c0000000119efcb0] [c00000000035de6c] __sys_bpf+0x46c/0xd60
> > > [  184.108854] [c0000000119efd90] [c00000000035e810] sys_bpf+0x30/0x40
> > > [  184.108859] [c0000000119efdb0] [c00000000002ea34] system_call_exception+0x144/0x280
> > > [  184.108866] [c0000000119efe10] [c00000000000c570] system_call_vectored_common+0xf0/0x268
> > > [  184.108874] --- interrupt: 3000 at 0x7fff8bb3ef24
> > > [  184.108878] NIP:  00007fff8bb3ef24 LR: 0000000000000000 CTR: 0000000000000000
> > > [  184.108883] REGS: c0000000119efe80 TRAP: 3000   Tainted: G            E      (5.13.0-rc3+)
> > > [  184.108887] MSR:  800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 28000848  XER: 00000000
> > > [  184.108903] IRQMASK: 0
> > > [  184.108903] GPR00: 0000000000000169 00007fffe4577710 00007fff8bc27200 000000000000000a
> > > [  184.108903] GPR04: 00007fffe45777b8 0000000000000080 0000000000000001 0000000000000008
> > > [  184.108903] GPR08: 000000000000000a 0000000000000000 0000000000000000 0000000000000000
> > > [  184.108903] GPR12: 0000000000000000 00007fff8be1c400 0000000000000000 0000000000000000
> > > [  184.108903] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [  184.108903] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> > > [  184.108903] GPR24: 0000000000000000 0000000000000000 0000000000000000 000000001000d1d0
> > > [  184.108903] GPR28: 0000000000000002 00007fffe4578128 00007fffe45782c0 00007fffe4577710
> > > [  184.108960] NIP [00007fff8bb3ef24] 0x7fff8bb3ef24
> > > [  184.108964] LR [0000000000000000] 0x0
> > > [  184.108967] --- interrupt: 3000
> > > [  184.108970] Instruction dump:
> > > [  184.108974] 60000000 f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8
> > > [  184.108985] 7d4048a8 7d4a1a14 7d4049ad 4082fff4 <28230000> 4082fffc 60000000 ebe10068
> > 
> > Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> > unsupported\n' in dmesg instead. I wonder if I missed something in
> > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other

I see that for all the other atomics tests:

[root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21
#21/p BPF_ATOMIC_AND without fetch FAIL
Failed to load prog 'Unknown error 524'!
verification time 32 usec
stack depth 8
processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
Summary: 0 PASSED, 0 SKIPPED, 2 FAILED

console:

	[   51.850952] eBPF filter atomic op code db (@2) unsupported
	[   51.851134] eBPF filter atomic op code db (@2) unsupported


[root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22
#22/u BPF_ATOMIC_AND with fetch FAIL
Failed to load prog 'Unknown error 524'!
verification time 38 usec
stack depth 8
processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
#22/p BPF_ATOMIC_AND with fetch FAIL
Failed to load prog 'Unknown error 524'!
verification time 26 usec
stack depth 8
processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1

console:
	[  223.231420] eBPF filter atomic op code db (@3) unsupported
	[  223.231596] eBPF filter atomic op code db (@3) unsupported

...


but no such console output for:

[root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24
#24/u BPF_ATOMIC bounds propagation, mem->reg OK


> > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > 
> 
> hum, I guess not.. will check

nope, it locks up the same:

ibm-p9z-07-lp1 login: [   88.070644] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [test_verifier:950]
[   88.070674] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[   88.070714] CPU: 4 PID: 950 Comm: test_verifier Tainted: G            E     5.11.0-rc4+ #4
[   88.070721] NIP:  c0080000017daad8 LR: c000000000c26ea8 CTR: c0080000017daaa0
[   88.070726] REGS: c000000011407870 TRAP: 0900   Tainted: G            E      (5.11.0-rc4+)
[   88.070732] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44222840  XER: 20040169
[   88.070745] CFAR: c0080000017daad8 IRQMASK: 0 
[   88.070745] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001 
[   88.070745] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37 
[   88.070745] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08 
[   88.070745] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000 
[   88.070745] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000 
[   88.070745] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238 
[   88.070745] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300 
[   88.070745] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40 
[   88.070807] NIP [c0080000017daad8] bpf_prog_a2eb9104e5e8a5bf+0x38/0x5560
[   88.070815] LR [c000000000c26ea8] bpf_test_run+0x288/0x470
[   88.070823] Call Trace:
[   88.070825] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable)
[   88.070832] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470
[   88.070840] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660
[   88.070846] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30
[   88.070853] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230
[   88.070861] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c
[   88.070869] Instruction dump:
[   88.070873] f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 7d4048a8 
[   88.070886] 7d4a1a14 7d4049ad 4082fff4 28230000 <4082fffc> 60000000 ebe10068 38210070 

ibm-p9z-07-lp1 login: [  121.762511] rcu: INFO: rcu_sched self-detected stall on CPU
[  121.762536] rcu:     4-....: (5999 ticks this GP) idle=9ee/1/0x4000000000000002 softirq=2299/2299 fqs=2998 
[  121.762546]  (t=6000 jiffies g=3681 q=798)
[  121.762551] NMI backtrace for cpu 4
[  121.762555] CPU: 4 PID: 950 Comm: test_verifier Tainted: G            EL    5.11.0-rc4+ #4
[  121.762562] Call Trace:
[  121.762565] [c0000000114072b0] [c0000000007cfe9c] dump_stack+0xc0/0x104 (unreliable)
[  121.762575] [c000000011407300] [c0000000007dd028] nmi_cpu_backtrace+0xe8/0x130
[  121.762582] [c000000011407370] [c0000000007dd21c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
[  121.762589] [c000000011407410] [c0000000000718a8] arch_trigger_cpumask_backtrace+0x28/0x40
[  121.762596] [c000000011407430] [c000000000215740] rcu_dump_cpu_stacks+0x10c/0x168
[  121.762604] [c000000011407480] [c00000000020b080] print_cpu_stall+0x1d0/0x2a0
[  121.762611] [c000000011407530] [c00000000020e4d4] check_cpu_stall+0x174/0x340
[  121.762618] [c000000011407560] [c000000000213b28] rcu_sched_clock_irq+0x98/0x3f0
[  121.762625] [c0000000114075a0] [c00000000022b974] update_process_times+0xc4/0x150
[  121.762631] [c0000000114075e0] [c000000000245cec] tick_sched_handle+0x3c/0xd0
[  121.762638] [c000000011407610] [c000000000246408] tick_sched_timer+0x68/0xe0
[  121.762645] [c000000011407650] [c00000000022ca18] __hrtimer_run_queues+0x1c8/0x430
[  121.762652] [c0000000114076f0] [c00000000022ddd4] hrtimer_interrupt+0x124/0x300
[  121.762658] [c0000000114077a0] [c0000000000286a4] timer_interrupt+0x104/0x290
[  121.762665] [c000000011407800] [c000000000009978] decrementer_common_virt+0x1c8/0x1d0
[  121.762673] --- interrupt: 900 at bpf_prog_a2eb9104e5e8a5bf+0x34/0x5560
[  121.762680] NIP:  c0080000017daad4 LR: c000000000c26ea8 CTR: c0080000017daaa0
[  121.762684] REGS: c000000011407870 TRAP: 0900   Tainted: G            EL     (5.11.0-rc4+)
[  121.762689] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44222840  XER: 20040169
[  121.762702] CFAR: c0080000017daad8 IRQMASK: 0 
[  121.762702] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001 
[  121.762702] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37 
[  121.762702] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08 
[  121.762702] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000 
[  121.762702] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000 
[  121.762702] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238 
[  121.762702] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300 
[  121.762702] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40 
[  121.762765] NIP [c0080000017daad4] bpf_prog_a2eb9104e5e8a5bf+0x34/0x5560
[  121.762770] LR [c000000000c26ea8] bpf_test_run+0x288/0x470
[  121.762777] --- interrupt: 900
[  121.762779] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable)
[  121.762786] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470
[  121.762793] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660
[  121.762800] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30
[  121.762807] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230
[  121.762814] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c
[  148.073962] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [test_verifier:950]
[  148.073990] Modules linked in: snd_seq_dummy(E) snd_hrtimer(E) snd_seq(E) snd_seq_device(E) snd_timer(E) snd(E) soundcore(E) bonding(E) tls(E) rfkill(E) pseries_rng(E) drm(E) fuse(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) t10_pi(E) ibmvscsi(E) ibmveth(E) scsi_transport_srp(E) vmx_crypto(E) dm_mirror(E) dm_region_hash(E) dm_log(E) dm_mod(E)
[  148.074029] CPU: 4 PID: 950 Comm: test_verifier Tainted: G            EL    5.11.0-rc4+ #4
[  148.074036] NIP:  c0080000017daad8 LR: c000000000c26ea8 CTR: c0080000017daaa0
[  148.074041] REGS: c000000011407870 TRAP: 0900   Tainted: G            EL     (5.11.0-rc4+)
[  148.074047] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 44222840  XER: 20040169
[  148.074060] CFAR: c0080000017daad8 IRQMASK: 0 
[  148.074060] GPR00: c000000000c26d1c c000000011407b10 c000000002011600 0000000000000001 
[  148.074060] GPR04: c0080000016d0038 ffffffffffffffff 0000000001f3f8ec 0017b1edefe5ad37 
[  148.074060] GPR08: 0000000000000000 c000000011407b38 0000000000000001 f5d4bac969e94f08 
[  148.074060] GPR12: c0080000017daaa0 c00000001ecab400 0000000000000000 0000000000000000 
[  148.074060] GPR16: 000000001003a348 000000001003a3e8 0000000000000000 0000000000000000 
[  148.074060] GPR20: c000000011407c64 c0080000016d0038 c000000011407c60 c000000001734238 
[  148.074060] GPR24: c000000012ce8400 0000000000000001 c000000001734230 c0000000113e9300 
[  148.074060] GPR28: 0000000e5b65166b 0000000000000001 c0080000016d0000 c000000011407b40 
[  148.074122] NIP [c0080000017daad8] bpf_prog_a2eb9104e5e8a5bf+0x38/0x5560
[  148.074129] LR [c000000000c26ea8] bpf_test_run+0x288/0x470
[  148.074137] Call Trace:
[  148.074139] [c000000011407b10] [c000000011407b80] 0xc000000011407b80 (unreliable)
[  148.074147] [c000000011407b80] [c000000000c26d1c] bpf_test_run+0xfc/0x470
[  148.074154] [c000000011407c40] [c000000000c2823c] bpf_prog_test_run_skb+0x38c/0x660
[  148.074160] [c000000011407ce0] [c00000000035896c] __do_sys_bpf+0x4cc/0xf30
[  148.074168] [c000000011407db0] [c000000000037a14] system_call_exception+0x134/0x230
[  148.074175] [c000000011407e10] [c00000000000c874] system_call_vectored_common+0xf4/0x26c
[  148.074183] Instruction dump:
[  148.074188] f821ff91 fbe10068 3be10030 39000000 f91ffff8 38600001 393ffff8 7d4048a8 
[  148.074199] 7d4a1a14 7d4049ad 4082fff4 28230000 <4082fffc> 60000000 ebe10068 38210070 

jirka


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-06-29 16:04       ` Jiri Olsa
@ 2021-06-29 16:25         ` Brendan Jackman
  2021-06-29 16:41           ` Jiri Olsa
  0 siblings, 1 reply; 16+ messages in thread
From: Brendan Jackman @ 2021-06-29 16:25 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao,
	Sandipan Das

On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote:
> > > >
> > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
[snip]
> > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> > > unsupported\n' in dmesg instead. I wonder if I missed something in
> > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
>
> I see that for all the other atomics tests:
>
> [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21
> #21/p BPF_ATOMIC_AND without fetch FAIL
> Failed to load prog 'Unknown error 524'!
> verification time 32 usec
> stack depth 8
> processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> Summary: 0 PASSED, 0 SKIPPED, 2 FAILED

Hm that's also not good - failure to JIT shouldn't mean failure to
load. Are there other test_verifier failures or is it just the atomics
ones?

> console:
>
>         [   51.850952] eBPF filter atomic op code db (@2) unsupported
>         [   51.851134] eBPF filter atomic op code db (@2) unsupported
>
>
> [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22
> #22/u BPF_ATOMIC_AND with fetch FAIL
> Failed to load prog 'Unknown error 524'!
> verification time 38 usec
> stack depth 8
> processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> #22/p BPF_ATOMIC_AND with fetch FAIL
> Failed to load prog 'Unknown error 524'!
> verification time 26 usec
> stack depth 8
> processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
>
> console:
>         [  223.231420] eBPF filter atomic op code db (@3) unsupported
>         [  223.231596] eBPF filter atomic op code db (@3) unsupported
>
> ...
>
>
> but no such console output for:
>
> [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24
> #24/u BPF_ATOMIC bounds propagation, mem->reg OK
>
>
> > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > >
> >
> > hum, I guess not.. will check
>
> nope, it locks up the same:

Do you mean it locks up at commit 91c960b0056 too?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-06-29 16:25         ` Brendan Jackman
@ 2021-06-29 16:41           ` Jiri Olsa
  2021-06-29 21:08             ` Jiri Olsa
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Olsa @ 2021-06-29 16:41 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao,
	Sandipan Das

On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote:
> > > > >
> > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
> [snip]
> > > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> > > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> > > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> > > > unsupported\n' in dmesg instead. I wonder if I missed something in
> > > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
> >
> > I see that for all the other atomics tests:
> >
> > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21
> > #21/p BPF_ATOMIC_AND without fetch FAIL
> > Failed to load prog 'Unknown error 524'!
> > verification time 32 usec
> > stack depth 8
> > processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
> 
> Hm that's also not good - failure to JIT shouldn't mean failure to
> load. Are there other test_verifier failures or is it just the atomics
> ones?

I have CONFIG_BPF_JIT_ALWAYS_ON=y so I think that's fine

> 
> > console:
> >
> >         [   51.850952] eBPF filter atomic op code db (@2) unsupported
> >         [   51.851134] eBPF filter atomic op code db (@2) unsupported
> >
> >
> > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22
> > #22/u BPF_ATOMIC_AND with fetch FAIL
> > Failed to load prog 'Unknown error 524'!
> > verification time 38 usec
> > stack depth 8
> > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > #22/p BPF_ATOMIC_AND with fetch FAIL
> > Failed to load prog 'Unknown error 524'!
> > verification time 26 usec
> > stack depth 8
> > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> >
> > console:
> >         [  223.231420] eBPF filter atomic op code db (@3) unsupported
> >         [  223.231596] eBPF filter atomic op code db (@3) unsupported
> >
> > ...
> >
> >
> > but no such console output for:
> >
> > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24
> > #24/u BPF_ATOMIC bounds propagation, mem->reg OK
> >
> >
> > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > >
> > >
> > > hum, I guess not.. will check
> >
> > nope, it locks up the same:
> 
> Do you mean it locks up at commit 91c960b0056 too?
> 

I tried this one:
  37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH

I will check also 91c960b0056, but I think it's the new test issue

jirka


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-06-29 16:41           ` Jiri Olsa
@ 2021-06-29 21:08             ` Jiri Olsa
  2021-06-30 10:34               ` Brendan Jackman
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Olsa @ 2021-06-29 21:08 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao,
	Sandipan Das

On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > > > > On Sun, 27 Jun 2021 at 17:34, Jiri Olsa <jolsa@redhat.com> wrote:
> > > > > >
> > > > > > On Tue, Feb 02, 2021 at 01:50:02PM +0000, Brendan Jackman wrote:
> > [snip]
> > > > > Hmm, is the test prog from atomic_bounds.c getting JITed there (my
> > > > > dumb guess at what '0xc0000000119efb30 (unreliable)' means)? That
> > > > > shouldn't happen - should get 'eBPF filter atomic op code %02x (@%d)
> > > > > unsupported\n' in dmesg instead. I wonder if I missed something in
> > > > > commit 91c960b0056 (bpf: Rename BPF_XADD and prepare to encode other
> > >
> > > I see that for all the other atomics tests:
> > >
> > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 21
> > > #21/p BPF_ATOMIC_AND without fetch FAIL
> > > Failed to load prog 'Unknown error 524'!
> > > verification time 32 usec
> > > stack depth 8
> > > processed 10 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > > Summary: 0 PASSED, 0 SKIPPED, 2 FAILED
> > 
> > Hm that's also not good - failure to JIT shouldn't mean failure to
> > load. Are there other test_verifier failures or is it just the atomics
> > ones?
> 
> I have CONFIG_BPF_JIT_ALWAYS_ON=y so I think that's fine
> 
> > 
> > > console:
> > >
> > >         [   51.850952] eBPF filter atomic op code db (@2) unsupported
> > >         [   51.851134] eBPF filter atomic op code db (@2) unsupported
> > >
> > >
> > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 22
> > > #22/u BPF_ATOMIC_AND with fetch FAIL
> > > Failed to load prog 'Unknown error 524'!
> > > verification time 38 usec
> > > stack depth 8
> > > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > > #22/p BPF_ATOMIC_AND with fetch FAIL
> > > Failed to load prog 'Unknown error 524'!
> > > verification time 26 usec
> > > stack depth 8
> > > processed 14 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
> > >
> > > console:
> > >         [  223.231420] eBPF filter atomic op code db (@3) unsupported
> > >         [  223.231596] eBPF filter atomic op code db (@3) unsupported
> > >
> > > ...
> > >
> > >
> > > but no such console output for:
> > >
> > > [root@ibm-p9z-07-lp1 bpf]# ./test_verifier 24
> > > #24/u BPF_ATOMIC bounds propagation, mem->reg OK
> > >
> > >
> > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > >
> > > >
> > > > hum, I guess not.. will check
> > >
> > > nope, it locks up the same:
> > 
> > Do you mean it locks up at commit 91c960b0056 too?
> > 
> 
> I tried this one:
>   37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> 
> I will check also 91c960b0056, but I think it's the new test issue

for i91c960b0056 in HEAD I'm getting just 2 fails:

	#1097/p xadd/w check whether src/dst got mangled, 1 FAIL
	Failed to load prog 'Unknown error 524'!
	verification time 25 usec
	stack depth 8
	processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0
	#1098/p xadd/w check whether src/dst got mangled, 2 FAIL
	Failed to load prog 'Unknown error 524'!
	verification time 30 usec
	stack depth 8
	processed 12 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 0

with console output:

	[  289.499341] eBPF filter atomic op code db (@4) unsupported
	[  289.499510] eBPF filter atomic op code c3 (@4) unsupported

no lock up

jirka


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-06-29 21:08             ` Jiri Olsa
@ 2021-06-30 10:34               ` Brendan Jackman
       [not found]                 ` <YNxmwZGtnqiXGnF0@krava>
  0 siblings, 1 reply; 16+ messages in thread
From: Brendan Jackman @ 2021-06-30 10:34 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, LKML, Naveen N. Rao,
	Sandipan Das

On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote:
> > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:

> > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > >
> > > > >
> > > > > hum, I guess not.. will check
> > > >
> > > > nope, it locks up the same:
> > >
> > > Do you mean it locks up at commit 91c960b0056 too?

Sorry I was being stupid here - the test didn't exist at this commit

> > I tried this one:
> >   37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> >
> > I will check also 91c960b0056, but I think it's the new test issue

So yeah hard to say whether this was broken on PowerPC all along. How
hard is it for me to get set up to reproduce the failure? Is there a
rootfs I can download, and some instructions for running a PowerPC
QEMU VM? If so if you can also share your config and I'll take a look.

If it's not as simple as that, I'll stare at the code for a while and
see if anything jumps out.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
       [not found]                 ` <YNxmwZGtnqiXGnF0@krava>
@ 2021-07-01  8:18                   ` Brendan Jackman
  2021-07-01 10:16                     ` Jiri Olsa
  2021-07-01 11:02                     ` Naveen N. Rao
  0 siblings, 2 replies; 16+ messages in thread
From: Brendan Jackman @ 2021-07-01  8:18 UTC (permalink / raw)
  To: Jiri Olsa, Sandipan Das, Naveen N. Rao
  Cc: bpf, Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	KP Singh, Florent Revest, John Fastabend, LKML

On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
> > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote:
> > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> >
> > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > > > >
> > > > > > >
> > > > > > > hum, I guess not.. will check
> > > > > >
> > > > > > nope, it locks up the same:
> > > > >
> > > > > Do you mean it locks up at commit 91c960b0056 too?
> >
> > Sorry I was being stupid here - the test didn't exist at this commit
> >
> > > > I tried this one:
> > > >   37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> > > >
> > > > I will check also 91c960b0056, but I think it's the new test issue
> >
> > So yeah hard to say whether this was broken on PowerPC all along. How
> > hard is it for me to get set up to reproduce the failure? Is there a
> > rootfs I can download, and some instructions for running a PowerPC
> > QEMU VM? If so if you can also share your config and I'll take a look.
> >
> > If it's not as simple as that, I'll stare at the code for a while and
> > see if anything jumps out.
> >
>
> I have latest fedora ppc server and compile/install latest bpf-next tree
> I think it will be reproduced also on vm, I attached my config

OK, getting set up to boot a PowerPC QEMU isn't practical here unless
someone's got commands I can copy-paste (suspect it will need .config
hacking too). Looks like you need to build a proper bootloader, and
boot an installer disk.

Looked at the code for a bit but nothing jumped out. It seems like the
verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't
detect an infinite loop, but then we lose the BPF_FETCH flag somewhere
between do_check in verifier.c and bpf_jit_build_body in
bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter
atomic op code %02x (@%d) unsupported", and would also explain the
lockup because a normal atomic add without fetch would leave BPF R1
unchanged.

We should be able to confirm that theory by disassembling the JITted
code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to
2... at least for PowerPC 32-bit... maybe you could paste those lines
into the 64-bit version too? Here's some notes I made for
disassembling the hexdump on x86, I guess you'd just need to change
the objdump flags:

-- 

- Enable console JIT output:
```shell
echo 2 > /proc/sys/net/core/bpf_jit_enable
```
- Load & run the program of interest.
- Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a
short program looks like. This includes a line of context - don't paste the
`flen=` line.
```
[ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390
from=test_verifier pid=258
[ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00
[ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48
[ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02
[ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa
```
- This incantation will split out and decode the hex, then disassemble the
result:
```shell
cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b
binary -m i386:x86-64 /tmp/obj
```

--

Sandipan, Naveen, do you know of anything in the PowerPC code that
might be leading us to drop the BPF_FETCH flag from the atomic
instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-07-01  8:18                   ` Brendan Jackman
@ 2021-07-01 10:16                     ` Jiri Olsa
  2021-07-01 11:02                     ` Naveen N. Rao
  1 sibling, 0 replies; 16+ messages in thread
From: Jiri Olsa @ 2021-07-01 10:16 UTC (permalink / raw)
  To: Brendan Jackman
  Cc: Sandipan Das, Naveen N. Rao, bpf, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, KP Singh, Florent Revest,
	John Fastabend, LKML

On Thu, Jul 01, 2021 at 10:18:39AM +0200, Brendan Jackman wrote:
> On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
> > > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote:
> > > >
> > > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote:
> > > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > >
> > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > > > > >
> > > > > > > >
> > > > > > > > hum, I guess not.. will check
> > > > > > >
> > > > > > > nope, it locks up the same:
> > > > > >
> > > > > > Do you mean it locks up at commit 91c960b0056 too?
> > >
> > > Sorry I was being stupid here - the test didn't exist at this commit
> > >
> > > > > I tried this one:
> > > > >   37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> > > > >
> > > > > I will check also 91c960b0056, but I think it's the new test issue
> > >
> > > So yeah hard to say whether this was broken on PowerPC all along. How
> > > hard is it for me to get set up to reproduce the failure? Is there a
> > > rootfs I can download, and some instructions for running a PowerPC
> > > QEMU VM? If so if you can also share your config and I'll take a look.
> > >
> > > If it's not as simple as that, I'll stare at the code for a while and
> > > see if anything jumps out.
> > >
> >
> > I have latest fedora ppc server and compile/install latest bpf-next tree
> > I think it will be reproduced also on vm, I attached my config
> 
> OK, getting set up to boot a PowerPC QEMU isn't practical here unless
> someone's got commands I can copy-paste (suspect it will need .config
> hacking too). Looks like you need to build a proper bootloader, and
> boot an installer disk.
> 
> Looked at the code for a bit but nothing jumped out. It seems like the
> verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't
> detect an infinite loop, but then we lose the BPF_FETCH flag somewhere
> between do_check in verifier.c and bpf_jit_build_body in
> bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter
> atomic op code %02x (@%d) unsupported", and would also explain the
> lockup because a normal atomic add without fetch would leave BPF R1
> unchanged.
> 
> We should be able to confirm that theory by disassembling the JITted
> code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to
> 2... at least for PowerPC 32-bit... maybe you could paste those lines
> into the 64-bit version too? Here's some notes I made for
> disassembling the hexdump on x86, I guess you'd just need to change
> the objdump flags:
> 
> -- 
> 
> - Enable console JIT output:
> ```shell
> echo 2 > /proc/sys/net/core/bpf_jit_enable
> ```
> - Load & run the program of interest.
> - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a
> short program looks like. This includes a line of context - don't paste the
> `flen=` line.
> ```
> [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390
> from=test_verifier pid=258
> [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00
> [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48
> [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02
> [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa
> ```
> - This incantation will split out and decode the hex, then disassemble the
> result:
> ```shell
> cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b
> binary -m i386:x86-64 /tmp/obj
> ```

that's where I decided to write to list and ask for help before
googling ppc assembly ;-)

I changed the test_verifier to stop before executing the test
so I can dump the program via bpftool:

	[root@ibm-p9z-07-lp1 bpf-next]# bpftool prog dump xlated id 48
	   0: (b7) r0 = 0
	   1: (7b) *(u64 *)(r10 -8) = r0
	   2: (b7) r1 = 1
	   3: (db) r1 = atomic64_fetch_add((u64 *)(r10 -8), r1)
	   4: (55) if r1 != 0x0 goto pc-1
	   5: (95) exit

	[root@ibm-p9z-07-lp1 bpf-next]# bpftool prog dump jited id 48
	bpf_prog_a2eb9104e5e8a5bf:
	   0:   nop
	   4:   nop
	   8:   stdu    r1,-112(r1)
	   c:   std     r31,104(r1)
	  10:   addi    r31,r1,48
	  14:   li      r8,0
	  18:   std     r8,-8(r31)
	  1c:   li      r3,1
	  20:   addi    r9,r31,-8
	  24:   ldarx   r10,0,r9
	  28:   add     r10,r10,r3
	  2c:   stdcx.  r10,0,r9
	  30:   bne     0x0000000000000024
	  34:   cmpldi  r3,0
	  38:   bne     0x0000000000000034
	  3c:   nop
	  40:   ld      r31,104(r1)
	  44:   addi    r1,r1,112
	  48:   mr      r3,r8
	  4c:   blr

I wanted to also do it through bpf_jit_enable and bpf_jit_dump, but I need to check
the setup, because I can't set bpf_jit_enable to 2 at the moment.. might take some time

	[root@ibm-p9z-07-lp1 bpf-next]# echo 2 > /proc/sys/net/core/bpf_jit_enable 
	-bash: echo: write error: Invalid argument

jirka

> 
> --
> 
> Sandipan, Naveen, do you know of anything in the PowerPC code that
> might be leading us to drop the BPF_FETCH flag from the atomic
> instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c?
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-07-01  8:18                   ` Brendan Jackman
  2021-07-01 10:16                     ` Jiri Olsa
@ 2021-07-01 11:02                     ` Naveen N. Rao
  2021-07-01 14:15                       ` Jiri Olsa
  1 sibling, 1 reply; 16+ messages in thread
From: Naveen N. Rao @ 2021-07-01 11:02 UTC (permalink / raw)
  To: Brendan Jackman, Jiri Olsa, Sandipan Das
  Cc: Andrii Nakryiko, Alexei Starovoitov, bpf, Daniel Borkmann,
	John Fastabend, KP Singh, LKML, Florent Revest

Hi Brendan, Hi Jiri,


Brendan Jackman wrote:
> On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <jolsa@redhat.com> wrote:
>>
>> On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
>> > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote:
>> > >
>> > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
>> > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
>> > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote:
>> > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
>> > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
>> >
>> > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
>> > > > > > > >
>> > > > > > >
>> > > > > > > hum, I guess not.. will check
>> > > > > >
>> > > > > > nope, it locks up the same:
>> > > > >
>> > > > > Do you mean it locks up at commit 91c960b0056 too?
>> >
>> > Sorry I was being stupid here - the test didn't exist at this commit
>> >
>> > > > I tried this one:
>> > > >   37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
>> > > >
>> > > > I will check also 91c960b0056, but I think it's the new test issue
>> >
>> > So yeah hard to say whether this was broken on PowerPC all along. How
>> > hard is it for me to get set up to reproduce the failure? Is there a
>> > rootfs I can download, and some instructions for running a PowerPC
>> > QEMU VM? If so if you can also share your config and I'll take a look.
>> >
>> > If it's not as simple as that, I'll stare at the code for a while and
>> > see if anything jumps out.
>> >
>>
>> I have latest fedora ppc server and compile/install latest bpf-next tree
>> I think it will be reproduced also on vm, I attached my config
> 
> OK, getting set up to boot a PowerPC QEMU isn't practical here unless
> someone's got commands I can copy-paste (suspect it will need .config
> hacking too). Looks like you need to build a proper bootloader, and
> boot an installer disk.

There are some notes put up here, though we can do better:
https://github.com/linuxppc/wiki/wiki/Booting-with-Qemu

If you are familiar with ubuntu/fedora cloud images (and cloud-init), 
you should be able to grab one of the ppc64le images and boot it in 
qemu:
https://cloud-images.ubuntu.com/releases/hirsute/release/
https://alt.fedoraproject.org/alt/

> 
> Looked at the code for a bit but nothing jumped out. It seems like the
> verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't
> detect an infinite loop, but then we lose the BPF_FETCH flag somewhere
> between do_check in verifier.c and bpf_jit_build_body in
> bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter
> atomic op code %02x (@%d) unsupported", and would also explain the
> lockup because a normal atomic add without fetch would leave BPF R1
> unchanged.
> 
> We should be able to confirm that theory by disassembling the JITted
> code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to
> 2... at least for PowerPC 32-bit... maybe you could paste those lines
> into the 64-bit version too? Here's some notes I made for
> disassembling the hexdump on x86, I guess you'd just need to change
> the objdump flags:
> 
> -- 
> 
> - Enable console JIT output:
> ```shell
> echo 2 > /proc/sys/net/core/bpf_jit_enable
> ```
> - Load & run the program of interest.
> - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a
> short program looks like. This includes a line of context - don't paste the
> `flen=` line.
> ```
> [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390
> from=test_verifier pid=258
> [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00
> [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48
> [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02
> [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa
> ```
> - This incantation will split out and decode the hex, then disassemble the
> result:
> ```shell
> cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b
> binary -m i386:x86-64 /tmp/obj
> ```
> 
> --
> 
> Sandipan, Naveen, do you know of anything in the PowerPC code that
> might be leading us to drop the BPF_FETCH flag from the atomic
> instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c?

Yes, I think I just found the issue. We aren't looking at the correct 
BPF instruction when checking the IMM value.


--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -673,7 +673,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
                 * BPF_STX ATOMIC (atomic ops)
                 */
                case BPF_STX | BPF_ATOMIC | BPF_W:
-                       if (insn->imm != BPF_ADD) {
+                       if (insn[i].imm != BPF_ADD) {
                                pr_err_ratelimited(
                                        "eBPF filter atomic op code %02x (@%d) unsupported\n",
                                        code, i);
@@ -695,7 +695,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
                        PPC_BCC_SHORT(COND_NE, tmp_idx);
                        break;
                case BPF_STX | BPF_ATOMIC | BPF_DW:
-                       if (insn->imm != BPF_ADD) {
+                       if (insn[i].imm != BPF_ADD) {
                                pr_err_ratelimited(
                                        "eBPF filter atomic op code %02x (@%d) unsupported\n",
                                        code, i);



Thanks,
Naveen


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-07-01 11:02                     ` Naveen N. Rao
@ 2021-07-01 14:15                       ` Jiri Olsa
  2021-07-02  9:44                         ` Brendan Jackman
  0 siblings, 1 reply; 16+ messages in thread
From: Jiri Olsa @ 2021-07-01 14:15 UTC (permalink / raw)
  To: Naveen N. Rao
  Cc: Brendan Jackman, Sandipan Das, Andrii Nakryiko,
	Alexei Starovoitov, bpf, Daniel Borkmann, John Fastabend,
	KP Singh, LKML, Florent Revest

On Thu, Jul 01, 2021 at 04:32:03PM +0530, Naveen N. Rao wrote:
> Hi Brendan, Hi Jiri,
> 
> 
> Brendan Jackman wrote:
> > On Wed, 30 Jun 2021 at 14:42, Jiri Olsa <jolsa@redhat.com> wrote:
> > > 
> > > On Wed, Jun 30, 2021 at 12:34:58PM +0200, Brendan Jackman wrote:
> > > > On Tue, 29 Jun 2021 at 23:09, Jiri Olsa <jolsa@redhat.com> wrote:
> > > > >
> > > > > On Tue, Jun 29, 2021 at 06:41:24PM +0200, Jiri Olsa wrote:
> > > > > > On Tue, Jun 29, 2021 at 06:25:33PM +0200, Brendan Jackman wrote:
> > > > > > > On Tue, 29 Jun 2021 at 18:04, Jiri Olsa <jolsa@redhat.com> wrote:
> > > > > > > > On Tue, Jun 29, 2021 at 04:10:12PM +0200, Jiri Olsa wrote:
> > > > > > > > > On Mon, Jun 28, 2021 at 11:21:42AM +0200, Brendan Jackman wrote:
> > > >
> > > > > > > > > > atomics in .imm). Any idea if this test was ever passing on PowerPC?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > hum, I guess not.. will check
> > > > > > > >
> > > > > > > > nope, it locks up the same:
> > > > > > >
> > > > > > > Do you mean it locks up at commit 91c960b0056 too?
> > > >
> > > > Sorry I was being stupid here - the test didn't exist at this commit
> > > >
> > > > > > I tried this one:
> > > > > >   37086bfdc737 bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
> > > > > >
> > > > > > I will check also 91c960b0056, but I think it's the new test issue
> > > >
> > > > So yeah hard to say whether this was broken on PowerPC all along. How
> > > > hard is it for me to get set up to reproduce the failure? Is there a
> > > > rootfs I can download, and some instructions for running a PowerPC
> > > > QEMU VM? If so if you can also share your config and I'll take a look.
> > > >
> > > > If it's not as simple as that, I'll stare at the code for a while and
> > > > see if anything jumps out.
> > > >
> > > 
> > > I have latest fedora ppc server and compile/install latest bpf-next tree
> > > I think it will be reproduced also on vm, I attached my config
> > 
> > OK, getting set up to boot a PowerPC QEMU isn't practical here unless
> > someone's got commands I can copy-paste (suspect it will need .config
> > hacking too). Looks like you need to build a proper bootloader, and
> > boot an installer disk.
> 
> There are some notes put up here, though we can do better:
> https://github.com/linuxppc/wiki/wiki/Booting-with-Qemu
> 
> If you are familiar with ubuntu/fedora cloud images (and cloud-init), you
> should be able to grab one of the ppc64le images and boot it in qemu:
> https://cloud-images.ubuntu.com/releases/hirsute/release/
> https://alt.fedoraproject.org/alt/
> 
> > 
> > Looked at the code for a bit but nothing jumped out. It seems like the
> > verifier is seeing a BPF_ADD | BPF_FETCH, which means it doesn't
> > detect an infinite loop, but then we lose the BPF_FETCH flag somewhere
> > between do_check in verifier.c and bpf_jit_build_body in
> > bpf_jit_comp64.c. That would explain why we don't get the "eBPF filter
> > atomic op code %02x (@%d) unsupported", and would also explain the
> > lockup because a normal atomic add without fetch would leave BPF R1
> > unchanged.
> > 
> > We should be able to confirm that theory by disassembling the JITted
> > code that gets hexdumped by bpf_jit_dump when bpf_jit_enable is set to
> > 2... at least for PowerPC 32-bit... maybe you could paste those lines
> > into the 64-bit version too? Here's some notes I made for
> > disassembling the hexdump on x86, I guess you'd just need to change
> > the objdump flags:
> > 
> > -- 
> > 
> > - Enable console JIT output:
> > ```shell
> > echo 2 > /proc/sys/net/core/bpf_jit_enable
> > ```
> > - Load & run the program of interest.
> > - Copy the hex code from the kernel console to `/tmp/jit.txt`. Here's what a
> > short program looks like. This includes a line of context - don't paste the
> > `flen=` line.
> > ```
> > [ 79.381020] flen=8 proglen=54 pass=4 image=000000001af6f390
> > from=test_verifier pid=258
> > [ 79.389568] JIT code: 00000000: 0f 1f 44 00 00 66 90 55 48 89 e5 48 81 ec 08 00
> > [ 79.397411] JIT code: 00000010: 00 00 48 c7 45 f8 64 00 00 00 bf 04 00 00 00 48
> > [ 79.405965] JIT code: 00000020: f7 df f0 48 29 7d f8 8b 45 f8 48 83 f8 60 74 02
> > [ 79.414719] JIT code: 00000030: c9 c3 31 c0 eb fa
> > ```
> > - This incantation will split out and decode the hex, then disassemble the
> > result:
> > ```shell
> > cat /tmp/jit.txt | cut -d: -f2- | xxd -r >/tmp/obj && objdump -D -b
> > binary -m i386:x86-64 /tmp/obj
> > ```
> > 
> > --
> > 
> > Sandipan, Naveen, do you know of anything in the PowerPC code that
> > might be leading us to drop the BPF_FETCH flag from the atomic
> > instruction in tools/testing/selftests/bpf/verifier/atomic_bounds.c?
> 
> Yes, I think I just found the issue. We aren't looking at the correct BPF
> instruction when checking the IMM value.

great, nice catch! :-) that fixes it for me.. 

Tested-by: Jiri Olsa <jolsa@redhat.com>

thanks,
jirka

> 
> 
> --- a/arch/powerpc/net/bpf_jit_comp64.c
> +++ b/arch/powerpc/net/bpf_jit_comp64.c
> @@ -673,7 +673,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
>                 * BPF_STX ATOMIC (atomic ops)
>                 */
>                case BPF_STX | BPF_ATOMIC | BPF_W:
> -                       if (insn->imm != BPF_ADD) {
> +                       if (insn[i].imm != BPF_ADD) {
>                                pr_err_ratelimited(
>                                        "eBPF filter atomic op code %02x (@%d) unsupported\n",
>                                        code, i);
> @@ -695,7 +695,7 @@ int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, struct codegen_context *
>                        PPC_BCC_SHORT(COND_NE, tmp_idx);
>                        break;
>                case BPF_STX | BPF_ATOMIC | BPF_DW:
> -                       if (insn->imm != BPF_ADD) {
> +                       if (insn[i].imm != BPF_ADD) {
>                                pr_err_ratelimited(
>                                        "eBPF filter atomic op code %02x (@%d) unsupported\n",
>                                        code, i);
> 
> 
> 
> Thanks,
> Naveen
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG soft lockup] Re: [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH
  2021-07-01 14:15                       ` Jiri Olsa
@ 2021-07-02  9:44                         ` Brendan Jackman
  0 siblings, 0 replies; 16+ messages in thread
From: Brendan Jackman @ 2021-07-02  9:44 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Naveen N. Rao, Sandipan Das, Andrii Nakryiko, Alexei Starovoitov,
	bpf, Daniel Borkmann, John Fastabend, KP Singh, LKML,
	Florent Revest

On Thu, 1 Jul 2021 at 16:15, Jiri Olsa <jolsa@redhat.com> wrote:
> On Thu, Jul 01, 2021 at 04:32:03PM +0530, Naveen N. Rao wrote:
> > Yes, I think I just found the issue. We aren't looking at the correct BPF
> > instruction when checking the IMM value.
>
> great, nice catch! :-) that fixes it for me..

Ahh, nice. +1 thanks for taking a look!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2021-07-02  9:44 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-02 13:50 [PATCH bpf-next v3] bpf: Propagate stack bounds to registers in atomics w/ BPF_FETCH Brendan Jackman
2021-02-03 17:07 ` Yonghong Song
2021-02-03 17:37   ` Alexei Starovoitov
2021-06-27 15:34 ` [BUG soft lockup] " Jiri Olsa
2021-06-28  9:21   ` Brendan Jackman
2021-06-29 14:10     ` Jiri Olsa
2021-06-29 16:04       ` Jiri Olsa
2021-06-29 16:25         ` Brendan Jackman
2021-06-29 16:41           ` Jiri Olsa
2021-06-29 21:08             ` Jiri Olsa
2021-06-30 10:34               ` Brendan Jackman
     [not found]                 ` <YNxmwZGtnqiXGnF0@krava>
2021-07-01  8:18                   ` Brendan Jackman
2021-07-01 10:16                     ` Jiri Olsa
2021-07-01 11:02                     ` Naveen N. Rao
2021-07-01 14:15                       ` Jiri Olsa
2021-07-02  9:44                         ` Brendan Jackman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).