bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next 0/1] arm64: Add BPF exception tables
@ 2020-07-28 15:21 Jean-Philippe Brucker
  2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker
  2021-06-09 12:04 ` [PATCH bpf-next 0/1] arm64: " Ravi Bangoria
  0 siblings, 2 replies; 19+ messages in thread
From: Jean-Philippe Brucker @ 2020-07-28 15:21 UTC (permalink / raw)
  To: linux-arm-kernel, bpf
  Cc: catalin.marinas, will, daniel, ast, zlim.lnx, kafai,
	songliubraving, yhs, andriin, john.fastabend, kpsingh,
	Jean-Philippe Brucker

The following patch adds support for BPF_PROBE_MEM on arm64. The
implementation is simple but I wanted to give a bit of background first.
If you're familiar with recent BPF development you can skip to the patch
(or fact-check the following blurb).

BPF programs used for tracing can inspect any of the traced function's
arguments and follow pointers in struct members. Traditionally the BPF
program would get a struct pt_regs as argument and cast the register
values to the appropriate struct pointer. The BPF verifier would mandate
that any memory access uses the bpf_probe_read() helper, to suppress
page faults (see samples/bpf/tracex1_kern.c).

With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF),
the verifier can now check the type of any access performed by a BPF
program. It rejects for example programs that cast to a different
structure and perform out-of-bounds accesses, or programs that attempt
to dereference something that isn't a pointer, or that hasn't gone
through a NULL check.

As this makes tracing programs safer, the verifier now allows loading
programs that access struct members without bpf_probe_read(). It is
however still possible to trigger page faults. For example in the
following example with which I've tested this patch, the verifier does
not mandate a NULL check for the second-level pointer:

/*
 * From tools/testing/selftests/bpf/progs/bpf_iter_task.c
 * dump_task() is called for each task.
 */
SEC("iter/task")
int dump_task(struct bpf_iter__task *ctx)
{
	struct seq_file *seq = ctx->meta->seq;
	struct task_struct *task = ctx->task;

	/* Program would be rejected without this check */
	if (task == NULL)
		return 0;

	/*
	 * However the verifier does not currently mandate
	 * checking task->mm, and the following faults for kernel
	 * threads.
	 */
	BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm);
	return 0;
}

Even if it checked this case, the verifier couldn't guarantee that all
accesses are safe since kernel structures could in theory contain
garbage or error pointers. So to allow fast access without
bpf_probe_read(), a JIT implementation must support BPF exception
tables. For each access to a BTF pointer, the JIT generates an entry
into an exception table appended to the BPF program. If the access
faults at runtime, the handler skips the faulting instruction. The
example above will display vm=0 for kernel threads.

See also
* The original implementation on x86
  https://lore.kernel.org/bpf/20191016032505.2089704-1-ast@kernel.org/
* The s390 implementation
  https://lore.kernel.org/bpf/20200715233301.933201-1-iii@linux.ibm.com/

Jean-Philippe Brucker (1):
  arm64: bpf: Add BPF exception tables

 arch/arm64/include/asm/extable.h |  3 ++
 arch/arm64/mm/extable.c          | 11 ++--
 arch/arm64/net/bpf_jit_comp.c    | 93 +++++++++++++++++++++++++++++---
 3 files changed, 98 insertions(+), 9 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-28 15:21 [PATCH bpf-next 0/1] arm64: Add BPF exception tables Jean-Philippe Brucker
@ 2020-07-28 15:21 ` Jean-Philippe Brucker
  2020-07-29 17:28   ` Song Liu
  2020-07-30 12:28   ` Qian Cai
  2021-06-09 12:04 ` [PATCH bpf-next 0/1] arm64: " Ravi Bangoria
  1 sibling, 2 replies; 19+ messages in thread
From: Jean-Philippe Brucker @ 2020-07-28 15:21 UTC (permalink / raw)
  To: linux-arm-kernel, bpf
  Cc: catalin.marinas, will, daniel, ast, zlim.lnx, kafai,
	songliubraving, yhs, andriin, john.fastabend, kpsingh,
	Jean-Philippe Brucker

When a tracing BPF program attempts to read memory without using the
bpf_probe_read() helper, the verifier marks the load instruction with
the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
this flag it falls back to the interpreter.

Add support for BPF_PROBE_MEM, by appending an exception table to the
BPF program. If the load instruction causes a data abort, the fixup
infrastructure finds the exception table and fixes up the fault, by
clearing the destination register and jumping over the faulting
instruction.

To keep the compact exception table entry format, inspect the pc in
fixup_exception(). A more generic solution would add a "handler" field
to the table entry, like on x86 and s390.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
Note: the extable is aligned on 32 bits. Given that extable entries have
32-bit members I figured we don't need to align it to 64 bits.
---
 arch/arm64/include/asm/extable.h |  3 ++
 arch/arm64/mm/extable.c          | 11 ++--
 arch/arm64/net/bpf_jit_comp.c    | 93 +++++++++++++++++++++++++++++---
 3 files changed, 98 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
index 56a4f68b262e..bcee40df1586 100644
--- a/arch/arm64/include/asm/extable.h
+++ b/arch/arm64/include/asm/extable.h
@@ -22,5 +22,8 @@ struct exception_table_entry
 
 #define ARCH_HAS_RELATIVE_EXTABLE
 
+int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
+			      struct pt_regs *regs);
+
 extern int fixup_exception(struct pt_regs *regs);
 #endif
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 81e694af5f8c..1f42991cacdd 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -11,8 +11,13 @@ int fixup_exception(struct pt_regs *regs)
 	const struct exception_table_entry *fixup;
 
 	fixup = search_exception_tables(instruction_pointer(regs));
-	if (fixup)
-		regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
+	if (!fixup)
+		return 0;
 
-	return fixup != NULL;
+	if (regs->pc >= BPF_JIT_REGION_START &&
+	    regs->pc < BPF_JIT_REGION_END)
+		return arm64_bpf_fixup_exception(fixup, regs);
+
+	regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
+	return 1;
 }
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index 3cb25b43b368..f8912e45be7a 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -7,6 +7,7 @@
 
 #define pr_fmt(fmt) "bpf_jit: " fmt
 
+#include <linux/bitfield.h>
 #include <linux/bpf.h>
 #include <linux/filter.h>
 #include <linux/printk.h>
@@ -56,6 +57,7 @@ struct jit_ctx {
 	int idx;
 	int epilogue_offset;
 	int *offset;
+	int exentry_idx;
 	__le32 *image;
 	u32 stack_size;
 };
@@ -351,6 +353,67 @@ static void build_epilogue(struct jit_ctx *ctx)
 	emit(A64_RET(A64_LR), ctx);
 }
 
+#define BPF_FIXUP_OFFSET_MASK	GENMASK(26, 0)
+#define BPF_FIXUP_REG_MASK	GENMASK(31, 27)
+
+int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
+			      struct pt_regs *regs)
+{
+	off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
+	int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
+
+	regs->regs[dst_reg] = 0;
+	regs->pc = (unsigned long)&ex->fixup - offset;
+	return 1;
+}
+
+/* For accesses to BTF pointers, add an entry to the exception table */
+static int add_exception_handler(const struct bpf_insn *insn,
+				 struct jit_ctx *ctx,
+				 int dst_reg)
+{
+	off_t offset;
+	unsigned long pc;
+	struct exception_table_entry *ex;
+
+	if (!ctx->image)
+		/* First pass */
+		return 0;
+
+	if (BPF_MODE(insn->code) != BPF_PROBE_MEM)
+		return 0;
+
+	if (!ctx->prog->aux->extable ||
+	    WARN_ON_ONCE(ctx->exentry_idx >= ctx->prog->aux->num_exentries))
+		return -EINVAL;
+
+	ex = &ctx->prog->aux->extable[ctx->exentry_idx];
+	pc = (unsigned long)&ctx->image[ctx->idx - 1];
+
+	offset = pc - (long)&ex->insn;
+	if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN))
+		return -ERANGE;
+	ex->insn = offset;
+
+	/*
+	 * Since the extable follows the program, the fixup offset is always
+	 * negative and limited to BPF_JIT_REGION_SIZE. Store a positive value
+	 * to keep things simple, and put the destination register in the upper
+	 * bits. We don't need to worry about buildtime or runtime sort
+	 * modifying the upper bits because the table is already sorted, and
+	 * isn't part of the main exception table.
+	 */
+	offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE);
+	if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, offset))
+		return -ERANGE;
+
+	ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, offset) |
+		    FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
+
+	ctx->exentry_idx++;
+	return 0;
+}
+
 /* JITs an eBPF instruction.
  * Returns:
  * 0  - successfully JITed an 8-byte eBPF instruction.
@@ -375,6 +438,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	u8 jmp_cond, reg;
 	s32 jmp_offset;
 	u32 a64_insn;
+	int ret;
 
 #define check_imm(bits, imm) do {				\
 	if ((((imm) > 0) && ((imm) >> (bits))) ||		\
@@ -694,7 +758,6 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 		const u8 r0 = bpf2a64[BPF_REG_0];
 		bool func_addr_fixed;
 		u64 func_addr;
-		int ret;
 
 		ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass,
 					    &func_addr, &func_addr_fixed);
@@ -738,6 +801,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 	case BPF_LDX | BPF_MEM | BPF_H:
 	case BPF_LDX | BPF_MEM | BPF_B:
 	case BPF_LDX | BPF_MEM | BPF_DW:
+	case BPF_LDX | BPF_PROBE_MEM | BPF_DW:
+	case BPF_LDX | BPF_PROBE_MEM | BPF_W:
+	case BPF_LDX | BPF_PROBE_MEM | BPF_H:
+	case BPF_LDX | BPF_PROBE_MEM | BPF_B:
 		emit_a64_mov_i(1, tmp, off, ctx);
 		switch (BPF_SIZE(code)) {
 		case BPF_W:
@@ -753,6 +820,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
 			emit(A64_LDR64(dst, src, tmp), ctx);
 			break;
 		}
+
+		ret = add_exception_handler(insn, ctx, dst);
+		if (ret)
+			return ret;
 		break;
 
 	/* ST: *(size *)(dst + off) = imm */
@@ -868,6 +939,9 @@ static int validate_code(struct jit_ctx *ctx)
 			return -1;
 	}
 
+	if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries))
+		return -1;
+
 	return 0;
 }
 
@@ -884,6 +958,7 @@ struct arm64_jit_data {
 
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 {
+	int image_size, prog_size, extable_size;
 	struct bpf_prog *tmp, *orig_prog = prog;
 	struct bpf_binary_header *header;
 	struct arm64_jit_data *jit_data;
@@ -891,7 +966,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	bool tmp_blinded = false;
 	bool extra_pass = false;
 	struct jit_ctx ctx;
-	int image_size;
 	u8 *image_ptr;
 
 	if (!prog->jit_requested)
@@ -922,7 +996,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 		image_ptr = jit_data->image;
 		header = jit_data->header;
 		extra_pass = true;
-		image_size = sizeof(u32) * ctx.idx;
+		prog_size = sizeof(u32) * ctx.idx;
 		goto skip_init_ctx;
 	}
 	memset(&ctx, 0, sizeof(ctx));
@@ -950,8 +1024,12 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	ctx.epilogue_offset = ctx.idx;
 	build_epilogue(&ctx);
 
+	extable_size = prog->aux->num_exentries *
+		sizeof(struct exception_table_entry);
+
 	/* Now we know the actual image size. */
-	image_size = sizeof(u32) * ctx.idx;
+	prog_size = sizeof(u32) * ctx.idx;
+	image_size = prog_size + extable_size;
 	header = bpf_jit_binary_alloc(image_size, &image_ptr,
 				      sizeof(u32), jit_fill_hole);
 	if (header == NULL) {
@@ -962,8 +1040,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	/* 2. Now, the actual pass. */
 
 	ctx.image = (__le32 *)image_ptr;
+	if (extable_size)
+		prog->aux->extable = (void *)image_ptr + prog_size;
 skip_init_ctx:
 	ctx.idx = 0;
+	ctx.exentry_idx = 0;
 
 	build_prologue(&ctx, was_classic);
 
@@ -984,7 +1065,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 
 	/* And we're done. */
 	if (bpf_jit_enable > 1)
-		bpf_jit_dump(prog->len, image_size, 2, ctx.image);
+		bpf_jit_dump(prog->len, prog_size, 2, ctx.image);
 
 	bpf_flush_icache(header, ctx.image + ctx.idx);
 
@@ -1005,7 +1086,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog)
 	}
 	prog->bpf_func = (void *)ctx.image;
 	prog->jited = 1;
-	prog->jited_len = image_size;
+	prog->jited_len = prog_size;
 
 	if (!prog->is_func || extra_pass) {
 		bpf_prog_fill_jited_linfo(prog, ctx.offset);
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker
@ 2020-07-29 17:28   ` Song Liu
  2020-07-29 21:29     ` Daniel Borkmann
  2020-07-30 12:28   ` Qian Cai
  1 sibling, 1 reply; 19+ messages in thread
From: Song Liu @ 2020-07-29 17:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, bpf, Catalin Marinas, Will Deacon,
	Daniel Borkmann, Alexei Starovoitov, zlim.lnx, Martin KaFai Lau,
	Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend,
	KP Singh

On Tue, Jul 28, 2020 at 8:37 AM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
>
> When a tracing BPF program attempts to read memory without using the
> bpf_probe_read() helper, the verifier marks the load instruction with
> the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
> this flag it falls back to the interpreter.
>
> Add support for BPF_PROBE_MEM, by appending an exception table to the
> BPF program. If the load instruction causes a data abort, the fixup
> infrastructure finds the exception table and fixes up the fault, by
> clearing the destination register and jumping over the faulting
> instruction.
>
> To keep the compact exception table entry format, inspect the pc in
> fixup_exception(). A more generic solution would add a "handler" field
> to the table entry, like on x86 and s390.
>
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

This patch looks good to me.

Acked-by: Song Liu <songliubraving@fb.com>

It is possible to add a selftest for this? I thought about this a
little bit, but
didn't get a good idea.

Thanks,
Song

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-29 17:28   ` Song Liu
@ 2020-07-29 21:29     ` Daniel Borkmann
  2020-07-30  8:28       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2020-07-29 21:29 UTC (permalink / raw)
  To: Song Liu, Jean-Philippe Brucker
  Cc: linux-arm-kernel, bpf, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, zlim.lnx, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh

On 7/29/20 7:28 PM, Song Liu wrote:
> On Tue, Jul 28, 2020 at 8:37 AM Jean-Philippe Brucker
> <jean-philippe@linaro.org> wrote:
>>
>> When a tracing BPF program attempts to read memory without using the
>> bpf_probe_read() helper, the verifier marks the load instruction with
>> the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
>> this flag it falls back to the interpreter.
>>
>> Add support for BPF_PROBE_MEM, by appending an exception table to the
>> BPF program. If the load instruction causes a data abort, the fixup
>> infrastructure finds the exception table and fixes up the fault, by
>> clearing the destination register and jumping over the faulting
>> instruction.
>>
>> To keep the compact exception table entry format, inspect the pc in
>> fixup_exception(). A more generic solution would add a "handler" field
>> to the table entry, like on x86 and s390.
>>
>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> 
> This patch looks good to me.
> 
> Acked-by: Song Liu <songliubraving@fb.com>

+1, applied, thanks a lot!

> It is possible to add a selftest for this? I thought about this a
> little bit, but
> didn't get a good idea.

Why not adding a test_verifier.c test case which calls into bpf_get_current_task()
to fetch pointer to current and then read out some field via BPF_PROBE_MEM which
should then succeed on x86/s390x/arm64 but be skipped on the other archs? Jean-Philippe,
could you look into following up with such test case(s)?

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-29 21:29     ` Daniel Borkmann
@ 2020-07-30  8:28       ` Jean-Philippe Brucker
  0 siblings, 0 replies; 19+ messages in thread
From: Jean-Philippe Brucker @ 2020-07-30  8:28 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Song Liu, linux-arm-kernel, bpf, Catalin Marinas, Will Deacon,
	Alexei Starovoitov, zlim.lnx, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh

On Wed, Jul 29, 2020 at 11:29:43PM +0200, Daniel Borkmann wrote:
> On 7/29/20 7:28 PM, Song Liu wrote:
> > On Tue, Jul 28, 2020 at 8:37 AM Jean-Philippe Brucker
> > <jean-philippe@linaro.org> wrote:
> > > 
> > > When a tracing BPF program attempts to read memory without using the
> > > bpf_probe_read() helper, the verifier marks the load instruction with
> > > the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
> > > this flag it falls back to the interpreter.
> > > 
> > > Add support for BPF_PROBE_MEM, by appending an exception table to the
> > > BPF program. If the load instruction causes a data abort, the fixup
> > > infrastructure finds the exception table and fixes up the fault, by
> > > clearing the destination register and jumping over the faulting
> > > instruction.
> > > 
> > > To keep the compact exception table entry format, inspect the pc in
> > > fixup_exception(). A more generic solution would add a "handler" field
> > > to the table entry, like on x86 and s390.
> > > 
> > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > 
> > This patch looks good to me.
> > 
> > Acked-by: Song Liu <songliubraving@fb.com>
> 
> +1, applied, thanks a lot!
> 
> > It is possible to add a selftest for this? I thought about this a
> > little bit, but
> > didn't get a good idea.
> 
> Why not adding a test_verifier.c test case which calls into bpf_get_current_task()
> to fetch pointer to current and then read out some field via BPF_PROBE_MEM which
> should then succeed on x86/s390x/arm64 but be skipped on the other archs? Jean-Philippe,
> could you look into following up with such test case(s)?

Sure I'll take a look. Ilya also added a selftests to trigger exceptions
in https://lore.kernel.org/bpf/20200715233301.933201-5-iii@linux.ibm.com/
It's useful but I think it relies on the verifier not mandating NULL
checks for next-level pointers (they are ptr_ instead of ptr_or_null_),
which might change in the future. So I'm wondering if we can deliberately
access an invalid pointer with the help of bpf_test_run, and check that
the result is zero. 

Thanks,
Jean

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker
  2020-07-29 17:28   ` Song Liu
@ 2020-07-30 12:28   ` Qian Cai
  2020-07-30 14:22     ` Jean-Philippe Brucker
  1 sibling, 1 reply; 19+ messages in thread
From: Qian Cai @ 2020-07-30 12:28 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-kernel, bpf, songliubraving, andriin, daniel,
	catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs,
	will, kafai, sfr, linux-next, linux-kernel

On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote:
> When a tracing BPF program attempts to read memory without using the
> bpf_probe_read() helper, the verifier marks the load instruction with
> the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
> this flag it falls back to the interpreter.
> 
> Add support for BPF_PROBE_MEM, by appending an exception table to the
> BPF program. If the load instruction causes a data abort, the fixup
> infrastructure finds the exception table and fixes up the fault, by
> clearing the destination register and jumping over the faulting
> instruction.
> 
> To keep the compact exception table entry format, inspect the pc in
> fixup_exception(). A more generic solution would add a "handler" field
> to the table entry, like on x86 and s390.
> 
> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

This will fail to compile on arm64,

https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config

arch/arm64/mm/extable.o: In function `fixup_exception':
arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception'

> ---
> Note: the extable is aligned on 32 bits. Given that extable entries have
> 32-bit members I figured we don't need to align it to 64 bits.
> ---
>  arch/arm64/include/asm/extable.h |  3 ++
>  arch/arm64/mm/extable.c          | 11 ++--
>  arch/arm64/net/bpf_jit_comp.c    | 93 +++++++++++++++++++++++++++++---
>  3 files changed, 98 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
> index 56a4f68b262e..bcee40df1586 100644
> --- a/arch/arm64/include/asm/extable.h
> +++ b/arch/arm64/include/asm/extable.h
> @@ -22,5 +22,8 @@ struct exception_table_entry
>  
>  #define ARCH_HAS_RELATIVE_EXTABLE
>  
> +int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
> +			      struct pt_regs *regs);
> +
>  extern int fixup_exception(struct pt_regs *regs);
>  #endif
> diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
> index 81e694af5f8c..1f42991cacdd 100644
> --- a/arch/arm64/mm/extable.c
> +++ b/arch/arm64/mm/extable.c
> @@ -11,8 +11,13 @@ int fixup_exception(struct pt_regs *regs)
>  	const struct exception_table_entry *fixup;
>  
>  	fixup = search_exception_tables(instruction_pointer(regs));
> -	if (fixup)
> -		regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
> +	if (!fixup)
> +		return 0;
>  
> -	return fixup != NULL;
> +	if (regs->pc >= BPF_JIT_REGION_START &&
> +	    regs->pc < BPF_JIT_REGION_END)
> +		return arm64_bpf_fixup_exception(fixup, regs);
> +
> +	regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
> +	return 1;
>  }
> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
> index 3cb25b43b368..f8912e45be7a 100644
> --- a/arch/arm64/net/bpf_jit_comp.c
> +++ b/arch/arm64/net/bpf_jit_comp.c
> @@ -7,6 +7,7 @@
>  
>  #define pr_fmt(fmt) "bpf_jit: " fmt
>  
> +#include <linux/bitfield.h>
>  #include <linux/bpf.h>
>  #include <linux/filter.h>
>  #include <linux/printk.h>
> @@ -56,6 +57,7 @@ struct jit_ctx {
>  	int idx;
>  	int epilogue_offset;
>  	int *offset;
> +	int exentry_idx;
>  	__le32 *image;
>  	u32 stack_size;
>  };
> @@ -351,6 +353,67 @@ static void build_epilogue(struct jit_ctx *ctx)
>  	emit(A64_RET(A64_LR), ctx);
>  }
>  
> +#define BPF_FIXUP_OFFSET_MASK	GENMASK(26, 0)
> +#define BPF_FIXUP_REG_MASK	GENMASK(31, 27)
> +
> +int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
> +			      struct pt_regs *regs)
> +{
> +	off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
> +	int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
> +
> +	regs->regs[dst_reg] = 0;
> +	regs->pc = (unsigned long)&ex->fixup - offset;
> +	return 1;
> +}
> +
[]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-30 12:28   ` Qian Cai
@ 2020-07-30 14:22     ` Jean-Philippe Brucker
  2020-07-30 19:47       ` Daniel Borkmann
  0 siblings, 1 reply; 19+ messages in thread
From: Jean-Philippe Brucker @ 2020-07-30 14:22 UTC (permalink / raw)
  To: Qian Cai
  Cc: linux-arm-kernel, bpf, songliubraving, andriin, daniel,
	catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs,
	will, kafai, sfr, linux-next, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On Thu, Jul 30, 2020 at 08:28:56AM -0400, Qian Cai wrote:
> On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote:
> > When a tracing BPF program attempts to read memory without using the
> > bpf_probe_read() helper, the verifier marks the load instruction with
> > the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
> > this flag it falls back to the interpreter.
> > 
> > Add support for BPF_PROBE_MEM, by appending an exception table to the
> > BPF program. If the load instruction causes a data abort, the fixup
> > infrastructure finds the exception table and fixes up the fault, by
> > clearing the destination register and jumping over the faulting
> > instruction.
> > 
> > To keep the compact exception table entry format, inspect the pc in
> > fixup_exception(). A more generic solution would add a "handler" field
> > to the table entry, like on x86 and s390.
> > 
> > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> 
> This will fail to compile on arm64,
> 
> https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
> 
> arch/arm64/mm/extable.o: In function `fixup_exception':
> arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception'

Thanks for the report, I attached a fix. Daniel, can I squash it and
resend as v2 or is it too late?

I'd be more confident if my patches sat a little longer on the list so
arm64 folks have a chance to review them. This isn't my first silly
mistake...

Thanks,
Jean

[-- Attachment #2: 0001-arm64-bpf-Fix-build-for-CONFIG_BPF_JIT.patch --]
[-- Type: text/plain, Size: 1688 bytes --]

From 17d0f041b57903cb2657dde15559cd1923498337 Mon Sep 17 00:00:00 2001
From: Jean-Philippe Brucker <jean-philippe@linaro.org>
Date: Thu, 30 Jul 2020 14:45:44 +0200
Subject: [PATCH] arm64: bpf: Fix build for !CONFIG_BPF_JIT

Add a stub for arm64_bpf_fixup_exception() when CONFIG_BPF_JIT isn't
enabled, and avoid the fixup in this case.

Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/include/asm/extable.h | 9 +++++++++
 arch/arm64/mm/extable.c          | 3 ++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h
index bcee40df1586..840a35ed92ec 100644
--- a/arch/arm64/include/asm/extable.h
+++ b/arch/arm64/include/asm/extable.h
@@ -22,8 +22,17 @@ struct exception_table_entry
 
 #define ARCH_HAS_RELATIVE_EXTABLE
 
+#ifdef CONFIG_BPF_JIT
 int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
 			      struct pt_regs *regs);
+#else /* !CONFIG_BPF_JIT */
+static inline
+int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
+			      struct pt_regs *regs)
+{
+	return 0;
+}
+#endif /* !CONFIG_BPF_JIT */
 
 extern int fixup_exception(struct pt_regs *regs);
 #endif
diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c
index 1f42991cacdd..eee1732ab6cd 100644
--- a/arch/arm64/mm/extable.c
+++ b/arch/arm64/mm/extable.c
@@ -14,7 +14,8 @@ int fixup_exception(struct pt_regs *regs)
 	if (!fixup)
 		return 0;
 
-	if (regs->pc >= BPF_JIT_REGION_START &&
+	if (IS_ENABLED(CONFIG_BPF_JIT) &&
+	    regs->pc >= BPF_JIT_REGION_START &&
 	    regs->pc < BPF_JIT_REGION_END)
 		return arm64_bpf_fixup_exception(fixup, regs);
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-30 14:22     ` Jean-Philippe Brucker
@ 2020-07-30 19:47       ` Daniel Borkmann
  2020-07-30 21:14         ` Jean-Philippe Brucker
  0 siblings, 1 reply; 19+ messages in thread
From: Daniel Borkmann @ 2020-07-30 19:47 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Qian Cai
  Cc: linux-arm-kernel, bpf, songliubraving, andriin, catalin.marinas,
	john.fastabend, ast, zlim.lnx, kpsingh, yhs, will, kafai, sfr,
	linux-next, linux-kernel

On 7/30/20 4:22 PM, Jean-Philippe Brucker wrote:
> On Thu, Jul 30, 2020 at 08:28:56AM -0400, Qian Cai wrote:
>> On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote:
>>> When a tracing BPF program attempts to read memory without using the
>>> bpf_probe_read() helper, the verifier marks the load instruction with
>>> the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
>>> this flag it falls back to the interpreter.
>>>
>>> Add support for BPF_PROBE_MEM, by appending an exception table to the
>>> BPF program. If the load instruction causes a data abort, the fixup
>>> infrastructure finds the exception table and fixes up the fault, by
>>> clearing the destination register and jumping over the faulting
>>> instruction.
>>>
>>> To keep the compact exception table entry format, inspect the pc in
>>> fixup_exception(). A more generic solution would add a "handler" field
>>> to the table entry, like on x86 and s390.
>>>
>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>
>> This will fail to compile on arm64,
>>
>> https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
>>
>> arch/arm64/mm/extable.o: In function `fixup_exception':
>> arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception'
> 
> Thanks for the report, I attached a fix. Daniel, can I squash it and
> resend as v2 or is it too late?

If you want I can squash your attached snippet into the original patch of
yours. If you want to send a v2 that is fine as well of course. Let me know.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-30 19:47       ` Daniel Borkmann
@ 2020-07-30 21:14         ` Jean-Philippe Brucker
  2020-07-30 22:45           ` Daniel Borkmann
  0 siblings, 1 reply; 19+ messages in thread
From: Jean-Philippe Brucker @ 2020-07-30 21:14 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Qian Cai, linux-arm-kernel, bpf, songliubraving, andriin,
	catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs,
	will, kafai, sfr, linux-next, linux-kernel

On Thu, Jul 30, 2020 at 09:47:39PM +0200, Daniel Borkmann wrote:
> On 7/30/20 4:22 PM, Jean-Philippe Brucker wrote:
> > On Thu, Jul 30, 2020 at 08:28:56AM -0400, Qian Cai wrote:
> > > On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote:
> > > > When a tracing BPF program attempts to read memory without using the
> > > > bpf_probe_read() helper, the verifier marks the load instruction with
> > > > the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
> > > > this flag it falls back to the interpreter.
> > > > 
> > > > Add support for BPF_PROBE_MEM, by appending an exception table to the
> > > > BPF program. If the load instruction causes a data abort, the fixup
> > > > infrastructure finds the exception table and fixes up the fault, by
> > > > clearing the destination register and jumping over the faulting
> > > > instruction.
> > > > 
> > > > To keep the compact exception table entry format, inspect the pc in
> > > > fixup_exception(). A more generic solution would add a "handler" field
> > > > to the table entry, like on x86 and s390.
> > > > 
> > > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > 
> > > This will fail to compile on arm64,
> > > 
> > > https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
> > > 
> > > arch/arm64/mm/extable.o: In function `fixup_exception':
> > > arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception'
> > 
> > Thanks for the report, I attached a fix. Daniel, can I squash it and
> > resend as v2 or is it too late?
> 
> If you want I can squash your attached snippet into the original patch of
> yours. If you want to send a v2 that is fine as well of course. Let me know.

Yes please squash it into the original patch, sorry for the mess

Thanks,
Jean

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables
  2020-07-30 21:14         ` Jean-Philippe Brucker
@ 2020-07-30 22:45           ` Daniel Borkmann
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Borkmann @ 2020-07-30 22:45 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Qian Cai, linux-arm-kernel, bpf, songliubraving, andriin,
	catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs,
	will, kafai, sfr, linux-next, linux-kernel

On 7/30/20 11:14 PM, Jean-Philippe Brucker wrote:
> On Thu, Jul 30, 2020 at 09:47:39PM +0200, Daniel Borkmann wrote:
>> On 7/30/20 4:22 PM, Jean-Philippe Brucker wrote:
>>> On Thu, Jul 30, 2020 at 08:28:56AM -0400, Qian Cai wrote:
>>>> On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote:
>>>>> When a tracing BPF program attempts to read memory without using the
>>>>> bpf_probe_read() helper, the verifier marks the load instruction with
>>>>> the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize
>>>>> this flag it falls back to the interpreter.
>>>>>
>>>>> Add support for BPF_PROBE_MEM, by appending an exception table to the
>>>>> BPF program. If the load instruction causes a data abort, the fixup
>>>>> infrastructure finds the exception table and fixes up the fault, by
>>>>> clearing the destination register and jumping over the faulting
>>>>> instruction.
>>>>>
>>>>> To keep the compact exception table entry format, inspect the pc in
>>>>> fixup_exception(). A more generic solution would add a "handler" field
>>>>> to the table entry, like on x86 and s390.
>>>>>
>>>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
>>>>
>>>> This will fail to compile on arm64,
>>>>
>>>> https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config
>>>>
>>>> arch/arm64/mm/extable.o: In function `fixup_exception':
>>>> arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception'
>>>
>>> Thanks for the report, I attached a fix. Daniel, can I squash it and
>>> resend as v2 or is it too late?
>>
>> If you want I can squash your attached snippet into the original patch of
>> yours. If you want to send a v2 that is fine as well of course. Let me know.
> 
> Yes please squash it into the original patch, sorry for the mess

Done, thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables
  2020-07-28 15:21 [PATCH bpf-next 0/1] arm64: Add BPF exception tables Jean-Philippe Brucker
  2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker
@ 2021-06-09 12:04 ` Ravi Bangoria
  2021-06-11  0:12   ` Alexei Starovoitov
  1 sibling, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2021-06-09 12:04 UTC (permalink / raw)
  To: daniel, ast
  Cc: catalin.marinas, will, zlim.lnx, kafai, songliubraving, yhs,
	andriin, john.fastabend, kpsingh, Jean-Philippe Brucker,
	linux-arm-kernel, bpf, Naveen N. Rao

Hi Alexei,

On 7/28/20 8:51 PM, Jean-Philippe Brucker wrote:
> The following patch adds support for BPF_PROBE_MEM on arm64. The
> implementation is simple but I wanted to give a bit of background first.
> If you're familiar with recent BPF development you can skip to the patch
> (or fact-check the following blurb).
> 
> BPF programs used for tracing can inspect any of the traced function's
> arguments and follow pointers in struct members. Traditionally the BPF
> program would get a struct pt_regs as argument and cast the register
> values to the appropriate struct pointer. The BPF verifier would mandate
> that any memory access uses the bpf_probe_read() helper, to suppress
> page faults (see samples/bpf/tracex1_kern.c).
> 
> With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF),
> the verifier can now check the type of any access performed by a BPF
> program. It rejects for example programs that cast to a different
> structure and perform out-of-bounds accesses, or programs that attempt
> to dereference something that isn't a pointer, or that hasn't gone
> through a NULL check.
> 
> As this makes tracing programs safer, the verifier now allows loading
> programs that access struct members without bpf_probe_read(). It is
> however still possible to trigger page faults. For example in the
> following example with which I've tested this patch, the verifier does
> not mandate a NULL check for the second-level pointer:
> 
> /*
>   * From tools/testing/selftests/bpf/progs/bpf_iter_task.c
>   * dump_task() is called for each task.
>   */
> SEC("iter/task")
> int dump_task(struct bpf_iter__task *ctx)
> {
> 	struct seq_file *seq = ctx->meta->seq;
> 	struct task_struct *task = ctx->task;
> 
> 	/* Program would be rejected without this check */
> 	if (task == NULL)
> 		return 0;
> 
> 	/*
> 	 * However the verifier does not currently mandate
> 	 * checking task->mm, and the following faults for kernel
> 	 * threads.
> 	 */
> 	BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm);
> 	return 0;
> }
> 
> Even if it checked this case, the verifier couldn't guarantee that all
> accesses are safe since kernel structures could in theory contain
> garbage or error pointers. So to allow fast access without
> bpf_probe_read(), a JIT implementation must support BPF exception
> tables. For each access to a BTF pointer, the JIT generates an entry
> into an exception table appended to the BPF program. If the access
> faults at runtime, the handler skips the faulting instruction. The
> example above will display vm=0 for kernel threads.

I'm trying with the example above (task->mm->total_vm) on x86 machine
with bpf/master (11fc79fc9f2e3) plus commit 4c5de127598e1 ("bpf: Emit
explicit NULL pointer checks for PROBE_LDX instructions.") *reverted*,
I'm seeing the app getting killed with error in dmesg.

   $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task
   $ sudo cat /sys/fs/bpf/task
   Killed

   $ dmesg
   [  188.810020] BUG: kernel NULL pointer dereference, address: 00000000000000c8
   [  188.810030] #PF: supervisor read access in kernel mode
   [  188.810034] #PF: error_code(0x0000) - not-present page

IIUC, this should be handled by bpf exception table rather than killing
the app. Am I missing anything?

Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables
  2021-06-09 12:04 ` [PATCH bpf-next 0/1] arm64: " Ravi Bangoria
@ 2021-06-11  0:12   ` Alexei Starovoitov
  2021-06-17  6:58     ` Ravi Bangoria
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2021-06-11  0:12 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Daniel Borkmann, Alexei Starovoitov, Catalin Marinas,
	Will Deacon, Zi Shen Lim, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao

On Wed, Jun 9, 2021 at 5:05 AM Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
>
> Hi Alexei,
>
> On 7/28/20 8:51 PM, Jean-Philippe Brucker wrote:
> > The following patch adds support for BPF_PROBE_MEM on arm64. The
> > implementation is simple but I wanted to give a bit of background first.
> > If you're familiar with recent BPF development you can skip to the patch
> > (or fact-check the following blurb).
> >
> > BPF programs used for tracing can inspect any of the traced function's
> > arguments and follow pointers in struct members. Traditionally the BPF
> > program would get a struct pt_regs as argument and cast the register
> > values to the appropriate struct pointer. The BPF verifier would mandate
> > that any memory access uses the bpf_probe_read() helper, to suppress
> > page faults (see samples/bpf/tracex1_kern.c).
> >
> > With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF),
> > the verifier can now check the type of any access performed by a BPF
> > program. It rejects for example programs that cast to a different
> > structure and perform out-of-bounds accesses, or programs that attempt
> > to dereference something that isn't a pointer, or that hasn't gone
> > through a NULL check.
> >
> > As this makes tracing programs safer, the verifier now allows loading
> > programs that access struct members without bpf_probe_read(). It is
> > however still possible to trigger page faults. For example in the
> > following example with which I've tested this patch, the verifier does
> > not mandate a NULL check for the second-level pointer:
> >
> > /*
> >   * From tools/testing/selftests/bpf/progs/bpf_iter_task.c
> >   * dump_task() is called for each task.
> >   */
> > SEC("iter/task")
> > int dump_task(struct bpf_iter__task *ctx)
> > {
> >       struct seq_file *seq = ctx->meta->seq;
> >       struct task_struct *task = ctx->task;
> >
> >       /* Program would be rejected without this check */
> >       if (task == NULL)
> >               return 0;
> >
> >       /*
> >        * However the verifier does not currently mandate
> >        * checking task->mm, and the following faults for kernel
> >        * threads.
> >        */
> >       BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm);
> >       return 0;
> > }
> >
> > Even if it checked this case, the verifier couldn't guarantee that all
> > accesses are safe since kernel structures could in theory contain
> > garbage or error pointers. So to allow fast access without
> > bpf_probe_read(), a JIT implementation must support BPF exception
> > tables. For each access to a BTF pointer, the JIT generates an entry
> > into an exception table appended to the BPF program. If the access
> > faults at runtime, the handler skips the faulting instruction. The
> > example above will display vm=0 for kernel threads.
>
> I'm trying with the example above (task->mm->total_vm) on x86 machine
> with bpf/master (11fc79fc9f2e3) plus commit 4c5de127598e1 ("bpf: Emit
> explicit NULL pointer checks for PROBE_LDX instructions.") *reverted*,
> I'm seeing the app getting killed with error in dmesg.
>
>    $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task
>    $ sudo cat /sys/fs/bpf/task
>    Killed
>
>    $ dmesg
>    [  188.810020] BUG: kernel NULL pointer dereference, address: 00000000000000c8
>    [  188.810030] #PF: supervisor read access in kernel mode
>    [  188.810034] #PF: error_code(0x0000) - not-present page
>
> IIUC, this should be handled by bpf exception table rather than killing
> the app. Am I missing anything?

For PROBE_LDX the verifier guarantees that the address is either
a very likely valid kernel address or NULL. On x86 the user and kernel
address spaces are shared and NULL is a user address, so there cannot be
an exception table for NULL. Hence x86-64 JIT inserts NULL check when
it converts PROBE_LDX into load insn.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables
  2021-06-11  0:12   ` Alexei Starovoitov
@ 2021-06-17  6:58     ` Ravi Bangoria
  2021-06-18 16:34       ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2021-06-17  6:58 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Alexei Starovoitov, Catalin Marinas,
	Will Deacon, Zi Shen Lim, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao,
	Ravi Bangoria

Hi Alexei,

Sorry to bother you again!

On 6/11/21 5:42 AM, Alexei Starovoitov wrote:
> On Wed, Jun 9, 2021 at 5:05 AM Ravi Bangoria
> <ravi.bangoria@linux.ibm.com> wrote:
>>
>> Hi Alexei,
>>
>> On 7/28/20 8:51 PM, Jean-Philippe Brucker wrote:
>>> The following patch adds support for BPF_PROBE_MEM on arm64. The
>>> implementation is simple but I wanted to give a bit of background first.
>>> If you're familiar with recent BPF development you can skip to the patch
>>> (or fact-check the following blurb).
>>>
>>> BPF programs used for tracing can inspect any of the traced function's
>>> arguments and follow pointers in struct members. Traditionally the BPF
>>> program would get a struct pt_regs as argument and cast the register
>>> values to the appropriate struct pointer. The BPF verifier would mandate
>>> that any memory access uses the bpf_probe_read() helper, to suppress
>>> page faults (see samples/bpf/tracex1_kern.c).
>>>
>>> With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF),
>>> the verifier can now check the type of any access performed by a BPF
>>> program. It rejects for example programs that cast to a different
>>> structure and perform out-of-bounds accesses, or programs that attempt
>>> to dereference something that isn't a pointer, or that hasn't gone
>>> through a NULL check.
>>>
>>> As this makes tracing programs safer, the verifier now allows loading
>>> programs that access struct members without bpf_probe_read(). It is
>>> however still possible to trigger page faults. For example in the
>>> following example with which I've tested this patch, the verifier does
>>> not mandate a NULL check for the second-level pointer:
>>>
>>> /*
>>>    * From tools/testing/selftests/bpf/progs/bpf_iter_task.c
>>>    * dump_task() is called for each task.
>>>    */
>>> SEC("iter/task")
>>> int dump_task(struct bpf_iter__task *ctx)
>>> {
>>>        struct seq_file *seq = ctx->meta->seq;
>>>        struct task_struct *task = ctx->task;
>>>
>>>        /* Program would be rejected without this check */
>>>        if (task == NULL)
>>>                return 0;
>>>
>>>        /*
>>>         * However the verifier does not currently mandate
>>>         * checking task->mm, and the following faults for kernel
>>>         * threads.
>>>         */
>>>        BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm);
>>>        return 0;
>>> }
>>>
>>> Even if it checked this case, the verifier couldn't guarantee that all
>>> accesses are safe since kernel structures could in theory contain
>>> garbage or error pointers. So to allow fast access without
>>> bpf_probe_read(), a JIT implementation must support BPF exception
>>> tables. For each access to a BTF pointer, the JIT generates an entry
>>> into an exception table appended to the BPF program. If the access
>>> faults at runtime, the handler skips the faulting instruction. The
>>> example above will display vm=0 for kernel threads.
>>
>> I'm trying with the example above (task->mm->total_vm) on x86 machine
>> with bpf/master (11fc79fc9f2e3) plus commit 4c5de127598e1 ("bpf: Emit
>> explicit NULL pointer checks for PROBE_LDX instructions.") *reverted*,
>> I'm seeing the app getting killed with error in dmesg.
>>
>>     $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task
>>     $ sudo cat /sys/fs/bpf/task
>>     Killed
>>
>>     $ dmesg
>>     [  188.810020] BUG: kernel NULL pointer dereference, address: 00000000000000c8
>>     [  188.810030] #PF: supervisor read access in kernel mode
>>     [  188.810034] #PF: error_code(0x0000) - not-present page
>>
>> IIUC, this should be handled by bpf exception table rather than killing
>> the app. Am I missing anything?
> 
> For PROBE_LDX the verifier guarantees that the address is either
> a very likely valid kernel address or NULL. On x86 the user and kernel
> address spaces are shared and NULL is a user address, so there cannot be
> an exception table for NULL. Hence x86-64 JIT inserts NULL check when
> it converts PROBE_LDX into load insn.

IIUC, there could be 3 types of addresses a BPF prog can have:

1. NULL ptr. Which is handled by adding additional check in BPF program.
    So this won't cause a fault.
2. Valid kernel address. This will never cause a fault.
3. Bad address. This is very unlikely but possible. IIUC, BPF extable
    is introduced to handle such scenarios. Unfortunately, with any type
    of bad address (user or kernel), extable on x86 never gets involved.
    Kernel always kills the application with error in dmesg.

Please let me know if I understood it incorrectly.

TLDR;
To check the case 3 further, I tried with bpf/master, this time without
reverting your patch. I added a dummy structure and a bad pointer to it
in task_struct.

   diff --git a/include/linux/sched.h b/include/linux/sched.h
   index d2c881384517..4698188bcf45 100644
   --- a/include/linux/sched.h
   +++ b/include/linux/sched.h
   @@ -646,6 +646,10 @@ struct kmap_ctrl {
    #endif
    };
   +struct dummy_task_ele {
   +       int dummy;
   +};
   +
    struct task_struct {
    #ifdef CONFIG_THREAD_INFO_IN_TASK
           /*
   @@ -771,6 +775,8 @@ struct task_struct {
           struct mm_struct                *mm;
           struct mm_struct                *active_mm;
   +       struct dummy_task_ele           *dte;
   +
           /* Per-thread vma caching: */
           struct vmacache                 vmacache;
   diff --git a/kernel/fork.c b/kernel/fork.c
   index dc06afd725cb..ed01f25edd8e 100644
   --- a/kernel/fork.c
   +++ b/kernel/fork.c
   @@ -2116,6 +2116,9 @@ static __latent_entropy struct task_struct *copy_process(
           retval = copy_mm(clone_flags, p);
           if (retval)
                   goto bad_fork_cleanup_signal;
   +
   +       p->dte = (void *)0xd12345;
   +
           retval = copy_namespaces(clone_flags, p);
           if (retval)
                   goto bad_fork_cleanup_mm;

And with below change in testcase:

   diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
   index b7f32c160f4e..391c1b3da638 100644
   --- a/tools/testing/selftests/bpf/progs/bpf_iter_task.c
   +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task.c
   @@ -21,6 +21,6 @@ int dump_task(struct bpf_iter__task *ctx)
           if (ctx->meta->seq_num == 0)
                   BPF_SEQ_PRINTF(seq, "    tgid      gid\n");
   -       BPF_SEQ_PRINTF(seq, "%8d %8d\n", task->tgid, task->pid);
   +       BPF_SEQ_PRINTF(seq, "%8d %8d %d\n", task->tgid, task->pid, task->dte->dummy);
           return 0;
    }

I see the same issue:

   $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task
   $ sudo cat /sys/fs/bpf/task
   Killed

   $ dmesg
   [  166.864325] BUG: unable to handle page fault for address: 0000000000d12345
   [  166.864336] #PF: supervisor read access in kernel mode
   [  166.864338] #PF: error_code(0x0000) - not-present page

0xd12345 is unallocated userspace address. Similarly, I also tried with
p->dte = (void *)0xffffffffc1234567 after confirming it's not allocated
to kernel or any module address. I see the same failure with it too.

Though, the same test with bpf_probe_read(task->dte->dummy) works fine.

Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables
  2021-06-17  6:58     ` Ravi Bangoria
@ 2021-06-18 16:34       ` Alexei Starovoitov
  2021-06-22  7:10         ` Ravi Bangoria
  2021-06-22 11:00         ` [PATCH] x86 bpf: Fix extable offset calculation Ravi Bangoria
  0 siblings, 2 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2021-06-18 16:34 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Daniel Borkmann, Alexei Starovoitov, Catalin Marinas,
	Will Deacon, Zi Shen Lim, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao

On Wed, Jun 16, 2021 at 11:58 PM Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
>
>    $ dmesg
>    [  166.864325] BUG: unable to handle page fault for address: 0000000000d12345
>    [  166.864336] #PF: supervisor read access in kernel mode
>    [  166.864338] #PF: error_code(0x0000) - not-present page
>
> 0xd12345 is unallocated userspace address. Similarly, I also tried with

that's unfortunately expected, since this is a user address.

> p->dte = (void *)0xffffffffc1234567 after confirming it's not allocated
> to kernel or any module address. I see the same failure with it too.

This one is surprising though. Sounds like a bug in exception table
construction. Can you debug it to see what's causing it?
First check that do_kern_addr_fault() is invoked in this case.
And then fixup_exception() and why search_bpf_extables()
cannot find it.
Separately we probably need to replace the NULL check
with addr >= TASK_SIZE_MAX to close this issue though it's a bit artificial.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables
  2021-06-18 16:34       ` Alexei Starovoitov
@ 2021-06-22  7:10         ` Ravi Bangoria
  2021-06-22 11:00         ` [PATCH] x86 bpf: Fix extable offset calculation Ravi Bangoria
  1 sibling, 0 replies; 19+ messages in thread
From: Ravi Bangoria @ 2021-06-22  7:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Daniel Borkmann, Alexei Starovoitov, Catalin Marinas,
	Will Deacon, Zi Shen Lim, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh,
	Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao,
	Ravi Bangoria

Hi Alexei,

On 6/18/21 10:04 PM, Alexei Starovoitov wrote:
> On Wed, Jun 16, 2021 at 11:58 PM Ravi Bangoria
> <ravi.bangoria@linux.ibm.com> wrote:
>>
>>     $ dmesg
>>     [  166.864325] BUG: unable to handle page fault for address: 0000000000d12345
>>     [  166.864336] #PF: supervisor read access in kernel mode
>>     [  166.864338] #PF: error_code(0x0000) - not-present page
>>
>> 0xd12345 is unallocated userspace address. Similarly, I also tried with
> 
> that's unfortunately expected, since this is a user address.

Sure. fwiw, it works with bpf_probe_read().

>> p->dte = (void *)0xffffffffc1234567 after confirming it's not allocated
>> to kernel or any module address. I see the same failure with it too.
> 
> This one is surprising though. Sounds like a bug in exception table
> construction. Can you debug it to see what's causing it?
> First check that do_kern_addr_fault() is invoked in this case.
> And then fixup_exception() and why search_bpf_extables()
> cannot find it.

It seems the commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks
for PROBE_LDX instructions.") added few instructions before actual load
but does not consider those additional instruction while calculating
extable offset. Let me prepare a fix.

> Separately we probably need to replace the NULL check
> with addr >= TASK_SIZE_MAX to close this issue though it's a bit artificial.

Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] x86 bpf: Fix extable offset calculation
  2021-06-18 16:34       ` Alexei Starovoitov
  2021-06-22  7:10         ` Ravi Bangoria
@ 2021-06-22 11:00         ` Ravi Bangoria
  2021-06-25  4:01           ` Alexei Starovoitov
  1 sibling, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2021-06-22 11:00 UTC (permalink / raw)
  To: ast
  Cc: ravi.bangoria, davem, yoshfuji, dsahern, daniel, tglx, mingo, bp,
	x86, hpa, andrii, kafai, songliubraving, yhs, john.fastabend,
	kpsingh, netdev, bpf, linux-kernel, naveen.n.rao

commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks
for PROBE_LDX instructions.") is emitting couple of instructions
before actual load. Consider those additional instructions while
calculating extable offset.

Fixes: 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/x86/net/bpf_jit_comp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 2a2e290fa5d8..231a8178cc11 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1297,7 +1297,7 @@ st:			if (is_imm8(insn->off))
 			emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
 			if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
 				struct exception_table_entry *ex;
-				u8 *_insn = image + proglen;
+				u8 *_insn = image + proglen + (u8)(start_of_ldx - temp);
 				s64 delta;
 
 				/* populate jmp_offset for JMP above */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86 bpf: Fix extable offset calculation
  2021-06-22 11:00         ` [PATCH] x86 bpf: Fix extable offset calculation Ravi Bangoria
@ 2021-06-25  4:01           ` Alexei Starovoitov
  2021-06-25  6:22             ` Ravi Bangoria
  0 siblings, 1 reply; 19+ messages in thread
From: Alexei Starovoitov @ 2021-06-25  4:01 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Alexei Starovoitov, David S. Miller, Hideaki YOSHIFUJI,
	David Ahern, Daniel Borkmann, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Network Development, bpf, LKML, Naveen N. Rao

On Tue, Jun 22, 2021 at 4:01 AM Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
>
> commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks
> for PROBE_LDX instructions.") is emitting couple of instructions
> before actual load. Consider those additional instructions while
> calculating extable offset.
>
> Fixes: 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
> ---
>  arch/x86/net/bpf_jit_comp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 2a2e290fa5d8..231a8178cc11 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1297,7 +1297,7 @@ st:                       if (is_imm8(insn->off))
>                         emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
>                         if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
>                                 struct exception_table_entry *ex;
> -                               u8 *_insn = image + proglen;
> +                               u8 *_insn = image + proglen + (u8)(start_of_ldx - temp);

Great debugging and the fix. Thanks a lot.
I've dropped (u8) cast, kept (), and applied to bpf tree.
I think it looks cleaner without that cast.
Could you send a followup patch with a selftest, so I don't make
the same mistake again ? ;)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86 bpf: Fix extable offset calculation
  2021-06-25  4:01           ` Alexei Starovoitov
@ 2021-06-25  6:22             ` Ravi Bangoria
  2021-06-25 15:07               ` Alexei Starovoitov
  0 siblings, 1 reply; 19+ messages in thread
From: Ravi Bangoria @ 2021-06-25  6:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, David S. Miller, Hideaki YOSHIFUJI,
	David Ahern, Daniel Borkmann, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Network Development, bpf, LKML, Naveen N. Rao,
	Ravi Bangoria



On 6/25/21 9:31 AM, Alexei Starovoitov wrote:
> On Tue, Jun 22, 2021 at 4:01 AM Ravi Bangoria
> <ravi.bangoria@linux.ibm.com> wrote:
>>
>> commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks
>> for PROBE_LDX instructions.") is emitting couple of instructions
>> before actual load. Consider those additional instructions while
>> calculating extable offset.
>>
>> Fixes: 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.")
>> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
>> ---
>>   arch/x86/net/bpf_jit_comp.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
>> index 2a2e290fa5d8..231a8178cc11 100644
>> --- a/arch/x86/net/bpf_jit_comp.c
>> +++ b/arch/x86/net/bpf_jit_comp.c
>> @@ -1297,7 +1297,7 @@ st:                       if (is_imm8(insn->off))
>>                          emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off);
>>                          if (BPF_MODE(insn->code) == BPF_PROBE_MEM) {
>>                                  struct exception_table_entry *ex;
>> -                               u8 *_insn = image + proglen;
>> +                               u8 *_insn = image + proglen + (u8)(start_of_ldx - temp);
> 
> Great debugging and the fix. Thanks a lot.
> I've dropped (u8) cast, kept (), and applied to bpf tree.
> I think it looks cleaner without that cast.

Thanks.

> Could you send a followup patch with a selftest, so I don't make
> the same mistake again ? ;)

Unfortunately extable gets involved only for bad kernel pointers and
ideally there should not be any bad pointer in kernel. So there is no
easy way to create a proper selftest for this.

Ravi

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] x86 bpf: Fix extable offset calculation
  2021-06-25  6:22             ` Ravi Bangoria
@ 2021-06-25 15:07               ` Alexei Starovoitov
  0 siblings, 0 replies; 19+ messages in thread
From: Alexei Starovoitov @ 2021-06-25 15:07 UTC (permalink / raw)
  To: Ravi Bangoria
  Cc: Alexei Starovoitov, David S. Miller, Hideaki YOSHIFUJI,
	David Ahern, Daniel Borkmann, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, X86 ML, H. Peter Anvin, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh, Network Development, bpf, LKML, Naveen N. Rao

On Thu, Jun 24, 2021 at 11:22 PM Ravi Bangoria
<ravi.bangoria@linux.ibm.com> wrote:
> > Could you send a followup patch with a selftest, so I don't make
> > the same mistake again ? ;)
>
> Unfortunately extable gets involved only for bad kernel pointers and
> ideally there should not be any bad pointer in kernel. So there is no
> easy way to create a proper selftest for this.

Right. We have lib/test_bpf.c kernel module for such cases.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-06-25 15:07 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-28 15:21 [PATCH bpf-next 0/1] arm64: Add BPF exception tables Jean-Philippe Brucker
2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker
2020-07-29 17:28   ` Song Liu
2020-07-29 21:29     ` Daniel Borkmann
2020-07-30  8:28       ` Jean-Philippe Brucker
2020-07-30 12:28   ` Qian Cai
2020-07-30 14:22     ` Jean-Philippe Brucker
2020-07-30 19:47       ` Daniel Borkmann
2020-07-30 21:14         ` Jean-Philippe Brucker
2020-07-30 22:45           ` Daniel Borkmann
2021-06-09 12:04 ` [PATCH bpf-next 0/1] arm64: " Ravi Bangoria
2021-06-11  0:12   ` Alexei Starovoitov
2021-06-17  6:58     ` Ravi Bangoria
2021-06-18 16:34       ` Alexei Starovoitov
2021-06-22  7:10         ` Ravi Bangoria
2021-06-22 11:00         ` [PATCH] x86 bpf: Fix extable offset calculation Ravi Bangoria
2021-06-25  4:01           ` Alexei Starovoitov
2021-06-25  6:22             ` Ravi Bangoria
2021-06-25 15:07               ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).