* [PATCH bpf-next 0/1] arm64: Add BPF exception tables @ 2020-07-28 15:21 Jean-Philippe Brucker 2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker 2021-06-09 12:04 ` [PATCH bpf-next 0/1] arm64: " Ravi Bangoria 0 siblings, 2 replies; 19+ messages in thread From: Jean-Philippe Brucker @ 2020-07-28 15:21 UTC (permalink / raw) To: linux-arm-kernel, bpf Cc: catalin.marinas, will, daniel, ast, zlim.lnx, kafai, songliubraving, yhs, andriin, john.fastabend, kpsingh, Jean-Philippe Brucker The following patch adds support for BPF_PROBE_MEM on arm64. The implementation is simple but I wanted to give a bit of background first. If you're familiar with recent BPF development you can skip to the patch (or fact-check the following blurb). BPF programs used for tracing can inspect any of the traced function's arguments and follow pointers in struct members. Traditionally the BPF program would get a struct pt_regs as argument and cast the register values to the appropriate struct pointer. The BPF verifier would mandate that any memory access uses the bpf_probe_read() helper, to suppress page faults (see samples/bpf/tracex1_kern.c). With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF), the verifier can now check the type of any access performed by a BPF program. It rejects for example programs that cast to a different structure and perform out-of-bounds accesses, or programs that attempt to dereference something that isn't a pointer, or that hasn't gone through a NULL check. As this makes tracing programs safer, the verifier now allows loading programs that access struct members without bpf_probe_read(). It is however still possible to trigger page faults. For example in the following example with which I've tested this patch, the verifier does not mandate a NULL check for the second-level pointer: /* * From tools/testing/selftests/bpf/progs/bpf_iter_task.c * dump_task() is called for each task. */ SEC("iter/task") int dump_task(struct bpf_iter__task *ctx) { struct seq_file *seq = ctx->meta->seq; struct task_struct *task = ctx->task; /* Program would be rejected without this check */ if (task == NULL) return 0; /* * However the verifier does not currently mandate * checking task->mm, and the following faults for kernel * threads. */ BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm); return 0; } Even if it checked this case, the verifier couldn't guarantee that all accesses are safe since kernel structures could in theory contain garbage or error pointers. So to allow fast access without bpf_probe_read(), a JIT implementation must support BPF exception tables. For each access to a BTF pointer, the JIT generates an entry into an exception table appended to the BPF program. If the access faults at runtime, the handler skips the faulting instruction. The example above will display vm=0 for kernel threads. See also * The original implementation on x86 https://lore.kernel.org/bpf/20191016032505.2089704-1-ast@kernel.org/ * The s390 implementation https://lore.kernel.org/bpf/20200715233301.933201-1-iii@linux.ibm.com/ Jean-Philippe Brucker (1): arm64: bpf: Add BPF exception tables arch/arm64/include/asm/extable.h | 3 ++ arch/arm64/mm/extable.c | 11 ++-- arch/arm64/net/bpf_jit_comp.c | 93 +++++++++++++++++++++++++++++--- 3 files changed, 98 insertions(+), 9 deletions(-) -- 2.27.0 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-28 15:21 [PATCH bpf-next 0/1] arm64: Add BPF exception tables Jean-Philippe Brucker @ 2020-07-28 15:21 ` Jean-Philippe Brucker 2020-07-29 17:28 ` Song Liu 2020-07-30 12:28 ` Qian Cai 2021-06-09 12:04 ` [PATCH bpf-next 0/1] arm64: " Ravi Bangoria 1 sibling, 2 replies; 19+ messages in thread From: Jean-Philippe Brucker @ 2020-07-28 15:21 UTC (permalink / raw) To: linux-arm-kernel, bpf Cc: catalin.marinas, will, daniel, ast, zlim.lnx, kafai, songliubraving, yhs, andriin, john.fastabend, kpsingh, Jean-Philippe Brucker When a tracing BPF program attempts to read memory without using the bpf_probe_read() helper, the verifier marks the load instruction with the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize this flag it falls back to the interpreter. Add support for BPF_PROBE_MEM, by appending an exception table to the BPF program. If the load instruction causes a data abort, the fixup infrastructure finds the exception table and fixes up the fault, by clearing the destination register and jumping over the faulting instruction. To keep the compact exception table entry format, inspect the pc in fixup_exception(). A more generic solution would add a "handler" field to the table entry, like on x86 and s390. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> --- Note: the extable is aligned on 32 bits. Given that extable entries have 32-bit members I figured we don't need to align it to 64 bits. --- arch/arm64/include/asm/extable.h | 3 ++ arch/arm64/mm/extable.c | 11 ++-- arch/arm64/net/bpf_jit_comp.c | 93 +++++++++++++++++++++++++++++--- 3 files changed, 98 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h index 56a4f68b262e..bcee40df1586 100644 --- a/arch/arm64/include/asm/extable.h +++ b/arch/arm64/include/asm/extable.h @@ -22,5 +22,8 @@ struct exception_table_entry #define ARCH_HAS_RELATIVE_EXTABLE +int arm64_bpf_fixup_exception(const struct exception_table_entry *ex, + struct pt_regs *regs); + extern int fixup_exception(struct pt_regs *regs); #endif diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c index 81e694af5f8c..1f42991cacdd 100644 --- a/arch/arm64/mm/extable.c +++ b/arch/arm64/mm/extable.c @@ -11,8 +11,13 @@ int fixup_exception(struct pt_regs *regs) const struct exception_table_entry *fixup; fixup = search_exception_tables(instruction_pointer(regs)); - if (fixup) - regs->pc = (unsigned long)&fixup->fixup + fixup->fixup; + if (!fixup) + return 0; - return fixup != NULL; + if (regs->pc >= BPF_JIT_REGION_START && + regs->pc < BPF_JIT_REGION_END) + return arm64_bpf_fixup_exception(fixup, regs); + + regs->pc = (unsigned long)&fixup->fixup + fixup->fixup; + return 1; } diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index 3cb25b43b368..f8912e45be7a 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -7,6 +7,7 @@ #define pr_fmt(fmt) "bpf_jit: " fmt +#include <linux/bitfield.h> #include <linux/bpf.h> #include <linux/filter.h> #include <linux/printk.h> @@ -56,6 +57,7 @@ struct jit_ctx { int idx; int epilogue_offset; int *offset; + int exentry_idx; __le32 *image; u32 stack_size; }; @@ -351,6 +353,67 @@ static void build_epilogue(struct jit_ctx *ctx) emit(A64_RET(A64_LR), ctx); } +#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0) +#define BPF_FIXUP_REG_MASK GENMASK(31, 27) + +int arm64_bpf_fixup_exception(const struct exception_table_entry *ex, + struct pt_regs *regs) +{ + off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup); + int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup); + + regs->regs[dst_reg] = 0; + regs->pc = (unsigned long)&ex->fixup - offset; + return 1; +} + +/* For accesses to BTF pointers, add an entry to the exception table */ +static int add_exception_handler(const struct bpf_insn *insn, + struct jit_ctx *ctx, + int dst_reg) +{ + off_t offset; + unsigned long pc; + struct exception_table_entry *ex; + + if (!ctx->image) + /* First pass */ + return 0; + + if (BPF_MODE(insn->code) != BPF_PROBE_MEM) + return 0; + + if (!ctx->prog->aux->extable || + WARN_ON_ONCE(ctx->exentry_idx >= ctx->prog->aux->num_exentries)) + return -EINVAL; + + ex = &ctx->prog->aux->extable[ctx->exentry_idx]; + pc = (unsigned long)&ctx->image[ctx->idx - 1]; + + offset = pc - (long)&ex->insn; + if (WARN_ON_ONCE(offset >= 0 || offset < INT_MIN)) + return -ERANGE; + ex->insn = offset; + + /* + * Since the extable follows the program, the fixup offset is always + * negative and limited to BPF_JIT_REGION_SIZE. Store a positive value + * to keep things simple, and put the destination register in the upper + * bits. We don't need to worry about buildtime or runtime sort + * modifying the upper bits because the table is already sorted, and + * isn't part of the main exception table. + */ + offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE); + if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, offset)) + return -ERANGE; + + ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, offset) | + FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg); + + ctx->exentry_idx++; + return 0; +} + /* JITs an eBPF instruction. * Returns: * 0 - successfully JITed an 8-byte eBPF instruction. @@ -375,6 +438,7 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, u8 jmp_cond, reg; s32 jmp_offset; u32 a64_insn; + int ret; #define check_imm(bits, imm) do { \ if ((((imm) > 0) && ((imm) >> (bits))) || \ @@ -694,7 +758,6 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, const u8 r0 = bpf2a64[BPF_REG_0]; bool func_addr_fixed; u64 func_addr; - int ret; ret = bpf_jit_get_func_addr(ctx->prog, insn, extra_pass, &func_addr, &func_addr_fixed); @@ -738,6 +801,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, case BPF_LDX | BPF_MEM | BPF_H: case BPF_LDX | BPF_MEM | BPF_B: case BPF_LDX | BPF_MEM | BPF_DW: + case BPF_LDX | BPF_PROBE_MEM | BPF_DW: + case BPF_LDX | BPF_PROBE_MEM | BPF_W: + case BPF_LDX | BPF_PROBE_MEM | BPF_H: + case BPF_LDX | BPF_PROBE_MEM | BPF_B: emit_a64_mov_i(1, tmp, off, ctx); switch (BPF_SIZE(code)) { case BPF_W: @@ -753,6 +820,10 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx, emit(A64_LDR64(dst, src, tmp), ctx); break; } + + ret = add_exception_handler(insn, ctx, dst); + if (ret) + return ret; break; /* ST: *(size *)(dst + off) = imm */ @@ -868,6 +939,9 @@ static int validate_code(struct jit_ctx *ctx) return -1; } + if (WARN_ON_ONCE(ctx->exentry_idx != ctx->prog->aux->num_exentries)) + return -1; + return 0; } @@ -884,6 +958,7 @@ struct arm64_jit_data { struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) { + int image_size, prog_size, extable_size; struct bpf_prog *tmp, *orig_prog = prog; struct bpf_binary_header *header; struct arm64_jit_data *jit_data; @@ -891,7 +966,6 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) bool tmp_blinded = false; bool extra_pass = false; struct jit_ctx ctx; - int image_size; u8 *image_ptr; if (!prog->jit_requested) @@ -922,7 +996,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) image_ptr = jit_data->image; header = jit_data->header; extra_pass = true; - image_size = sizeof(u32) * ctx.idx; + prog_size = sizeof(u32) * ctx.idx; goto skip_init_ctx; } memset(&ctx, 0, sizeof(ctx)); @@ -950,8 +1024,12 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) ctx.epilogue_offset = ctx.idx; build_epilogue(&ctx); + extable_size = prog->aux->num_exentries * + sizeof(struct exception_table_entry); + /* Now we know the actual image size. */ - image_size = sizeof(u32) * ctx.idx; + prog_size = sizeof(u32) * ctx.idx; + image_size = prog_size + extable_size; header = bpf_jit_binary_alloc(image_size, &image_ptr, sizeof(u32), jit_fill_hole); if (header == NULL) { @@ -962,8 +1040,11 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) /* 2. Now, the actual pass. */ ctx.image = (__le32 *)image_ptr; + if (extable_size) + prog->aux->extable = (void *)image_ptr + prog_size; skip_init_ctx: ctx.idx = 0; + ctx.exentry_idx = 0; build_prologue(&ctx, was_classic); @@ -984,7 +1065,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) /* And we're done. */ if (bpf_jit_enable > 1) - bpf_jit_dump(prog->len, image_size, 2, ctx.image); + bpf_jit_dump(prog->len, prog_size, 2, ctx.image); bpf_flush_icache(header, ctx.image + ctx.idx); @@ -1005,7 +1086,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog) } prog->bpf_func = (void *)ctx.image; prog->jited = 1; - prog->jited_len = image_size; + prog->jited_len = prog_size; if (!prog->is_func || extra_pass) { bpf_prog_fill_jited_linfo(prog, ctx.offset); -- 2.27.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker @ 2020-07-29 17:28 ` Song Liu 2020-07-29 21:29 ` Daniel Borkmann 2020-07-30 12:28 ` Qian Cai 1 sibling, 1 reply; 19+ messages in thread From: Song Liu @ 2020-07-29 17:28 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: linux-arm-kernel, bpf, Catalin Marinas, Will Deacon, Daniel Borkmann, Alexei Starovoitov, zlim.lnx, Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh On Tue, Jul 28, 2020 at 8:37 AM Jean-Philippe Brucker <jean-philippe@linaro.org> wrote: > > When a tracing BPF program attempts to read memory without using the > bpf_probe_read() helper, the verifier marks the load instruction with > the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize > this flag it falls back to the interpreter. > > Add support for BPF_PROBE_MEM, by appending an exception table to the > BPF program. If the load instruction causes a data abort, the fixup > infrastructure finds the exception table and fixes up the fault, by > clearing the destination register and jumping over the faulting > instruction. > > To keep the compact exception table entry format, inspect the pc in > fixup_exception(). A more generic solution would add a "handler" field > to the table entry, like on x86 and s390. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> This patch looks good to me. Acked-by: Song Liu <songliubraving@fb.com> It is possible to add a selftest for this? I thought about this a little bit, but didn't get a good idea. Thanks, Song ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-29 17:28 ` Song Liu @ 2020-07-29 21:29 ` Daniel Borkmann 2020-07-30 8:28 ` Jean-Philippe Brucker 0 siblings, 1 reply; 19+ messages in thread From: Daniel Borkmann @ 2020-07-29 21:29 UTC (permalink / raw) To: Song Liu, Jean-Philippe Brucker Cc: linux-arm-kernel, bpf, Catalin Marinas, Will Deacon, Alexei Starovoitov, zlim.lnx, Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh On 7/29/20 7:28 PM, Song Liu wrote: > On Tue, Jul 28, 2020 at 8:37 AM Jean-Philippe Brucker > <jean-philippe@linaro.org> wrote: >> >> When a tracing BPF program attempts to read memory without using the >> bpf_probe_read() helper, the verifier marks the load instruction with >> the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize >> this flag it falls back to the interpreter. >> >> Add support for BPF_PROBE_MEM, by appending an exception table to the >> BPF program. If the load instruction causes a data abort, the fixup >> infrastructure finds the exception table and fixes up the fault, by >> clearing the destination register and jumping over the faulting >> instruction. >> >> To keep the compact exception table entry format, inspect the pc in >> fixup_exception(). A more generic solution would add a "handler" field >> to the table entry, like on x86 and s390. >> >> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > > This patch looks good to me. > > Acked-by: Song Liu <songliubraving@fb.com> +1, applied, thanks a lot! > It is possible to add a selftest for this? I thought about this a > little bit, but > didn't get a good idea. Why not adding a test_verifier.c test case which calls into bpf_get_current_task() to fetch pointer to current and then read out some field via BPF_PROBE_MEM which should then succeed on x86/s390x/arm64 but be skipped on the other archs? Jean-Philippe, could you look into following up with such test case(s)? Thanks, Daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-29 21:29 ` Daniel Borkmann @ 2020-07-30 8:28 ` Jean-Philippe Brucker 0 siblings, 0 replies; 19+ messages in thread From: Jean-Philippe Brucker @ 2020-07-30 8:28 UTC (permalink / raw) To: Daniel Borkmann Cc: Song Liu, linux-arm-kernel, bpf, Catalin Marinas, Will Deacon, Alexei Starovoitov, zlim.lnx, Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh On Wed, Jul 29, 2020 at 11:29:43PM +0200, Daniel Borkmann wrote: > On 7/29/20 7:28 PM, Song Liu wrote: > > On Tue, Jul 28, 2020 at 8:37 AM Jean-Philippe Brucker > > <jean-philippe@linaro.org> wrote: > > > > > > When a tracing BPF program attempts to read memory without using the > > > bpf_probe_read() helper, the verifier marks the load instruction with > > > the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize > > > this flag it falls back to the interpreter. > > > > > > Add support for BPF_PROBE_MEM, by appending an exception table to the > > > BPF program. If the load instruction causes a data abort, the fixup > > > infrastructure finds the exception table and fixes up the fault, by > > > clearing the destination register and jumping over the faulting > > > instruction. > > > > > > To keep the compact exception table entry format, inspect the pc in > > > fixup_exception(). A more generic solution would add a "handler" field > > > to the table entry, like on x86 and s390. > > > > > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > > > > This patch looks good to me. > > > > Acked-by: Song Liu <songliubraving@fb.com> > > +1, applied, thanks a lot! > > > It is possible to add a selftest for this? I thought about this a > > little bit, but > > didn't get a good idea. > > Why not adding a test_verifier.c test case which calls into bpf_get_current_task() > to fetch pointer to current and then read out some field via BPF_PROBE_MEM which > should then succeed on x86/s390x/arm64 but be skipped on the other archs? Jean-Philippe, > could you look into following up with such test case(s)? Sure I'll take a look. Ilya also added a selftests to trigger exceptions in https://lore.kernel.org/bpf/20200715233301.933201-5-iii@linux.ibm.com/ It's useful but I think it relies on the verifier not mandating NULL checks for next-level pointers (they are ptr_ instead of ptr_or_null_), which might change in the future. So I'm wondering if we can deliberately access an invalid pointer with the help of bpf_test_run, and check that the result is zero. Thanks, Jean ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker 2020-07-29 17:28 ` Song Liu @ 2020-07-30 12:28 ` Qian Cai 2020-07-30 14:22 ` Jean-Philippe Brucker 1 sibling, 1 reply; 19+ messages in thread From: Qian Cai @ 2020-07-30 12:28 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: linux-arm-kernel, bpf, songliubraving, andriin, daniel, catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs, will, kafai, sfr, linux-next, linux-kernel On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote: > When a tracing BPF program attempts to read memory without using the > bpf_probe_read() helper, the verifier marks the load instruction with > the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize > this flag it falls back to the interpreter. > > Add support for BPF_PROBE_MEM, by appending an exception table to the > BPF program. If the load instruction causes a data abort, the fixup > infrastructure finds the exception table and fixes up the fault, by > clearing the destination register and jumping over the faulting > instruction. > > To keep the compact exception table entry format, inspect the pc in > fixup_exception(). A more generic solution would add a "handler" field > to the table entry, like on x86 and s390. > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> This will fail to compile on arm64, https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config arch/arm64/mm/extable.o: In function `fixup_exception': arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception' > --- > Note: the extable is aligned on 32 bits. Given that extable entries have > 32-bit members I figured we don't need to align it to 64 bits. > --- > arch/arm64/include/asm/extable.h | 3 ++ > arch/arm64/mm/extable.c | 11 ++-- > arch/arm64/net/bpf_jit_comp.c | 93 +++++++++++++++++++++++++++++--- > 3 files changed, 98 insertions(+), 9 deletions(-) > > diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h > index 56a4f68b262e..bcee40df1586 100644 > --- a/arch/arm64/include/asm/extable.h > +++ b/arch/arm64/include/asm/extable.h > @@ -22,5 +22,8 @@ struct exception_table_entry > > #define ARCH_HAS_RELATIVE_EXTABLE > > +int arm64_bpf_fixup_exception(const struct exception_table_entry *ex, > + struct pt_regs *regs); > + > extern int fixup_exception(struct pt_regs *regs); > #endif > diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c > index 81e694af5f8c..1f42991cacdd 100644 > --- a/arch/arm64/mm/extable.c > +++ b/arch/arm64/mm/extable.c > @@ -11,8 +11,13 @@ int fixup_exception(struct pt_regs *regs) > const struct exception_table_entry *fixup; > > fixup = search_exception_tables(instruction_pointer(regs)); > - if (fixup) > - regs->pc = (unsigned long)&fixup->fixup + fixup->fixup; > + if (!fixup) > + return 0; > > - return fixup != NULL; > + if (regs->pc >= BPF_JIT_REGION_START && > + regs->pc < BPF_JIT_REGION_END) > + return arm64_bpf_fixup_exception(fixup, regs); > + > + regs->pc = (unsigned long)&fixup->fixup + fixup->fixup; > + return 1; > } > diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c > index 3cb25b43b368..f8912e45be7a 100644 > --- a/arch/arm64/net/bpf_jit_comp.c > +++ b/arch/arm64/net/bpf_jit_comp.c > @@ -7,6 +7,7 @@ > > #define pr_fmt(fmt) "bpf_jit: " fmt > > +#include <linux/bitfield.h> > #include <linux/bpf.h> > #include <linux/filter.h> > #include <linux/printk.h> > @@ -56,6 +57,7 @@ struct jit_ctx { > int idx; > int epilogue_offset; > int *offset; > + int exentry_idx; > __le32 *image; > u32 stack_size; > }; > @@ -351,6 +353,67 @@ static void build_epilogue(struct jit_ctx *ctx) > emit(A64_RET(A64_LR), ctx); > } > > +#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0) > +#define BPF_FIXUP_REG_MASK GENMASK(31, 27) > + > +int arm64_bpf_fixup_exception(const struct exception_table_entry *ex, > + struct pt_regs *regs) > +{ > + off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup); > + int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup); > + > + regs->regs[dst_reg] = 0; > + regs->pc = (unsigned long)&ex->fixup - offset; > + return 1; > +} > + [] ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-30 12:28 ` Qian Cai @ 2020-07-30 14:22 ` Jean-Philippe Brucker 2020-07-30 19:47 ` Daniel Borkmann 0 siblings, 1 reply; 19+ messages in thread From: Jean-Philippe Brucker @ 2020-07-30 14:22 UTC (permalink / raw) To: Qian Cai Cc: linux-arm-kernel, bpf, songliubraving, andriin, daniel, catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs, will, kafai, sfr, linux-next, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1499 bytes --] On Thu, Jul 30, 2020 at 08:28:56AM -0400, Qian Cai wrote: > On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote: > > When a tracing BPF program attempts to read memory without using the > > bpf_probe_read() helper, the verifier marks the load instruction with > > the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize > > this flag it falls back to the interpreter. > > > > Add support for BPF_PROBE_MEM, by appending an exception table to the > > BPF program. If the load instruction causes a data abort, the fixup > > infrastructure finds the exception table and fixes up the fault, by > > clearing the destination register and jumping over the faulting > > instruction. > > > > To keep the compact exception table entry format, inspect the pc in > > fixup_exception(). A more generic solution would add a "handler" field > > to the table entry, like on x86 and s390. > > > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > > This will fail to compile on arm64, > > https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config > > arch/arm64/mm/extable.o: In function `fixup_exception': > arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception' Thanks for the report, I attached a fix. Daniel, can I squash it and resend as v2 or is it too late? I'd be more confident if my patches sat a little longer on the list so arm64 folks have a chance to review them. This isn't my first silly mistake... Thanks, Jean [-- Attachment #2: 0001-arm64-bpf-Fix-build-for-CONFIG_BPF_JIT.patch --] [-- Type: text/plain, Size: 1688 bytes --] From 17d0f041b57903cb2657dde15559cd1923498337 Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker <jean-philippe@linaro.org> Date: Thu, 30 Jul 2020 14:45:44 +0200 Subject: [PATCH] arm64: bpf: Fix build for !CONFIG_BPF_JIT Add a stub for arm64_bpf_fixup_exception() when CONFIG_BPF_JIT isn't enabled, and avoid the fixup in this case. Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> --- arch/arm64/include/asm/extable.h | 9 +++++++++ arch/arm64/mm/extable.c | 3 ++- 2 files changed, 11 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/extable.h b/arch/arm64/include/asm/extable.h index bcee40df1586..840a35ed92ec 100644 --- a/arch/arm64/include/asm/extable.h +++ b/arch/arm64/include/asm/extable.h @@ -22,8 +22,17 @@ struct exception_table_entry #define ARCH_HAS_RELATIVE_EXTABLE +#ifdef CONFIG_BPF_JIT int arm64_bpf_fixup_exception(const struct exception_table_entry *ex, struct pt_regs *regs); +#else /* !CONFIG_BPF_JIT */ +static inline +int arm64_bpf_fixup_exception(const struct exception_table_entry *ex, + struct pt_regs *regs) +{ + return 0; +} +#endif /* !CONFIG_BPF_JIT */ extern int fixup_exception(struct pt_regs *regs); #endif diff --git a/arch/arm64/mm/extable.c b/arch/arm64/mm/extable.c index 1f42991cacdd..eee1732ab6cd 100644 --- a/arch/arm64/mm/extable.c +++ b/arch/arm64/mm/extable.c @@ -14,7 +14,8 @@ int fixup_exception(struct pt_regs *regs) if (!fixup) return 0; - if (regs->pc >= BPF_JIT_REGION_START && + if (IS_ENABLED(CONFIG_BPF_JIT) && + regs->pc >= BPF_JIT_REGION_START && regs->pc < BPF_JIT_REGION_END) return arm64_bpf_fixup_exception(fixup, regs); -- 2.27.0 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-30 14:22 ` Jean-Philippe Brucker @ 2020-07-30 19:47 ` Daniel Borkmann 2020-07-30 21:14 ` Jean-Philippe Brucker 0 siblings, 1 reply; 19+ messages in thread From: Daniel Borkmann @ 2020-07-30 19:47 UTC (permalink / raw) To: Jean-Philippe Brucker, Qian Cai Cc: linux-arm-kernel, bpf, songliubraving, andriin, catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs, will, kafai, sfr, linux-next, linux-kernel On 7/30/20 4:22 PM, Jean-Philippe Brucker wrote: > On Thu, Jul 30, 2020 at 08:28:56AM -0400, Qian Cai wrote: >> On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote: >>> When a tracing BPF program attempts to read memory without using the >>> bpf_probe_read() helper, the verifier marks the load instruction with >>> the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize >>> this flag it falls back to the interpreter. >>> >>> Add support for BPF_PROBE_MEM, by appending an exception table to the >>> BPF program. If the load instruction causes a data abort, the fixup >>> infrastructure finds the exception table and fixes up the fault, by >>> clearing the destination register and jumping over the faulting >>> instruction. >>> >>> To keep the compact exception table entry format, inspect the pc in >>> fixup_exception(). A more generic solution would add a "handler" field >>> to the table entry, like on x86 and s390. >>> >>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> >> >> This will fail to compile on arm64, >> >> https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config >> >> arch/arm64/mm/extable.o: In function `fixup_exception': >> arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception' > > Thanks for the report, I attached a fix. Daniel, can I squash it and > resend as v2 or is it too late? If you want I can squash your attached snippet into the original patch of yours. If you want to send a v2 that is fine as well of course. Let me know. Thanks, Daniel ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-30 19:47 ` Daniel Borkmann @ 2020-07-30 21:14 ` Jean-Philippe Brucker 2020-07-30 22:45 ` Daniel Borkmann 0 siblings, 1 reply; 19+ messages in thread From: Jean-Philippe Brucker @ 2020-07-30 21:14 UTC (permalink / raw) To: Daniel Borkmann Cc: Qian Cai, linux-arm-kernel, bpf, songliubraving, andriin, catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs, will, kafai, sfr, linux-next, linux-kernel On Thu, Jul 30, 2020 at 09:47:39PM +0200, Daniel Borkmann wrote: > On 7/30/20 4:22 PM, Jean-Philippe Brucker wrote: > > On Thu, Jul 30, 2020 at 08:28:56AM -0400, Qian Cai wrote: > > > On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote: > > > > When a tracing BPF program attempts to read memory without using the > > > > bpf_probe_read() helper, the verifier marks the load instruction with > > > > the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize > > > > this flag it falls back to the interpreter. > > > > > > > > Add support for BPF_PROBE_MEM, by appending an exception table to the > > > > BPF program. If the load instruction causes a data abort, the fixup > > > > infrastructure finds the exception table and fixes up the fault, by > > > > clearing the destination register and jumping over the faulting > > > > instruction. > > > > > > > > To keep the compact exception table entry format, inspect the pc in > > > > fixup_exception(). A more generic solution would add a "handler" field > > > > to the table entry, like on x86 and s390. > > > > > > > > Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> > > > > > > This will fail to compile on arm64, > > > > > > https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config > > > > > > arch/arm64/mm/extable.o: In function `fixup_exception': > > > arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception' > > > > Thanks for the report, I attached a fix. Daniel, can I squash it and > > resend as v2 or is it too late? > > If you want I can squash your attached snippet into the original patch of > yours. If you want to send a v2 that is fine as well of course. Let me know. Yes please squash it into the original patch, sorry for the mess Thanks, Jean ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 1/1] arm64: bpf: Add BPF exception tables 2020-07-30 21:14 ` Jean-Philippe Brucker @ 2020-07-30 22:45 ` Daniel Borkmann 0 siblings, 0 replies; 19+ messages in thread From: Daniel Borkmann @ 2020-07-30 22:45 UTC (permalink / raw) To: Jean-Philippe Brucker Cc: Qian Cai, linux-arm-kernel, bpf, songliubraving, andriin, catalin.marinas, john.fastabend, ast, zlim.lnx, kpsingh, yhs, will, kafai, sfr, linux-next, linux-kernel On 7/30/20 11:14 PM, Jean-Philippe Brucker wrote: > On Thu, Jul 30, 2020 at 09:47:39PM +0200, Daniel Borkmann wrote: >> On 7/30/20 4:22 PM, Jean-Philippe Brucker wrote: >>> On Thu, Jul 30, 2020 at 08:28:56AM -0400, Qian Cai wrote: >>>> On Tue, Jul 28, 2020 at 05:21:26PM +0200, Jean-Philippe Brucker wrote: >>>>> When a tracing BPF program attempts to read memory without using the >>>>> bpf_probe_read() helper, the verifier marks the load instruction with >>>>> the BPF_PROBE_MEM flag. Since the arm64 JIT does not currently recognize >>>>> this flag it falls back to the interpreter. >>>>> >>>>> Add support for BPF_PROBE_MEM, by appending an exception table to the >>>>> BPF program. If the load instruction causes a data abort, the fixup >>>>> infrastructure finds the exception table and fixes up the fault, by >>>>> clearing the destination register and jumping over the faulting >>>>> instruction. >>>>> >>>>> To keep the compact exception table entry format, inspect the pc in >>>>> fixup_exception(). A more generic solution would add a "handler" field >>>>> to the table entry, like on x86 and s390. >>>>> >>>>> Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org> >>>> >>>> This will fail to compile on arm64, >>>> >>>> https://gitlab.com/cailca/linux-mm/-/blob/master/arm64.config >>>> >>>> arch/arm64/mm/extable.o: In function `fixup_exception': >>>> arch/arm64/mm/extable.c:19: undefined reference to `arm64_bpf_fixup_exception' >>> >>> Thanks for the report, I attached a fix. Daniel, can I squash it and >>> resend as v2 or is it too late? >> >> If you want I can squash your attached snippet into the original patch of >> yours. If you want to send a v2 that is fine as well of course. Let me know. > > Yes please squash it into the original patch, sorry for the mess Done, thanks! ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables 2020-07-28 15:21 [PATCH bpf-next 0/1] arm64: Add BPF exception tables Jean-Philippe Brucker 2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker @ 2021-06-09 12:04 ` Ravi Bangoria 2021-06-11 0:12 ` Alexei Starovoitov 1 sibling, 1 reply; 19+ messages in thread From: Ravi Bangoria @ 2021-06-09 12:04 UTC (permalink / raw) To: daniel, ast Cc: catalin.marinas, will, zlim.lnx, kafai, songliubraving, yhs, andriin, john.fastabend, kpsingh, Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao Hi Alexei, On 7/28/20 8:51 PM, Jean-Philippe Brucker wrote: > The following patch adds support for BPF_PROBE_MEM on arm64. The > implementation is simple but I wanted to give a bit of background first. > If you're familiar with recent BPF development you can skip to the patch > (or fact-check the following blurb). > > BPF programs used for tracing can inspect any of the traced function's > arguments and follow pointers in struct members. Traditionally the BPF > program would get a struct pt_regs as argument and cast the register > values to the appropriate struct pointer. The BPF verifier would mandate > that any memory access uses the bpf_probe_read() helper, to suppress > page faults (see samples/bpf/tracex1_kern.c). > > With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF), > the verifier can now check the type of any access performed by a BPF > program. It rejects for example programs that cast to a different > structure and perform out-of-bounds accesses, or programs that attempt > to dereference something that isn't a pointer, or that hasn't gone > through a NULL check. > > As this makes tracing programs safer, the verifier now allows loading > programs that access struct members without bpf_probe_read(). It is > however still possible to trigger page faults. For example in the > following example with which I've tested this patch, the verifier does > not mandate a NULL check for the second-level pointer: > > /* > * From tools/testing/selftests/bpf/progs/bpf_iter_task.c > * dump_task() is called for each task. > */ > SEC("iter/task") > int dump_task(struct bpf_iter__task *ctx) > { > struct seq_file *seq = ctx->meta->seq; > struct task_struct *task = ctx->task; > > /* Program would be rejected without this check */ > if (task == NULL) > return 0; > > /* > * However the verifier does not currently mandate > * checking task->mm, and the following faults for kernel > * threads. > */ > BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm); > return 0; > } > > Even if it checked this case, the verifier couldn't guarantee that all > accesses are safe since kernel structures could in theory contain > garbage or error pointers. So to allow fast access without > bpf_probe_read(), a JIT implementation must support BPF exception > tables. For each access to a BTF pointer, the JIT generates an entry > into an exception table appended to the BPF program. If the access > faults at runtime, the handler skips the faulting instruction. The > example above will display vm=0 for kernel threads. I'm trying with the example above (task->mm->total_vm) on x86 machine with bpf/master (11fc79fc9f2e3) plus commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.") *reverted*, I'm seeing the app getting killed with error in dmesg. $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task $ sudo cat /sys/fs/bpf/task Killed $ dmesg [ 188.810020] BUG: kernel NULL pointer dereference, address: 00000000000000c8 [ 188.810030] #PF: supervisor read access in kernel mode [ 188.810034] #PF: error_code(0x0000) - not-present page IIUC, this should be handled by bpf exception table rather than killing the app. Am I missing anything? Ravi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables 2021-06-09 12:04 ` [PATCH bpf-next 0/1] arm64: " Ravi Bangoria @ 2021-06-11 0:12 ` Alexei Starovoitov 2021-06-17 6:58 ` Ravi Bangoria 0 siblings, 1 reply; 19+ messages in thread From: Alexei Starovoitov @ 2021-06-11 0:12 UTC (permalink / raw) To: Ravi Bangoria Cc: Daniel Borkmann, Alexei Starovoitov, Catalin Marinas, Will Deacon, Zi Shen Lim, Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh, Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao On Wed, Jun 9, 2021 at 5:05 AM Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote: > > Hi Alexei, > > On 7/28/20 8:51 PM, Jean-Philippe Brucker wrote: > > The following patch adds support for BPF_PROBE_MEM on arm64. The > > implementation is simple but I wanted to give a bit of background first. > > If you're familiar with recent BPF development you can skip to the patch > > (or fact-check the following blurb). > > > > BPF programs used for tracing can inspect any of the traced function's > > arguments and follow pointers in struct members. Traditionally the BPF > > program would get a struct pt_regs as argument and cast the register > > values to the appropriate struct pointer. The BPF verifier would mandate > > that any memory access uses the bpf_probe_read() helper, to suppress > > page faults (see samples/bpf/tracex1_kern.c). > > > > With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF), > > the verifier can now check the type of any access performed by a BPF > > program. It rejects for example programs that cast to a different > > structure and perform out-of-bounds accesses, or programs that attempt > > to dereference something that isn't a pointer, or that hasn't gone > > through a NULL check. > > > > As this makes tracing programs safer, the verifier now allows loading > > programs that access struct members without bpf_probe_read(). It is > > however still possible to trigger page faults. For example in the > > following example with which I've tested this patch, the verifier does > > not mandate a NULL check for the second-level pointer: > > > > /* > > * From tools/testing/selftests/bpf/progs/bpf_iter_task.c > > * dump_task() is called for each task. > > */ > > SEC("iter/task") > > int dump_task(struct bpf_iter__task *ctx) > > { > > struct seq_file *seq = ctx->meta->seq; > > struct task_struct *task = ctx->task; > > > > /* Program would be rejected without this check */ > > if (task == NULL) > > return 0; > > > > /* > > * However the verifier does not currently mandate > > * checking task->mm, and the following faults for kernel > > * threads. > > */ > > BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm); > > return 0; > > } > > > > Even if it checked this case, the verifier couldn't guarantee that all > > accesses are safe since kernel structures could in theory contain > > garbage or error pointers. So to allow fast access without > > bpf_probe_read(), a JIT implementation must support BPF exception > > tables. For each access to a BTF pointer, the JIT generates an entry > > into an exception table appended to the BPF program. If the access > > faults at runtime, the handler skips the faulting instruction. The > > example above will display vm=0 for kernel threads. > > I'm trying with the example above (task->mm->total_vm) on x86 machine > with bpf/master (11fc79fc9f2e3) plus commit 4c5de127598e1 ("bpf: Emit > explicit NULL pointer checks for PROBE_LDX instructions.") *reverted*, > I'm seeing the app getting killed with error in dmesg. > > $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task > $ sudo cat /sys/fs/bpf/task > Killed > > $ dmesg > [ 188.810020] BUG: kernel NULL pointer dereference, address: 00000000000000c8 > [ 188.810030] #PF: supervisor read access in kernel mode > [ 188.810034] #PF: error_code(0x0000) - not-present page > > IIUC, this should be handled by bpf exception table rather than killing > the app. Am I missing anything? For PROBE_LDX the verifier guarantees that the address is either a very likely valid kernel address or NULL. On x86 the user and kernel address spaces are shared and NULL is a user address, so there cannot be an exception table for NULL. Hence x86-64 JIT inserts NULL check when it converts PROBE_LDX into load insn. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables 2021-06-11 0:12 ` Alexei Starovoitov @ 2021-06-17 6:58 ` Ravi Bangoria 2021-06-18 16:34 ` Alexei Starovoitov 0 siblings, 1 reply; 19+ messages in thread From: Ravi Bangoria @ 2021-06-17 6:58 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Borkmann, Alexei Starovoitov, Catalin Marinas, Will Deacon, Zi Shen Lim, Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh, Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao, Ravi Bangoria Hi Alexei, Sorry to bother you again! On 6/11/21 5:42 AM, Alexei Starovoitov wrote: > On Wed, Jun 9, 2021 at 5:05 AM Ravi Bangoria > <ravi.bangoria@linux.ibm.com> wrote: >> >> Hi Alexei, >> >> On 7/28/20 8:51 PM, Jean-Philippe Brucker wrote: >>> The following patch adds support for BPF_PROBE_MEM on arm64. The >>> implementation is simple but I wanted to give a bit of background first. >>> If you're familiar with recent BPF development you can skip to the patch >>> (or fact-check the following blurb). >>> >>> BPF programs used for tracing can inspect any of the traced function's >>> arguments and follow pointers in struct members. Traditionally the BPF >>> program would get a struct pt_regs as argument and cast the register >>> values to the appropriate struct pointer. The BPF verifier would mandate >>> that any memory access uses the bpf_probe_read() helper, to suppress >>> page faults (see samples/bpf/tracex1_kern.c). >>> >>> With BPF Type Format embedded into the kernel (CONFIG_DEBUG_INFO_BTF), >>> the verifier can now check the type of any access performed by a BPF >>> program. It rejects for example programs that cast to a different >>> structure and perform out-of-bounds accesses, or programs that attempt >>> to dereference something that isn't a pointer, or that hasn't gone >>> through a NULL check. >>> >>> As this makes tracing programs safer, the verifier now allows loading >>> programs that access struct members without bpf_probe_read(). It is >>> however still possible to trigger page faults. For example in the >>> following example with which I've tested this patch, the verifier does >>> not mandate a NULL check for the second-level pointer: >>> >>> /* >>> * From tools/testing/selftests/bpf/progs/bpf_iter_task.c >>> * dump_task() is called for each task. >>> */ >>> SEC("iter/task") >>> int dump_task(struct bpf_iter__task *ctx) >>> { >>> struct seq_file *seq = ctx->meta->seq; >>> struct task_struct *task = ctx->task; >>> >>> /* Program would be rejected without this check */ >>> if (task == NULL) >>> return 0; >>> >>> /* >>> * However the verifier does not currently mandate >>> * checking task->mm, and the following faults for kernel >>> * threads. >>> */ >>> BPF_SEQ_PRINTF(seq, "pid=%d vm=%d", task->pid, task->mm->total_vm); >>> return 0; >>> } >>> >>> Even if it checked this case, the verifier couldn't guarantee that all >>> accesses are safe since kernel structures could in theory contain >>> garbage or error pointers. So to allow fast access without >>> bpf_probe_read(), a JIT implementation must support BPF exception >>> tables. For each access to a BTF pointer, the JIT generates an entry >>> into an exception table appended to the BPF program. If the access >>> faults at runtime, the handler skips the faulting instruction. The >>> example above will display vm=0 for kernel threads. >> >> I'm trying with the example above (task->mm->total_vm) on x86 machine >> with bpf/master (11fc79fc9f2e3) plus commit 4c5de127598e1 ("bpf: Emit >> explicit NULL pointer checks for PROBE_LDX instructions.") *reverted*, >> I'm seeing the app getting killed with error in dmesg. >> >> $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task >> $ sudo cat /sys/fs/bpf/task >> Killed >> >> $ dmesg >> [ 188.810020] BUG: kernel NULL pointer dereference, address: 00000000000000c8 >> [ 188.810030] #PF: supervisor read access in kernel mode >> [ 188.810034] #PF: error_code(0x0000) - not-present page >> >> IIUC, this should be handled by bpf exception table rather than killing >> the app. Am I missing anything? > > For PROBE_LDX the verifier guarantees that the address is either > a very likely valid kernel address or NULL. On x86 the user and kernel > address spaces are shared and NULL is a user address, so there cannot be > an exception table for NULL. Hence x86-64 JIT inserts NULL check when > it converts PROBE_LDX into load insn. IIUC, there could be 3 types of addresses a BPF prog can have: 1. NULL ptr. Which is handled by adding additional check in BPF program. So this won't cause a fault. 2. Valid kernel address. This will never cause a fault. 3. Bad address. This is very unlikely but possible. IIUC, BPF extable is introduced to handle such scenarios. Unfortunately, with any type of bad address (user or kernel), extable on x86 never gets involved. Kernel always kills the application with error in dmesg. Please let me know if I understood it incorrectly. TLDR; To check the case 3 further, I tried with bpf/master, this time without reverting your patch. I added a dummy structure and a bad pointer to it in task_struct. diff --git a/include/linux/sched.h b/include/linux/sched.h index d2c881384517..4698188bcf45 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -646,6 +646,10 @@ struct kmap_ctrl { #endif }; +struct dummy_task_ele { + int dummy; +}; + struct task_struct { #ifdef CONFIG_THREAD_INFO_IN_TASK /* @@ -771,6 +775,8 @@ struct task_struct { struct mm_struct *mm; struct mm_struct *active_mm; + struct dummy_task_ele *dte; + /* Per-thread vma caching: */ struct vmacache vmacache; diff --git a/kernel/fork.c b/kernel/fork.c index dc06afd725cb..ed01f25edd8e 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2116,6 +2116,9 @@ static __latent_entropy struct task_struct *copy_process( retval = copy_mm(clone_flags, p); if (retval) goto bad_fork_cleanup_signal; + + p->dte = (void *)0xd12345; + retval = copy_namespaces(clone_flags, p); if (retval) goto bad_fork_cleanup_mm; And with below change in testcase: diff --git a/tools/testing/selftests/bpf/progs/bpf_iter_task.c b/tools/testing/selftests/bpf/progs/bpf_iter_task.c index b7f32c160f4e..391c1b3da638 100644 --- a/tools/testing/selftests/bpf/progs/bpf_iter_task.c +++ b/tools/testing/selftests/bpf/progs/bpf_iter_task.c @@ -21,6 +21,6 @@ int dump_task(struct bpf_iter__task *ctx) if (ctx->meta->seq_num == 0) BPF_SEQ_PRINTF(seq, " tgid gid\n"); - BPF_SEQ_PRINTF(seq, "%8d %8d\n", task->tgid, task->pid); + BPF_SEQ_PRINTF(seq, "%8d %8d %d\n", task->tgid, task->pid, task->dte->dummy); return 0; } I see the same issue: $ sudo bpftool iter pin bpf_iter_task.o /sys/fs/bpf/task $ sudo cat /sys/fs/bpf/task Killed $ dmesg [ 166.864325] BUG: unable to handle page fault for address: 0000000000d12345 [ 166.864336] #PF: supervisor read access in kernel mode [ 166.864338] #PF: error_code(0x0000) - not-present page 0xd12345 is unallocated userspace address. Similarly, I also tried with p->dte = (void *)0xffffffffc1234567 after confirming it's not allocated to kernel or any module address. I see the same failure with it too. Though, the same test with bpf_probe_read(task->dte->dummy) works fine. Ravi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables 2021-06-17 6:58 ` Ravi Bangoria @ 2021-06-18 16:34 ` Alexei Starovoitov 2021-06-22 7:10 ` Ravi Bangoria 2021-06-22 11:00 ` [PATCH] x86 bpf: Fix extable offset calculation Ravi Bangoria 0 siblings, 2 replies; 19+ messages in thread From: Alexei Starovoitov @ 2021-06-18 16:34 UTC (permalink / raw) To: Ravi Bangoria Cc: Daniel Borkmann, Alexei Starovoitov, Catalin Marinas, Will Deacon, Zi Shen Lim, Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh, Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao On Wed, Jun 16, 2021 at 11:58 PM Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote: > > $ dmesg > [ 166.864325] BUG: unable to handle page fault for address: 0000000000d12345 > [ 166.864336] #PF: supervisor read access in kernel mode > [ 166.864338] #PF: error_code(0x0000) - not-present page > > 0xd12345 is unallocated userspace address. Similarly, I also tried with that's unfortunately expected, since this is a user address. > p->dte = (void *)0xffffffffc1234567 after confirming it's not allocated > to kernel or any module address. I see the same failure with it too. This one is surprising though. Sounds like a bug in exception table construction. Can you debug it to see what's causing it? First check that do_kern_addr_fault() is invoked in this case. And then fixup_exception() and why search_bpf_extables() cannot find it. Separately we probably need to replace the NULL check with addr >= TASK_SIZE_MAX to close this issue though it's a bit artificial. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH bpf-next 0/1] arm64: Add BPF exception tables 2021-06-18 16:34 ` Alexei Starovoitov @ 2021-06-22 7:10 ` Ravi Bangoria 2021-06-22 11:00 ` [PATCH] x86 bpf: Fix extable offset calculation Ravi Bangoria 1 sibling, 0 replies; 19+ messages in thread From: Ravi Bangoria @ 2021-06-22 7:10 UTC (permalink / raw) To: Alexei Starovoitov Cc: Daniel Borkmann, Alexei Starovoitov, Catalin Marinas, Will Deacon, Zi Shen Lim, Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko, John Fastabend, KP Singh, Jean-Philippe Brucker, linux-arm-kernel, bpf, Naveen N. Rao, Ravi Bangoria Hi Alexei, On 6/18/21 10:04 PM, Alexei Starovoitov wrote: > On Wed, Jun 16, 2021 at 11:58 PM Ravi Bangoria > <ravi.bangoria@linux.ibm.com> wrote: >> >> $ dmesg >> [ 166.864325] BUG: unable to handle page fault for address: 0000000000d12345 >> [ 166.864336] #PF: supervisor read access in kernel mode >> [ 166.864338] #PF: error_code(0x0000) - not-present page >> >> 0xd12345 is unallocated userspace address. Similarly, I also tried with > > that's unfortunately expected, since this is a user address. Sure. fwiw, it works with bpf_probe_read(). >> p->dte = (void *)0xffffffffc1234567 after confirming it's not allocated >> to kernel or any module address. I see the same failure with it too. > > This one is surprising though. Sounds like a bug in exception table > construction. Can you debug it to see what's causing it? > First check that do_kern_addr_fault() is invoked in this case. > And then fixup_exception() and why search_bpf_extables() > cannot find it. It seems the commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.") added few instructions before actual load but does not consider those additional instruction while calculating extable offset. Let me prepare a fix. > Separately we probably need to replace the NULL check > with addr >= TASK_SIZE_MAX to close this issue though it's a bit artificial. Ravi ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] x86 bpf: Fix extable offset calculation 2021-06-18 16:34 ` Alexei Starovoitov 2021-06-22 7:10 ` Ravi Bangoria @ 2021-06-22 11:00 ` Ravi Bangoria 2021-06-25 4:01 ` Alexei Starovoitov 1 sibling, 1 reply; 19+ messages in thread From: Ravi Bangoria @ 2021-06-22 11:00 UTC (permalink / raw) To: ast Cc: ravi.bangoria, davem, yoshfuji, dsahern, daniel, tglx, mingo, bp, x86, hpa, andrii, kafai, songliubraving, yhs, john.fastabend, kpsingh, netdev, bpf, linux-kernel, naveen.n.rao commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.") is emitting couple of instructions before actual load. Consider those additional instructions while calculating extable offset. Fixes: 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.") Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> --- arch/x86/net/bpf_jit_comp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index 2a2e290fa5d8..231a8178cc11 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1297,7 +1297,7 @@ st: if (is_imm8(insn->off)) emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off); if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { struct exception_table_entry *ex; - u8 *_insn = image + proglen; + u8 *_insn = image + proglen + (u8)(start_of_ldx - temp); s64 delta; /* populate jmp_offset for JMP above */ -- 2.30.2 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH] x86 bpf: Fix extable offset calculation 2021-06-22 11:00 ` [PATCH] x86 bpf: Fix extable offset calculation Ravi Bangoria @ 2021-06-25 4:01 ` Alexei Starovoitov 2021-06-25 6:22 ` Ravi Bangoria 0 siblings, 1 reply; 19+ messages in thread From: Alexei Starovoitov @ 2021-06-25 4:01 UTC (permalink / raw) To: Ravi Bangoria Cc: Alexei Starovoitov, David S. Miller, Hideaki YOSHIFUJI, David Ahern, Daniel Borkmann, Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML, H. Peter Anvin, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Network Development, bpf, LKML, Naveen N. Rao On Tue, Jun 22, 2021 at 4:01 AM Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote: > > commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks > for PROBE_LDX instructions.") is emitting couple of instructions > before actual load. Consider those additional instructions while > calculating extable offset. > > Fixes: 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.") > Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> > --- > arch/x86/net/bpf_jit_comp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c > index 2a2e290fa5d8..231a8178cc11 100644 > --- a/arch/x86/net/bpf_jit_comp.c > +++ b/arch/x86/net/bpf_jit_comp.c > @@ -1297,7 +1297,7 @@ st: if (is_imm8(insn->off)) > emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off); > if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { > struct exception_table_entry *ex; > - u8 *_insn = image + proglen; > + u8 *_insn = image + proglen + (u8)(start_of_ldx - temp); Great debugging and the fix. Thanks a lot. I've dropped (u8) cast, kept (), and applied to bpf tree. I think it looks cleaner without that cast. Could you send a followup patch with a selftest, so I don't make the same mistake again ? ;) ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] x86 bpf: Fix extable offset calculation 2021-06-25 4:01 ` Alexei Starovoitov @ 2021-06-25 6:22 ` Ravi Bangoria 2021-06-25 15:07 ` Alexei Starovoitov 0 siblings, 1 reply; 19+ messages in thread From: Ravi Bangoria @ 2021-06-25 6:22 UTC (permalink / raw) To: Alexei Starovoitov Cc: Alexei Starovoitov, David S. Miller, Hideaki YOSHIFUJI, David Ahern, Daniel Borkmann, Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML, H. Peter Anvin, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Network Development, bpf, LKML, Naveen N. Rao, Ravi Bangoria On 6/25/21 9:31 AM, Alexei Starovoitov wrote: > On Tue, Jun 22, 2021 at 4:01 AM Ravi Bangoria > <ravi.bangoria@linux.ibm.com> wrote: >> >> commit 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks >> for PROBE_LDX instructions.") is emitting couple of instructions >> before actual load. Consider those additional instructions while >> calculating extable offset. >> >> Fixes: 4c5de127598e1 ("bpf: Emit explicit NULL pointer checks for PROBE_LDX instructions.") >> Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com> >> --- >> arch/x86/net/bpf_jit_comp.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c >> index 2a2e290fa5d8..231a8178cc11 100644 >> --- a/arch/x86/net/bpf_jit_comp.c >> +++ b/arch/x86/net/bpf_jit_comp.c >> @@ -1297,7 +1297,7 @@ st: if (is_imm8(insn->off)) >> emit_ldx(&prog, BPF_SIZE(insn->code), dst_reg, src_reg, insn->off); >> if (BPF_MODE(insn->code) == BPF_PROBE_MEM) { >> struct exception_table_entry *ex; >> - u8 *_insn = image + proglen; >> + u8 *_insn = image + proglen + (u8)(start_of_ldx - temp); > > Great debugging and the fix. Thanks a lot. > I've dropped (u8) cast, kept (), and applied to bpf tree. > I think it looks cleaner without that cast. Thanks. > Could you send a followup patch with a selftest, so I don't make > the same mistake again ? ;) Unfortunately extable gets involved only for bad kernel pointers and ideally there should not be any bad pointer in kernel. So there is no easy way to create a proper selftest for this. Ravi ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] x86 bpf: Fix extable offset calculation 2021-06-25 6:22 ` Ravi Bangoria @ 2021-06-25 15:07 ` Alexei Starovoitov 0 siblings, 0 replies; 19+ messages in thread From: Alexei Starovoitov @ 2021-06-25 15:07 UTC (permalink / raw) To: Ravi Bangoria Cc: Alexei Starovoitov, David S. Miller, Hideaki YOSHIFUJI, David Ahern, Daniel Borkmann, Thomas Gleixner, Ingo Molnar, Borislav Petkov, X86 ML, H. Peter Anvin, Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend, KP Singh, Network Development, bpf, LKML, Naveen N. Rao On Thu, Jun 24, 2021 at 11:22 PM Ravi Bangoria <ravi.bangoria@linux.ibm.com> wrote: > > Could you send a followup patch with a selftest, so I don't make > > the same mistake again ? ;) > > Unfortunately extable gets involved only for bad kernel pointers and > ideally there should not be any bad pointer in kernel. So there is no > easy way to create a proper selftest for this. Right. We have lib/test_bpf.c kernel module for such cases. ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2021-06-25 15:07 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-28 15:21 [PATCH bpf-next 0/1] arm64: Add BPF exception tables Jean-Philippe Brucker 2020-07-28 15:21 ` [PATCH bpf-next 1/1] arm64: bpf: " Jean-Philippe Brucker 2020-07-29 17:28 ` Song Liu 2020-07-29 21:29 ` Daniel Borkmann 2020-07-30 8:28 ` Jean-Philippe Brucker 2020-07-30 12:28 ` Qian Cai 2020-07-30 14:22 ` Jean-Philippe Brucker 2020-07-30 19:47 ` Daniel Borkmann 2020-07-30 21:14 ` Jean-Philippe Brucker 2020-07-30 22:45 ` Daniel Borkmann 2021-06-09 12:04 ` [PATCH bpf-next 0/1] arm64: " Ravi Bangoria 2021-06-11 0:12 ` Alexei Starovoitov 2021-06-17 6:58 ` Ravi Bangoria 2021-06-18 16:34 ` Alexei Starovoitov 2021-06-22 7:10 ` Ravi Bangoria 2021-06-22 11:00 ` [PATCH] x86 bpf: Fix extable offset calculation Ravi Bangoria 2021-06-25 4:01 ` Alexei Starovoitov 2021-06-25 6:22 ` Ravi Bangoria 2021-06-25 15:07 ` Alexei Starovoitov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).