bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Jiri Olsa <jolsa@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@chromium.org>
Subject: Re: [PATCH bpf-next 06/29] bpf: Add bpf_arg/bpf_ret_value helpers for tracing programs
Date: Tue, 30 Nov 2021 23:13:08 -0800	[thread overview]
Message-ID: <CAEf4BzbauHaDDJvGpx4oCRddd4KWpb4PkxUiUJvx-CXqEN2sdQ@mail.gmail.com> (raw)
In-Reply-To: <YaPFEpAqIREeUMU7@krava>

On Sun, Nov 28, 2021 at 10:06 AM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Wed, Nov 24, 2021 at 01:43:22PM -0800, Andrii Nakryiko wrote:
> > On Thu, Nov 18, 2021 at 3:25 AM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > Adding bpf_arg/bpf_ret_value helpers for tracing programs
> > > that returns traced function arguments.
> > >
> > > Get n-th argument of the traced function:
> > >   long bpf_arg(void *ctx, int n)
> > >
> > > Get return value of the traced function:
> > >   long bpf_ret_value(void *ctx)
> > >
> > > The trampoline now stores number of arguments on ctx-8
> > > address, so it's easy to verify argument index and find
> > > return value argument.
> > >
> > > Moving function ip address on the trampoline stack behind
> > > the number of functions arguments, so it's now stored
> > > on ctx-16 address.
> > >
> > > Both helpers are inlined by verifier.
> > >
> > > Signed-off-by: Jiri Olsa <jolsa@kernel.org>
> > > ---
> >
> > It would be great to land these changes separate from your huge patch
> > set. There are some upcoming BPF trampoline related changes that will
> > touch this (to add BPF cookie support for fentry/fexit progs), so
> > would be nice to minimize the interdependencies. So maybe post this
> > patch separately (probably after holidays ;) ).
>
> ok
>
> >
> > >  arch/x86/net/bpf_jit_comp.c    | 18 +++++++++++---
> > >  include/uapi/linux/bpf.h       | 14 +++++++++++
> > >  kernel/bpf/verifier.c          | 45 ++++++++++++++++++++++++++++++++--
> > >  kernel/trace/bpf_trace.c       | 38 +++++++++++++++++++++++++++-
> > >  tools/include/uapi/linux/bpf.h | 14 +++++++++++
> > >  5 files changed, 122 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > index 631847907786..67e8ac9aaf0d 100644
> > > --- a/arch/x86/net/bpf_jit_comp.c
> > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > @@ -1941,7 +1941,7 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> > >                                 void *orig_call)
> > >  {
> > >         int ret, i, nr_args = m->nr_args;
> > > -       int stack_size = nr_args * 8;
> > > +       int stack_size = nr_args * 8 + 8 /* nr_args */;
> >
> > this /* nr_args */ next to 8 is super confusing, would be better to
> > expand the comment; might be a good idea to have some sort of a
> > description of possible stack layouts (e.g., fexit has some extra
> > stuff on the stack, I think, but it's impossible to remember and need
> > to recover that knowledge from the assembly code, basically).
> >
> > >         struct bpf_tramp_progs *fentry = &tprogs[BPF_TRAMP_FENTRY];
> > >         struct bpf_tramp_progs *fexit = &tprogs[BPF_TRAMP_FEXIT];
> > >         struct bpf_tramp_progs *fmod_ret = &tprogs[BPF_TRAMP_MODIFY_RETURN];
> > > @@ -1987,12 +1987,22 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *im, void *image, void *i
> > >                 EMIT4(0x48, 0x83, 0xe8, X86_PATCH_SIZE);
> > >                 emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -stack_size);
> > >
> > > -               /* Continue with stack_size for regs storage, stack will
> > > -                * be correctly restored with 'leave' instruction.
> > > -                */
> > > +               /* Continue with stack_size for 'nr_args' storage */
> >
> > same, I don't think this comment really helps, just confuses some more
>
> ok, I'll put some more comments with list of possible the stack layouts
>
> >
> > >                 stack_size -= 8;
> > >         }
> > >
> > > +       /* Store number of arguments of the traced function:
> > > +        *   mov rax, nr_args
> > > +        *   mov QWORD PTR [rbp - stack_size], rax
> > > +        */
> > > +       emit_mov_imm64(&prog, BPF_REG_0, 0, (u32) nr_args);
> > > +       emit_stx(&prog, BPF_DW, BPF_REG_FP, BPF_REG_0, -stack_size);
> > > +
> > > +       /* Continue with stack_size for regs storage, stack will
> > > +        * be correctly restored with 'leave' instruction.
> > > +        */
> > > +       stack_size -= 8;
> >
> > I think "stack_size" as a name outlived itself and it just makes
> > everything harder to understand. It's used more like a stack offset
> > (relative to rsp or rbp) for different things. Would it make code
> > worse if we had few offset variables instead (or rather in addition,
> > we still need to calculate a full stack_size; it's just it's constant
> > re-adjustment is what's hard to keep track of), like regs_off,
> > ret_ip_off, arg_cnt_off, etc?
>
> let's see, I'll try that
>
> >
> > > +
> > >         save_regs(m, &prog, nr_args, stack_size);
> > >
> > >         if (flags & BPF_TRAMP_F_CALL_ORIG) {
> > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> > > index a69e4b04ffeb..fc8b344eecba 100644
> > > --- a/include/uapi/linux/bpf.h
> > > +++ b/include/uapi/linux/bpf.h
> > > @@ -4957,6 +4957,18 @@ union bpf_attr {
> > >   *             **-ENOENT** if *task->mm* is NULL, or no vma contains *addr*.
> > >   *             **-EBUSY** if failed to try lock mmap_lock.
> > >   *             **-EINVAL** for invalid **flags**.
> > > + *
> > > + * long bpf_arg(void *ctx, int n)
> >
> > __u32 n ?
>
> ok
>
> >
> > > + *     Description
> > > + *             Get n-th argument of the traced function (for tracing programs).
> > > + *     Return
> > > + *             Value of the argument.
> >
> > What about errors? those need to be documented.
>
> ok
>
> >
> > > + *
> > > + * long bpf_ret_value(void *ctx)
> > > + *     Description
> > > + *             Get return value of the traced function (for tracing programs).
> > > + *     Return
> > > + *             Return value of the traced function.
> >
> > Same, errors not documented. Also would be good to document what
> > happens when ret_value is requested in the context where there is no
> > ret value (e.g., fentry)
>
> ugh, that's not handled at the moment.. should we fail when
> we see bpf_ret_value helper call in fentry program?

Well, two options, really. Either return zero or detect at
verification time and fail verifications. I find myself leaning
towards less restrictions at verification time, so I'd probably go
with runtime check and zero. This allows to have the same BPF
subprogram that can be called both from fentry/fexit with a proper
if() guard to not do anything with the result of bpf_ret_value (as one
example).

>
> >
> > >   */
> > >  #define __BPF_FUNC_MAPPER(FN)          \
> > >         FN(unspec),                     \
> > > @@ -5140,6 +5152,8 @@ union bpf_attr {
> > >         FN(skc_to_unix_sock),           \
> > >         FN(kallsyms_lookup_name),       \
> > >         FN(find_vma),                   \
> > > +       FN(arg),                        \
> > > +       FN(ret_value),                  \
> >
> > We already have bpf_get_func_ip, so why not continue a tradition and
> > call these bpf_get_func_arg() and bpf_get_func_ret(). Nice, short,
> > clean, consistent.
>
> ok
>
> >
> > BTW, a wild thought. Wouldn't it be cool to have these functions work
> > with kprobe/kretprobe as well? Do you think it's possible?
>
> right, bpf_get_func_ip already works for kprobes
>
> struct kprobe could have the btf_func_model of the traced function,
> so in case we trace function directly on the entry point we could
> read arguments registers based on the btf_func_model
>
> I'll check with Massami

Hm... I'd actually try to keep kprobe BTF-free. We have fentry for
cases where BTF is present and the function is simple enough (like <=6
args, etc). Kprobe is an escape hatch mechanism when all the BTF
fanciness just gets in the way (retsnoop being a primary example from
my side). What I meant here was that bpf_get_arg(int n) would read
correct fields from pt_regs that map to first N arguments passed in
the registers. What we currently have with PT_REGS_PARM macros in
bpf_tracing.h, but with a proper unified BPF helper.

>
> >
> > >         /* */
> > >
> > >  /* integer value in 'imm' field of BPF_CALL instruction selects which helper
> > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > > index fac0c3518add..d4249ef6ca7e 100644
> > > --- a/kernel/bpf/verifier.c
> > > +++ b/kernel/bpf/verifier.c
> > > @@ -13246,11 +13246,52 @@ static int do_misc_fixups(struct bpf_verifier_env *env)
> > >                         continue;
> > >                 }
> > >
> > > +               /* Implement bpf_arg inline. */
> > > +               if (prog_type == BPF_PROG_TYPE_TRACING &&
> > > +                   insn->imm == BPF_FUNC_arg) {
> > > +                       /* Load nr_args from ctx - 8 */
> > > +                       insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
> > > +                       insn_buf[1] = BPF_JMP32_REG(BPF_JGE, BPF_REG_2, BPF_REG_0, 4);
> > > +                       insn_buf[2] = BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8);
> > > +                       insn_buf[3] = BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1);
> > > +                       insn_buf[4] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0);
> > > +                       insn_buf[5] = BPF_JMP_A(1);
> > > +                       insn_buf[6] = BPF_MOV64_IMM(BPF_REG_0, 0);
> > > +
> > > +                       new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 7);
> > > +                       if (!new_prog)
> > > +                               return -ENOMEM;
> > > +
> > > +                       delta    += 6;
> > > +                       env->prog = prog = new_prog;
> > > +                       insn      = new_prog->insnsi + i + delta;
> > > +                       continue;
> >
> > nit: this whole sequence of steps and calculations seems like
> > something that might be abstracted and hidden behind a macro or helper
> > func? Not related to your change, though. But wouldn't it be easier to
> > understand if it was just written as:
> >
> > PATCH_INSNS(
> >     BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
> >     BPF_JMP32_REG(BPF_JGE, BPF_REG_2, BPF_REG_0, 4);
> >     BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8);
> >     BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1);
> >     BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0);
> >     BPF_JMP_A(1);
> >     BPF_MOV64_IMM(BPF_REG_0, 0));
> > continue;
>
> yep, looks better ;-) I'll check

as Alexei mentioned, might not be possible, but if variadic
implementation turns out to be not too ugly, I think it might work.
Macro can assume that insn_buf and all the other variables are there,
so there shouldn't be any increase in stack size use, I think.

But this is just an item on a wishlist, so don't overstress about that.

>
> >
> > ?
> >
> >
> > > +               }
> > > +
> > > +               /* Implement bpf_ret_value inline. */
> > > +               if (prog_type == BPF_PROG_TYPE_TRACING &&
> > > +                   insn->imm == BPF_FUNC_ret_value) {
> > > +                       /* Load nr_args from ctx - 8 */
> > > +                       insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1, -8);
> > > +                       insn_buf[1] = BPF_ALU64_IMM(BPF_MUL, BPF_REG_2, 8);
> > > +                       insn_buf[2] = BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_1);
> > > +                       insn_buf[3] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0);
> > > +
> > > +                       new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 4);
> > > +                       if (!new_prog)
> > > +                               return -ENOMEM;
> > > +
> > > +                       delta    += 3;
> > > +                       env->prog = prog = new_prog;
> > > +                       insn      = new_prog->insnsi + i + delta;
> > > +                       continue;
> > > +               }
> > > +
> > >                 /* Implement bpf_get_func_ip inline. */
> > >                 if (prog_type == BPF_PROG_TYPE_TRACING &&
> > >                     insn->imm == BPF_FUNC_get_func_ip) {
> > > -                       /* Load IP address from ctx - 8 */
> > > -                       insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8);
> > > +                       /* Load IP address from ctx - 16 */
> > > +                       insn_buf[0] = BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -16);
> > >
> > >                         new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, 1);
> > >                         if (!new_prog)
> > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> > > index 25ea521fb8f1..3844cfb45490 100644
> > > --- a/kernel/trace/bpf_trace.c
> > > +++ b/kernel/trace/bpf_trace.c
> > > @@ -1012,7 +1012,7 @@ const struct bpf_func_proto bpf_snprintf_btf_proto = {
> > >  BPF_CALL_1(bpf_get_func_ip_tracing, void *, ctx)
> > >  {
> > >         /* This helper call is inlined by verifier. */
> > > -       return ((u64 *)ctx)[-1];
> > > +       return ((u64 *)ctx)[-2];
> > >  }
> > >
> > >  static const struct bpf_func_proto bpf_get_func_ip_proto_tracing = {
> > > @@ -1091,6 +1091,38 @@ static const struct bpf_func_proto bpf_get_branch_snapshot_proto = {
> > >         .arg2_type      = ARG_CONST_SIZE_OR_ZERO,
> > >  };
> > >
> > > +BPF_CALL_2(bpf_arg, void *, ctx, int, n)
> > > +{
> > > +       /* This helper call is inlined by verifier. */
> > > +       u64 nr_args = ((u64 *)ctx)[-1];
> > > +
> > > +       if ((u64) n >= nr_args)
> > > +               return 0;
> >
> > We'll need bpf_get_func_arg_cnt() helper as well to be able to know
> > the actual number of arguments traced function has. It's impossible to
> > know whether the argument is zero or there is no argument, otherwise.
>
> my idea was that the program will call those helpers based
> on get_func_ip with proper argument indexes

see my comments on multi-attach kprobes, get_func_ip() is nice, but
BPF cookies are often much better. So I wouldn't design everything
with the assumption that user always has to use hashmap +
get_func_ip().

>
> but with bpf_get_func_arg_cnt we could make a simple program
> that would just print function with all its arguments easily, ok ;-)

right, and many other more complicated functions that don't have to do
runtime ip lookups ;)

>
> >
> > > +       return ((u64 *)ctx)[n];
> > > +}
> > > +
> > > +static const struct bpf_func_proto bpf_arg_proto = {
> > > +       .func           = bpf_arg,
> > > +       .gpl_only       = true,
> > > +       .ret_type       = RET_INTEGER,
> > > +       .arg1_type      = ARG_PTR_TO_CTX,
> > > +       .arg1_type      = ARG_ANYTHING,
> > > +};
> > > +
> > > +BPF_CALL_1(bpf_ret_value, void *, ctx)
> > > +{
> > > +       /* This helper call is inlined by verifier. */
> > > +       u64 nr_args = ((u64 *)ctx)[-1];
> > > +
> > > +       return ((u64 *)ctx)[nr_args];
> >
> > we should return 0 for fentry or disable this helper for anything but
> > fexit? It's going to return garbage otherwise.
>
> disabling seems like right choice to me
>

well, see above. I think we should prefer statically disabling
something if it's harmful to enable otherwise, but for more
flexibility and less headache with "proving to BPF verifier", I lean
more and more towards runtime checks, if they are safe and not overly
expensive or complicated.

> thanks,
> jirka
>

  reply	other threads:[~2021-12-01  7:13 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-18 11:24 [RFC bpf-next v5 00/29] bpf: Add batch support for attaching trampolines Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 01/29] ftrace: Use direct_ops hash in unregister_ftrace_direct Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 02/29] ftrace: Add cleanup to unregister_ftrace_direct_multi Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 03/29] ftrace: Add ftrace_set_filter_ips function Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 04/29] bpf: Factor bpf_check_attach_target function Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 05/29] bpf: Add bpf_check_attach_model function Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 06/29] bpf: Add bpf_arg/bpf_ret_value helpers for tracing programs Jiri Olsa
2021-11-24 21:43   ` Andrii Nakryiko
2021-11-25 16:14     ` Alexei Starovoitov
2021-11-28 18:07       ` Jiri Olsa
2021-11-28 18:06     ` Jiri Olsa
2021-12-01  7:13       ` Andrii Nakryiko [this message]
2021-12-01 17:37         ` Alexei Starovoitov
2021-12-01 17:59           ` Andrii Nakryiko
2021-12-01 20:36             ` Alexei Starovoitov
2021-12-01 21:16             ` Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 07/29] bpf, x64: Allow to use caller address from stack Jiri Olsa
2021-11-19  4:14   ` Alexei Starovoitov
2021-11-19 21:46     ` Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 08/29] bpf: Keep active attached trampoline in bpf_prog Jiri Olsa
2021-11-24 21:48   ` Andrii Nakryiko
2021-11-28 17:24     ` Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 09/29] bpf: Add support to load multi func tracing program Jiri Olsa
2021-11-19  4:11   ` Alexei Starovoitov
2021-11-22 20:15     ` Jiri Olsa
2021-11-24 21:51       ` Andrii Nakryiko
2021-11-28 17:41         ` Jiri Olsa
2021-12-01  7:17           ` Andrii Nakryiko
2021-12-01 21:20             ` Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 10/29] bpf: Add bpf_trampoline_id object Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 11/29] bpf: Add addr to " Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 12/29] bpf: Add struct bpf_tramp_node layer Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 13/29] bpf: Add bpf_tramp_attach layer for trampoline attachment Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 14/29] bpf: Add support to store multiple ids in bpf_tramp_id object Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 15/29] bpf: Add support to store multiple addrs " Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 16/29] bpf: Add bpf_tramp_id_single function Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 17/29] bpf: Resolve id in bpf_tramp_id_single Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 18/29] bpf: Add refcount_t to struct bpf_tramp_id Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 19/29] bpf: Add support to attach trampolines with multiple IDs Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 20/29] bpf: Add support for tracing multi link Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 21/29] libbpf: Add btf__find_by_glob_kind function Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 22/29] libbpf: Add support to link multi func tracing program Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 23/29] selftests/bpf: Add bpf_arg/bpf_ret_value test Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 24/29] selftests/bpf: Add fentry multi func test Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 25/29] selftests/bpf: Add fexit " Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 26/29] selftests/bpf: Add fentry/fexit " Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 27/29] selftests/bpf: Add mixed " Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 28/29] selftests/bpf: Add ret_mod " Jiri Olsa
2021-11-18 11:24 ` [PATCH bpf-next 29/29] selftests/bpf: Add attach " Jiri Olsa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEf4BzbauHaDDJvGpx4oCRddd4KWpb4PkxUiUJvx-CXqEN2sdQ@mail.gmail.com \
    --to=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@redhat.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@chromium.org \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).