netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joanne Koong <joannelkoong@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: bpf@vger.kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	martin.lau@kernel.org, ast@kernel.org, netdev@vger.kernel.org,
	memxor@gmail.com, kernel-team@fb.com
Subject: Re: [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs
Date: Tue, 31 Jan 2023 09:54:52 -0800	[thread overview]
Message-ID: <CAJnrk1Z_GB_ynL5kEaVQaxYsPFjad+3dk8JWKqDfvb1VHHavwg@mail.gmail.com> (raw)
In-Reply-To: <20230131053605.g7o75yylku6nusnp@macbook-pro-6.dhcp.thefacebook.com>

On Mon, Jan 30, 2023 at 9:36 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, Jan 30, 2023 at 04:44:12PM -0800, Joanne Koong wrote:
> > On Sun, Jan 29, 2023 at 3:39 PM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Jan 27, 2023 at 11:17:01AM -0800, Joanne Koong wrote:
> > > > Add skb dynptrs, which are dynptrs whose underlying pointer points
> > > > to a skb. The dynptr acts on skb data. skb dynptrs have two main
> > > > benefits. One is that they allow operations on sizes that are not
> > > > statically known at compile-time (eg variable-sized accesses).
> > > > Another is that parsing the packet data through dynptrs (instead of
> > > > through direct access of skb->data and skb->data_end) can be more
> > > > ergonomic and less brittle (eg does not need manual if checking for
> > > > being within bounds of data_end).
> > > >
> > > > For bpf prog types that don't support writes on skb data, the dynptr is
> > > > read-only (bpf_dynptr_write() will return an error and bpf_dynptr_data()
> > > > will return a data slice that is read-only where any writes to it will
> > > > be rejected by the verifier).
> > > >
> > > > For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
> > > > interfaces, reading and writing from/to data in the head as well as from/to
> > > > non-linear paged buffers is supported. For data slices (through the
> > > > bpf_dynptr_data() interface), if the data is in a paged buffer, the user
> > > > must first call bpf_skb_pull_data() to pull the data into the linear
> > > > portion.
> > >
> > > Looks like there is an assumption in parts of this patch that
> > > linear part of skb is always writeable. That's not the case.
> > > See if (ops->gen_prologue || env->seen_direct_write) in convert_ctx_accesses().
> > > For TC progs it calls bpf_unclone_prologue() which adds hidden
> > > bpf_skb_pull_data() in the beginning of the prog to make it writeable.
> >
> > I think we can make this assumption? For writable progs (referenced in
> > the may_access_direct_pkt_data() function), all of them have a
> > gen_prologue that unclones the buffer (eg tc_cls_act, lwt_xmit, sk_skb
> > progs) or their linear portion is okay to write into by default (eg
> > xdp, sk_msg, cg_sockopt progs).
>
> but the patch was preserving seen_direct_write in some cases.
> I'm still confused.

seen_direct_write is used to determine whether to actually unclone or
not in the program's prologue function (eg tc_cls_act_prologue() ->
bpf_unclone_prologue() where in bpf_unclone_prologue(), if
direct_write was not true, then it can skip doing the actual
uncloning).

I think the part of the patch you're talking about regarding
seen_direct_write is this in check_helper_call():

+ if (func_id == BPF_FUNC_dynptr_data &&
+    dynptr_type == BPF_DYNPTR_TYPE_SKB) {
+   bool seen_direct_write = env->seen_direct_write;
+
+   regs[BPF_REG_0].type |= DYNPTR_TYPE_SKB;
+   if (!may_access_direct_pkt_data(env, NULL, BPF_WRITE))
+     regs[BPF_REG_0].type |= MEM_RDONLY;
+   else
+     /*
+     * Calling may_access_direct_pkt_data() will set
+     * env->seen_direct_write to true if the skb is
+     * writable. As an optimization, we can ignore
+     * setting env->seen_direct_write.
+     *
+     * env->seen_direct_write is used by skb
+     * programs to determine whether the skb's page
+     * buffers should be cloned. Since data slice
+     * writes would only be to the head, we can skip
+     * this.
+     */
+     env->seen_direct_write = seen_direct_write;
+ }

If the data slice for a skb dynptr is writable, then seen_direct_write
gets set to true (done internally in may_access_direct_pkt_data()) so
that the skb is actually uncloned, whereas if it's read-only, then
env->seen_direct_write gets reset to its original value (since the
may_access_direct_pkt_data() call will have set env->seen_direct_write
to true)

>
> > >
> > > > Any bpf_dynptr_write() automatically invalidates any prior data slices
> > > > to the skb dynptr. This is because a bpf_dynptr_write() may be writing
> > > > to data in a paged buffer, so it will need to pull the buffer first into
> > > > the head. The reason it needs to be pulled instead of writing directly to
> > > > the paged buffers is because they may be cloned (only the head of the skb
> > > > is by default uncloned). As such, any bpf_dynptr_write() will
> > > > automatically have its prior data slices invalidated, even if the write
> > > > is to data in the skb head (the verifier has no way of differentiating
> > > > whether the write is to the head or paged buffers during program load
> > > > time).
> > >
> > > Could you explain the workflow how bpf_dynptr_write() invalidates other
> > > pkt pointers ?
> > > I expected bpf_dynptr_write() to be in bpf_helper_changes_pkt_data().
> > > Looks like bpf_dynptr_write() calls bpf_skb_store_bytes() underneath,
> > > but that doesn't help the verifier.
> >
> > In the verifier in check_helper_call(), for the BPF_FUNC_dynptr_write
> > case (line 8236) the "changes_data" variable gets set to true if the
> > dynptr is an skb type. At the end of check_helper_call() on line 8474,
> > since "changes_data" is true, clear_all_pkt_pointer() gets called,
> > which invalidates the other packet pointers.
>
> Ahh. I see. Thanks for explaining.
>
> > >
> > > > Please note as well that any other helper calls that change the
> > > > underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
> > > > slices of the skb dynptr as well. The stack trace for this is
> > > > check_helper_call() -> clear_all_pkt_pointers() ->
> > > > __clear_all_pkt_pointers() -> mark_reg_unknown().
> > >
> > > __clear_all_pkt_pointers isn't present in the tree. Typo ?
> >
> > I'll update this message, clear_all_pkt_pointers() and
> > __clear_all_pkt_pointers() were combined in a previous commit.
> >
> > >
> > > >
> > > > For examples of how skb dynptrs can be used, please see the attached
> > > > selftests.
> > > >
> > > > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > > > ---
> > > >  include/linux/bpf.h            |  82 +++++++++------
> > > >  include/linux/filter.h         |  18 ++++
> > > >  include/uapi/linux/bpf.h       |  37 +++++--
> > > >  kernel/bpf/btf.c               |  18 ++++
> > > >  kernel/bpf/helpers.c           |  95 ++++++++++++++---
> > > >  kernel/bpf/verifier.c          | 185 ++++++++++++++++++++++++++-------
> > > >  net/core/filter.c              |  60 ++++++++++-
> > > >  tools/include/uapi/linux/bpf.h |  37 +++++--
> > > >  8 files changed, 432 insertions(+), 100 deletions(-)
> > > >
> > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > > > index 14a0264fac57..1ac061b64582 100644
> > [...]
> > > > @@ -8243,6 +8316,28 @@ static int check_helper_call(struct bpf_verifier_env *env, struct bpf_insn *insn
> > > >               mark_reg_known_zero(env, regs, BPF_REG_0);
> > > >               regs[BPF_REG_0].type = PTR_TO_MEM | ret_flag;
> > > >               regs[BPF_REG_0].mem_size = meta.mem_size;
> > > > +             if (func_id == BPF_FUNC_dynptr_data &&
> > > > +                 dynptr_type == BPF_DYNPTR_TYPE_SKB) {
> > > > +                     bool seen_direct_write = env->seen_direct_write;
> > > > +
> > > > +                     regs[BPF_REG_0].type |= DYNPTR_TYPE_SKB;
> > > > +                     if (!may_access_direct_pkt_data(env, NULL, BPF_WRITE))
> > > > +                             regs[BPF_REG_0].type |= MEM_RDONLY;
> > > > +                     else
> > > > +                             /*
> > > > +                              * Calling may_access_direct_pkt_data() will set
> > > > +                              * env->seen_direct_write to true if the skb is
> > > > +                              * writable. As an optimization, we can ignore
> > > > +                              * setting env->seen_direct_write.
> > > > +                              *
> > > > +                              * env->seen_direct_write is used by skb
> > > > +                              * programs to determine whether the skb's page
> > > > +                              * buffers should be cloned. Since data slice
> > > > +                              * writes would only be to the head, we can skip
> > > > +                              * this.
>
> I was talking about above comment. It reads as 'write to the head are allowed'.
> But they're not. seen_direct_write is needed to do hidden pull.
>

I will remove this line, I agree that it is confusing.

> > > > +                              */
> > > > +                             env->seen_direct_write = seen_direct_write;
> > >
> > > This looks incorrect. skb head might not be writeable.
> > >
> > > > +             }
> > > >               break;
> > > >       case RET_PTR_TO_MEM_OR_BTF_ID:
> > > >       {
> > > > @@ -8649,6 +8744,7 @@ enum special_kfunc_type {
> > > >       KF_bpf_list_pop_back,
> > > >       KF_bpf_cast_to_kern_ctx,
> > > >       KF_bpf_rdonly_cast,
> > > > +     KF_bpf_dynptr_from_skb,
> > > >       KF_bpf_rcu_read_lock,
> > > >       KF_bpf_rcu_read_unlock,
> > > >  };
> > > > @@ -8662,6 +8758,7 @@ BTF_ID(func, bpf_list_pop_front)
> > > >  BTF_ID(func, bpf_list_pop_back)
> > > >  BTF_ID(func, bpf_cast_to_kern_ctx)
> > > >  BTF_ID(func, bpf_rdonly_cast)
> > > > +BTF_ID(func, bpf_dynptr_from_skb)
> > > >  BTF_SET_END(special_kfunc_set)
> > > >
> > > >  BTF_ID_LIST(special_kfunc_list)
> > > > @@ -8673,6 +8770,7 @@ BTF_ID(func, bpf_list_pop_front)
> > > >  BTF_ID(func, bpf_list_pop_back)
> > > >  BTF_ID(func, bpf_cast_to_kern_ctx)
> > > >  BTF_ID(func, bpf_rdonly_cast)
> > > > +BTF_ID(func, bpf_dynptr_from_skb)
> > > >  BTF_ID(func, bpf_rcu_read_lock)
> > > >  BTF_ID(func, bpf_rcu_read_unlock)
> > > >
> > > > @@ -9263,17 +9361,26 @@ static int check_kfunc_args(struct bpf_verifier_env *env, struct bpf_kfunc_call_
> > > >                               return ret;
> > > >                       break;
> > > >               case KF_ARG_PTR_TO_DYNPTR:
> > > > +             {
> > > > +                     enum bpf_arg_type dynptr_arg_type = ARG_PTR_TO_DYNPTR;
> > > > +
> > > >                       if (reg->type != PTR_TO_STACK &&
> > > >                           reg->type != CONST_PTR_TO_DYNPTR) {
> > > >                               verbose(env, "arg#%d expected pointer to stack or dynptr_ptr\n", i);
> > > >                               return -EINVAL;
> > > >                       }
> > > >
> > > > -                     ret = process_dynptr_func(env, regno, insn_idx,
> > > > -                                               ARG_PTR_TO_DYNPTR | MEM_RDONLY);
> > > > +                     if (meta->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb])
> > > > +                             dynptr_arg_type |= MEM_UNINIT | DYNPTR_TYPE_SKB;
> > > > +                     else
> > > > +                             dynptr_arg_type |= MEM_RDONLY;
> > > > +
> > > > +                     ret = process_dynptr_func(env, regno, insn_idx, dynptr_arg_type,
> > > > +                                               meta->func_id);
> > > >                       if (ret < 0)
> > > >                               return ret;
> > > >                       break;
> > > > +             }
> > > >               case KF_ARG_PTR_TO_LIST_HEAD:
> > > >                       if (reg->type != PTR_TO_MAP_VALUE &&
> > > >                           reg->type != (PTR_TO_BTF_ID | MEM_ALLOC)) {
> > > > @@ -15857,6 +15964,14 @@ static int fixup_kfunc_call(struct bpf_verifier_env *env, struct bpf_insn *insn,
> > > >                  desc->func_id == special_kfunc_list[KF_bpf_rdonly_cast]) {
> > > >               insn_buf[0] = BPF_MOV64_REG(BPF_REG_0, BPF_REG_1);
> > > >               *cnt = 1;
> > > > +     } else if (desc->func_id == special_kfunc_list[KF_bpf_dynptr_from_skb]) {
> > > > +             bool is_rdonly = !may_access_direct_pkt_data(env, NULL, BPF_WRITE);
> > > > +             struct bpf_insn addr[2] = { BPF_LD_IMM64(BPF_REG_4, is_rdonly) };
> > >
> > > Why use 16-byte insn to pass boolean in R4 ?
> > > Single 8-byte MOV would do.
> >
> > Great, I'll change it to a 8-byte MOV
> >
> > >
> > > > +
> > > > +             insn_buf[0] = addr[0];
> > > > +             insn_buf[1] = addr[1];
> > > > +             insn_buf[2] = *insn;
> > > > +             *cnt = 3;
> > > >       }
> > > >       return 0;
> > > >  }
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index 6da78b3d381e..ddb47126071a 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -1684,8 +1684,8 @@ static inline void bpf_pull_mac_rcsum(struct sk_buff *skb)
> > > >               skb_postpull_rcsum(skb, skb_mac_header(skb), skb->mac_len);
> > > >  }
> > > >
> > > > -BPF_CALL_5(bpf_skb_store_bytes, struct sk_buff *, skb, u32, offset,
> > > > -        const void *, from, u32, len, u64, flags)
> > > > +int __bpf_skb_store_bytes(struct sk_buff *skb, u32 offset, const void *from,
> > > > +                       u32 len, u64 flags)
> > >
> > > This change is just to be able to call __bpf_skb_store_bytes() ?
> > > If so, it's unnecessary.
> > > See:
> > > BPF_CALL_4(sk_reuseport_load_bytes,
> > >            const struct sk_reuseport_kern *, reuse_kern, u32, offset,
> > >            void *, to, u32, len)
> > > {
> > >         return ____bpf_skb_load_bytes(reuse_kern->skb, offset, to, len);
> > > }
> > >
> >
> > There was prior feedback [0] that using four underscores to call a
> > helper function is confusing and makes it ungreppable
>
> There are plenty of ungreppable funcs in the kernel.
> Try finding where folio_test_dirty() is defined.
> mm subsystem is full of such 'features'.
> Not friendly for casual kernel code reader, but useful.
>
> Since quadruple underscore is already used in the code base
> I see no reason to sacrifice bpf_skb_load_bytes performance with extra call.

I don't have a preference either way, I'll change it to use the
quadruple underscore in the next version

  reply	other threads:[~2023-01-31 17:55 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-27 19:16 [PATCH v9 bpf-next 0/5] Add skb + xdp dynptrs Joanne Koong
2023-01-27 19:16 ` [PATCH v9 bpf-next 1/5] bpf: Allow "sk_buff" and "xdp_buff" as valid kfunc arg types Joanne Koong
2023-01-27 19:17 ` [PATCH v9 bpf-next 2/5] bpf: Allow initializing dynptrs in kfuncs Joanne Koong
2023-01-27 19:17 ` [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs Joanne Koong
2023-01-29 23:39   ` Alexei Starovoitov
2023-01-31  0:44     ` Joanne Koong
2023-01-31  5:36       ` Alexei Starovoitov
2023-01-31 17:54         ` Joanne Koong [this message]
2023-01-31 19:50           ` Alexei Starovoitov
2023-01-31 21:29             ` Joanne Koong
2023-02-08 21:46           ` Joanne Koong
2023-02-08 23:22             ` Alexei Starovoitov
2023-01-30 22:04   ` Martin KaFai Lau
2023-01-30 22:31     ` Alexei Starovoitov
2023-01-30 23:11       ` Martin KaFai Lau
2023-01-31  1:04       ` Andrii Nakryiko
2023-01-31  1:49         ` Martin KaFai Lau
2023-01-31  4:43           ` Andrii Nakryiko
2023-01-31  5:30             ` Alexei Starovoitov
2023-01-31 22:07               ` Martin KaFai Lau
2023-01-31 23:17               ` Joanne Koong
2023-02-01  0:46                 ` Alexei Starovoitov
2023-02-01  0:11               ` Andrii Nakryiko
2023-02-01  0:40                 ` Alexei Starovoitov
2023-02-02  1:21                   ` Andrii Nakryiko
2023-02-02 11:43                     ` Alexei Starovoitov
2023-02-03 21:37                       ` Andrii Nakryiko
2023-02-08  2:25                         ` Alexei Starovoitov
2023-02-08 20:13                           ` Joanne Koong
2023-02-09  0:39                             ` Andrii Nakryiko
2023-01-31 18:30         ` Joanne Koong
2023-01-31 19:58           ` Alexei Starovoitov
2023-01-31 20:47             ` Joanne Koong
2023-01-31 21:10               ` Alexei Starovoitov
2023-01-31 21:33                 ` Joanne Koong
2023-01-31 18:18     ` Joanne Koong
2023-01-31  0:48   ` Andrii Nakryiko
2023-01-31  0:55     ` Joanne Koong
2023-01-31  1:06       ` Andrii Nakryiko
2023-01-31  1:13         ` Joanne Koong
2023-01-31  1:19           ` Andrii Nakryiko
2023-01-27 19:17 ` [PATCH v9 bpf-next 4/5] bpf: Add xdp dynptrs Joanne Koong
2023-01-27 19:17 ` [PATCH v9 bpf-next 5/5] selftests/bpf: tests for using dynptrs to parse skb and xdp buffers Joanne Koong
2023-01-31  0:49   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJnrk1Z_GB_ynL5kEaVQaxYsPFjad+3dk8JWKqDfvb1VHHavwg@mail.gmail.com \
    --to=joannelkoong@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).