From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>,
Joanne Koong <joannelkoong@gmail.com>,
daniel@iogearbox.net, andrii@kernel.org, martin.lau@kernel.org,
ast@kernel.org, netdev@vger.kernel.org, memxor@gmail.com,
kernel-team@fb.com, bpf@vger.kernel.org
Subject: Re: [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs
Date: Wed, 1 Feb 2023 17:21:39 -0800 [thread overview]
Message-ID: <CAEf4BzbTXqhsKqPd=hDANKeg75UDbKjtX318ucMGw7a1L3693w@mail.gmail.com> (raw)
In-Reply-To: <20230201004034.sea642affpiu7yfm@macbook-pro-6.dhcp.thefacebook.com>
On Tue, Jan 31, 2023 at 4:40 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Jan 31, 2023 at 04:11:47PM -0800, Andrii Nakryiko wrote:
> > >
> > > When prog is just parsing the packet it doesn't need to finalize with bpf_dynptr_write.
> > > The prog can always write into the pointer followed by if (p == buf) bpf_dynptr_write.
> > > No need for rdonly flag, but extra copy is there in case of cloned which
> > > could have been avoided with extra rd_only flag.
> >
> > Yep, given we are designing bpf_dynptr_slice for performance, extra
> > copy on reads is unfortunate. ro/rw flag or have separate
> > bpf_dynptr_slice_rw vs bpf_dynptr_slice_ro?
>
> Either flag or two kfuncs sound good to me.
Would it make sense to make bpf_dynptr_slice() as read-only variant,
and bpf_dynptr_slice_rw() for read/write? I think the common case is
read-only, right? And if users mistakenly use bpf_dynptr_slice() for
r/w case, they will get a verifier error when trying to write into the
returned pointer. While if we make bpf_dynptr_slice() as read-write,
users won't realize they are paying a performance penalty for
something that they don't actually need.
>
> > > Yes and No. bpf_skb_store_bytes is doing pull followed by memcpy,
> > > while xdp_store_bytes does scatter gather copy into frags.
> > > We should probably add similar copy to skb case to avoid allocations and pull.
> > > Then in case of:
> > > if (p == buf) {
> > > bpf_dynptr_write(dp, buf, 16);
> > > }
> > >
> > > the write will guarantee to succeed for both xdp and skb and the user
> > > doesn't need to add error checking for alloc failures in case of skb.
> > >
> >
> > That seems like a nice guarantee, agreed.
>
> Just grepped through few projects that use skb_store_bytes.
> Everywhere it looks like:
> if (bpf_skb_store_byte(...))
> return error;
>
> Not a pretty code to read.
> I should prioritize bpf_assert() work, so we can assert from inside of
> bpf_dynptr_write() eventually and remove all these IFs.
>
> > > > >
> > > > > > But I wonder if for simple cases when users are mostly sure that they
> > > > > > are going to access only header data directly we can have an option
> > > > > > for bpf_dynptr_from_skb() to specify what should be the behavior for
> > > > > > bpf_dynptr_slice():
> > > > > >
> > > > > > - either return NULL for anything that crosses into frags (no
> > > > > > surprising perf penalty, but surprising NULLs);
> > > > > > - do bpf_skb_pull_data() if bpf_dynptr_data() needs to point to data
> > > > > > beyond header (potential perf penalty, but on NULLs, if off+len is
> > > > > > within packet).
> > > > > >
> > > > > > And then bpf_dynptr_from_skb() can accept a flag specifying this
> > > > > > behavior and store it somewhere in struct bpf_dynptr.
> > > > >
> > > > > xdp does not have the bpf_skb_pull_data() equivalent, so xdp prog will still
> > > > > need the write back handling.
> > > > >
> > > >
> > > > Sure, unfortunately, can't have everything. I'm just thinking how to
> > > > make bpf_dynptr_data() generically usable. Think about some common BPF
> > > > routine that calculates hash for all bytes pointed to by dynptr,
> > > > regardless of underlying dynptr type; it can iterate in small chunks,
> > > > get memory slice, if possible, but fallback to generic
> > > > bpf_dynptr_read() if doesn't. This will work for skb, xdp, LOCAL,
> > > > RINGBUF, any other dynptr type.
> > >
> > > It looks to me that dynptr on top of skb, xdp, local can work as generic reader,
> > > but dynptr as a generic writer doesn't look possible.
> > > BPF_F_RECOMPUTE_CSUM and BPF_F_INVALIDATE_HASH are special to skb.
> > > There is also bpf_skb_change_proto and crazy complex bpf_skb_adjust_room.
> > > I don't think writing into skb vs xdp vs ringbuf are generalizable.
> > > The prog needs to do a ton more work to write into skb correctly.
> >
> > If that's the case, then yeah, bpf_dynptr_write() can just return
> > error for skb/xdp dynptrs?
>
> You mean to error when these skb only flags are present, but dynptr->type == xdp ?
> Yep. I don't see another option. My point was that dynptr doesn't quite work as an
> abstraction for writing into networking things.
agreed
> While libraries like: parse_http(&dynptr), compute_hash(&dynptr), find_string(&dynptr)
> can indeed be generic and work with raw bytes, skb, xdp as an input,
> which I think was on top of your wishlist for dynptr.
yep, it would be a great property
next prev parent reply other threads:[~2023-02-02 1:21 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-27 19:16 [PATCH v9 bpf-next 0/5] Add skb + xdp dynptrs Joanne Koong
2023-01-27 19:16 ` [PATCH v9 bpf-next 1/5] bpf: Allow "sk_buff" and "xdp_buff" as valid kfunc arg types Joanne Koong
2023-01-27 19:17 ` [PATCH v9 bpf-next 2/5] bpf: Allow initializing dynptrs in kfuncs Joanne Koong
2023-01-27 19:17 ` [PATCH v9 bpf-next 3/5] bpf: Add skb dynptrs Joanne Koong
2023-01-29 23:39 ` Alexei Starovoitov
2023-01-31 0:44 ` Joanne Koong
2023-01-31 5:36 ` Alexei Starovoitov
2023-01-31 17:54 ` Joanne Koong
2023-01-31 19:50 ` Alexei Starovoitov
2023-01-31 21:29 ` Joanne Koong
2023-02-08 21:46 ` Joanne Koong
2023-02-08 23:22 ` Alexei Starovoitov
2023-01-30 22:04 ` Martin KaFai Lau
2023-01-30 22:31 ` Alexei Starovoitov
2023-01-30 23:11 ` Martin KaFai Lau
2023-01-31 1:04 ` Andrii Nakryiko
2023-01-31 1:49 ` Martin KaFai Lau
2023-01-31 4:43 ` Andrii Nakryiko
2023-01-31 5:30 ` Alexei Starovoitov
2023-01-31 22:07 ` Martin KaFai Lau
2023-01-31 23:17 ` Joanne Koong
2023-02-01 0:46 ` Alexei Starovoitov
2023-02-01 0:11 ` Andrii Nakryiko
2023-02-01 0:40 ` Alexei Starovoitov
2023-02-02 1:21 ` Andrii Nakryiko [this message]
2023-02-02 11:43 ` Alexei Starovoitov
2023-02-03 21:37 ` Andrii Nakryiko
2023-02-08 2:25 ` Alexei Starovoitov
2023-02-08 20:13 ` Joanne Koong
2023-02-09 0:39 ` Andrii Nakryiko
2023-01-31 18:30 ` Joanne Koong
2023-01-31 19:58 ` Alexei Starovoitov
2023-01-31 20:47 ` Joanne Koong
2023-01-31 21:10 ` Alexei Starovoitov
2023-01-31 21:33 ` Joanne Koong
2023-01-31 18:18 ` Joanne Koong
2023-01-31 0:48 ` Andrii Nakryiko
2023-01-31 0:55 ` Joanne Koong
2023-01-31 1:06 ` Andrii Nakryiko
2023-01-31 1:13 ` Joanne Koong
2023-01-31 1:19 ` Andrii Nakryiko
2023-01-27 19:17 ` [PATCH v9 bpf-next 4/5] bpf: Add xdp dynptrs Joanne Koong
2023-01-27 19:17 ` [PATCH v9 bpf-next 5/5] selftests/bpf: tests for using dynptrs to parse skb and xdp buffers Joanne Koong
2023-01-31 0:49 ` Andrii Nakryiko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAEf4BzbTXqhsKqPd=hDANKeg75UDbKjtX318ucMGw7a1L3693w@mail.gmail.com' \
--to=andrii.nakryiko@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=joannelkoong@gmail.com \
--cc=kernel-team@fb.com \
--cc=martin.lau@kernel.org \
--cc=martin.lau@linux.dev \
--cc=memxor@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).