bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Martin KaFai Lau <kafai@fb.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <edumazet@google.com>, <kernel-team@fb.com>,
	Lawrence Brakmo <brakmo@fb.com>,
	Neal Cardwell <ncardwell@google.com>, <netdev@vger.kernel.org>,
	Yuchung Cheng <ycheng@google.com>
Subject: Re: [PATCH bpf-next 04/10] bpf: tcp: Allow bpf prog to write and parse BPF TCP header option
Date: Sun, 28 Jun 2020 17:34:48 -0700	[thread overview]
Message-ID: <20200629003448.hstswzhn4eakv36f@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <20200628182427.qt7vpjohwkxvixfi@ast-mbp.dhcp.thefacebook.com>

On Sun, Jun 28, 2020 at 11:24:27AM -0700, Alexei Starovoitov wrote:
> On Fri, Jun 26, 2020 at 10:55:26AM -0700, Martin KaFai Lau wrote:
> > 
> > Parsing BPF Header Option
> > ─────────────────────────
> > 
> > As mentioned earlier, the received SYN/SYNACK/ACK during the 3WHS
> > will be available to some specific CB (e.g. the *_ESTABLISHED_CB)
> > 
> > For established connection, if the kernel finds a bpf header
> > option (i.e. option with kind:254 and magic:0xeB9F) and the
> > the "PARSE_HDR_OPT_CB_FLAG" flag is set,  the
> > bpf prog will be called in the "BPF_SOCK_OPS_PARSE_HDR_OPT_CB" op.
> > The received skb will be available through sock_ops->skb_data
> > and the bpf header option offset will also be specified
> > in sock_ops->skb_bpf_hdr_opt_off.
> 
> TCP noob question:
> - can tcp header have two or more options with the same kind and magic?
> I scanned draft-ietf-tcpm-experimental-options-00.txt and it seems
> it's not prohibiting collisions.
> So should be ok?
I also think it is ok.  Regardless of kind, the kernel's tcp_parse_options()
seems to be ok on duplication also.

> Why I'm asking... I think existing bpf_sock_ops style of running
> multiple bpf progs is gonna be awkward to use.
> Picking the max of bpf_reserve_hdr_opt() from many calls and let
> parent bpf progs override children written headers feels a bit hackish.
> I feel the users will thank us if we design the progs to be more
> isolated and independent somehow.
> I was thinking may be each bpf prog will bpf_reserve_hdr_opt()
> and bpf_store_hdr_opt() into their own option?
> Then during option writing side the tcp header will have two or more
> options with the same kind and magic.
> Obviously it creates a headache during parsing. Which bpf prog
> should be called for each option?
> 
> I suspect tcp draft actually prefers all options to have unique kind+magic.
> Can we add an attribute to prog load time that will request particular magic ?
> Then only that _one_ program will be called for the given kind+magic.
> We can still have multiple progs attached to a cgroup (likely root cgroup)
> and different progs will take care of parsing and writing their own option.
> cgroup attaching side can make sure that multi progs have different magics.
Interesting idea.

If the magic can be specified at load time,
may be extend this for the "length" requirement too.  At load time,
both magic and length should be specified.  The total length can
be calculated during the attach time.  That will avoid making
an extra call to bpf prog to learn the length.

If we don't limit magic, I think we should discuss if we need to limit the
kind to 254 too.  How about we allow user to write any option kind?  That can
save 2 byte magic from the limited TCP option spaces.  At load
time, we can definitely reject the kind that the kernel is already
writing, e.g. timestamp, sack...etc.

When a tcp pkt is received at an established sk, this patch decides
if there is 0xeB9F option before calling into bpf prog.  That needs
to be adjusted as well.  It could be changed to: call into bpf prog
if there is option "kind" that the kernel cannot handle and the
PARSE_HDR_CB_FLAGS is set.

> Saving multiple bpf_hdr_opt_off in patch 2 for different magics becomes
> problematic. So clearly the implementation complexity shots through the roof
> with above proposal, but it feels more flexible and more user friendly?
> Not a strong opinion. The feature is great as-is.
If we go with the above route that multiple cgroup-bpf has multiple
kinds (or magics),  each of them has to parse the tcphdr to figure
out where is its own kind (or magic).  The intention of providing
bpf_hdr_opt_off in this patch is mostly for bpf-prog convenience only.
I think having it work nicer in multi bpf prog is the proper trade off.

  reply	other threads:[~2020-06-29  0:35 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-26 17:55 [PATCH bpf-next 00/10] BPF TCP header options Martin KaFai Lau
2020-06-26 17:55 ` [PATCH bpf-next 01/10] tcp: Use a struct to represent a saved_syn Martin KaFai Lau
2020-06-27 17:41   ` Eric Dumazet
2020-06-30 23:24     ` Martin KaFai Lau
2020-06-30 23:35       ` Eric Dumazet
2020-06-26 17:55 ` [PATCH bpf-next 02/10] tcp: bpf: Parse BPF experimental header option Martin KaFai Lau
2020-06-27 16:44   ` Eric Dumazet
2020-06-27 17:17   ` Eric Dumazet
2020-06-28 23:44     ` Martin KaFai Lau
2020-06-29  0:45     ` Martin KaFai Lau
2020-06-26 17:55 ` [PATCH bpf-next 03/10] bpf: sock_ops: Change some members of sock_ops_kern from u32 to u8 Martin KaFai Lau
2020-06-26 17:55 ` [PATCH bpf-next 04/10] bpf: tcp: Allow bpf prog to write and parse BPF TCP header option Martin KaFai Lau
2020-06-28 18:24   ` Alexei Starovoitov
2020-06-29  0:34     ` Martin KaFai Lau [this message]
2020-07-02  5:31       ` Martin KaFai Lau
2020-06-26 17:55 ` [PATCH bpf-next 05/10] bpf: selftests: A few improvements to network_helpers.c Martin KaFai Lau
2020-06-26 17:55 ` [PATCH bpf-next 06/10] bpf: selftests: Add fastopen_connect to network_helpers Martin KaFai Lau
2020-06-26 17:55 ` [PATCH bpf-next 07/10] bpf: selftests: Restore netns after each test Martin KaFai Lau
2020-06-26 22:45   ` Andrii Nakryiko
2020-06-27  0:23     ` Martin KaFai Lau
2020-06-27 20:31       ` Andrii Nakryiko
2020-06-29 18:00         ` Martin KaFai Lau
2020-06-29 18:13           ` Andrii Nakryiko
2020-06-29 18:24             ` Martin KaFai Lau
2020-06-26 17:55 ` [PATCH bpf-next 08/10] bpf: selftests: tcp header options Martin KaFai Lau
2020-06-26 17:55 ` [PATCH bpf-next 09/10] tcp: bpf: Add TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN to bpf_setsockopt Martin KaFai Lau
2020-06-27 17:30   ` Eric Dumazet
2020-06-26 17:56 ` [PATCH bpf-next 10/10] bpf: selftest: Add test for TCP_BPF_DELACK_MAX and TCP_BPF_RTO_MIN Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200629003448.hstswzhn4eakv36f@kafai-mbp.dhcp.thefacebook.com \
    --to=kafai@fb.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brakmo@fb.com \
    --cc=daniel@iogearbox.net \
    --cc=edumazet@google.com \
    --cc=kernel-team@fb.com \
    --cc=ncardwell@google.com \
    --cc=netdev@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).