From: Martin KaFai Lau <kafai@fb.com>
To: Eric Dumazet <edumazet@google.com>
Cc: bpf <bpf@vger.kernel.org>, Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
kernel-team <kernel-team@fb.com>, Lawrence Brakmo <brakmo@fb.com>,
Neal Cardwell <ncardwell@google.com>,
netdev <netdev@vger.kernel.org>,
Yuchung Cheng <ycheng@google.com>
Subject: Re: [PATCH v3 bpf-next 6/9] bpf: tcp: Allow bpf prog to write and parse TCP header option
Date: Fri, 31 Jul 2020 10:59:13 -0700 [thread overview]
Message-ID: <20200731175913.v4r2qjcvflehtyii@kafai-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CANn89i+5RKTcBFqueEs48HUadC+dO54eR7Yp5pBJ6zgbosTDCQ@mail.gmail.com>
On Fri, Jul 31, 2020 at 09:06:57AM -0700, Eric Dumazet wrote:
> On Thu, Jul 30, 2020 at 1:57 PM Martin KaFai Lau <kafai@fb.com> wrote:
> >
> > The earlier effort in BPF-TCP-CC allows the TCP Congestion Control
> > algorithm to be written in BPF. It opens up opportunities to allow
> > a faster turnaround time in testing/releasing new congestion control
> > ideas to production environment.
> >
> > The same flexibility can be extended to writing TCP header option.
> > It is not uncommon that people want to test new TCP header option
> > to improve the TCP performance. Another use case is for data-center
> > that has a more controlled environment and has more flexibility in
> > putting header options for internal only use.
> >
> > For example, we want to test the idea in putting maximum delay
> > ACK in TCP header option which is similar to a draft RFC proposal [1].
> >
> > This patch introduces the necessary BPF API and use them in the
> > TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse
> > and write TCP header options. It currently supports most of
> > the TCP packet except RST.
> >
> > Supported TCP header option:
> > ───────────────────────────
> > This patch allows the bpf-prog to write any option kind.
> > Different bpf-progs can write its own option by calling the new helper
> > bpf_store_hdr_opt(). The helper will ensure there is no duplicated
> > option in the header.
> >
> > By allowing bpf-prog to write any option kind, this gives a lot of
> > flexibility to the bpf-prog. Different bpf-prog can write its
> > own option kind. It could also allow the bpf-prog to support a
> > recently standardized option on an older kernel.
> >
> > Sockops Callback Flags:
> > ──────────────────────
> > The header parsing and writing callback can be turned on
> > by enabling a few newly added callback flags:
> >
> > BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG:
> > Call bpf when kernel has received a header option that
> > the kernel cannot handle. It is useful when the peer doesn't
> > send bpf-options very often.
> >
> > The bpf-prog can inspect the received header by sock_ops->skb_data
> > which covers the whole header (including the fixed fields like
> > ports, flags...etc) or
> > use the new bpf_load_hdr_opt() to search for a particular TCP
> > header option.
> >
> >
> >
> >
>
> > [1]: draft-wang-tcpm-low-latency-opt-00
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_draft-2Dwang-2Dtcpm-2Dlow-2Dlatency-2Dopt-2D00&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=Z-syoz304fodO8xPKCcJh0QYhXbb7_XVuRgTINFba2U&s=Ad66Zb5r0utWgnrB-QuDXBft6G1HXW2C_aBV9fTMxoo&e=
> >
> > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > ---
> > include/linux/bpf-cgroup.h | 25 +++
> > include/linux/filter.h | 4 +
> > include/net/tcp.h | 53 ++++-
> > include/uapi/linux/bpf.h | 231 ++++++++++++++++++++-
> > net/core/filter.c | 365 +++++++++++++++++++++++++++++++++
> > net/ipv4/tcp_fastopen.c | 2 +-
> > net/ipv4/tcp_input.c | 86 +++++++-
> > net/ipv4/tcp_ipv4.c | 3 +-
> > net/ipv4/tcp_minisocks.c | 1 +
> > net/ipv4/tcp_output.c | 194 ++++++++++++++++--
> > net/ipv6/tcp_ipv6.c | 3 +-
> > tools/include/uapi/linux/bpf.h | 231 ++++++++++++++++++++-
> > 12 files changed, 1171 insertions(+), 27 deletions(-)
>
> This is a truly gigantic patch.
>
> Could you split it in maybe two parts ?
Yes.
Most of the code changes in TCP are calling out the bpf prog to parse and
write header. Thus, they are all in this one patch.
I will put those callout changes (and a few func arg changes) in TCP
to a separate patch but leave the bpf callout function empty.
Then the next bpf specific patch will fill out those empty bpf
callout functions.
>
> This way I could focus on the TCP changes, and let eBPF experts focus
> on BPF changes.
Thanks for the review!
next prev parent reply other threads:[~2020-07-31 17:59 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-30 20:56 [PATCH v3 bpf-next 0/9] BPF TCP header options Martin KaFai Lau
2020-07-30 20:57 ` [PATCH v3 bpf-next 1/9] tcp: Use a struct to represent a saved_syn Martin KaFai Lau
2020-07-31 15:57 ` Eric Dumazet
2020-07-31 17:31 ` Eric Dumazet
2020-07-30 20:57 ` [PATCH v3 bpf-next 2/9] tcp: bpf: Add TCP_BPF_DELACK_MAX setsockopt Martin KaFai Lau
2020-07-30 20:57 ` [PATCH v3 bpf-next 3/9] tcp: bpf: Add TCP_BPF_RTO_MIN for bpf_setsockopt Martin KaFai Lau
2020-07-30 20:57 ` [PATCH v3 bpf-next 4/9] tcp: Add unknown_opt arg to tcp_parse_options Martin KaFai Lau
2020-07-31 16:12 ` Eric Dumazet
2020-07-31 17:37 ` Martin KaFai Lau
2020-07-30 20:57 ` [PATCH v3 bpf-next 5/9] bpf: sock_ops: Change some members of sock_ops_kern from u32 to u8 Martin KaFai Lau
2020-07-30 20:57 ` [PATCH v3 bpf-next 6/9] bpf: tcp: Allow bpf prog to write and parse TCP header option Martin KaFai Lau
2020-07-31 16:06 ` Eric Dumazet
2020-07-31 17:59 ` Martin KaFai Lau [this message]
2020-07-30 20:57 ` [PATCH v3 bpf-next 7/9] bpf: selftests: Add fastopen_connect to network_helpers Martin KaFai Lau
2020-07-30 20:57 ` [PATCH v3 bpf-next 8/9] bpf: selftests: tcp header options Martin KaFai Lau
2020-07-30 20:57 ` [PATCH v3 bpf-next 9/9] tcp: bpf: Optionally store mac header in TCP_SAVE_SYN Martin KaFai Lau
2020-07-31 15:51 ` Eric Dumazet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200731175913.v4r2qjcvflehtyii@kafai-mbp.dhcp.thefacebook.com \
--to=kafai@fb.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=brakmo@fb.com \
--cc=daniel@iogearbox.net \
--cc=edumazet@google.com \
--cc=kernel-team@fb.com \
--cc=ncardwell@google.com \
--cc=netdev@vger.kernel.org \
--cc=ycheng@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).