bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 bpf-next 00/11] bpf: tcp: Add SYN Cookie generation/validation SOCK_OPS hooks.
@ 2023-10-13 22:04 Kuniyuki Iwashima
  2023-10-13 22:04 ` [PATCH v1 bpf-next 01/11] tcp: Clean up reverse xmas tree in cookie_v[46]_check() Kuniyuki Iwashima
                   ` (13 more replies)
  0 siblings, 14 replies; 44+ messages in thread
From: Kuniyuki Iwashima @ 2023-10-13 22:04 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	David Ahern, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, Song Liu, Yonghong Song,
	John Fastabend, KP Singh, Stanislav Fomichev, Hao Luo, Jiri Olsa,
	Mykola Lysenko
  Cc: Kuniyuki Iwashima, Kuniyuki Iwashima, bpf, netdev

Under SYN Flood, the TCP stack generates SYN Cookie to remain stateless
for the connection request until a valid ACK is responded to the SYN+ACK.

The cookie contains two kinds of host-specific bits, a timestamp and
secrets, so only can it be validated by the generator.  It means SYN
Cookie consumes network resources between the client and the server;
intermediate nodes must remember which nodes to route ACK for the cookie.

SYN Proxy reduces such unwanted resource allocation by handling 3WHS at
the edge network.  After SYN Proxy completes 3WHS, it forwards SYN to the
backend server and completes another 3WHS.  However, since the server's
ISN differs from the cookie, the proxy must manage the ISN mappings and
fix up SEQ/ACK numbers in every packet for each connection.  If a proxy
node is down, all the connections through it are also down.  Keeping a
state at proxy is painful from that perspective.

At AWS, we use a dirty hack to build truly stateless SYN Proxy at scale.
Our SYN Proxy consists of the front proxy layer and the backend kernel
module.  (See slides of netconf [0], p6 - p15)

The cookie that SYN Proxy generates differs from the kernel's cookie in
that it contains a secret (called rolling salt) (i) shared by all the proxy
nodes so that any node can validate ACK and (ii) updated periodically so
that old cookies cannot be validated.  Also, ISN contains WScale, SACK, and
ECN, not in TS val.  This is not to sacrifice any connection quality, where
some customers turn off the timestamp option due to retro CVE.

After 3WHS, the proxy restores SYN and forwards it and ACK to the backend
server.  Our kernel module works at Netfilter input/output hooks and first
feeds SYN to the TCP stack to initiate 3WHS.  When the module is triggered
for SYN+ACK, it looks up the corresponding request socket and overwrites
tcp_rsk(req)->snt_isn with the proxy's cookie.  Then, the module can
complete 3WHS with the original ACK as is.

This way, our SYN Proxy does not manage the ISN mappings and can stay
stateless.  It's working very well for high-bandwidth services like
multiple Tbps, but we are looking for a way to drop the dirty hack and
further optimise the sequences.

If we could validate an arbitrary SYN Cookie on the backend server with
BPF, the proxy would need not restore SYN nor pass it.  After validating
ACK, the proxy node just needs to forward it, and then the server can do
the lightweight validation (e.g. check if ACK came from proxy nodes, etc)
and create a connection from the ACK.

This series adds two SOCK_OPS hooks to generate and validate arbitrary
SYN Cookie.  Each hook is invoked if BPF_SOCK_OPS_SYNCOOKIE_CB_FLAG is
set to the listening socket in advance by bpf_sock_ops_cb_flags_set().

The user interface looks like this:

  BPF_SOCK_OPS_GEN_SYNCOOKIE_CB

    input
    |- bpf_sock_ops.sk           : 4-tuple
    |- bpf_sock_ops.skb          : TCP header
    |- bpf_sock_ops.args[0]      : MSS
    `- bpf_sock_ops.args[1]      : BPF_SYNCOOKIE_XXX flags

    output
    |- bpf_sock_ops.replylong[0] : ISN (SYN Cookie) ------.
    `- bpf_sock_ops.replylong[1] : TS value -----------.  |
                                                       |  |
  BPF_SOCK_OPS_CHECK_SYNCOOKIE_CB                      |  |
                                                       |  |
    input                                              |  |
    |- bpf_sock_ops.sk           : 4-tuple             |  |
    |- bpf_sock_ops.skb          : TCP header          |  |
    |- bpf_sock_ops.args[0]      : ISN (SYN Cookie) <-----'
    `- bpf_sock_ops.args[1]      : TS value <----------'

    output
    |- bpf_sock_ops.replylong[0] : MSS
    `- bpf_sock_ops.replylong[1] : BPF_SYNCOOKIE_XXX flags

To establish a connection from SYN Cookie, BPF_SOCK_OPS_CHECK_SYNCOOKIE_CB
hook must set a valid MSS to bpf_sock_ops.replylong[0], meaning that
BPF_SOCK_OPS_GEN_SYNCOOKIE_CB hook must encode MSS to ISN or TS val to be
restored in the validation hook.

If WScale, SACK, and ECN are detected to be available in SYN packet, the
corresponding flags are passed to args[0] of BPF_SOCK_OPS_GEN_SYNCOOKIE_CB
so that bpf prog need not parse the TCP header.  The same flags can be set
to replylong[0] of BPF_SOCK_OPS_CHECK_SYNCOOKIE_CB to enable each feature
on the connection.

For details, please see each patch.  Here's an overview:

  patch 1 - 4 : Misc cleanup
  patch 5, 6  : Add SOCK_OPS hook (only ISN is available here)
  patch 7, 8  : Make TS val available as the second cookie storage
  patch 9, 10 : Make WScale, SACK, and ECN configurable from ACK
  patch 11    : selftest, need some help from BPF experts...

[0]: https://netdev.bots.linux.dev/netconf/2023/kuniyuki.pdf


Kuniyuki Iwashima (11):
  tcp: Clean up reverse xmas tree in cookie_v[46]_check().
  tcp: Cache sock_net(sk) in cookie_v[46]_check().
  tcp: Clean up goto labels in cookie_v[46]_check().
  tcp: Don't initialise tp->tsoffset in tcp_get_cookie_sock().
  bpf: tcp: Add SYN Cookie generation SOCK_OPS hook.
  bpf: tcp: Add SYN Cookie validation SOCK_OPS hook.
  bpf: Make bpf_sock_ops.replylong[1] writable.
  bpf: tcp: Make TS available for SYN Cookie storage.
  tcp: Split cookie_ecn_ok().
  bpf: tcp: Make WS, SACK, ECN configurable from BPF SYN Cookie.
  selftest: bpf: Test BPF_SOCK_OPS_(GEN|CHECK)_SYNCOOKIE_CB.

 include/net/inet_sock.h                       |   4 +-
 include/net/tcp.h                             |  46 +++-
 include/uapi/linux/bpf.h                      |  52 ++++-
 net/core/filter.c                             |   2 +-
 net/ipv4/syncookies.c                         | 219 +++++++++++-------
 net/ipv4/tcp_input.c                          |  53 ++++-
 net/ipv6/syncookies.c                         |  94 +++++---
 tools/include/uapi/linux/bpf.h                |  52 ++++-
 .../selftests/bpf/prog_tests/tcp_syncookie.c  |  84 +++++++
 .../selftests/bpf/progs/test_siphash.h        |  65 ++++++
 .../selftests/bpf/progs/test_tcp_syncookie.c  | 170 ++++++++++++++
 .../selftests/bpf/test_tcp_hdr_options.h      |   8 +-
 12 files changed, 715 insertions(+), 134 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/tcp_syncookie.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_siphash.h
 create mode 100644 tools/testing/selftests/bpf/progs/test_tcp_syncookie.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2023-10-24 17:55 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-13 22:04 [PATCH v1 bpf-next 00/11] bpf: tcp: Add SYN Cookie generation/validation SOCK_OPS hooks Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 01/11] tcp: Clean up reverse xmas tree in cookie_v[46]_check() Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 02/11] tcp: Cache sock_net(sk) " Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 03/11] tcp: Clean up goto labels " Kuniyuki Iwashima
2023-10-17  0:00   ` Kui-Feng Lee
2023-10-17  0:30     ` Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 04/11] tcp: Don't initialise tp->tsoffset in tcp_get_cookie_sock() Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 05/11] bpf: tcp: Add SYN Cookie generation SOCK_OPS hook Kuniyuki Iwashima
2023-10-18  0:54   ` Martin KaFai Lau
2023-10-18 17:00     ` Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 06/11] bpf: tcp: Add SYN Cookie validation " Kuniyuki Iwashima
2023-10-16 20:38   ` Stanislav Fomichev
2023-10-16 22:02     ` Kuniyuki Iwashima
2023-10-17 16:52   ` Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 07/11] bpf: Make bpf_sock_ops.replylong[1] writable Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 08/11] bpf: tcp: Make TS available for SYN Cookie storage Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 09/11] tcp: Split cookie_ecn_ok() Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 10/11] bpf: tcp: Make WS, SACK, ECN configurable from BPF SYN Cookie Kuniyuki Iwashima
2023-10-18  1:08   ` Martin KaFai Lau
2023-10-18 17:02     ` Kuniyuki Iwashima
2023-10-13 22:04 ` [PATCH v1 bpf-next 11/11] selftest: bpf: Test BPF_SOCK_OPS_(GEN|CHECK)_SYNCOOKIE_CB Kuniyuki Iwashima
2023-10-17  5:50   ` Martin KaFai Lau
2023-10-17 16:29     ` Kuniyuki Iwashima
2023-10-16 13:05 ` [PATCH v1 bpf-next 00/11] bpf: tcp: Add SYN Cookie generation/validation SOCK_OPS hooks Daniel Borkmann
2023-10-16 16:11   ` Kuniyuki Iwashima
2023-10-16 14:19 ` Willem de Bruijn
2023-10-16 16:46   ` Kuniyuki Iwashima
2023-10-16 18:41     ` Willem de Bruijn
2023-10-17  5:53 ` Martin KaFai Lau
2023-10-17 16:48   ` Kuniyuki Iwashima
2023-10-18  6:19     ` Martin KaFai Lau
2023-10-18  8:02       ` Eric Dumazet
2023-10-18 17:20         ` Kuniyuki Iwashima
2023-10-18 21:47           ` Kui-Feng Lee
2023-10-18 22:31             ` Kuniyuki Iwashima
2023-10-19  7:25               ` Martin KaFai Lau
2023-10-19 18:01                 ` Kuniyuki Iwashima
2023-10-20 19:59                   ` Martin KaFai Lau
2023-10-20 23:10                     ` Kuniyuki Iwashima
2023-10-21  6:48                       ` Kuniyuki Iwashima
2023-10-23 21:35                         ` Martin KaFai Lau
2023-10-24  0:37                           ` Kui-Feng Lee
2023-10-24  1:22                             ` Kuniyuki Iwashima
2023-10-24 17:55                               ` Kui-Feng Lee

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).