bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: bpf@vger.kernel.org
Cc: netdev@vger.kernel.org, kernel-team@cloudflare.com,
	John Fastabend <john.fastabend@gmail.com>,
	Martin KaFai Lau <kafai@fb.com>
Subject: [PATCH bpf-next 0/8] Extend SOCKMAP to store listening sockets
Date: Sat, 23 Nov 2019 12:07:43 +0100	[thread overview]
Message-ID: <20191123110751.6729-1-jakub@cloudflare.com> (raw)

This patch set makes SOCKMAP more flexible by allowing it to hold TCP
sockets that are either in established or listening state. With it SOCKMAP
can act as a drop-in replacement for REUSEPORT_SOCKARRAY which reuseport
BPF programs use. Granted, it is limited to only TCP sockets.

The idea started out at LPC '19 as feedback from John Fastabend to our
troubles with repurposing REUSEPORT_SOCKARRAY as a collection of listening
sockets accessed by a BPF program ran on socket lookup [1]. Without going
into details, REUSEPORT_SOCKARRAY proved to be tightly coupled with
reuseport logic. Talk from LPC (see slides [2] or video [3]) highlights
what problems we ran into when trying to make REUSEPORT_SOCKARRAY work for
our use-case.

Patches have evolved quite a bit since the RFC series from a month ago
[4]. To recap the RFC feedback, John pointed out that BPF redirect helpers
for SOCKMAP need sane semantics when used with listening sockets [5], and
that SOCKMAP lookup from BPF would be useful [6]. While Martin asked for
UDP support [7].

As it happens, patches needed more work to get SOCKMAP to actually behave
correctly with listening sockets. It turns out flexibility has its
price. Change log below outlines them all.

With more than I would like patches in the set, I left the new features,
lookup from BPF as well as UDP support, for another series. I'm quite happy
with how the changes turned out and the test coverage so I'm boldly
proposing it as v1 :-)

Curious to see what you think.

RFC -> v1:

- Switch from overriding proto->accept to af_ops->syn_recv_sock, which
  happens earlier. Clearing the psock state after accept() does not work
  for child sockets that become orphaned (never got accepted). v4-mapped
  sockets need special care.

- Return the socket cookie on SOCKMAP lookup from syscall to be on par with
  REUSEPORT_SOCKARRAY. Requires SOCKMAP to take u64 on lookup/update from
  syscall.

- Make bpf_sk_redirect_map (ingress) and bpf_msg_redirect_map (egress)
  SOCKMAP helpers fail when target socket is a listening one.

- Make bpf_sk_select_reuseport helper fail when target is a TCP established
  socket.

- Teach libbpf to recognize SK_REUSEPORT program type from section name.

- Add a dedicated set of tests for SOCKMAP holding listening sockets,
  covering map operations, overridden socket callbacks, and BPF helpers.

Thanks,
Jakub

[1] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
[2] https://linuxplumbersconf.org/event/4/contributions/487/
[3] https://www.youtube.com/watch?v=qRDoUpqvYjY
[4] https://lore.kernel.org/bpf/20191022113730.29303-1-jakub@cloudflare.com/
[5] https://lore.kernel.org/bpf/5db1da20174b1_5c282ada047205c046@john-XPS-13-9370.notmuch/
[6] https://lore.kernel.org/bpf/5db1d7a810bdb_5c282ada047205c08f@john-XPS-13-9370.notmuch/
[7] https://lore.kernel.org/bpf/20191028213804.yv3xfjjlayfghkcr@kafai-mbp/


Jakub Sitnicki (8):
  bpf, sockmap: Return socket cookie on lookup from syscall
  bpf, sockmap: Let all kernel-land lookup values in SOCKMAP
  bpf, sockmap: Allow inserting listening TCP sockets into SOCKMAP
  bpf, sockmap: Don't let child socket inherit psock or its ops on copy
  bpf: Allow selecting reuseport socket from a SOCKMAP
  libbpf: Recognize SK_REUSEPORT programs from section name
  selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP
  selftests/bpf: Tests for SOCKMAP holding listening sockets

 include/linux/skmsg.h                         |  17 +-
 kernel/bpf/verifier.c                         |   6 +-
 net/core/filter.c                             |   2 +
 net/core/sock_map.c                           |  68 +-
 net/ipv4/tcp_bpf.c                            |  66 +-
 tools/lib/bpf/libbpf.c                        |   1 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   9 +-
 .../bpf/progs/test_sockmap_listen_kern.c      |  75 ++
 tools/testing/selftests/bpf/test_maps.c       |   6 +-
 .../selftests/bpf/test_select_reuseport.c     | 141 ++-
 .../selftests/bpf/test_select_reuseport.sh    |  14 +
 .../selftests/bpf/test_sockmap_listen.c       | 820 ++++++++++++++++++
 13 files changed, 1170 insertions(+), 56 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_sockmap_listen_kern.c
 create mode 100755 tools/testing/selftests/bpf/test_select_reuseport.sh
 create mode 100644 tools/testing/selftests/bpf/test_sockmap_listen.c

-- 
2.20.1


             reply	other threads:[~2019-11-23 11:08 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-23 11:07 Jakub Sitnicki [this message]
2019-11-23 11:07 ` [PATCH bpf-next 1/8] bpf, sockmap: Return socket cookie on lookup from syscall Jakub Sitnicki
2019-11-24  5:32   ` John Fastabend
2019-11-23 11:07 ` [PATCH bpf-next 2/8] bpf, sockmap: Let all kernel-land lookup values in SOCKMAP Jakub Sitnicki
2019-11-24  5:35   ` John Fastabend
2019-11-23 11:07 ` [PATCH bpf-next 3/8] bpf, sockmap: Allow inserting listening TCP sockets into SOCKMAP Jakub Sitnicki
2019-11-24  5:38   ` John Fastabend
2019-11-23 11:07 ` [PATCH bpf-next 4/8] bpf, sockmap: Don't let child socket inherit psock or its ops on copy Jakub Sitnicki
2019-11-24  5:56   ` John Fastabend
2019-11-25 22:38   ` Martin Lau
2019-11-26 15:54     ` Jakub Sitnicki
2019-11-26 17:16       ` Martin Lau
2019-11-26 18:36         ` Jakub Sitnicki
     [not found]           ` <87sglsfdda.fsf@cloudflare.com>
2019-12-11 17:20             ` Martin Lau
2019-12-12 11:27               ` Jakub Sitnicki
2019-12-12 19:23                 ` Martin Lau
2019-12-17 15:06                   ` Jakub Sitnicki
2019-11-26 18:43         ` John Fastabend
2019-11-27 22:18           ` Jakub Sitnicki
2019-11-23 11:07 ` [PATCH bpf-next 5/8] bpf: Allow selecting reuseport socket from a SOCKMAP Jakub Sitnicki
2019-11-24  5:57   ` John Fastabend
2019-11-25  1:24   ` Alexei Starovoitov
2019-11-25  4:17     ` John Fastabend
2019-11-25 10:40       ` Jakub Sitnicki
2019-11-25 22:07         ` Martin Lau
2019-11-26 14:30           ` Jakub Sitnicki
2019-11-26 19:03             ` Martin Lau
2019-11-27 21:34               ` Jakub Sitnicki
2019-11-23 11:07 ` [PATCH bpf-next 6/8] libbpf: Recognize SK_REUSEPORT programs from section name Jakub Sitnicki
2019-11-24  5:57   ` John Fastabend
2019-11-23 11:07 ` [PATCH bpf-next 7/8] selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP Jakub Sitnicki
2019-11-24  6:00   ` John Fastabend
2019-11-25 22:30   ` Martin Lau
2019-11-26 14:32     ` Jakub Sitnicki
2019-12-12 10:30     ` Jakub Sitnicki
2019-11-23 11:07 ` [PATCH bpf-next 8/8] selftests/bpf: Tests for SOCKMAP holding listening sockets Jakub Sitnicki
2019-11-24  6:04   ` John Fastabend
2019-11-24  6:10 ` [PATCH bpf-next 0/8] Extend SOCKMAP to store " John Fastabend
2019-11-25  9:22   ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191123110751.6729-1-jakub@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --subject='Re: [PATCH bpf-next 0/8] Extend SOCKMAP to store listening sockets' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox