Netdev Archive on lore.kernel.org
 help / color / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: netdev@vger.kernel.org, bpf@vger.kernel.org
Cc: kernel-team@cloudflare.com
Subject: [RFC bpf-next 0/7] Programming socket lookup with BPF
Date: Tue, 18 Jun 2019 15:00:43 +0200
Message-ID: <20190618130050.8344-1-jakub@cloudflare.com> (raw)

We have been exploring an idea of making listening socket
lookup (inet_lookup) programmable with BPF.

Why? At last Netdev Marek talked [1] about two limitations of bind() API
we're hitting when running services on our edge servers:

1) sharing a port between two services

   Services are accepting connections on different (disjoint) IP ranges but
   use the same port. Say, packets to 192.0.2.0/24 tcp/80 go to NGINX,
   while 198.51.100.0/24 tcp/80 is handled by Apache. Servers are running
   as different users, in a flat single-netns setup.

2) receiving traffic on all ports

   Proxy accepts connections a specific IP range but on any port [2].

In both cases we've found that bind() and a combination of INADDR_ANY,
SO_REUSEADDR, or SO_REUSEPORT doesn't allow for the setup we need, short of
binding each service to every IP:port pair combination :-)

We've resorted at first to custom patches [3], and more recently to traffic
steering with TPROXY. Not without pain points:

 - XDP programs using bpf_sk_lookup helpers, like load balancers, can't
   find the listening socket to check for SYN cookies with TPROXY redirect.

 - TPROXY takes a reference to the listening socket on dispatch, which
   raises lock contention concerns.

 - Traffic steering configuration is split over several iptables rules, at
   least one per service, which makes configuration changes error prone.

Now back to the patch set, it introduces a new BPF program type, dubbed
inet_lookup, that runs before listening socket lookup, and can override the
destination IP:port pair used as lookup key. Program attaches to netns in
scope of which the lookup happens.

What an inet_lookup program might look like? For the mentioned scenario
with two HTTP servers sharing port 80:

#define NET1 (IP4(192,  0,   2, 0) >> 8)
#define NET2 (IP4(198, 51, 100, 0) >> 8)

SEC("inet_lookup/demo_two_http_servers")
int demo_two_http_servers(struct bpf_inet_lookup *ctx)
{
        if (ctx->family != AF_INET)
                return BPF_OK;
        if (ctx->local_port != 80)
                return BPF_OK;

        switch (bpf_ntohl(ctx->local_ip4) >> 8) {
        case NET1:
                ctx->local_ip4 = bpf_htonl(IP4(127, 0, 0, 1));
                ctx->local_port = 81;
                return BPF_REDIRECT;
        case NET2:
                ctx->local_ip4 = bpf_htonl(IP4(127, 0, 0, 1));
                ctx->local_port = 82;
                return BPF_REDIRECT;
        }

        return BPF_OK;
}

What are the downsides?

 - BPF program, if attached, runs on the receive hot path,
 - introspection is worse than for TPROXY iptables rules.

Also UDP packet steering has to be reworked. In current form we run the
inet_lookup program before checking for any connected UDP sockets, which is
unexpected.

The patches, while still in their early stages, show what we're trying to
solve. We're reaching out early for feedback to see what are the technical
concerns and if we can address them.

Just in time for the coming Netconf conference.

Thanks,
Jakub

[1] https://netdevconf.org/0x13/session.html?panel-industry-perspectives
[2] https://blog.cloudflare.com/how-we-built-spectrum/
[3] https://www.spinics.net/lists/netdev/msg370789.html


Jakub Sitnicki (7):
  bpf: Introduce inet_lookup program type
  ipv4: Run inet_lookup bpf program on socket lookup
  ipv6: Run inet_lookup bpf program on socket lookup
  bpf: Sync linux/bpf.h to tools/
  libbpf: Add support for inet_lookup program type
  bpf: Test destination address remapping with inet_lookup
  bpf: Add verifier tests for inet_lookup context access

 include/linux/bpf_types.h                     |   1 +
 include/linux/filter.h                        |  17 +
 include/net/inet6_hashtables.h                |  39 ++
 include/net/inet_hashtables.h                 |  39 ++
 include/net/net_namespace.h                   |   3 +
 include/uapi/linux/bpf.h                      |  27 +
 kernel/bpf/syscall.c                          |  10 +
 net/core/filter.c                             | 216 ++++++++
 net/ipv4/inet_hashtables.c                    |  11 +-
 net/ipv4/udp.c                                |   1 +
 net/ipv6/inet6_hashtables.c                   |  11 +-
 net/ipv6/udp.c                                |   6 +-
 tools/include/uapi/linux/bpf.h                |  27 +
 tools/lib/bpf/libbpf.c                        |   4 +
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_probes.c                 |   1 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   6 +-
 .../selftests/bpf/progs/inet_lookup_prog.c    |  68 +++
 .../testing/selftests/bpf/test_inet_lookup.c  | 392 ++++++++++++++
 .../testing/selftests/bpf/test_inet_lookup.sh |  35 ++
 .../selftests/bpf/verifier/ctx_inet_lookup.c  | 511 ++++++++++++++++++
 23 files changed, 1418 insertions(+), 12 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/inet_lookup_prog.c
 create mode 100644 tools/testing/selftests/bpf/test_inet_lookup.c
 create mode 100755 tools/testing/selftests/bpf/test_inet_lookup.sh
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c

-- 
2.20.1


             reply index

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-18 13:00 Jakub Sitnicki [this message]
2019-06-18 13:00 ` [RFC bpf-next 1/7] bpf: Introduce inet_lookup program type Jakub Sitnicki
2019-06-18 13:00 ` [RFC bpf-next 2/7] ipv4: Run inet_lookup bpf program on socket lookup Jakub Sitnicki
2019-06-18 13:00 ` [RFC bpf-next 3/7] ipv6: " Jakub Sitnicki
2019-06-18 13:00 ` [RFC bpf-next 4/7] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2019-06-18 13:00 ` [RFC bpf-next 5/7] libbpf: Add support for inet_lookup program type Jakub Sitnicki
2019-06-18 13:00 ` [RFC bpf-next 6/7] bpf: Test destination address remapping with inet_lookup Jakub Sitnicki
2019-06-18 13:00 ` [RFC bpf-next 7/7] bpf: Add verifier tests for inet_lookup context access Jakub Sitnicki
2019-06-18 13:52 ` [RFC bpf-next 0/7] Programming socket lookup with BPF Florian Westphal
2019-06-19  9:13   ` Jakub Sitnicki
2019-06-20 11:56     ` Florian Westphal
2019-06-20 22:20     ` Joe Stringer
     [not found]       ` <CAGn+7TUmgsA8oKw-mM6S5iR4rmNt6sWxjUgw8=qSCHb=m0ROyg@mail.gmail.com>
2019-06-21 16:50         ` Joe Stringer
2019-06-25  8:11           ` Jakub Sitnicki
2019-06-25  7:28       ` Jakub Sitnicki
2019-06-21 12:51     ` Florian Westphal
2019-06-21 14:33       ` Eric Dumazet
2019-06-21 16:41         ` Florian Westphal
2019-06-21 16:54           ` Paolo Abeni

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190618130050.8344-1-jakub@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=bpf@vger.kernel.org \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git