bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup
@ 2020-07-17 10:35 Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 01/15] bpf, netns: Handle multiple link attachments Jakub Sitnicki
                   ` (16 more replies)
  0 siblings, 17 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko, Lorenz Bauer,
	Marek Majkowski, Martin KaFai Lau, Yonghong Song

Changelog
=========
v4 -> v5:
- Enforce BPF prog return value to be SK_DROP or SK_PASS. (Andrii)
- Simplify prog runners now that only SK_DROP/PASS can be returned.
- Enable bpf_perf_event_output from the start. (Andrii)
- Drop patch
  "selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c"
- Remove tests for narrow loads from context at an offset wider in size
  than target field, while we are discussing how to fix it:
  https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@cloudflare.com/
- Rebase onto recent bpf-next (bfdfa51702de)
- Other minor changes called out in per-patch changelogs,
  see patches: 2, 4, 6, 13-15
- Carried over Andrii's Acks where nothing changed.

v3 -> v4:
- Reduce BPF prog return codes to SK_DROP/SK_PASS (Lorenz)
- Default to drop on illegal return value from BPF prog (Lorenz)
- Extend bpf_sk_assign to accept NULL socket pointer.
- Switch to saner return values and add docs for new prog_array API (Andrii)
- Add support for narrow loads from BPF context fields (Yonghong)
- Fix broken build when IPv6 is compiled as a module (kernel test robot)
- Fix null/wild-ptr-deref on BPF context access
- Rebase to recent bpf-next (eef8a42d6ce0)
- Other minor changes called out in per-patch changelogs,
  see patches 1-2, 4, 6, 8, 10-12, 14, 16

v2 -> v3:
- Switch to link-based program attachment
- Support for multi-prog attachment
- Ability to skip reuseport socket selection
- Code on RX path is guarded by a static key
- struct in6_addr's are no longer copied into BPF prog context
- BPF prog context is initialized as late as possible
- Changes called out in patches 1-2, 4, 6, 8, 10-14, 16
- Patches dropped:
  01/17 flow_dissector: Extract attach/detach/query helpers
  03/17 inet: Store layer 4 protocol in inet_hashinfo
  08/17 udp: Store layer 4 protocol in udp_table

v1 -> v2:
- Changes called out in patches 2, 13-15, 17
- Rebase to recent bpf-next (b4563facdcae)

RFCv2 -> v1:

- Switch to fetching a socket from a map and selecting a socket with
  bpf_sk_assign, instead of having a dedicated helper that does both.
- Run reuseport logic on sockets selected by BPF sk_lookup.
- Allow BPF sk_lookup to fail the lookup with no match.
- Go back to having just 2 hash table lookups in UDP.

RFCv1 -> RFCv2:

- Make socket lookup redirection map-based. BPF program now uses a
  dedicated helper and a SOCKARRAY map to select the socket to redirect to.
  A consequence of this change is that bpf_inet_lookup context is now
  read-only.
- Look for connected UDP sockets before allowing redirection from BPF.
  This makes connected UDP socket work as expected in the presence of
  inet_lookup prog.
- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
  the only other per-netns BPF prog type.

Overview
========

This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
or BPF sk_lookup for short.

BPF sk_lookup program runs when transport layer is looking up a listening
socket for a new connection request (TCP), or when looking up an
unconnected socket for a packet (UDP).

This serves as a mechanism to overcome the limits of what bind() API allows
to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, fixed port to a single socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, any port to a single socket

     198.51.100.1, any port -> L7 proxy socket

In its context, program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection, and returns SK_PASS code. Transport layer
then uses the selected socket as a result of socket lookup.

Alternatively, program can also fail the lookup (SK_DROP), or let the
lookup continue as usual (SK_PASS without selecting a socket).

This lets the user match packets with listening (TCP) or receiving (UDP)
sockets freely at the last possible point on the receive path, where we
know that packets are destined for local delivery after undergoing
policing, filtering, and routing.

Program is attached to a network namespace, similar to BPF flow_dissector.
We add a new attach type, BPF_SK_LOOKUP, for this. Multiple programs can be
attached at the same time, in which case their return values are aggregated
according the rules outlined in patch #4 description.

Series structure
================

Patches are organized as so:

 1: enables multiple link-based prog attachments for bpf-netns
 2: introduces sk_lookup program type
 3-4: hook up the program to run on ipv4/tcp socket lookup
 5-6: hook up the program to run on ipv6/tcp socket lookup
 7-8: hook up the program to run on ipv4/udp socket lookup
 9-10: hook up the program to run on ipv6/udp socket lookup
 11-13: libbpf & bpftool support for sk_lookup
 14-15: verifier and selftests for sk_lookup

Patches are also available on GH:

  https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v5

Follow-up work
==============

I'll follow up with below items, which IMHO don't block the review:

- benchmark results for udp6 small packet flood scenario,
- user docs for new BPF prog type, Documentation/bpf/prog_sk_lookup.rst,
- timeout for accept() in tests after extending network_helper.[ch].

Thanks to the reviewers for their feedback to this patch series:

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>

-jkbs

[RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
[RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
[v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@cloudflare.com/
[v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@cloudflare.com/
[v3] https://lore.kernel.org/bpf/20200702092416.11961-1-jakub@cloudflare.com/
[v4] https://lore.kernel.org/bpf/20200713174654.642628-1-jakub@cloudflare.com/

Jakub Sitnicki (15):
  bpf, netns: Handle multiple link attachments
  bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  inet: Extract helper for selecting socket from reuseport group
  inet: Run SK_LOOKUP BPF program on socket lookup
  inet6: Extract helper for selecting socket from reuseport group
  inet6: Run SK_LOOKUP BPF program on socket lookup
  udp: Extract helper for selecting socket from reuseport group
  udp: Run SK_LOOKUP BPF program on socket lookup
  udp6: Extract helper for selecting socket from reuseport group
  udp6: Run SK_LOOKUP BPF program on socket lookup
  bpf: Sync linux/bpf.h to tools/
  libbpf: Add support for SK_LOOKUP program type
  tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type
  selftests/bpf: Add verifier tests for bpf_sk_lookup context access
  selftests/bpf: Tests for BPF_SK_LOOKUP attach point

 include/linux/bpf-netns.h                     |    3 +
 include/linux/bpf.h                           |    4 +
 include/linux/bpf_types.h                     |    2 +
 include/linux/filter.h                        |  147 ++
 include/uapi/linux/bpf.h                      |   77 +
 kernel/bpf/core.c                             |   55 +
 kernel/bpf/net_namespace.c                    |  127 +-
 kernel/bpf/syscall.c                          |    9 +
 kernel/bpf/verifier.c                         |   13 +-
 net/core/filter.c                             |  183 +++
 net/ipv4/inet_hashtables.c                    |   60 +-
 net/ipv4/udp.c                                |   93 +-
 net/ipv6/inet6_hashtables.c                   |   66 +-
 net/ipv6/udp.c                                |   97 +-
 scripts/bpf_helpers_doc.py                    |    9 +-
 .../bpftool/Documentation/bpftool-prog.rst    |    2 +-
 tools/bpf/bpftool/bash-completion/bpftool     |    2 +-
 tools/bpf/bpftool/common.c                    |    1 +
 tools/bpf/bpftool/prog.c                      |    3 +-
 tools/include/uapi/linux/bpf.h                |   77 +
 tools/lib/bpf/libbpf.c                        |    3 +
 tools/lib/bpf/libbpf.h                        |    2 +
 tools/lib/bpf/libbpf.map                      |    2 +
 tools/lib/bpf/libbpf_probes.c                 |    3 +
 tools/testing/selftests/bpf/network_helpers.c |   58 +-
 tools/testing/selftests/bpf/network_helpers.h |    2 +
 .../selftests/bpf/prog_tests/sk_lookup.c      | 1282 +++++++++++++++++
 .../selftests/bpf/progs/test_sk_lookup.c      |  641 +++++++++
 .../selftests/bpf/verifier/ctx_sk_lookup.c    |  492 +++++++
 29 files changed, 3418 insertions(+), 97 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c

-- 
2.25.4


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 01/15] bpf, netns: Handle multiple link attachments
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 02/15] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko

Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the
prog_array at given position when link gets attached/updated/released.

This let's us lift the limit of having just one link attached for the new
attach type introduced by subsequent patch.

No functional changes intended.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Document prog_array {delete_safe,update}_at() behavior. (Andrii)
    - Return -EINVAL/-ENOENT on failure in {delete_safe,update}_at(). (Andrii)
    - Return -ENOENT on index out of range in link_index(). (Andrii)
    
    v3:
    - New in v3 to support multi-prog attachments. (Alexei)

 include/linux/bpf.h        |  3 ++
 kernel/bpf/core.c          | 55 +++++++++++++++++++++++
 kernel/bpf/net_namespace.c | 90 ++++++++++++++++++++++++++++++++++----
 3 files changed, 139 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 54ad426dbea1..c8c9eabcd106 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -928,6 +928,9 @@ int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs,
 
 void bpf_prog_array_delete_safe(struct bpf_prog_array *progs,
 				struct bpf_prog *old_prog);
+int bpf_prog_array_delete_safe_at(struct bpf_prog_array *array, int index);
+int bpf_prog_array_update_at(struct bpf_prog_array *array, int index,
+			     struct bpf_prog *prog);
 int bpf_prog_array_copy_info(struct bpf_prog_array *array,
 			     u32 *prog_ids, u32 request_cnt,
 			     u32 *prog_cnt);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9df4cc9a2907..7be02e555ab9 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1958,6 +1958,61 @@ void bpf_prog_array_delete_safe(struct bpf_prog_array *array,
 		}
 }
 
+/**
+ * bpf_prog_array_delete_safe_at() - Replaces the program at the given
+ *                                   index into the program array with
+ *                                   a dummy no-op program.
+ * @array: a bpf_prog_array
+ * @index: the index of the program to replace
+ *
+ * Skips over dummy programs, by not counting them, when calculating
+ * the the position of the program to replace.
+ *
+ * Return:
+ * * 0		- Success
+ * * -EINVAL	- Invalid index value. Must be a non-negative integer.
+ * * -ENOENT	- Index out of range
+ */
+int bpf_prog_array_delete_safe_at(struct bpf_prog_array *array, int index)
+{
+	return bpf_prog_array_update_at(array, index, &dummy_bpf_prog.prog);
+}
+
+/**
+ * bpf_prog_array_update_at() - Updates the program at the given index
+ *                              into the program array.
+ * @array: a bpf_prog_array
+ * @index: the index of the program to update
+ * @prog: the program to insert into the array
+ *
+ * Skips over dummy programs, by not counting them, when calculating
+ * the position of the program to update.
+ *
+ * Return:
+ * * 0		- Success
+ * * -EINVAL	- Invalid index value. Must be a non-negative integer.
+ * * -ENOENT	- Index out of range
+ */
+int bpf_prog_array_update_at(struct bpf_prog_array *array, int index,
+			     struct bpf_prog *prog)
+{
+	struct bpf_prog_array_item *item;
+
+	if (unlikely(index < 0))
+		return -EINVAL;
+
+	for (item = array->items; item->prog; item++) {
+		if (item->prog == &dummy_bpf_prog.prog)
+			continue;
+		if (!index) {
+			WRITE_ONCE(item->prog, prog);
+			return 0;
+		}
+		index--;
+	}
+	return -ENOENT;
+}
+
 int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 			struct bpf_prog *exclude_prog,
 			struct bpf_prog *include_prog,
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 310241ca7991..e9c8e26ac8f2 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -36,12 +36,50 @@ static void netns_bpf_run_array_detach(struct net *net,
 	bpf_prog_array_free(run_array);
 }
 
+static int link_index(struct net *net, enum netns_bpf_attach_type type,
+		      struct bpf_netns_link *link)
+{
+	struct bpf_netns_link *pos;
+	int i = 0;
+
+	list_for_each_entry(pos, &net->bpf.links[type], node) {
+		if (pos == link)
+			return i;
+		i++;
+	}
+	return -ENOENT;
+}
+
+static int link_count(struct net *net, enum netns_bpf_attach_type type)
+{
+	struct list_head *pos;
+	int i = 0;
+
+	list_for_each(pos, &net->bpf.links[type])
+		i++;
+	return i;
+}
+
+static void fill_prog_array(struct net *net, enum netns_bpf_attach_type type,
+			    struct bpf_prog_array *prog_array)
+{
+	struct bpf_netns_link *pos;
+	unsigned int i = 0;
+
+	list_for_each_entry(pos, &net->bpf.links[type], node) {
+		prog_array->items[i].prog = pos->link.prog;
+		i++;
+	}
+}
+
 static void bpf_netns_link_release(struct bpf_link *link)
 {
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct bpf_prog_array *old_array, *new_array;
 	struct net *net;
+	int cnt, idx;
 
 	mutex_lock(&netns_bpf_mutex);
 
@@ -53,9 +91,27 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	if (!net)
 		goto out_unlock;
 
-	netns_bpf_run_array_detach(net, type);
+	/* Remember link position in case of safe delete */
+	idx = link_index(net, type, net_link);
 	list_del(&net_link->node);
 
+	cnt = link_count(net, type);
+	if (!cnt) {
+		netns_bpf_run_array_detach(net, type);
+		goto out_unlock;
+	}
+
+	old_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	new_array = bpf_prog_array_alloc(cnt, GFP_KERNEL);
+	if (!new_array) {
+		WARN_ON(bpf_prog_array_delete_safe_at(old_array, idx));
+		goto out_unlock;
+	}
+	fill_prog_array(net, type, new_array);
+	rcu_assign_pointer(net->bpf.run_array[type], new_array);
+	bpf_prog_array_free(old_array);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 }
@@ -77,7 +133,7 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 	enum netns_bpf_attach_type type = net_link->netns_type;
 	struct bpf_prog_array *run_array;
 	struct net *net;
-	int ret = 0;
+	int idx, ret;
 
 	if (old_prog && old_prog != link->prog)
 		return -EPERM;
@@ -95,7 +151,10 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 
 	run_array = rcu_dereference_protected(net->bpf.run_array[type],
 					      lockdep_is_held(&netns_bpf_mutex));
-	WRITE_ONCE(run_array->items[0].prog, new_prog);
+	idx = link_index(net, type, net_link);
+	ret = bpf_prog_array_update_at(run_array, idx, new_prog);
+	if (ret)
+		goto out_unlock;
 
 	old_prog = xchg(&link->prog, new_prog);
 	bpf_prog_put(old_prog);
@@ -309,18 +368,28 @@ int netns_bpf_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype)
 	return ret;
 }
 
+static int netns_bpf_max_progs(enum netns_bpf_attach_type type)
+{
+	switch (type) {
+	case NETNS_BPF_FLOW_DISSECTOR:
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 				 enum netns_bpf_attach_type type)
 {
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	struct bpf_prog_array *run_array;
-	int err;
+	int cnt, err;
 
 	mutex_lock(&netns_bpf_mutex);
 
-	/* Allow attaching only one prog or link for now */
-	if (!list_empty(&net->bpf.links[type])) {
+	cnt = link_count(net, type);
+	if (cnt >= netns_bpf_max_progs(type)) {
 		err = -E2BIG;
 		goto out_unlock;
 	}
@@ -341,16 +410,19 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	if (err)
 		goto out_unlock;
 
-	run_array = bpf_prog_array_alloc(1, GFP_KERNEL);
+	run_array = bpf_prog_array_alloc(cnt + 1, GFP_KERNEL);
 	if (!run_array) {
 		err = -ENOMEM;
 		goto out_unlock;
 	}
-	run_array->items[0].prog = link->prog;
-	rcu_assign_pointer(net->bpf.run_array[type], run_array);
 
 	list_add_tail(&net_link->node, &net->bpf.links[type]);
 
+	fill_prog_array(net, type, run_array);
+	run_array = rcu_replace_pointer(net->bpf.run_array[type], run_array,
+					lockdep_is_held(&netns_bpf_mutex));
+	bpf_prog_array_free(run_array);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 	return err;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 02/15] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 01/15] bpf, netns: Handle multiple link attachments Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 03/15] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski

Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.

When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, on fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, on any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

In its basic form, SK_LOOKUP acts as a filter and hence must return either
SK_PASS or SK_DROP. If the program returns with SK_PASS, transport should
look for a socket to receive the packet, or use the one selected by the
program if available, while SK_DROP informs the transport layer that the
lookup should fail.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v5:
    - Enforce that return value is either SK_PASS or SK_DROP. (Andrii)
    - Use a case {} block avoid a conditional variable declaration. (Andrii)
    - Enable bpf_perf_event_output from the start. (Andrii)
    
    v4:
    - Reintroduce narrow load support for most BPF context fields. (Yonghong)
    - Fix null-ptr-deref in BPF context access when IPv6 address not set.
    - Unpack v4/v6 IP address union in bpf_sk_lookup context type.
    - Add verifier support for ARG_PTR_TO_SOCKET_OR_NULL.
    - Allow resetting socket selection with bpf_sk_assign(ctx, NULL).
    - Document that bpf_sk_assign accepts a NULL socket.
    
    v3:
    - Allow bpf_sk_assign helper to replace previously selected socket only
      when BPF_SK_LOOKUP_F_REPLACE flag is set, as a precaution for multiple
      programs running in series to accidentally override each other's verdict.
    - Let BPF program decide that load-balancing within a reuseport socket group
      should be skipped for the socket selected with bpf_sk_assign() by passing
      BPF_SK_LOOKUP_F_NO_REUSEPORT flag. (Martin)
    - Extend struct bpf_sk_lookup program context with an 'sk' field containing
      the selected socket with an intention for multiple attached program
      running in series to see each other's choices. However, currently the
      verifier doesn't allow checking if pointer is set.
    - Use bpf-netns infra for link-based multi-program attachment. (Alexei)
    - Get rid of macros in convert_ctx_access to make it easier to read.
    - Disallow 1-,2-byte access to context fields containing IP addresses.
    
    v2:
    - Make bpf_sk_assign reject sockets that don't use RCU freeing.
      Update bpf_sk_assign docs accordingly. (Martin)
    - Change bpf_sk_assign proto to take PTR_TO_SOCKET as argument. (Martin)
    - Fix broken build when CONFIG_INET is not selected. (Martin)
    - Rename bpf_sk_lookup{} src_/dst_* fields remote_/local_*. (Martin)
    - Enforce BPF_SK_LOOKUP attach point on load & attach. (Martin)

 include/linux/bpf-netns.h  |   3 +
 include/linux/bpf.h        |   1 +
 include/linux/bpf_types.h  |   2 +
 include/linux/filter.h     |  17 ++++
 include/uapi/linux/bpf.h   |  77 ++++++++++++++++
 kernel/bpf/net_namespace.c |   5 ++
 kernel/bpf/syscall.c       |   9 ++
 kernel/bpf/verifier.c      |  13 ++-
 net/core/filter.c          | 180 +++++++++++++++++++++++++++++++++++++
 scripts/bpf_helpers_doc.py |   9 +-
 10 files changed, 312 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
index 47d5b0c708c9..722f799c1a2e 100644
--- a/include/linux/bpf-netns.h
+++ b/include/linux/bpf-netns.h
@@ -8,6 +8,7 @@
 enum netns_bpf_attach_type {
 	NETNS_BPF_INVALID = -1,
 	NETNS_BPF_FLOW_DISSECTOR = 0,
+	NETNS_BPF_SK_LOOKUP,
 	MAX_NETNS_BPF_ATTACH_TYPE
 };
 
@@ -17,6 +18,8 @@ to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
 	switch (attach_type) {
 	case BPF_FLOW_DISSECTOR:
 		return NETNS_BPF_FLOW_DISSECTOR;
+	case BPF_SK_LOOKUP:
+		return NETNS_BPF_SK_LOOKUP;
 	default:
 		return NETNS_BPF_INVALID;
 	}
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c8c9eabcd106..adb16bdc5f0a 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -249,6 +249,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_INT,		/* pointer to int */
 	ARG_PTR_TO_LONG,	/* pointer to long */
 	ARG_PTR_TO_SOCKET,	/* pointer to bpf_sock (fullsock) */
+	ARG_PTR_TO_SOCKET_OR_NULL,	/* pointer to bpf_sock (fullsock) or NULL */
 	ARG_PTR_TO_BTF_ID,	/* pointer to in-kernel struct */
 	ARG_PTR_TO_ALLOC_MEM,	/* pointer to dynamically allocated memory */
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL,	/* pointer to dynamically allocated memory or NULL */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index a18ae82a298a..a52a5688418e 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -64,6 +64,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
 #ifdef CONFIG_INET
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
 	      struct sk_reuseport_md, struct sk_reuseport_kern)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SK_LOOKUP, sk_lookup,
+	      struct bpf_sk_lookup, struct bpf_sk_lookup_kern)
 #endif
 #if defined(CONFIG_BPF_JIT)
 BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 0b0144752d78..fa1ea12ad2cd 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1278,4 +1278,21 @@ struct bpf_sockopt_kern {
 	s32		retval;
 };
 
+struct bpf_sk_lookup_kern {
+	u16		family;
+	u16		protocol;
+	struct {
+		__be32 saddr;
+		__be32 daddr;
+	} v4;
+	struct {
+		const struct in6_addr *saddr;
+		const struct in6_addr *daddr;
+	} v6;
+	__be16		sport;
+	u16		dport;
+	struct sock	*selected_sk;
+	bool		no_reuseport;
+};
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7ac3992dacfe..54d0c886e3ba 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -189,6 +189,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_STRUCT_OPS,
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
+	BPF_PROG_TYPE_SK_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -228,6 +229,7 @@ enum bpf_attach_type {
 	BPF_XDP_DEVMAP,
 	BPF_CGROUP_INET_SOCK_RELEASE,
 	BPF_XDP_CPUMAP,
+	BPF_SK_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3069,6 +3071,10 @@ union bpf_attr {
  *
  * long bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
  *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
+ *		**BPF_PROG_TYPE_SCHED_ACT** programs.
+ *
  *		Assign the *sk* to the *skb*. When combined with appropriate
  *		routing configuration to receive the packet towards the socket,
  *		will cause *skb* to be delivered to the specified socket.
@@ -3094,6 +3100,56 @@ union bpf_attr {
  *		**-ESOCKTNOSUPPORT** if the socket type is not supported
  *		(reuseport).
  *
+ * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
+ *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
+ *
+ *		Select the *sk* as a result of a socket lookup.
+ *
+ *		For the operation to succeed passed socket must be compatible
+ *		with the packet description provided by the *ctx* object.
+ *
+ *		L4 protocol (**IPPROTO_TCP** or **IPPROTO_UDP**) must
+ *		be an exact match. While IP family (**AF_INET** or
+ *		**AF_INET6**) must be compatible, that is IPv6 sockets
+ *		that are not v6-only can be selected for IPv4 packets.
+ *
+ *		Only TCP listeners and UDP unconnected sockets can be
+ *		selected. *sk* can also be NULL to reset any previous
+ *		selection.
+ *
+ *		*flags* argument can combination of following values:
+ *
+ *		* **BPF_SK_LOOKUP_F_REPLACE** to override the previous
+ *		  socket selection, potentially done by a BPF program
+ *		  that ran before us.
+ *
+ *		* **BPF_SK_LOOKUP_F_NO_REUSEPORT** to skip
+ *		  load-balancing within reuseport group for the socket
+ *		  being selected.
+ *
+ *		On success *ctx->sk* will point to the selected socket.
+ *
+ *	Return
+ *		0 on success, or a negative errno in case of failure.
+ *
+ *		* **-EAFNOSUPPORT** if socket family (*sk->family*) is
+ *		  not compatible with packet family (*ctx->family*).
+ *
+ *		* **-EEXIST** if socket has been already selected,
+ *		  potentially by another program, and
+ *		  **BPF_SK_LOOKUP_F_REPLACE** flag was not specified.
+ *
+ *		* **-EINVAL** if unsupported flags were specified.
+ *
+ *		* **-EPROTOTYPE** if socket L4 protocol
+ *		  (*sk->protocol*) doesn't match packet protocol
+ *		  (*ctx->protocol*).
+ *
+ *		* **-ESOCKTNOSUPPORT** if socket is not in allowed
+ *		  state (TCP listening or UDP unconnected).
+ *
  * u64 bpf_ktime_get_boot_ns(void)
  * 	Description
  * 		Return the time elapsed since system boot, in nanoseconds.
@@ -3607,6 +3663,12 @@ enum {
 	BPF_RINGBUF_HDR_SZ		= 8,
 };
 
+/* BPF_FUNC_sk_assign flags in bpf_sk_lookup context. */
+enum {
+	BPF_SK_LOOKUP_F_REPLACE		= (1ULL << 0),
+	BPF_SK_LOOKUP_F_NO_REUSEPORT	= (1ULL << 1),
+};
+
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
 	BPF_ADJ_ROOM_NET,
@@ -4349,4 +4411,19 @@ struct bpf_pidns_info {
 	__u32 pid;
 	__u32 tgid;
 };
+
+/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
+struct bpf_sk_lookup {
+	__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
+
+	__u32 family;		/* Protocol family (AF_INET, AF_INET6) */
+	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
+	__u32 remote_ip4;	/* Network byte order */
+	__u32 remote_ip6[4];	/* Network byte order */
+	__u32 remote_port;	/* Network byte order */
+	__u32 local_ip4;	/* Network byte order */
+	__u32 local_ip6[4];	/* Network byte order */
+	__u32 local_port;	/* Host byte order */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index e9c8e26ac8f2..38b368bccda2 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -373,6 +373,8 @@ static int netns_bpf_max_progs(enum netns_bpf_attach_type type)
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
 		return 1;
+	case NETNS_BPF_SK_LOOKUP:
+		return 64;
 	default:
 		return 0;
 	}
@@ -403,6 +405,9 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	case NETNS_BPF_FLOW_DISSECTOR:
 		err = flow_dissector_bpf_prog_attach_check(net, link->prog);
 		break;
+	case NETNS_BPF_SK_LOOKUP:
+		err = 0; /* nothing to check */
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 7ea9dfbebd8c..d07417d17712 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2022,6 +2022,10 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 		default:
 			return -EINVAL;
 		}
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		if (expected_attach_type == BPF_SK_LOOKUP)
+			return 0;
+		return -EINVAL;
 	case BPF_PROG_TYPE_EXT:
 		if (expected_attach_type)
 			return -EINVAL;
@@ -2756,6 +2760,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_CGROUP_SOCK:
 	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
 		if (!capable(CAP_NET_ADMIN))
@@ -2817,6 +2822,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
 	case BPF_TRACE_ITER:
 		return BPF_PROG_TYPE_TRACING;
+	case BPF_SK_LOOKUP:
+		return BPF_PROG_TYPE_SK_LOOKUP;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
@@ -2953,6 +2960,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
+	case BPF_SK_LOOKUP:
 		return netns_bpf_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
@@ -3891,6 +3899,7 @@ static int link_create(union bpf_attr *attr)
 		ret = tracing_bpf_link_attach(attr, prog);
 		break;
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
+	case BPF_PROG_TYPE_SK_LOOKUP:
 		ret = netns_bpf_link_create(attr, prog);
 		break;
 	default:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 3c1efc9d08fd..9a6703bc3f36 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3878,10 +3878,14 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			}
 			meta->ref_obj_id = reg->ref_obj_id;
 		}
-	} else if (arg_type == ARG_PTR_TO_SOCKET) {
+	} else if (arg_type == ARG_PTR_TO_SOCKET ||
+		   arg_type == ARG_PTR_TO_SOCKET_OR_NULL) {
 		expected_type = PTR_TO_SOCKET;
-		if (type != expected_type)
-			goto err_type;
+		if (!(register_is_null(reg) &&
+		      arg_type == ARG_PTR_TO_SOCKET_OR_NULL)) {
+			if (type != expected_type)
+				goto err_type;
+		}
 	} else if (arg_type == ARG_PTR_TO_BTF_ID) {
 		expected_type = PTR_TO_BTF_ID;
 		if (type != expected_type)
@@ -7354,6 +7358,9 @@ static int check_return_code(struct bpf_verifier_env *env)
 			return -ENOTSUPP;
 		}
 		break;
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		range = tnum_range(SK_DROP, SK_PASS);
+		break;
 	case BPF_PROG_TYPE_EXT:
 		/* freplace program can return anything as its return value
 		 * depends on the to-be-replaced kernel func or bpf program.
diff --git a/net/core/filter.c b/net/core/filter.c
index bdd2382e655d..d099436b3ff5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9229,6 +9229,186 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
+
+BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
+	   struct sock *, sk, u64, flags)
+{
+	if (unlikely(flags & ~(BPF_SK_LOOKUP_F_REPLACE |
+			       BPF_SK_LOOKUP_F_NO_REUSEPORT)))
+		return -EINVAL;
+	if (unlikely(sk && sk_is_refcounted(sk)))
+		return -ESOCKTNOSUPPORT; /* reject non-RCU freed sockets */
+	if (unlikely(sk && sk->sk_state == TCP_ESTABLISHED))
+		return -ESOCKTNOSUPPORT; /* reject connected sockets */
+
+	/* Check if socket is suitable for packet L3/L4 protocol */
+	if (sk && sk->sk_protocol != ctx->protocol)
+		return -EPROTOTYPE;
+	if (sk && sk->sk_family != ctx->family &&
+	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
+		return -EAFNOSUPPORT;
+
+	if (ctx->selected_sk && !(flags & BPF_SK_LOOKUP_F_REPLACE))
+		return -EEXIST;
+
+	/* Select socket as lookup result */
+	ctx->selected_sk = sk;
+	ctx->no_reuseport = flags & BPF_SK_LOOKUP_F_NO_REUSEPORT;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_sk_lookup_assign_proto = {
+	.func		= bpf_sk_lookup_assign,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_SOCKET_OR_NULL,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *
+sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_perf_event_output:
+		return &bpf_event_output_data_proto;
+	case BPF_FUNC_sk_assign:
+		return &bpf_sk_lookup_assign_proto;
+	case BPF_FUNC_sk_release:
+		return &bpf_sk_release_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+static bool sk_lookup_is_valid_access(int off, int size,
+				      enum bpf_access_type type,
+				      const struct bpf_prog *prog,
+				      struct bpf_insn_access_aux *info)
+{
+	if (off < 0 || off >= sizeof(struct bpf_sk_lookup))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (type != BPF_READ)
+		return false;
+
+	switch (off) {
+	case offsetof(struct bpf_sk_lookup, sk):
+		info->reg_type = PTR_TO_SOCKET_OR_NULL;
+		return size == sizeof(__u64);
+
+	case bpf_ctx_range(struct bpf_sk_lookup, family):
+	case bpf_ctx_range(struct bpf_sk_lookup, protocol):
+	case bpf_ctx_range(struct bpf_sk_lookup, remote_ip4):
+	case bpf_ctx_range(struct bpf_sk_lookup, local_ip4):
+	case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]):
+	case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]):
+	case bpf_ctx_range(struct bpf_sk_lookup, remote_port):
+	case bpf_ctx_range(struct bpf_sk_lookup, local_port):
+		bpf_ctx_record_field_size(info, sizeof(__u32));
+		return bpf_ctx_narrow_access_ok(off, size, sizeof(__u32));
+
+	default:
+		return false;
+	}
+}
+
+static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
+					const struct bpf_insn *si,
+					struct bpf_insn *insn_buf,
+					struct bpf_prog *prog,
+					u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+
+	switch (si->off) {
+	case offsetof(struct bpf_sk_lookup, sk):
+		*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sk_lookup_kern, selected_sk));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, family):
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     family, 2, target_size));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, protocol):
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     protocol, 2, target_size));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, remote_ip4):
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     v4.saddr, 4, target_size));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, local_ip4):
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     v4.daddr, 4, target_size));
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				remote_ip6[0], remote_ip6[3]): {
+#if IS_ENABLED(CONFIG_IPV6)
+		int off = si->off;
+
+		off -= offsetof(struct bpf_sk_lookup, remote_ip6[0]);
+		off += bpf_target_off(struct in6_addr, s6_addr32[0], 4, target_size);
+		*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sk_lookup_kern, v6.saddr));
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1);
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+	}
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				local_ip6[0], local_ip6[3]): {
+#if IS_ENABLED(CONFIG_IPV6)
+		int off = si->off;
+
+		off -= offsetof(struct bpf_sk_lookup, local_ip6[0]);
+		off += bpf_target_off(struct in6_addr, s6_addr32[0], 4, target_size);
+		*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sk_lookup_kern, v6.daddr));
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1);
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+	}
+	case offsetof(struct bpf_sk_lookup, remote_port):
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     sport, 2, target_size));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, local_port):
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     dport, 2, target_size));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+const struct bpf_prog_ops sk_lookup_prog_ops = {
+};
+
+const struct bpf_verifier_ops sk_lookup_verifier_ops = {
+	.get_func_proto		= sk_lookup_func_proto,
+	.is_valid_access	= sk_lookup_is_valid_access,
+	.convert_ctx_access	= sk_lookup_convert_ctx_access,
+};
+
 #endif /* CONFIG_INET */
 
 DEFINE_BPF_DISPATCHER(xdp)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 6843376733df..5bfa448b4704 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -404,6 +404,7 @@ class PrinterHelpers(Printer):
 
     type_fwds = [
             'struct bpf_fib_lookup',
+            'struct bpf_sk_lookup',
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
@@ -450,6 +451,7 @@ class PrinterHelpers(Printer):
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
+            'struct bpf_sk_lookup',
             'struct bpf_sock',
             'struct bpf_sock_addr',
             'struct bpf_sock_ops',
@@ -487,6 +489,11 @@ class PrinterHelpers(Printer):
             'struct sk_msg_buff': 'struct sk_msg_md',
             'struct xdp_buff': 'struct xdp_md',
     }
+    # Helpers overloaded for different context types.
+    overloaded_helpers = [
+        'bpf_get_socket_cookie',
+        'bpf_sk_assign',
+    ]
 
     def print_header(self):
         header = '''\
@@ -543,7 +550,7 @@ class PrinterHelpers(Printer):
         for i, a in enumerate(proto['args']):
             t = a['type']
             n = a['name']
-            if proto['name'] == 'bpf_get_socket_cookie' and i == 0:
+            if proto['name'] in self.overloaded_helpers and i == 0:
                     t = 'void'
                     n = 'ctx'
             one_arg = '{}{}'.format(comma, self.map_type(t))
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 03/15] inet: Extract helper for selecting socket from reuseport group
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 01/15] bpf, netns: Handle multiple link attachments Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 02/15] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 04/15] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko

Prepare for calling into reuseport from __inet_lookup_listener as well.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/inet_hashtables.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 2bbaaf0c7176..ab64834837c8 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -246,6 +246,21 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb, int doff,
+					    __be32 saddr, __be16 sport,
+					    __be32 daddr, unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 phash;
+
+	if (sk->sk_reuseport) {
+		phash = inet_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
+	}
+	return reuse_sk;
+}
+
 /*
  * Here are some nice properties to exploit here. The BSD API
  * does not allow a listening sock to specify the remote port nor the
@@ -265,21 +280,17 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	struct inet_connection_sock *icsk;
 	struct sock *sk, *result = NULL;
 	int score, hiscore = 0;
-	u32 phash = 0;
 
 	inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) {
 		sk = (struct sock *)icsk;
 		score = compute_score(sk, net, hnum, daddr,
 				      dif, sdif, exact_dif);
 		if (score > hiscore) {
-			if (sk->sk_reuseport) {
-				phash = inet_ehashfn(net, daddr, hnum,
-						     saddr, sport);
-				result = reuseport_select_sock(sk, phash,
-							       skb, doff);
-				if (result)
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb, doff,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			hiscore = score;
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 04/15] inet: Run SK_LOOKUP BPF program on socket lookup
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (2 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 03/15] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 05/15] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski

Run a BPF program before looking up a listening socket on the receive path.
Program selects a listening socket to yield as result of socket lookup by
calling bpf_sk_assign() helper and returning SK_PASS code. Program can
revert its decision by assigning a NULL socket with bpf_sk_assign().

Alternatively, BPF program can also fail the lookup by returning with
SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
no socket has been selected with bpf_sk_assign().

This lets the user match packets with listening sockets freely at the last
possible point on the receive path, where we know that packets are destined
for local delivery after undergoing policing, filtering, and routing.

With BPF code selecting the socket, directing packets destined to an IP
range or to a port range to a single socket becomes possible.

In case multiple programs are attached, they are run in series in the order
in which they were attached. The end result is determined from return codes
of all the programs according to following rules:

 1. If any program returned SK_PASS and selected a valid socket, the socket
    is used as result of socket lookup.
 2. If more than one program returned SK_PASS and selected a socket,
    last selection takes effect.
 3. If any program returned SK_DROP, and no program returned SK_PASS and
    selected a socket, socket lookup fails with -ECONNREFUSED.
 4. If all programs returned SK_PASS and none of them selected a socket,
    socket lookup continues to htable-based lookup.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v5:
    - Move variable initialization out of critical section. (Andrii)
    - Simplify prog runners now that only SK_DROP/PASS can be returned.
    
    v4:
    - Reduce BPF sk_lookup prog return codes to SK_PASS/SK_DROP. (Lorenz)
    - Default to drop & warn on illegal return value from BPF prog. (Lorenz)
    - Rename netns_bpf_attach_type_enable/disable to _need/unneed. (Lorenz)
    - Export bpf_sk_lookup_enabled symbol for CONFIG_IPV6=m (kernel test robot)
    - Invert return value from bpf_sk_lookup_run_v4 to true on skip reuseport.
    - Move dedicated prog_array runner close to its callers in filter.h.
    
    v3:
    - Use a static_key to minimize the hook overhead when not used. (Alexei)
    - Adapt for running an array of attached programs. (Alexei)
    - Adapt for optionally skipping reuseport selection. (Martin)

 include/linux/filter.h     | 91 ++++++++++++++++++++++++++++++++++++++
 kernel/bpf/net_namespace.c | 32 +++++++++++++-
 net/core/filter.c          |  3 ++
 net/ipv4/inet_hashtables.c | 31 +++++++++++++
 4 files changed, 156 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index fa1ea12ad2cd..c4f54c216347 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1295,4 +1295,95 @@ struct bpf_sk_lookup_kern {
 	bool		no_reuseport;
 };
 
+extern struct static_key_false bpf_sk_lookup_enabled;
+
+/* Runners for BPF_SK_LOOKUP programs to invoke on socket lookup.
+ *
+ * Allowed return values for a BPF SK_LOOKUP program are SK_PASS and
+ * SK_DROP. Their meaning is as follows:
+ *
+ *  SK_PASS && ctx.selected_sk != NULL: use selected_sk as lookup result
+ *  SK_PASS && ctx.selected_sk == NULL: continue to htable-based socket lookup
+ *  SK_DROP                           : terminate lookup with -ECONNREFUSED
+ *
+ * This macro aggregates return values and selected sockets from
+ * multiple BPF programs according to following rules in order:
+ *
+ *  1. If any program returned SK_PASS and a non-NULL ctx.selected_sk,
+ *     macro result is SK_PASS and last ctx.selected_sk is used.
+ *  2. If any program returned SK_DROP return value,
+ *     macro result is SK_DROP.
+ *  3. Otherwise result is SK_PASS and ctx.selected_sk is NULL.
+ *
+ * Caller must ensure that the prog array is non-NULL, and that the
+ * array as well as the programs it contains remain valid.
+ */
+#define BPF_PROG_SK_LOOKUP_RUN_ARRAY(array, ctx, func)			\
+	({								\
+		struct bpf_sk_lookup_kern *_ctx = &(ctx);		\
+		struct bpf_prog_array_item *_item;			\
+		struct sock *_selected_sk = NULL;			\
+		bool _no_reuseport = false;				\
+		struct bpf_prog *_prog;					\
+		bool _all_pass = true;					\
+		u32 _ret;						\
+									\
+		migrate_disable();					\
+		_item = &(array)->items[0];				\
+		while ((_prog = READ_ONCE(_item->prog))) {		\
+			/* restore most recent selection */		\
+			_ctx->selected_sk = _selected_sk;		\
+			_ctx->no_reuseport = _no_reuseport;		\
+									\
+			_ret = func(_prog, _ctx);			\
+			if (_ret == SK_PASS && _ctx->selected_sk) {	\
+				/* remember last non-NULL socket */	\
+				_selected_sk = _ctx->selected_sk;	\
+				_no_reuseport = _ctx->no_reuseport;	\
+			} else if (_ret == SK_DROP && _all_pass) {	\
+				_all_pass = false;			\
+			}						\
+			_item++;					\
+		}							\
+		_ctx->selected_sk = _selected_sk;			\
+		_ctx->no_reuseport = _no_reuseport;			\
+		migrate_enable();					\
+		_all_pass || _selected_sk ? SK_PASS : SK_DROP;		\
+	 })
+
+static inline bool bpf_sk_lookup_run_v4(struct net *net, int protocol,
+					const __be32 saddr, const __be16 sport,
+					const __be32 daddr, const u16 dport,
+					struct sock **psk)
+{
+	struct bpf_prog_array *run_array;
+	struct sock *selected_sk = NULL;
+	bool no_reuseport = false;
+
+	rcu_read_lock();
+	run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
+	if (run_array) {
+		struct bpf_sk_lookup_kern ctx = {
+			.family		= AF_INET,
+			.protocol	= protocol,
+			.v4.saddr	= saddr,
+			.v4.daddr	= daddr,
+			.sport		= sport,
+			.dport		= dport,
+		};
+		u32 act;
+
+		act = BPF_PROG_SK_LOOKUP_RUN_ARRAY(run_array, ctx, BPF_PROG_RUN);
+		if (act == SK_PASS) {
+			selected_sk = ctx.selected_sk;
+			no_reuseport = ctx.no_reuseport;
+		} else {
+			selected_sk = ERR_PTR(-ECONNREFUSED);
+		}
+	}
+	rcu_read_unlock();
+	*psk = selected_sk;
+	return no_reuseport;
+}
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 38b368bccda2..4e1bcaa2c3cb 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -25,6 +25,28 @@ struct bpf_netns_link {
 /* Protects updates to netns_bpf */
 DEFINE_MUTEX(netns_bpf_mutex);
 
+static void netns_bpf_attach_type_unneed(enum netns_bpf_attach_type type)
+{
+	switch (type) {
+	case NETNS_BPF_SK_LOOKUP:
+		static_branch_dec(&bpf_sk_lookup_enabled);
+		break;
+	default:
+		break;
+	}
+}
+
+static void netns_bpf_attach_type_need(enum netns_bpf_attach_type type)
+{
+	switch (type) {
+	case NETNS_BPF_SK_LOOKUP:
+		static_branch_inc(&bpf_sk_lookup_enabled);
+		break;
+	default:
+		break;
+	}
+}
+
 /* Must be called with netns_bpf_mutex held. */
 static void netns_bpf_run_array_detach(struct net *net,
 				       enum netns_bpf_attach_type type)
@@ -91,6 +113,9 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	if (!net)
 		goto out_unlock;
 
+	/* Mark attach point as unused */
+	netns_bpf_attach_type_unneed(type);
+
 	/* Remember link position in case of safe delete */
 	idx = link_index(net, type, net_link);
 	list_del(&net_link->node);
@@ -428,6 +453,9 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 					lockdep_is_held(&netns_bpf_mutex));
 	bpf_prog_array_free(run_array);
 
+	/* Mark attach point as used */
+	netns_bpf_attach_type_need(type);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 	return err;
@@ -503,8 +531,10 @@ static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 	mutex_lock(&netns_bpf_mutex);
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
 		netns_bpf_run_array_detach(net, type);
-		list_for_each_entry(net_link, &net->bpf.links[type], node)
+		list_for_each_entry(net_link, &net->bpf.links[type], node) {
 			net_link->net = NULL; /* auto-detach link */
+			netns_bpf_attach_type_unneed(type);
+		}
 		if (net->bpf.progs[type])
 			bpf_prog_put(net->bpf.progs[type]);
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index d099436b3ff5..2bd129b5ae74 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9230,6 +9230,9 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
 
+DEFINE_STATIC_KEY_FALSE(bpf_sk_lookup_enabled);
+EXPORT_SYMBOL(bpf_sk_lookup_enabled);
+
 BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
 	   struct sock *, sk, u64, flags)
 {
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ab64834837c8..4eb4cd8d20dd 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -299,6 +299,29 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	return result;
 }
 
+static inline struct sock *inet_lookup_run_bpf(struct net *net,
+					       struct inet_hashinfo *hashinfo,
+					       struct sk_buff *skb, int doff,
+					       __be32 saddr, __be16 sport,
+					       __be32 daddr, u16 hnum)
+{
+	struct sock *sk, *reuse_sk;
+	bool no_reuseport;
+
+	if (hashinfo != &tcp_hashinfo)
+		return NULL; /* only TCP is supported */
+
+	no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_TCP,
+					    saddr, sport, daddr, hnum, &sk);
+	if (no_reuseport || IS_ERR_OR_NULL(sk))
+		return sk;
+
+	reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
+	if (reuse_sk)
+		sk = reuse_sk;
+	return sk;
+}
+
 struct sock *__inet_lookup_listener(struct net *net,
 				    struct inet_hashinfo *hashinfo,
 				    struct sk_buff *skb, int doff,
@@ -310,6 +333,14 @@ struct sock *__inet_lookup_listener(struct net *net,
 	struct sock *result = NULL;
 	unsigned int hash2;
 
+	/* Lookup redirect from BPF */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
+		result = inet_lookup_run_bpf(net, hashinfo, skb, doff,
+					     saddr, sport, daddr, hnum);
+		if (result)
+			goto done;
+	}
+
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 05/15] inet6: Extract helper for selecting socket from reuseport group
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (3 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 04/15] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 06/15] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko

Prepare for calling into reuseport from inet6_lookup_listener as well.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/inet6_hashtables.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index fbe9d4295eac..03942eef8ab6 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -111,6 +111,23 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb, int doff,
+					    const struct in6_addr *saddr,
+					    __be16 sport,
+					    const struct in6_addr *daddr,
+					    unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 phash;
+
+	if (sk->sk_reuseport) {
+		phash = inet6_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *inet6_lhash2_lookup(struct net *net,
 		struct inet_listen_hashbucket *ilb2,
@@ -123,21 +140,17 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 	struct inet_connection_sock *icsk;
 	struct sock *sk, *result = NULL;
 	int score, hiscore = 0;
-	u32 phash = 0;
 
 	inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) {
 		sk = (struct sock *)icsk;
 		score = compute_score(sk, net, hnum, daddr, dif, sdif,
 				      exact_dif);
 		if (score > hiscore) {
-			if (sk->sk_reuseport) {
-				phash = inet6_ehashfn(net, daddr, hnum,
-						      saddr, sport);
-				result = reuseport_select_sock(sk, phash,
-							       skb, doff);
-				if (result)
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb, doff,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			hiscore = score;
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 06/15] inet6: Run SK_LOOKUP BPF program on socket lookup
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (4 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 05/15] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 07/15] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski

Following ipv4 stack changes, run a BPF program attached to netns before
looking up a listening socket. Program can return a listening socket to use
as result of socket lookup, fail the lookup, or take no action.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v5:
    - Simplify prog runners now that only SK_DROP/PASS can be returned.
    
    v4:
    - Adapt to changes in BPF prog return codes.
    - Invert return value from bpf_sk_lookup_run_v6 to true on skip reuseport.
    
    v3:
    - Use a static_key to minimize the hook overhead when not used. (Alexei)
    - Don't copy struct in6_addr when populating BPF prog context. (Martin)
    - Adapt for running an array of attached programs. (Alexei)
    - Adapt for optionally skipping reuseport selection. (Martin)

 include/linux/filter.h      | 39 +++++++++++++++++++++++++++++++++++++
 net/ipv6/inet6_hashtables.c | 35 +++++++++++++++++++++++++++++++++
 2 files changed, 74 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index c4f54c216347..8252572db918 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1386,4 +1386,43 @@ static inline bool bpf_sk_lookup_run_v4(struct net *net, int protocol,
 	return no_reuseport;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static inline bool bpf_sk_lookup_run_v6(struct net *net, int protocol,
+					const struct in6_addr *saddr,
+					const __be16 sport,
+					const struct in6_addr *daddr,
+					const u16 dport,
+					struct sock **psk)
+{
+	struct bpf_prog_array *run_array;
+	struct sock *selected_sk = NULL;
+	bool no_reuseport = false;
+
+	rcu_read_lock();
+	run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
+	if (run_array) {
+		struct bpf_sk_lookup_kern ctx = {
+			.family		= AF_INET6,
+			.protocol	= protocol,
+			.v6.saddr	= saddr,
+			.v6.daddr	= daddr,
+			.sport		= sport,
+			.dport		= dport,
+		};
+		u32 act;
+
+		act = BPF_PROG_SK_LOOKUP_RUN_ARRAY(run_array, ctx, BPF_PROG_RUN);
+		if (act == SK_PASS) {
+			selected_sk = ctx.selected_sk;
+			no_reuseport = ctx.no_reuseport;
+		} else {
+			selected_sk = ERR_PTR(-ECONNREFUSED);
+		}
+	}
+	rcu_read_unlock();
+	*psk = selected_sk;
+	return no_reuseport;
+}
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 03942eef8ab6..2d3add9e6116 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -21,6 +21,8 @@
 #include <net/ip.h>
 #include <net/sock_reuseport.h>
 
+extern struct inet_hashinfo tcp_hashinfo;
+
 u32 inet6_ehashfn(const struct net *net,
 		  const struct in6_addr *laddr, const u16 lport,
 		  const struct in6_addr *faddr, const __be16 fport)
@@ -159,6 +161,31 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 	return result;
 }
 
+static inline struct sock *inet6_lookup_run_bpf(struct net *net,
+						struct inet_hashinfo *hashinfo,
+						struct sk_buff *skb, int doff,
+						const struct in6_addr *saddr,
+						const __be16 sport,
+						const struct in6_addr *daddr,
+						const u16 hnum)
+{
+	struct sock *sk, *reuse_sk;
+	bool no_reuseport;
+
+	if (hashinfo != &tcp_hashinfo)
+		return NULL; /* only TCP is supported */
+
+	no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_TCP,
+					    saddr, sport, daddr, hnum, &sk);
+	if (no_reuseport || IS_ERR_OR_NULL(sk))
+		return sk;
+
+	reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
+	if (reuse_sk)
+		sk = reuse_sk;
+	return sk;
+}
+
 struct sock *inet6_lookup_listener(struct net *net,
 		struct inet_hashinfo *hashinfo,
 		struct sk_buff *skb, int doff,
@@ -170,6 +197,14 @@ struct sock *inet6_lookup_listener(struct net *net,
 	struct sock *result = NULL;
 	unsigned int hash2;
 
+	/* Lookup redirect from BPF */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
+		result = inet6_lookup_run_bpf(net, hashinfo, skb, doff,
+					      saddr, sport, daddr, hnum);
+		if (result)
+			goto done;
+	}
+
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 07/15] udp: Extract helper for selecting socket from reuseport group
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (5 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 06/15] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 08/15] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko

Prepare for calling into reuseport from __udp4_lib_lookup as well.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 073d346f515c..9296faea3acf 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -408,6 +408,25 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
 			      udp_ehash_secret + net_hash_mix(net));
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb,
+					    __be32 saddr, __be16 sport,
+					    __be32 daddr, unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 hash;
+
+	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
+		hash = udp_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, hash, skb,
+						 sizeof(struct udphdr));
+		/* Fall back to scoring if group has connections */
+		if (reuseport_has_conns(sk, false))
+			return NULL;
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 saddr, __be16 sport,
@@ -418,7 +437,6 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
-	u32 hash = 0;
 
 	result = NULL;
 	badness = 0;
@@ -426,15 +444,11 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport &&
-			    sk->sk_state != TCP_ESTABLISHED) {
-				hash = udp_ehashfn(net, daddr, hnum,
-						   saddr, sport);
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			badness = score;
 			result = sk;
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 08/15] udp: Run SK_LOOKUP BPF program on socket lookup
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (6 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 07/15] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 09/15] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski,
	Andrii Nakryiko

Following INET/TCP socket lookup changes, modify UDP socket lookup to let
BPF program select a receiving socket before searching for a socket by
destination address and port as usual.

Lookup of connected sockets that match packet 4-tuple is unaffected by this
change. BPF program runs, and potentially overrides the lookup result, only
if a 4-tuple match was not found.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Adapt to change in bpf_sk_lookup_run_v4 return value semantics.
    
    v3:
    - Use a static_key to minimize the hook overhead when not used. (Alexei)
    - Adapt for running an array of attached programs. (Alexei)
    - Adapt for optionally skipping reuseport selection. (Martin)

 net/ipv4/udp.c | 59 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9296faea3acf..b738c63d7a77 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -456,6 +456,29 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 	return result;
 }
 
+static inline struct sock *udp4_lookup_run_bpf(struct net *net,
+					       struct udp_table *udptable,
+					       struct sk_buff *skb,
+					       __be32 saddr, __be16 sport,
+					       __be32 daddr, u16 hnum)
+{
+	struct sock *sk, *reuse_sk;
+	bool no_reuseport;
+
+	if (udptable != &udp_table)
+		return NULL; /* only UDP is supported */
+
+	no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_UDP,
+					    saddr, sport, daddr, hnum, &sk);
+	if (no_reuseport || IS_ERR_OR_NULL(sk))
+		return sk;
+
+	reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+	if (reuse_sk)
+		sk = reuse_sk;
+	return sk;
+}
+
 /* UDP is nearly always wildcards out the wazoo, it makes no sense to try
  * harder than this. -DaveM
  */
@@ -463,27 +486,45 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		__be16 sport, __be32 daddr, __be16 dport, int dif,
 		int sdif, struct udp_table *udptable, struct sk_buff *skb)
 {
-	struct sock *result;
 	unsigned short hnum = ntohs(dport);
 	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
+	struct sock *result, *sk;
 
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected or non-wildcard socket */
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
 				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
-		slot2 = hash2 & udptable->mask;
-		hslot2 = &udptable->hash2[slot2];
-
-		result = udp4_lib_lookup2(net, saddr, sport,
-					  htonl(INADDR_ANY), hnum, dif, sdif,
-					  hslot2, skb);
+	if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
+		goto done;
+
+	/* Lookup redirect from BPF */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
+		sk = udp4_lookup_run_bpf(net, udptable, skb,
+					 saddr, sport, daddr, hnum);
+		if (sk) {
+			result = sk;
+			goto done;
+		}
 	}
+
+	/* Got non-wildcard socket or error on first lookup */
+	if (result)
+		goto done;
+
+	/* Lookup wildcard sockets */
+	hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp4_lib_lookup2(net, saddr, sport,
+				  htonl(INADDR_ANY), hnum, dif, sdif,
+				  hslot2, skb);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 09/15] udp6: Extract helper for selecting socket from reuseport group
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (7 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 08/15] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 10/15] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko

Prepare for calling into reuseport from __udp6_lib_lookup as well.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/udp.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 38c0d9350c6b..084205c18a33 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -141,6 +141,27 @@ static int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb,
+					    const struct in6_addr *saddr,
+					    __be16 sport,
+					    const struct in6_addr *daddr,
+					    unsigned int hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 hash;
+
+	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
+		hash = udp6_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, hash, skb,
+						 sizeof(struct udphdr));
+		/* Fall back to scoring if group has connections */
+		if (reuseport_has_conns(sk, false))
+			return NULL;
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
@@ -150,7 +171,6 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
-	u32 hash = 0;
 
 	result = NULL;
 	badness = -1;
@@ -158,16 +178,11 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport &&
-			    sk->sk_state != TCP_ESTABLISHED) {
-				hash = udp6_ehashfn(net, daddr, hnum,
-						    saddr, sport);
-
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			badness = score;
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 10/15] udp6: Run SK_LOOKUP BPF program on socket lookup
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (8 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 09/15] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 11/15] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski,
	Andrii Nakryiko

Same as for udp4, let BPF program override the socket lookup result, by
selecting a receiving socket of its choice or failing the lookup, if no
connected UDP socket matched packet 4-tuple.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Adapt to change in bpf_sk_lookup_run_v6 return value semantics.
    
    v3:
    - Use a static_key to minimize the hook overhead when not used. (Alexei)
    - Adapt for running an array of attached programs. (Alexei)
    - Adapt for optionally skipping reuseport selection. (Martin)

 net/ipv6/udp.c | 60 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 51 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 084205c18a33..ff8be202726a 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -190,6 +190,31 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 	return result;
 }
 
+static inline struct sock *udp6_lookup_run_bpf(struct net *net,
+					       struct udp_table *udptable,
+					       struct sk_buff *skb,
+					       const struct in6_addr *saddr,
+					       __be16 sport,
+					       const struct in6_addr *daddr,
+					       u16 hnum)
+{
+	struct sock *sk, *reuse_sk;
+	bool no_reuseport;
+
+	if (udptable != &udp_table)
+		return NULL; /* only UDP is supported */
+
+	no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_UDP,
+					    saddr, sport, daddr, hnum, &sk);
+	if (no_reuseport || IS_ERR_OR_NULL(sk))
+		return sk;
+
+	reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+	if (reuse_sk)
+		sk = reuse_sk;
+	return sk;
+}
+
 /* rcu_read_lock() must be held */
 struct sock *__udp6_lib_lookup(struct net *net,
 			       const struct in6_addr *saddr, __be16 sport,
@@ -200,25 +225,42 @@ struct sock *__udp6_lib_lookup(struct net *net,
 	unsigned short hnum = ntohs(dport);
 	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
-	struct sock *result;
+	struct sock *result, *sk;
 
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected or non-wildcard sockets */
 	result = udp6_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
 				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
-		slot2 = hash2 & udptable->mask;
+	if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
+		goto done;
+
+	/* Lookup redirect from BPF */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
+		sk = udp6_lookup_run_bpf(net, udptable, skb,
+					 saddr, sport, daddr, hnum);
+		if (sk) {
+			result = sk;
+			goto done;
+		}
+	}
 
-		hslot2 = &udptable->hash2[slot2];
+	/* Got non-wildcard socket or error on first lookup */
+	if (result)
+		goto done;
 
-		result = udp6_lib_lookup2(net, saddr, sport,
-					  &in6addr_any, hnum, dif, sdif,
-					  hslot2, skb);
-	}
+	/* Lookup wildcard sockets */
+	hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp6_lib_lookup2(net, saddr, sport,
+				  &in6addr_any, hnum, dif, sdif,
+				  hslot2, skb);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 11/15] bpf: Sync linux/bpf.h to tools/
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (9 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 10/15] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 12/15] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko

Newly added program, context type and helper is used by tests in a
subsequent patch. Synchronize the header file.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Update after changes to bpf.h in earlier patch.
    
    v3:
    - Update after changes to bpf.h in earlier patch.
    
    v2:
    - Update after changes to bpf.h in earlier patch.

 tools/include/uapi/linux/bpf.h | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 7ac3992dacfe..54d0c886e3ba 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -189,6 +189,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_STRUCT_OPS,
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
+	BPF_PROG_TYPE_SK_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -228,6 +229,7 @@ enum bpf_attach_type {
 	BPF_XDP_DEVMAP,
 	BPF_CGROUP_INET_SOCK_RELEASE,
 	BPF_XDP_CPUMAP,
+	BPF_SK_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3069,6 +3071,10 @@ union bpf_attr {
  *
  * long bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
  *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
+ *		**BPF_PROG_TYPE_SCHED_ACT** programs.
+ *
  *		Assign the *sk* to the *skb*. When combined with appropriate
  *		routing configuration to receive the packet towards the socket,
  *		will cause *skb* to be delivered to the specified socket.
@@ -3094,6 +3100,56 @@ union bpf_attr {
  *		**-ESOCKTNOSUPPORT** if the socket type is not supported
  *		(reuseport).
  *
+ * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
+ *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
+ *
+ *		Select the *sk* as a result of a socket lookup.
+ *
+ *		For the operation to succeed passed socket must be compatible
+ *		with the packet description provided by the *ctx* object.
+ *
+ *		L4 protocol (**IPPROTO_TCP** or **IPPROTO_UDP**) must
+ *		be an exact match. While IP family (**AF_INET** or
+ *		**AF_INET6**) must be compatible, that is IPv6 sockets
+ *		that are not v6-only can be selected for IPv4 packets.
+ *
+ *		Only TCP listeners and UDP unconnected sockets can be
+ *		selected. *sk* can also be NULL to reset any previous
+ *		selection.
+ *
+ *		*flags* argument can combination of following values:
+ *
+ *		* **BPF_SK_LOOKUP_F_REPLACE** to override the previous
+ *		  socket selection, potentially done by a BPF program
+ *		  that ran before us.
+ *
+ *		* **BPF_SK_LOOKUP_F_NO_REUSEPORT** to skip
+ *		  load-balancing within reuseport group for the socket
+ *		  being selected.
+ *
+ *		On success *ctx->sk* will point to the selected socket.
+ *
+ *	Return
+ *		0 on success, or a negative errno in case of failure.
+ *
+ *		* **-EAFNOSUPPORT** if socket family (*sk->family*) is
+ *		  not compatible with packet family (*ctx->family*).
+ *
+ *		* **-EEXIST** if socket has been already selected,
+ *		  potentially by another program, and
+ *		  **BPF_SK_LOOKUP_F_REPLACE** flag was not specified.
+ *
+ *		* **-EINVAL** if unsupported flags were specified.
+ *
+ *		* **-EPROTOTYPE** if socket L4 protocol
+ *		  (*sk->protocol*) doesn't match packet protocol
+ *		  (*ctx->protocol*).
+ *
+ *		* **-ESOCKTNOSUPPORT** if socket is not in allowed
+ *		  state (TCP listening or UDP unconnected).
+ *
  * u64 bpf_ktime_get_boot_ns(void)
  * 	Description
  * 		Return the time elapsed since system boot, in nanoseconds.
@@ -3607,6 +3663,12 @@ enum {
 	BPF_RINGBUF_HDR_SZ		= 8,
 };
 
+/* BPF_FUNC_sk_assign flags in bpf_sk_lookup context. */
+enum {
+	BPF_SK_LOOKUP_F_REPLACE		= (1ULL << 0),
+	BPF_SK_LOOKUP_F_NO_REUSEPORT	= (1ULL << 1),
+};
+
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
 	BPF_ADJ_ROOM_NET,
@@ -4349,4 +4411,19 @@ struct bpf_pidns_info {
 	__u32 pid;
 	__u32 tgid;
 };
+
+/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
+struct bpf_sk_lookup {
+	__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
+
+	__u32 family;		/* Protocol family (AF_INET, AF_INET6) */
+	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
+	__u32 remote_ip4;	/* Network byte order */
+	__u32 remote_ip6[4];	/* Network byte order */
+	__u32 remote_port;	/* Network byte order */
+	__u32 local_ip4;	/* Network byte order */
+	__u32 local_ip6[4];	/* Network byte order */
+	__u32 local_port;	/* Host byte order */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 12/15] libbpf: Add support for SK_LOOKUP program type
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (10 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 11/15] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 13/15] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko

Make libbpf aware of the newly added program type, and assign it a
section name.

Acked-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Add trailing slash to section prefix ("sk_lookup/"). (Andrii)
    
    v3:
    - Move new libbpf symbols to version 0.1.0.
    - Set expected_attach_type in probe_load for new prog type.
    
    v2:
    - Add new libbpf symbols to version 0.0.9. (Andrii)

 tools/lib/bpf/libbpf.c        | 3 +++
 tools/lib/bpf/libbpf.h        | 2 ++
 tools/lib/bpf/libbpf.map      | 2 ++
 tools/lib/bpf/libbpf_probes.c | 3 +++
 4 files changed, 10 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index f55fd8a5c008..846164c79df1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6799,6 +6799,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
 BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
 BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
+BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
 
 enum bpf_attach_type
 bpf_program__get_expected_attach_type(struct bpf_program *prog)
@@ -6981,6 +6982,8 @@ static const struct bpf_sec_def section_defs[] = {
 	BPF_EAPROG_SEC("cgroup/setsockopt",	BPF_PROG_TYPE_CGROUP_SOCKOPT,
 						BPF_CGROUP_SETSOCKOPT),
 	BPF_PROG_SEC("struct_ops",		BPF_PROG_TYPE_STRUCT_OPS),
+	BPF_EAPROG_SEC("sk_lookup/",		BPF_PROG_TYPE_SK_LOOKUP,
+						BPF_SK_LOOKUP),
 };
 
 #undef BPF_PROG_SEC_IMPL
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2335971ed0bd..c2272132e929 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -350,6 +350,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
+LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
 
 LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
 LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
@@ -377,6 +378,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
+LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);
 
 /*
  * No need for __attribute__((packed)), all members of 'bpf_map_def'
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index c5d5c7664c3b..6f0856abe299 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -287,6 +287,8 @@ LIBBPF_0.1.0 {
 		bpf_map__type;
 		bpf_map__value_size;
 		bpf_program__autoload;
+		bpf_program__is_sk_lookup;
 		bpf_program__set_autoload;
+		bpf_program__set_sk_lookup;
 		btf__set_fd;
 } LIBBPF_0.0.9;
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 10cd8d1891f5..5a3d3f078408 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -78,6 +78,9 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
 		xattr.expected_attach_type = BPF_CGROUP_INET4_CONNECT;
 		break;
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		xattr.expected_attach_type = BPF_SK_LOOKUP;
+		break;
 	case BPF_PROG_TYPE_KPROBE:
 		xattr.kern_version = get_kernel_version();
 		break;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 13/15] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (11 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 12/15] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 14/15] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Make bpftool show human-friendly identifiers for newly introduced program
and attach type, BPF_PROG_TYPE_SK_LOOKUP and BPF_SK_LOOKUP, respectively.

Also, add the new prog type bash-completion, man page and help message.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v5:
    - Update prog type list in bash-completion and bpftool-prog man page. (Andrii)
    
    v3:
    - New patch in v3.

 tools/bpf/bpftool/Documentation/bpftool-prog.rst | 2 +-
 tools/bpf/bpftool/bash-completion/bpftool        | 2 +-
 tools/bpf/bpftool/common.c                       | 1 +
 tools/bpf/bpftool/prog.c                         | 3 ++-
 4 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpftool/Documentation/bpftool-prog.rst b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
index 412ea3d9bf7f..82e356b664e8 100644
--- a/tools/bpf/bpftool/Documentation/bpftool-prog.rst
+++ b/tools/bpf/bpftool/Documentation/bpftool-prog.rst
@@ -45,7 +45,7 @@ PROG COMMANDS
 |               **cgroup/getsockname4** | **cgroup/getsockname6** | **cgroup/sendmsg4** | **cgroup/sendmsg6** |
 |		**cgroup/recvmsg4** | **cgroup/recvmsg6** | **cgroup/sysctl** |
 |		**cgroup/getsockopt** | **cgroup/setsockopt** |
-|		**struct_ops** | **fentry** | **fexit** | **freplace**
+|		**struct_ops** | **fentry** | **fexit** | **freplace** | **sk_lookup**
 |	}
 |       *ATTACH_TYPE* := {
 |		**msg_verdict** | **stream_verdict** | **stream_parser** | **flow_dissector**
diff --git a/tools/bpf/bpftool/bash-completion/bpftool b/tools/bpf/bpftool/bash-completion/bpftool
index 25b25aca1112..7b137264ea3a 100644
--- a/tools/bpf/bpftool/bash-completion/bpftool
+++ b/tools/bpf/bpftool/bash-completion/bpftool
@@ -479,7 +479,7 @@ _bpftool()
                                 cgroup/post_bind4 cgroup/post_bind6 \
                                 cgroup/sysctl cgroup/getsockopt \
                                 cgroup/setsockopt struct_ops \
-                                fentry fexit freplace" -- \
+                                fentry fexit freplace sk_lookup" -- \
                                                    "$cur" ) )
                             return 0
                             ;;
diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 29f4e7611ae8..9b28c69dd8e4 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -64,6 +64,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = {
 	[BPF_TRACE_FEXIT]		= "fexit",
 	[BPF_MODIFY_RETURN]		= "mod_ret",
 	[BPF_LSM_MAC]			= "lsm_mac",
+	[BPF_SK_LOOKUP]			= "sk_lookup",
 };
 
 void p_err(const char *fmt, ...)
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 6863c57effd0..3e6ecc6332e2 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -59,6 +59,7 @@ const char * const prog_type_name[] = {
 	[BPF_PROG_TYPE_TRACING]			= "tracing",
 	[BPF_PROG_TYPE_STRUCT_OPS]		= "struct_ops",
 	[BPF_PROG_TYPE_EXT]			= "ext",
+	[BPF_PROG_TYPE_SK_LOOKUP]		= "sk_lookup",
 };
 
 const size_t prog_type_name_size = ARRAY_SIZE(prog_type_name);
@@ -1905,7 +1906,7 @@ static int do_help(int argc, char **argv)
 		"                 cgroup/getsockname4 | cgroup/getsockname6 | cgroup/sendmsg4 |\n"
 		"                 cgroup/sendmsg6 | cgroup/recvmsg4 | cgroup/recvmsg6 |\n"
 		"                 cgroup/getsockopt | cgroup/setsockopt |\n"
-		"                 struct_ops | fentry | fexit | freplace }\n"
+		"                 struct_ops | fentry | fexit | freplace | sk_lookup }\n"
 		"       ATTACH_TYPE := { msg_verdict | stream_verdict | stream_parser |\n"
 		"                        flow_dissector }\n"
 		"       METRIC := { cycles | instructions | l1d_loads | llc_misses }\n"
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 14/15] selftests/bpf: Add verifier tests for bpf_sk_lookup context access
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (12 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 13/15] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-17 10:35 ` [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Exercise verifier access checks for bpf_sk_lookup context fields.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v5:
    - Return an allowed value (SK_DROP) in all tests.
    
    v4:
    - Bring back tests for narrow loads.
    
    v3:
    - Consolidate ACCEPT tests into one.
    - Deduplicate REJECT tests and arrange them into logical groups.
    - Add tests for out-of-bounds and unaligned access.
    - Cover access to newly introduced 'sk' field.
    
    v2:
     - Adjust for fields renames in struct bpf_sk_lookup.

 .../selftests/bpf/verifier/ctx_sk_lookup.c    | 492 ++++++++++++++++++
 1 file changed, 492 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c

diff --git a/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
new file mode 100644
index 000000000000..2ad5f974451c
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
@@ -0,0 +1,492 @@
+{
+	"valid 1,2,4,8-byte reads from bpf_sk_lookup",
+	.insns = {
+		/* 1-byte read from family field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family) + 3),
+		/* 2-byte read from family field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family) + 2),
+		/* 4-byte read from family field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+
+		/* 1-byte read from protocol field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol) + 3),
+		/* 2-byte read from protocol field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol) + 2),
+		/* 4-byte read from protocol field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+
+		/* 1-byte read from remote_ip4 field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4) + 3),
+		/* 2-byte read from remote_ip4 field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4) + 2),
+		/* 4-byte read from remote_ip4 field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4)),
+
+		/* 1-byte read from remote_ip6 field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 3),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 4),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 5),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 6),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 7),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 8),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 9),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 10),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 11),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 12),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 13),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 14),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 15),
+		/* 2-byte read from remote_ip6 field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 2),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 4),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 6),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 8),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 10),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 12),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 14),
+		/* 4-byte read from remote_ip6 field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6)),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 4),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 8),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 12),
+
+		/* 1-byte read from remote_port field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port) + 3),
+		/* 2-byte read from remote_port field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port) + 2),
+		/* 4-byte read from remote_port field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port)),
+
+		/* 1-byte read from local_ip4 field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4) + 3),
+		/* 2-byte read from local_ip4 field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4) + 2),
+		/* 4-byte read from local_ip4 field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4)),
+
+		/* 1-byte read from local_ip6 field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 3),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 4),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 5),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 6),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 7),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 8),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 9),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 10),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 11),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 12),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 13),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 14),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 15),
+		/* 2-byte read from local_ip6 field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 2),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 4),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 6),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 8),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 10),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 12),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 14),
+		/* 4-byte read from local_ip6 field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6)),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 4),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 8),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 12),
+
+		/* 1-byte read from local_port field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port) + 3),
+		/* 2-byte read from local_port field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port) + 2),
+		/* 4-byte read from local_port field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port)),
+
+		/* 8-byte read from sk field */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, sk)),
+
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+/* invalid 8-byte reads from a 4-byte fields in bpf_sk_lookup */
+{
+	"invalid 8-byte read from bpf_sk_lookup family field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup protocol field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup remote_ip4 field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup remote_ip6 field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup remote_port field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup local_ip4 field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup local_ip6 field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup local_port field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+/* invalid 1,2,4-byte reads from 8-byte fields in bpf_sk_lookup */
+{
+	"invalid 4-byte read from bpf_sk_lookup sk field",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, sk)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read from bpf_sk_lookup sk field",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, sk)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read from bpf_sk_lookup sk field",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, sk)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+/* out of bounds and unaligned reads from bpf_sk_lookup */
+{
+	"invalid 4-byte read past end of bpf_sk_lookup",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    sizeof(struct bpf_sk_lookup)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 4-byte unaligned read from bpf_sk_lookup at odd offset",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, 1),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 4-byte unaligned read from bpf_sk_lookup at even offset",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, 2),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+/* in-bound and out-of-bound writes to bpf_sk_lookup */
+{
+	"invalid 8-byte write to bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write to bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write to bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0, 0),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write to bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write past end of bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    sizeof(struct bpf_sk_lookup)),
+		BPF_MOV32_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (13 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 14/15] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
@ 2020-07-17 10:35 ` Jakub Sitnicki
  2020-07-28 20:13   ` Andrii Nakryiko
  2020-07-17 16:40 ` [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Lorenz Bauer
  2020-08-18 15:49 ` BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks Jakub Sitnicki
  16 siblings, 1 reply; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-17 10:35 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Add tests to test_progs that exercise:

 - attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
 - redirecting socket lookup to a socket selected by BPF program,
 - failing a socket lookup on BPF program's request,
 - error scenarios for selecting a socket from BPF program,
 - accessing BPF program context,
 - attaching and running multiple BPF programs.

Run log:

  bash-5.0# ./test_progs -n 70
  #70/1 query lookup prog:OK
  #70/2 TCP IPv4 redir port:OK
  #70/3 TCP IPv4 redir addr:OK
  #70/4 TCP IPv4 redir with reuseport:OK
  #70/5 TCP IPv4 redir skip reuseport:OK
  #70/6 TCP IPv6 redir port:OK
  #70/7 TCP IPv6 redir addr:OK
  #70/8 TCP IPv4->IPv6 redir port:OK
  #70/9 TCP IPv6 redir with reuseport:OK
  #70/10 TCP IPv6 redir skip reuseport:OK
  #70/11 UDP IPv4 redir port:OK
  #70/12 UDP IPv4 redir addr:OK
  #70/13 UDP IPv4 redir with reuseport:OK
  #70/14 UDP IPv4 redir skip reuseport:OK
  #70/15 UDP IPv6 redir port:OK
  #70/16 UDP IPv6 redir addr:OK
  #70/17 UDP IPv4->IPv6 redir port:OK
  #70/18 UDP IPv6 redir and reuseport:OK
  #70/19 UDP IPv6 redir skip reuseport:OK
  #70/20 TCP IPv4 drop on lookup:OK
  #70/21 TCP IPv6 drop on lookup:OK
  #70/22 UDP IPv4 drop on lookup:OK
  #70/23 UDP IPv6 drop on lookup:OK
  #70/24 TCP IPv4 drop on reuseport:OK
  #70/25 TCP IPv6 drop on reuseport:OK
  #70/26 UDP IPv4 drop on reuseport:OK
  #70/27 TCP IPv6 drop on reuseport:OK
  #70/28 sk_assign returns EEXIST:OK
  #70/29 sk_assign honors F_REPLACE:OK
  #70/30 sk_assign accepts NULL socket:OK
  #70/31 access ctx->sk:OK
  #70/32 narrow access to ctx v4:OK
  #70/33 narrow access to ctx v6:OK
  #70/34 sk_assign rejects TCP established:OK
  #70/35 sk_assign rejects UDP connected:OK
  #70/36 multi prog - pass, pass:OK
  #70/37 multi prog - drop, drop:OK
  #70/38 multi prog - pass, drop:OK
  #70/39 multi prog - drop, pass:OK
  #70/40 multi prog - pass, redir:OK
  #70/41 multi prog - redir, pass:OK
  #70/42 multi prog - drop, redir:OK
  #70/43 multi prog - redir, drop:OK
  #70/44 multi prog - redir, redir:OK
  #70 sk_lookup:OK
  Summary: 1/44 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v5:
    - Rename progs/test_sk_lookup_kern.c to progs/test_sk_lookup.c. (Andrii)
    - Adapt tests for BPF skeleton name change after above rename.
    - Remove tests for narrow loads at an offset wider in size than target field.
    
    v4:
    - Remove system("bpftool ...") call left over from debugging. (Lorenz)
    - Dedup BPF code that selects a socket. (Lorenz)
    - Switch from CHECK_FAIL to CHECK macro. (Andrii)
    - Extract a network_helper that wraps inet_pton.
    - Don't restore netns now that test_progs does it.
    - Cover bpf_sk_assign(ctx, NULL) in tests.
    - Cover narrow loads in tests.
    - Cover NULL ctx->sk access attempts in tests.
    - Cover accessing IPv6 ctx fields on IPv4 lookup.
    
    v3:
    - Extend tests to cover new functionality in v3:
      - multi-prog attachments (query, running, verdict precedence)
      - socket selecting for the second time with bpf_sk_assign
      - skipping over reuseport load-balancing
    
    v2:
     - Adjust for fields renames in struct bpf_sk_lookup.

 tools/testing/selftests/bpf/network_helpers.c |   58 +-
 tools/testing/selftests/bpf/network_helpers.h |    2 +
 .../selftests/bpf/prog_tests/sk_lookup.c      | 1282 +++++++++++++++++
 .../selftests/bpf/progs/test_sk_lookup.c      |  641 +++++++++
 4 files changed, 1960 insertions(+), 23 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_sk_lookup.c

diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
index acd08715be2e..f56655690f9b 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -73,29 +73,8 @@ int start_server(int family, int type, const char *addr_str, __u16 port,
 	socklen_t len;
 	int fd;
 
-	if (family == AF_INET) {
-		struct sockaddr_in *sin = (void *)&addr;
-
-		sin->sin_family = AF_INET;
-		sin->sin_port = htons(port);
-		if (addr_str &&
-		    inet_pton(AF_INET, addr_str, &sin->sin_addr) != 1) {
-			log_err("inet_pton(AF_INET, %s)", addr_str);
-			return -1;
-		}
-		len = sizeof(*sin);
-	} else {
-		struct sockaddr_in6 *sin6 = (void *)&addr;
-
-		sin6->sin6_family = AF_INET6;
-		sin6->sin6_port = htons(port);
-		if (addr_str &&
-		    inet_pton(AF_INET6, addr_str, &sin6->sin6_addr) != 1) {
-			log_err("inet_pton(AF_INET6, %s)", addr_str);
-			return -1;
-		}
-		len = sizeof(*sin6);
-	}
+	if (make_sockaddr(family, addr_str, port, &addr, &len))
+		return -1;
 
 	fd = socket(family, type, 0);
 	if (fd < 0) {
@@ -194,3 +173,36 @@ int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms)
 
 	return 0;
 }
+
+int make_sockaddr(int family, const char *addr_str, __u16 port,
+		  struct sockaddr_storage *addr, socklen_t *len)
+{
+	if (family == AF_INET) {
+		struct sockaddr_in *sin = (void *)addr;
+
+		sin->sin_family = AF_INET;
+		sin->sin_port = htons(port);
+		if (addr_str &&
+		    inet_pton(AF_INET, addr_str, &sin->sin_addr) != 1) {
+			log_err("inet_pton(AF_INET, %s)", addr_str);
+			return -1;
+		}
+		if (len)
+			*len = sizeof(*sin);
+		return 0;
+	} else if (family == AF_INET6) {
+		struct sockaddr_in6 *sin6 = (void *)addr;
+
+		sin6->sin6_family = AF_INET6;
+		sin6->sin6_port = htons(port);
+		if (addr_str &&
+		    inet_pton(AF_INET6, addr_str, &sin6->sin6_addr) != 1) {
+			log_err("inet_pton(AF_INET6, %s)", addr_str);
+			return -1;
+		}
+		if (len)
+			*len = sizeof(*sin6);
+		return 0;
+	}
+	return -1;
+}
diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
index f580e82fda58..c3728f6667e4 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -37,5 +37,7 @@ int start_server(int family, int type, const char *addr, __u16 port,
 		 int timeout_ms);
 int connect_to_fd(int server_fd, int timeout_ms);
 int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms);
+int make_sockaddr(int family, const char *addr_str, __u16 port,
+		  struct sockaddr_storage *addr, socklen_t *len);
 
 #endif
diff --git a/tools/testing/selftests/bpf/prog_tests/sk_lookup.c b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
new file mode 100644
index 000000000000..f1784ae4565a
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
@@ -0,0 +1,1282 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2020 Cloudflare
+/*
+ * Test BPF attach point for INET socket lookup (BPF_SK_LOOKUP).
+ *
+ * Tests exercise:
+ *  - attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
+ *  - redirecting socket lookup to a socket selected by BPF program,
+ *  - failing a socket lookup on BPF program's request,
+ *  - error scenarios for selecting a socket from BPF program,
+ *  - accessing BPF program context,
+ *  - attaching and running multiple BPF programs.
+ *
+ * Tests run in a dedicated network namespace.
+ */
+
+#define _GNU_SOURCE
+#include <arpa/inet.h>
+#include <assert.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+
+#include "test_progs.h"
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+#include "network_helpers.h"
+#include "test_sk_lookup.skel.h"
+
+/* External (address, port) pairs the client sends packets to. */
+#define EXT_IP4		"127.0.0.1"
+#define EXT_IP6		"fd00::1"
+#define EXT_PORT	7007
+
+/* Internal (address, port) pairs the server listens/receives at. */
+#define INT_IP4		"127.0.0.2"
+#define INT_IP4_V6	"::ffff:127.0.0.2"
+#define INT_IP6		"fd00::2"
+#define INT_PORT	8008
+
+#define IO_TIMEOUT_SEC	3
+
+enum server {
+	SERVER_A = 0,
+	SERVER_B = 1,
+	MAX_SERVERS,
+};
+
+enum {
+	PROG1 = 0,
+	PROG2,
+};
+
+struct inet_addr {
+	const char *ip;
+	unsigned short port;
+};
+
+struct test {
+	const char *desc;
+	struct bpf_program *lookup_prog;
+	struct bpf_program *reuseport_prog;
+	struct bpf_map *sock_map;
+	int sotype;
+	struct inet_addr connect_to;
+	struct inet_addr listen_at;
+	enum server accept_on;
+};
+
+static __u32 duration;		/* for CHECK macro */
+
+static bool is_ipv6(const char *ip)
+{
+	return !!strchr(ip, ':');
+}
+
+static int attach_reuseport(int sock_fd, struct bpf_program *reuseport_prog)
+{
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(reuseport_prog);
+	if (prog_fd < 0) {
+		errno = -prog_fd;
+		return -1;
+	}
+
+	err = setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF,
+			 &prog_fd, sizeof(prog_fd));
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static socklen_t inetaddr_len(const struct sockaddr_storage *addr)
+{
+	return (addr->ss_family == AF_INET ? sizeof(struct sockaddr_in) :
+		addr->ss_family == AF_INET6 ? sizeof(struct sockaddr_in6) : 0);
+}
+
+static int make_socket(int sotype, const char *ip, int port,
+		       struct sockaddr_storage *addr)
+{
+	struct timeval timeo = { .tv_sec = IO_TIMEOUT_SEC };
+	int err, family, fd;
+
+	family = is_ipv6(ip) ? AF_INET6 : AF_INET;
+	err = make_sockaddr(family, ip, port, addr, NULL);
+	if (CHECK(err, "make_address", "failed\n"))
+		return -1;
+
+	fd = socket(addr->ss_family, sotype, 0);
+	if (CHECK(fd < 0, "socket", "failed\n")) {
+		log_err("failed to make socket");
+		return -1;
+	}
+
+	err = setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &timeo, sizeof(timeo));
+	if (CHECK(err, "setsockopt(SO_SNDTIMEO)", "failed\n")) {
+		log_err("failed to set SNDTIMEO");
+		close(fd);
+		return -1;
+	}
+
+	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &timeo, sizeof(timeo));
+	if (CHECK(err, "setsockopt(SO_RCVTIMEO)", "failed\n")) {
+		log_err("failed to set RCVTIMEO");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int make_server(int sotype, const char *ip, int port,
+		       struct bpf_program *reuseport_prog)
+{
+	struct sockaddr_storage addr = {0};
+	const int one = 1;
+	int err, fd = -1;
+
+	fd = make_socket(sotype, ip, port, &addr);
+	if (fd < 0)
+		return -1;
+
+	/* Enabled for UDPv6 sockets for IPv4-mapped IPv6 to work. */
+	if (sotype == SOCK_DGRAM) {
+		err = setsockopt(fd, SOL_IP, IP_RECVORIGDSTADDR, &one,
+				 sizeof(one));
+		if (CHECK(err, "setsockopt(IP_RECVORIGDSTADDR)", "failed\n")) {
+			log_err("failed to enable IP_RECVORIGDSTADDR");
+			goto fail;
+		}
+	}
+
+	if (sotype == SOCK_DGRAM && addr.ss_family == AF_INET6) {
+		err = setsockopt(fd, SOL_IPV6, IPV6_RECVORIGDSTADDR, &one,
+				 sizeof(one));
+		if (CHECK(err, "setsockopt(IPV6_RECVORIGDSTADDR)", "failed\n")) {
+			log_err("failed to enable IPV6_RECVORIGDSTADDR");
+			goto fail;
+		}
+	}
+
+	if (sotype == SOCK_STREAM) {
+		err = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &one,
+				 sizeof(one));
+		if (CHECK(err, "setsockopt(SO_REUSEADDR)", "failed\n")) {
+			log_err("failed to enable SO_REUSEADDR");
+			goto fail;
+		}
+	}
+
+	if (reuseport_prog) {
+		err = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one,
+				 sizeof(one));
+		if (CHECK(err, "setsockopt(SO_REUSEPORT)", "failed\n")) {
+			log_err("failed to enable SO_REUSEPORT");
+			goto fail;
+		}
+	}
+
+	err = bind(fd, (void *)&addr, inetaddr_len(&addr));
+	if (CHECK(err, "bind", "failed\n")) {
+		log_err("failed to bind listen socket");
+		goto fail;
+	}
+
+	if (sotype == SOCK_STREAM) {
+		err = listen(fd, SOMAXCONN);
+		if (CHECK(err, "make_server", "listen")) {
+			log_err("failed to listen on port %d", port);
+			goto fail;
+		}
+	}
+
+	/* Late attach reuseport prog so we can have one init path */
+	if (reuseport_prog) {
+		err = attach_reuseport(fd, reuseport_prog);
+		if (CHECK(err, "attach_reuseport", "failed\n")) {
+			log_err("failed to attach reuseport prog");
+			goto fail;
+		}
+	}
+
+	return fd;
+fail:
+	close(fd);
+	return -1;
+}
+
+static int make_client(int sotype, const char *ip, int port)
+{
+	struct sockaddr_storage addr = {0};
+	int err, fd;
+
+	fd = make_socket(sotype, ip, port, &addr);
+	if (fd < 0)
+		return -1;
+
+	err = connect(fd, (void *)&addr, inetaddr_len(&addr));
+	if (CHECK(err, "make_client", "connect")) {
+		log_err("failed to connect client socket");
+		goto fail;
+	}
+
+	return fd;
+fail:
+	close(fd);
+	return -1;
+}
+
+static int send_byte(int fd)
+{
+	ssize_t n;
+
+	errno = 0;
+	n = send(fd, "a", 1, 0);
+	if (CHECK(n <= 0, "send_byte", "send")) {
+		log_err("failed/partial send");
+		return -1;
+	}
+	return 0;
+}
+
+static int recv_byte(int fd)
+{
+	char buf[1];
+	ssize_t n;
+
+	n = recv(fd, buf, sizeof(buf), 0);
+	if (CHECK(n <= 0, "recv_byte", "recv")) {
+		log_err("failed/partial recv");
+		return -1;
+	}
+	return 0;
+}
+
+static int tcp_recv_send(int server_fd)
+{
+	char buf[1];
+	int ret, fd;
+	ssize_t n;
+
+	fd = accept(server_fd, NULL, NULL);
+	if (CHECK(fd < 0, "accept", "failed\n")) {
+		log_err("failed to accept");
+		return -1;
+	}
+
+	n = recv(fd, buf, sizeof(buf), 0);
+	if (CHECK(n <= 0, "recv", "failed\n")) {
+		log_err("failed/partial recv");
+		ret = -1;
+		goto close;
+	}
+
+	n = send(fd, buf, n, 0);
+	if (CHECK(n <= 0, "send", "failed\n")) {
+		log_err("failed/partial send");
+		ret = -1;
+		goto close;
+	}
+
+	ret = 0;
+close:
+	close(fd);
+	return ret;
+}
+
+static void v4_to_v6(struct sockaddr_storage *ss)
+{
+	struct sockaddr_in6 *v6 = (struct sockaddr_in6 *)ss;
+	struct sockaddr_in v4 = *(struct sockaddr_in *)ss;
+
+	v6->sin6_family = AF_INET6;
+	v6->sin6_port = v4.sin_port;
+	v6->sin6_addr.s6_addr[10] = 0xff;
+	v6->sin6_addr.s6_addr[11] = 0xff;
+	memcpy(&v6->sin6_addr.s6_addr[12], &v4.sin_addr.s_addr, 4);
+}
+
+static int udp_recv_send(int server_fd)
+{
+	char cmsg_buf[CMSG_SPACE(sizeof(struct sockaddr_storage))];
+	struct sockaddr_storage _src_addr = { 0 };
+	struct sockaddr_storage *src_addr = &_src_addr;
+	struct sockaddr_storage *dst_addr = NULL;
+	struct msghdr msg = { 0 };
+	struct iovec iov = { 0 };
+	struct cmsghdr *cm;
+	char buf[1];
+	int ret, fd;
+	ssize_t n;
+
+	iov.iov_base = buf;
+	iov.iov_len = sizeof(buf);
+
+	msg.msg_name = src_addr;
+	msg.msg_namelen = sizeof(*src_addr);
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+
+	errno = 0;
+	n = recvmsg(server_fd, &msg, 0);
+	if (CHECK(n <= 0, "recvmsg", "failed\n")) {
+		log_err("failed to receive");
+		return -1;
+	}
+	if (CHECK(msg.msg_flags & MSG_CTRUNC, "recvmsg", "truncated cmsg\n"))
+		return -1;
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if ((cm->cmsg_level == SOL_IP &&
+		     cm->cmsg_type == IP_ORIGDSTADDR) ||
+		    (cm->cmsg_level == SOL_IPV6 &&
+		     cm->cmsg_type == IPV6_ORIGDSTADDR)) {
+			dst_addr = (struct sockaddr_storage *)CMSG_DATA(cm);
+			break;
+		}
+		log_err("warning: ignored cmsg at level %d type %d",
+			cm->cmsg_level, cm->cmsg_type);
+	}
+	if (CHECK(!dst_addr, "recvmsg", "missing ORIGDSTADDR\n"))
+		return -1;
+
+	/* Server socket bound to IPv4-mapped IPv6 address */
+	if (src_addr->ss_family == AF_INET6 &&
+	    dst_addr->ss_family == AF_INET) {
+		v4_to_v6(dst_addr);
+	}
+
+	/* Reply from original destination address. */
+	fd = socket(dst_addr->ss_family, SOCK_DGRAM, 0);
+	if (CHECK(fd < 0, "socket", "failed\n")) {
+		log_err("failed to create tx socket");
+		return -1;
+	}
+
+	ret = bind(fd, (struct sockaddr *)dst_addr, sizeof(*dst_addr));
+	if (CHECK(ret, "bind", "failed\n")) {
+		log_err("failed to bind tx socket");
+		goto out;
+	}
+
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	n = sendmsg(fd, &msg, 0);
+	if (CHECK(n <= 0, "sendmsg", "failed\n")) {
+		log_err("failed to send echo reply");
+		ret = -1;
+		goto out;
+	}
+
+	ret = 0;
+out:
+	close(fd);
+	return ret;
+}
+
+static int tcp_echo_test(int client_fd, int server_fd)
+{
+	int err;
+
+	err = send_byte(client_fd);
+	if (err)
+		return -1;
+	err = tcp_recv_send(server_fd);
+	if (err)
+		return -1;
+	err = recv_byte(client_fd);
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static int udp_echo_test(int client_fd, int server_fd)
+{
+	int err;
+
+	err = send_byte(client_fd);
+	if (err)
+		return -1;
+	err = udp_recv_send(server_fd);
+	if (err)
+		return -1;
+	err = recv_byte(client_fd);
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static struct bpf_link *attach_lookup_prog(struct bpf_program *prog)
+{
+	struct bpf_link *link;
+	int net_fd;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK(net_fd < 0, "open", "failed\n")) {
+		log_err("failed to open /proc/self/ns/net");
+		return NULL;
+	}
+
+	link = bpf_program__attach_netns(prog, net_fd);
+	if (CHECK(IS_ERR(link), "bpf_program__attach_netns", "failed\n")) {
+		errno = -PTR_ERR(link);
+		log_err("failed to attach program '%s' to netns",
+			bpf_program__name(prog));
+		link = NULL;
+	}
+
+	close(net_fd);
+	return link;
+}
+
+static int update_lookup_map(struct bpf_map *map, int index, int sock_fd)
+{
+	int err, map_fd;
+	uint64_t value;
+
+	map_fd = bpf_map__fd(map);
+	if (CHECK(map_fd < 0, "bpf_map__fd", "failed\n")) {
+		errno = -map_fd;
+		log_err("failed to get map FD");
+		return -1;
+	}
+
+	value = (uint64_t)sock_fd;
+	err = bpf_map_update_elem(map_fd, &index, &value, BPF_NOEXIST);
+	if (CHECK(err, "bpf_map_update_elem", "failed\n")) {
+		log_err("failed to update redir_map @ %d", index);
+		return -1;
+	}
+
+	return 0;
+}
+
+static __u32 link_info_prog_id(struct bpf_link *link)
+{
+	struct bpf_link_info info = {};
+	__u32 info_len = sizeof(info);
+	int link_fd, err;
+
+	link_fd = bpf_link__fd(link);
+	if (CHECK(link_fd < 0, "bpf_link__fd", "failed\n")) {
+		errno = -link_fd;
+		log_err("bpf_link__fd failed");
+		return 0;
+	}
+
+	err = bpf_obj_get_info_by_fd(link_fd, &info, &info_len);
+	if (CHECK(err, "bpf_obj_get_info_by_fd", "failed\n")) {
+		log_err("bpf_obj_get_info_by_fd");
+		return 0;
+	}
+	if (CHECK(info_len != sizeof(info), "bpf_obj_get_info_by_fd",
+		  "unexpected info len %u\n", info_len))
+		return 0;
+
+	return info.prog_id;
+}
+
+static void query_lookup_prog(struct test_sk_lookup *skel)
+{
+	struct bpf_link *link[3] = {};
+	__u32 attach_flags = 0;
+	__u32 prog_ids[3] = {};
+	__u32 prog_cnt = 3;
+	__u32 prog_id;
+	int net_fd;
+	int err;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK(net_fd < 0, "open", "failed\n")) {
+		log_err("failed to open /proc/self/ns/net");
+		return;
+	}
+
+	link[0] = attach_lookup_prog(skel->progs.lookup_pass);
+	if (!link[0])
+		goto close;
+	link[1] = attach_lookup_prog(skel->progs.lookup_pass);
+	if (!link[1])
+		goto detach;
+	link[2] = attach_lookup_prog(skel->progs.lookup_drop);
+	if (!link[2])
+		goto detach;
+
+	err = bpf_prog_query(net_fd, BPF_SK_LOOKUP, 0 /* query flags */,
+			     &attach_flags, prog_ids, &prog_cnt);
+	if (CHECK(err, "bpf_prog_query", "failed\n")) {
+		log_err("failed to query lookup prog");
+		goto detach;
+	}
+
+	errno = 0;
+	if (CHECK(attach_flags != 0, "bpf_prog_query",
+		  "wrong attach_flags on query: %u", attach_flags))
+		goto detach;
+	if (CHECK(prog_cnt != 3, "bpf_prog_query",
+		  "wrong program count on query: %u", prog_cnt))
+		goto detach;
+	prog_id = link_info_prog_id(link[0]);
+	CHECK(prog_ids[0] != prog_id, "bpf_prog_query",
+	      "invalid program #0 id on query: %u != %u\n",
+	      prog_ids[0], prog_id);
+	prog_id = link_info_prog_id(link[1]);
+	CHECK(prog_ids[1] != prog_id, "bpf_prog_query",
+	      "invalid program #1 id on query: %u != %u\n",
+	      prog_ids[1], prog_id);
+	prog_id = link_info_prog_id(link[2]);
+	CHECK(prog_ids[2] != prog_id, "bpf_prog_query",
+	      "invalid program #2 id on query: %u != %u\n",
+	      prog_ids[2], prog_id);
+
+detach:
+	if (link[2])
+		bpf_link__destroy(link[2]);
+	if (link[1])
+		bpf_link__destroy(link[1]);
+	if (link[0])
+		bpf_link__destroy(link[0]);
+close:
+	close(net_fd);
+}
+
+static void run_lookup_prog(const struct test *t)
+{
+	int client_fd, server_fds[MAX_SERVERS] = { -1 };
+	struct bpf_link *lookup_link;
+	int i, err;
+
+	lookup_link = attach_lookup_prog(t->lookup_prog);
+	if (!lookup_link)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		server_fds[i] = make_server(t->sotype, t->listen_at.ip,
+					    t->listen_at.port,
+					    t->reuseport_prog);
+		if (server_fds[i] < 0)
+			goto close;
+
+		err = update_lookup_map(t->sock_map, i, server_fds[i]);
+		if (err)
+			goto close;
+
+		/* want just one server for non-reuseport test */
+		if (!t->reuseport_prog)
+			break;
+	}
+
+	client_fd = make_client(t->sotype, t->connect_to.ip, t->connect_to.port);
+	if (client_fd < 0)
+		goto close;
+
+	if (t->sotype == SOCK_STREAM)
+		tcp_echo_test(client_fd, server_fds[t->accept_on]);
+	else
+		udp_echo_test(client_fd, server_fds[t->accept_on]);
+
+	close(client_fd);
+close:
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		if (server_fds[i] != -1)
+			close(server_fds[i]);
+	}
+	bpf_link__destroy(lookup_link);
+}
+
+static void test_redirect_lookup(struct test_sk_lookup *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { EXT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4 redir addr",
+			.lookup_prog	= skel->progs.redir_ip4,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4 redir with reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_B,
+		},
+		{
+			.desc		= "TCP IPv4 redir skip reuseport",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_A,
+		},
+		{
+			.desc		= "TCP IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { EXT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 redir addr",
+			.lookup_prog	= skel->progs.redir_ip6,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4->IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4_V6, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 redir with reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+			.accept_on	= SERVER_B,
+		},
+		{
+			.desc		= "TCP IPv6 redir skip reuseport",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+			.accept_on	= SERVER_A,
+		},
+		{
+			.desc		= "UDP IPv4 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { EXT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 redir addr",
+			.lookup_prog	= skel->progs.redir_ip4,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 redir with reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_B,
+		},
+		{
+			.desc		= "UDP IPv4 redir skip reuseport",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_A,
+		},
+		{
+			.desc		= "UDP IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { EXT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 redir addr",
+			.lookup_prog	= skel->progs.redir_ip6,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4->IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.listen_at	= { INT_IP4_V6, INT_PORT },
+			.connect_to	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+			.accept_on	= SERVER_B,
+		},
+		{
+			.desc		= "UDP IPv6 redir skip reuseport",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+			.accept_on	= SERVER_A,
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			run_lookup_prog(t);
+	}
+}
+
+static void drop_on_lookup(const struct test *t)
+{
+	struct sockaddr_storage dst = {};
+	int client_fd, server_fd, err;
+	struct bpf_link *lookup_link;
+	ssize_t n;
+
+	lookup_link = attach_lookup_prog(t->lookup_prog);
+	if (!lookup_link)
+		return;
+
+	server_fd = make_server(t->sotype, t->listen_at.ip, t->listen_at.port,
+				t->reuseport_prog);
+	if (server_fd < 0)
+		goto detach;
+
+	client_fd = make_socket(t->sotype, t->connect_to.ip,
+				t->connect_to.port, &dst);
+	if (client_fd < 0)
+		goto close_srv;
+
+	err = connect(client_fd, (void *)&dst, inetaddr_len(&dst));
+	if (t->sotype == SOCK_DGRAM) {
+		err = send_byte(client_fd);
+		if (err)
+			goto close_all;
+
+		/* Read out asynchronous error */
+		n = recv(client_fd, NULL, 0, 0);
+		err = n == -1;
+	}
+	if (CHECK(!err || errno != ECONNREFUSED, "connect",
+		  "unexpected success or error\n"))
+		log_err("expected ECONNREFUSED on connect");
+
+close_all:
+	close(client_fd);
+close_srv:
+	close(server_fd);
+detach:
+	bpf_link__destroy(lookup_link);
+}
+
+static void test_drop_on_lookup(struct test_sk_lookup *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { EXT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { EXT_IP6, INT_PORT },
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			drop_on_lookup(t);
+	}
+}
+
+static void drop_on_reuseport(const struct test *t)
+{
+	struct sockaddr_storage dst = { 0 };
+	int client, server1, server2, err;
+	struct bpf_link *lookup_link;
+	ssize_t n;
+
+	lookup_link = attach_lookup_prog(t->lookup_prog);
+	if (!lookup_link)
+		return;
+
+	server1 = make_server(t->sotype, t->listen_at.ip, t->listen_at.port,
+			      t->reuseport_prog);
+	if (server1 < 0)
+		goto detach;
+
+	err = update_lookup_map(t->sock_map, SERVER_A, server1);
+	if (err)
+		goto detach;
+
+	/* second server on destination address we should never reach */
+	server2 = make_server(t->sotype, t->connect_to.ip, t->connect_to.port,
+			      NULL /* reuseport prog */);
+	if (server2 < 0)
+		goto close_srv1;
+
+	client = make_socket(t->sotype, t->connect_to.ip,
+			     t->connect_to.port, &dst);
+	if (client < 0)
+		goto close_srv2;
+
+	err = connect(client, (void *)&dst, inetaddr_len(&dst));
+	if (t->sotype == SOCK_DGRAM) {
+		err = send_byte(client);
+		if (err)
+			goto close_all;
+
+		/* Read out asynchronous error */
+		n = recv(client, NULL, 0, 0);
+		err = n == -1;
+	}
+	if (CHECK(!err || errno != ECONNREFUSED, "connect",
+		  "unexpected success or error\n"))
+		log_err("expected ECONNREFUSED on connect");
+
+close_all:
+	close(client);
+close_srv2:
+	close(server2);
+close_srv1:
+	close(server1);
+detach:
+	bpf_link__destroy(lookup_link);
+}
+
+static void test_drop_on_reuseport(struct test_sk_lookup *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			drop_on_reuseport(t);
+	}
+}
+
+static void run_sk_assign(struct test_sk_lookup *skel,
+			  struct bpf_program *lookup_prog,
+			  const char *listen_ip, const char *connect_ip)
+{
+	int client_fd, peer_fd, server_fds[MAX_SERVERS] = { -1 };
+	struct bpf_link *lookup_link;
+	int i, err;
+
+	lookup_link = attach_lookup_prog(lookup_prog);
+	if (!lookup_link)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		server_fds[i] = make_server(SOCK_STREAM, listen_ip, 0, NULL);
+		if (server_fds[i] < 0)
+			goto close_servers;
+
+		err = update_lookup_map(skel->maps.redir_map, i,
+					server_fds[i]);
+		if (err)
+			goto close_servers;
+	}
+
+	client_fd = make_client(SOCK_STREAM, connect_ip, EXT_PORT);
+	if (client_fd < 0)
+		goto close_servers;
+
+	peer_fd = accept(server_fds[SERVER_B], NULL, NULL);
+	if (CHECK(peer_fd < 0, "accept", "failed\n"))
+		goto close_client;
+
+	close(peer_fd);
+close_client:
+	close(client_fd);
+close_servers:
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		if (server_fds[i] != -1)
+			close(server_fds[i]);
+	}
+	bpf_link__destroy(lookup_link);
+}
+
+static void run_sk_assign_v4(struct test_sk_lookup *skel,
+			     struct bpf_program *lookup_prog)
+{
+	run_sk_assign(skel, lookup_prog, INT_IP4, EXT_IP4);
+}
+
+static void run_sk_assign_v6(struct test_sk_lookup *skel,
+			     struct bpf_program *lookup_prog)
+{
+	run_sk_assign(skel, lookup_prog, INT_IP6, EXT_IP6);
+}
+
+static void run_sk_assign_connected(struct test_sk_lookup *skel,
+				    int sotype)
+{
+	int err, client_fd, connected_fd, server_fd;
+	struct bpf_link *lookup_link;
+
+	server_fd = make_server(sotype, EXT_IP4, EXT_PORT, NULL);
+	if (server_fd < 0)
+		return;
+
+	connected_fd = make_client(sotype, EXT_IP4, EXT_PORT);
+	if (connected_fd < 0)
+		goto out_close_server;
+
+	/* Put a connected socket in redirect map */
+	err = update_lookup_map(skel->maps.redir_map, SERVER_A, connected_fd);
+	if (err)
+		goto out_close_connected;
+
+	lookup_link = attach_lookup_prog(skel->progs.sk_assign_esocknosupport);
+	if (!lookup_link)
+		goto out_close_connected;
+
+	/* Try to redirect TCP SYN / UDP packet to a connected socket */
+	client_fd = make_client(sotype, EXT_IP4, EXT_PORT);
+	if (client_fd < 0)
+		goto out_unlink_prog;
+	if (sotype == SOCK_DGRAM) {
+		send_byte(client_fd);
+		recv_byte(server_fd);
+	}
+
+	close(client_fd);
+out_unlink_prog:
+	bpf_link__destroy(lookup_link);
+out_close_connected:
+	close(connected_fd);
+out_close_server:
+	close(server_fd);
+}
+
+static void test_sk_assign_helper(struct test_sk_lookup *skel)
+{
+	if (test__start_subtest("sk_assign returns EEXIST"))
+		run_sk_assign_v4(skel, skel->progs.sk_assign_eexist);
+	if (test__start_subtest("sk_assign honors F_REPLACE"))
+		run_sk_assign_v4(skel, skel->progs.sk_assign_replace_flag);
+	if (test__start_subtest("sk_assign accepts NULL socket"))
+		run_sk_assign_v4(skel, skel->progs.sk_assign_null);
+	if (test__start_subtest("access ctx->sk"))
+		run_sk_assign_v4(skel, skel->progs.access_ctx_sk);
+	if (test__start_subtest("narrow access to ctx v4"))
+		run_sk_assign_v4(skel, skel->progs.ctx_narrow_access);
+	if (test__start_subtest("narrow access to ctx v6"))
+		run_sk_assign_v6(skel, skel->progs.ctx_narrow_access);
+	if (test__start_subtest("sk_assign rejects TCP established"))
+		run_sk_assign_connected(skel, SOCK_STREAM);
+	if (test__start_subtest("sk_assign rejects UDP connected"))
+		run_sk_assign_connected(skel, SOCK_DGRAM);
+}
+
+struct test_multi_prog {
+	const char *desc;
+	struct bpf_program *prog1;
+	struct bpf_program *prog2;
+	struct bpf_map *redir_map;
+	struct bpf_map *run_map;
+	int expect_errno;
+	struct inet_addr listen_at;
+};
+
+static void run_multi_prog_lookup(const struct test_multi_prog *t)
+{
+	struct sockaddr_storage dst = {};
+	int map_fd, server_fd, client_fd;
+	struct bpf_link *link1, *link2;
+	int prog_idx, done, err;
+
+	map_fd = bpf_map__fd(t->run_map);
+
+	done = 0;
+	prog_idx = PROG1;
+	err = bpf_map_update_elem(map_fd, &prog_idx, &done, BPF_ANY);
+	if (CHECK(err, "bpf_map_update_elem", "failed\n"))
+		return;
+	prog_idx = PROG2;
+	err = bpf_map_update_elem(map_fd, &prog_idx, &done, BPF_ANY);
+	if (CHECK(err, "bpf_map_update_elem", "failed\n"))
+		return;
+
+	link1 = attach_lookup_prog(t->prog1);
+	if (!link1)
+		return;
+	link2 = attach_lookup_prog(t->prog2);
+	if (!link2)
+		goto out_unlink1;
+
+	server_fd = make_server(SOCK_STREAM, t->listen_at.ip,
+				t->listen_at.port, NULL);
+	if (server_fd < 0)
+		goto out_unlink2;
+
+	err = update_lookup_map(t->redir_map, SERVER_A, server_fd);
+	if (err)
+		goto out_close_server;
+
+	client_fd = make_socket(SOCK_STREAM, EXT_IP4, EXT_PORT, &dst);
+	if (client_fd < 0)
+		goto out_close_server;
+
+	err = connect(client_fd, (void *)&dst, inetaddr_len(&dst));
+	if (CHECK(err && !t->expect_errno, "connect",
+		  "unexpected error %d\n", errno))
+		goto out_close_client;
+	if (CHECK(err && t->expect_errno && errno != t->expect_errno,
+		  "connect", "unexpected error %d\n", errno))
+		goto out_close_client;
+
+	done = 0;
+	prog_idx = PROG1;
+	err = bpf_map_lookup_elem(map_fd, &prog_idx, &done);
+	CHECK(err, "bpf_map_lookup_elem", "failed\n");
+	CHECK(!done, "bpf_map_lookup_elem", "PROG1 !done\n");
+
+	done = 0;
+	prog_idx = PROG2;
+	err = bpf_map_lookup_elem(map_fd, &prog_idx, &done);
+	CHECK(err, "bpf_map_lookup_elem", "failed\n");
+	CHECK(!done, "bpf_map_lookup_elem", "PROG2 !done\n");
+
+out_close_client:
+	close(client_fd);
+out_close_server:
+	close(server_fd);
+out_unlink2:
+	bpf_link__destroy(link2);
+out_unlink1:
+	bpf_link__destroy(link1);
+}
+
+static void test_multi_prog_lookup(struct test_sk_lookup *skel)
+{
+	struct test_multi_prog tests[] = {
+		{
+			.desc		= "multi prog - pass, pass",
+			.prog1		= skel->progs.multi_prog_pass1,
+			.prog2		= skel->progs.multi_prog_pass2,
+			.listen_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "multi prog - drop, drop",
+			.prog1		= skel->progs.multi_prog_drop1,
+			.prog2		= skel->progs.multi_prog_drop2,
+			.listen_at	= { EXT_IP4, EXT_PORT },
+			.expect_errno	= ECONNREFUSED,
+		},
+		{
+			.desc		= "multi prog - pass, drop",
+			.prog1		= skel->progs.multi_prog_pass1,
+			.prog2		= skel->progs.multi_prog_drop2,
+			.listen_at	= { EXT_IP4, EXT_PORT },
+			.expect_errno	= ECONNREFUSED,
+		},
+		{
+			.desc		= "multi prog - drop, pass",
+			.prog1		= skel->progs.multi_prog_drop1,
+			.prog2		= skel->progs.multi_prog_pass2,
+			.listen_at	= { EXT_IP4, EXT_PORT },
+			.expect_errno	= ECONNREFUSED,
+		},
+		{
+			.desc		= "multi prog - pass, redir",
+			.prog1		= skel->progs.multi_prog_pass1,
+			.prog2		= skel->progs.multi_prog_redir2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "multi prog - redir, pass",
+			.prog1		= skel->progs.multi_prog_redir1,
+			.prog2		= skel->progs.multi_prog_pass2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "multi prog - drop, redir",
+			.prog1		= skel->progs.multi_prog_drop1,
+			.prog2		= skel->progs.multi_prog_redir2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "multi prog - redir, drop",
+			.prog1		= skel->progs.multi_prog_redir1,
+			.prog2		= skel->progs.multi_prog_drop2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "multi prog - redir, redir",
+			.prog1		= skel->progs.multi_prog_redir1,
+			.prog2		= skel->progs.multi_prog_redir2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+	};
+	struct test_multi_prog *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		t->redir_map = skel->maps.redir_map;
+		t->run_map = skel->maps.run_map;
+		if (test__start_subtest(t->desc))
+			run_multi_prog_lookup(t);
+	}
+}
+
+static void run_tests(struct test_sk_lookup *skel)
+{
+	if (test__start_subtest("query lookup prog"))
+		query_lookup_prog(skel);
+	test_redirect_lookup(skel);
+	test_drop_on_lookup(skel);
+	test_drop_on_reuseport(skel);
+	test_sk_assign_helper(skel);
+	test_multi_prog_lookup(skel);
+}
+
+static int switch_netns(void)
+{
+	static const char * const setup_script[] = {
+		"ip -6 addr add dev lo " EXT_IP6 "/128 nodad",
+		"ip -6 addr add dev lo " INT_IP6 "/128 nodad",
+		"ip link set dev lo up",
+		NULL,
+	};
+	const char * const *cmd;
+	int err;
+
+	err = unshare(CLONE_NEWNET);
+	if (CHECK(err, "unshare", "failed\n")) {
+		log_err("unshare(CLONE_NEWNET)");
+		return -1;
+	}
+
+	for (cmd = setup_script; *cmd; cmd++) {
+		err = system(*cmd);
+		if (CHECK(err, "system", "failed\n")) {
+			log_err("system(%s)", *cmd);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+void test_sk_lookup(void)
+{
+	struct test_sk_lookup *skel;
+	int err;
+
+	err = switch_netns();
+	if (err)
+		return;
+
+	skel = test_sk_lookup__open_and_load();
+	if (CHECK(!skel, "skel open_and_load", "failed\n"))
+		return;
+
+	run_tests(skel);
+
+	test_sk_lookup__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup.c b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
new file mode 100644
index 000000000000..bbf8296f4d66
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup.c
@@ -0,0 +1,641 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2020 Cloudflare
+
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <linux/bpf.h>
+#include <linux/in.h>
+#include <sys/socket.h>
+
+#include <bpf/bpf_endian.h>
+#include <bpf/bpf_helpers.h>
+
+#define IP4(a, b, c, d)					\
+	bpf_htonl((((__u32)(a) & 0xffU) << 24) |	\
+		  (((__u32)(b) & 0xffU) << 16) |	\
+		  (((__u32)(c) & 0xffU) <<  8) |	\
+		  (((__u32)(d) & 0xffU) <<  0))
+#define IP6(aaaa, bbbb, cccc, dddd)			\
+	{ bpf_htonl(aaaa), bpf_htonl(bbbb), bpf_htonl(cccc), bpf_htonl(dddd) }
+
+#define MAX_SOCKS 32
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, MAX_SOCKS);
+	__type(key, __u32);
+	__type(value, __u64);
+} redir_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 2);
+	__type(key, int);
+	__type(value, int);
+} run_map SEC(".maps");
+
+enum {
+	PROG1 = 0,
+	PROG2,
+};
+
+enum {
+	SERVER_A = 0,
+	SERVER_B,
+};
+
+/* Addressable key/value constants for convenience */
+static const int KEY_PROG1 = PROG1;
+static const int KEY_PROG2 = PROG2;
+static const int PROG_DONE = 1;
+
+static const __u32 KEY_SERVER_A = SERVER_A;
+static const __u32 KEY_SERVER_B = SERVER_B;
+
+static const __u16 DST_PORT = 7007; /* Host byte order */
+static const __u32 DST_IP4 = IP4(127, 0, 0, 1);
+static const __u32 DST_IP6[] = IP6(0xfd000000, 0x0, 0x0, 0x00000001);
+
+SEC("sk_lookup/lookup_pass")
+int lookup_pass(struct bpf_sk_lookup *ctx)
+{
+	return SK_PASS;
+}
+
+SEC("sk_lookup/lookup_drop")
+int lookup_drop(struct bpf_sk_lookup *ctx)
+{
+	return SK_DROP;
+}
+
+SEC("sk_reuseport/reuse_pass")
+int reuseport_pass(struct sk_reuseport_md *ctx)
+{
+	return SK_PASS;
+}
+
+SEC("sk_reuseport/reuse_drop")
+int reuseport_drop(struct sk_reuseport_md *ctx)
+{
+	return SK_DROP;
+}
+
+/* Redirect packets destined for port DST_PORT to socket at redir_map[0]. */
+SEC("sk_lookup/redir_port")
+int redir_port(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->local_port != DST_PORT)
+		return SK_PASS;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_PASS;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+/* Redirect packets destined for DST_IP4 address to socket at redir_map[0]. */
+SEC("sk_lookup/redir_ip4")
+int redir_ip4(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->family != AF_INET)
+		return SK_PASS;
+	if (ctx->local_port != DST_PORT)
+		return SK_PASS;
+	if (ctx->local_ip4 != DST_IP4)
+		return SK_PASS;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_PASS;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+/* Redirect packets destined for DST_IP6 address to socket at redir_map[0]. */
+SEC("sk_lookup/redir_ip6")
+int redir_ip6(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->family != AF_INET6)
+		return SK_PASS;
+	if (ctx->local_port != DST_PORT)
+		return SK_PASS;
+	if (ctx->local_ip6[0] != DST_IP6[0] ||
+	    ctx->local_ip6[1] != DST_IP6[1] ||
+	    ctx->local_ip6[2] != DST_IP6[2] ||
+	    ctx->local_ip6[3] != DST_IP6[3])
+		return SK_PASS;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_PASS;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+SEC("sk_lookup/select_sock_a")
+int select_sock_a(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_PASS;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+SEC("sk_lookup/select_sock_a_no_reuseport")
+int select_sock_a_no_reuseport(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_DROP;
+
+	err = bpf_sk_assign(ctx, sk, BPF_SK_LOOKUP_F_NO_REUSEPORT);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+SEC("sk_reuseport/select_sock_b")
+int select_sock_b(struct sk_reuseport_md *ctx)
+{
+	__u32 key = KEY_SERVER_B;
+	int err;
+
+	err = bpf_sk_select_reuseport(ctx, &redir_map, &key, 0);
+	return err ? SK_DROP : SK_PASS;
+}
+
+/* Check that bpf_sk_assign() returns -EEXIST if socket already selected. */
+SEC("sk_lookup/sk_assign_eexist")
+int sk_assign_eexist(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err, ret;
+
+	ret = SK_DROP;
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, 0);
+	if (err)
+		goto out;
+	bpf_sk_release(sk);
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, 0);
+	if (err != -EEXIST) {
+		bpf_printk("sk_assign returned %d, expected %d\n",
+			   err, -EEXIST);
+		goto out;
+	}
+
+	ret = SK_PASS; /* Success, redirect to KEY_SERVER_B */
+out:
+	if (sk)
+		bpf_sk_release(sk);
+	return ret;
+}
+
+/* Check that bpf_sk_assign(BPF_SK_LOOKUP_F_REPLACE) can override selection. */
+SEC("sk_lookup/sk_assign_replace_flag")
+int sk_assign_replace_flag(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err, ret;
+
+	ret = SK_DROP;
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, 0);
+	if (err)
+		goto out;
+	bpf_sk_release(sk);
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, BPF_SK_LOOKUP_F_REPLACE);
+	if (err) {
+		bpf_printk("sk_assign returned %d, expected 0\n", err);
+		goto out;
+	}
+
+	ret = SK_PASS; /* Success, redirect to KEY_SERVER_B */
+out:
+	if (sk)
+		bpf_sk_release(sk);
+	return ret;
+}
+
+/* Check that bpf_sk_assign(sk=NULL) is accepted. */
+SEC("sk_lookup/sk_assign_null")
+int sk_assign_null(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk = NULL;
+	int err, ret;
+
+	ret = SK_DROP;
+
+	err = bpf_sk_assign(ctx, NULL, 0);
+	if (err) {
+		bpf_printk("sk_assign returned %d, expected 0\n", err);
+		goto out;
+	}
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, BPF_SK_LOOKUP_F_REPLACE);
+	if (err) {
+		bpf_printk("sk_assign returned %d, expected 0\n", err);
+		goto out;
+	}
+
+	if (ctx->sk != sk)
+		goto out;
+	err = bpf_sk_assign(ctx, NULL, 0);
+	if (err != -EEXIST)
+		goto out;
+	err = bpf_sk_assign(ctx, NULL, BPF_SK_LOOKUP_F_REPLACE);
+	if (err)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, BPF_SK_LOOKUP_F_REPLACE);
+	if (err)
+		goto out;
+
+	ret = SK_PASS; /* Success, redirect to KEY_SERVER_B */
+out:
+	if (sk)
+		bpf_sk_release(sk);
+	return ret;
+}
+
+/* Check that selected sk is accessible through context. */
+SEC("sk_lookup/access_ctx_sk")
+int access_ctx_sk(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk1 = NULL, *sk2 = NULL;
+	int err, ret;
+
+	ret = SK_DROP;
+
+	/* Try accessing unassigned (NULL) ctx->sk field */
+	if (ctx->sk && ctx->sk->family != AF_INET)
+		goto out;
+
+	/* Assign a value to ctx->sk */
+	sk1 = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk1)
+		goto out;
+	err = bpf_sk_assign(ctx, sk1, 0);
+	if (err)
+		goto out;
+	if (ctx->sk != sk1)
+		goto out;
+
+	/* Access ctx->sk fields */
+	if (ctx->sk->family != AF_INET ||
+	    ctx->sk->type != SOCK_STREAM ||
+	    ctx->sk->state != BPF_TCP_LISTEN)
+		goto out;
+
+	/* Reset selection */
+	err = bpf_sk_assign(ctx, NULL, BPF_SK_LOOKUP_F_REPLACE);
+	if (err)
+		goto out;
+	if (ctx->sk)
+		goto out;
+
+	/* Assign another socket */
+	sk2 = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (!sk2)
+		goto out;
+	err = bpf_sk_assign(ctx, sk2, BPF_SK_LOOKUP_F_REPLACE);
+	if (err)
+		goto out;
+	if (ctx->sk != sk2)
+		goto out;
+
+	/* Access reassigned ctx->sk fields */
+	if (ctx->sk->family != AF_INET ||
+	    ctx->sk->type != SOCK_STREAM ||
+	    ctx->sk->state != BPF_TCP_LISTEN)
+		goto out;
+
+	ret = SK_PASS; /* Success, redirect to KEY_SERVER_B */
+out:
+	if (sk1)
+		bpf_sk_release(sk1);
+	if (sk2)
+		bpf_sk_release(sk2);
+	return ret;
+}
+
+/* Check narrow loads from ctx fields that support them.
+ *
+ * Narrow loads of size >= target field size from a non-zero offset
+ * are not covered because they give bogus results, that is the
+ * verifier ignores the offset.
+ */
+SEC("sk_lookup/ctx_narrow_access")
+int ctx_narrow_access(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err, family;
+	__u16 *half;
+	__u8 *byte;
+	bool v4;
+
+	v4 = (ctx->family == AF_INET);
+
+	/* Narrow loads from family field */
+	byte = (__u8 *)&ctx->family;
+	half = (__u16 *)&ctx->family;
+	if (byte[0] != (v4 ? AF_INET : AF_INET6) ||
+	    byte[1] != 0 || byte[2] != 0 || byte[3] != 0)
+		return SK_DROP;
+	if (half[0] != (v4 ? AF_INET : AF_INET6))
+		return SK_DROP;
+
+	byte = (__u8 *)&ctx->protocol;
+	if (byte[0] != IPPROTO_TCP ||
+	    byte[1] != 0 || byte[2] != 0 || byte[3] != 0)
+		return SK_DROP;
+	half = (__u16 *)&ctx->protocol;
+	if (half[0] != IPPROTO_TCP)
+		return SK_DROP;
+
+	/* Narrow loads from remote_port field. Expect non-0 value. */
+	byte = (__u8 *)&ctx->remote_port;
+	if (byte[0] == 0 && byte[1] == 0 && byte[2] == 0 && byte[3] == 0)
+		return SK_DROP;
+	half = (__u16 *)&ctx->remote_port;
+	if (half[0] == 0)
+		return SK_DROP;
+
+	/* Narrow loads from local_port field. Expect DST_PORT. */
+	byte = (__u8 *)&ctx->local_port;
+	if (byte[0] != ((DST_PORT >> 0) & 0xff) ||
+	    byte[1] != ((DST_PORT >> 8) & 0xff) ||
+	    byte[2] != 0 || byte[3] != 0)
+		return SK_DROP;
+	half = (__u16 *)&ctx->local_port;
+	if (half[0] != DST_PORT)
+		return SK_DROP;
+
+	/* Narrow loads from IPv4 fields */
+	if (v4) {
+		/* Expect non-0.0.0.0 in remote_ip4 */
+		byte = (__u8 *)&ctx->remote_ip4;
+		if (byte[0] == 0 && byte[1] == 0 &&
+		    byte[2] == 0 && byte[3] == 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->remote_ip4;
+		if (half[0] == 0 && half[1] == 0)
+			return SK_DROP;
+
+		/* Expect DST_IP4 in local_ip4 */
+		byte = (__u8 *)&ctx->local_ip4;
+		if (byte[0] != ((DST_IP4 >>  0) & 0xff) ||
+		    byte[1] != ((DST_IP4 >>  8) & 0xff) ||
+		    byte[2] != ((DST_IP4 >> 16) & 0xff) ||
+		    byte[3] != ((DST_IP4 >> 24) & 0xff))
+			return SK_DROP;
+		half = (__u16 *)&ctx->local_ip4;
+		if (half[0] != ((DST_IP4 >>  0) & 0xffff) ||
+		    half[1] != ((DST_IP4 >> 16) & 0xffff))
+			return SK_DROP;
+	} else {
+		/* Expect 0.0.0.0 IPs when family != AF_INET */
+		byte = (__u8 *)&ctx->remote_ip4;
+		if (byte[0] != 0 || byte[1] != 0 &&
+		    byte[2] != 0 || byte[3] != 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->remote_ip4;
+		if (half[0] != 0 || half[1] != 0)
+			return SK_DROP;
+
+		byte = (__u8 *)&ctx->local_ip4;
+		if (byte[0] != 0 || byte[1] != 0 &&
+		    byte[2] != 0 || byte[3] != 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->local_ip4;
+		if (half[0] != 0 || half[1] != 0)
+			return SK_DROP;
+	}
+
+	/* Narrow loads from IPv6 fields */
+	if (!v4) {
+		/* Expenct non-:: IP in remote_ip6 */
+		byte = (__u8 *)&ctx->remote_ip6;
+		if (byte[0] == 0 && byte[1] == 0 &&
+		    byte[2] == 0 && byte[3] == 0 &&
+		    byte[4] == 0 && byte[5] == 0 &&
+		    byte[6] == 0 && byte[7] == 0 &&
+		    byte[8] == 0 && byte[9] == 0 &&
+		    byte[10] == 0 && byte[11] == 0 &&
+		    byte[12] == 0 && byte[13] == 0 &&
+		    byte[14] == 0 && byte[15] == 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->remote_ip6;
+		if (half[0] == 0 && half[1] == 0 &&
+		    half[2] == 0 && half[3] == 0 &&
+		    half[4] == 0 && half[5] == 0 &&
+		    half[6] == 0 && half[7] == 0)
+			return SK_DROP;
+
+		/* Expect DST_IP6 in local_ip6 */
+		byte = (__u8 *)&ctx->local_ip6;
+		if (byte[0] != ((DST_IP6[0] >>  0) & 0xff) ||
+		    byte[1] != ((DST_IP6[0] >>  8) & 0xff) ||
+		    byte[2] != ((DST_IP6[0] >> 16) & 0xff) ||
+		    byte[3] != ((DST_IP6[0] >> 24) & 0xff) ||
+		    byte[4] != ((DST_IP6[1] >>  0) & 0xff) ||
+		    byte[5] != ((DST_IP6[1] >>  8) & 0xff) ||
+		    byte[6] != ((DST_IP6[1] >> 16) & 0xff) ||
+		    byte[7] != ((DST_IP6[1] >> 24) & 0xff) ||
+		    byte[8] != ((DST_IP6[2] >>  0) & 0xff) ||
+		    byte[9] != ((DST_IP6[2] >>  8) & 0xff) ||
+		    byte[10] != ((DST_IP6[2] >> 16) & 0xff) ||
+		    byte[11] != ((DST_IP6[2] >> 24) & 0xff) ||
+		    byte[12] != ((DST_IP6[3] >>  0) & 0xff) ||
+		    byte[13] != ((DST_IP6[3] >>  8) & 0xff) ||
+		    byte[14] != ((DST_IP6[3] >> 16) & 0xff) ||
+		    byte[15] != ((DST_IP6[3] >> 24) & 0xff))
+			return SK_DROP;
+		half = (__u16 *)&ctx->local_ip6;
+		if (half[0] != ((DST_IP6[0] >>  0) & 0xffff) ||
+		    half[1] != ((DST_IP6[0] >> 16) & 0xffff) ||
+		    half[2] != ((DST_IP6[1] >>  0) & 0xffff) ||
+		    half[3] != ((DST_IP6[1] >> 16) & 0xffff) ||
+		    half[4] != ((DST_IP6[2] >>  0) & 0xffff) ||
+		    half[5] != ((DST_IP6[2] >> 16) & 0xffff) ||
+		    half[6] != ((DST_IP6[3] >>  0) & 0xffff) ||
+		    half[7] != ((DST_IP6[3] >> 16) & 0xffff))
+			return SK_DROP;
+	} else {
+		/* Expect :: IPs when family != AF_INET6 */
+		byte = (__u8 *)&ctx->remote_ip6;
+		if (byte[0] != 0 || byte[1] != 0 ||
+		    byte[2] != 0 || byte[3] != 0 ||
+		    byte[4] != 0 || byte[5] != 0 ||
+		    byte[6] != 0 || byte[7] != 0 ||
+		    byte[8] != 0 || byte[9] != 0 ||
+		    byte[10] != 0 || byte[11] != 0 ||
+		    byte[12] != 0 || byte[13] != 0 ||
+		    byte[14] != 0 || byte[15] != 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->remote_ip6;
+		if (half[0] != 0 || half[1] != 0 ||
+		    half[2] != 0 || half[3] != 0 ||
+		    half[4] != 0 || half[5] != 0 ||
+		    half[6] != 0 || half[7] != 0)
+			return SK_DROP;
+
+		byte = (__u8 *)&ctx->local_ip6;
+		if (byte[0] != 0 || byte[1] != 0 ||
+		    byte[2] != 0 || byte[3] != 0 ||
+		    byte[4] != 0 || byte[5] != 0 ||
+		    byte[6] != 0 || byte[7] != 0 ||
+		    byte[8] != 0 || byte[9] != 0 ||
+		    byte[10] != 0 || byte[11] != 0 ||
+		    byte[12] != 0 || byte[13] != 0 ||
+		    byte[14] != 0 || byte[15] != 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->local_ip6;
+		if (half[0] != 0 || half[1] != 0 ||
+		    half[2] != 0 || half[3] != 0 ||
+		    half[4] != 0 || half[5] != 0 ||
+		    half[6] != 0 || half[7] != 0)
+			return SK_DROP;
+	}
+
+	/* Success, redirect to KEY_SERVER_B */
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (sk) {
+		bpf_sk_assign(ctx, sk, 0);
+		bpf_sk_release(sk);
+	}
+	return SK_PASS;
+}
+
+/* Check that sk_assign rejects SERVER_A socket with -ESOCKNOSUPPORT */
+SEC("sk_lookup/sk_assign_esocknosupport")
+int sk_assign_esocknosupport(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err, ret;
+
+	ret = SK_DROP;
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		goto out;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	if (err != -ESOCKTNOSUPPORT) {
+		bpf_printk("sk_assign returned %d, expected %d\n",
+			   err, -ESOCKTNOSUPPORT);
+		goto out;
+	}
+
+	ret = SK_PASS; /* Success, pass to regular lookup */
+out:
+	if (sk)
+		bpf_sk_release(sk);
+	return ret;
+}
+
+SEC("sk_lookup/multi_prog_pass1")
+int multi_prog_pass1(struct bpf_sk_lookup *ctx)
+{
+	bpf_map_update_elem(&run_map, &KEY_PROG1, &PROG_DONE, BPF_ANY);
+	return SK_PASS;
+}
+
+SEC("sk_lookup/multi_prog_pass2")
+int multi_prog_pass2(struct bpf_sk_lookup *ctx)
+{
+	bpf_map_update_elem(&run_map, &KEY_PROG2, &PROG_DONE, BPF_ANY);
+	return SK_PASS;
+}
+
+SEC("sk_lookup/multi_prog_drop1")
+int multi_prog_drop1(struct bpf_sk_lookup *ctx)
+{
+	bpf_map_update_elem(&run_map, &KEY_PROG1, &PROG_DONE, BPF_ANY);
+	return SK_DROP;
+}
+
+SEC("sk_lookup/multi_prog_drop2")
+int multi_prog_drop2(struct bpf_sk_lookup *ctx)
+{
+	bpf_map_update_elem(&run_map, &KEY_PROG2, &PROG_DONE, BPF_ANY);
+	return SK_DROP;
+}
+
+static __always_inline int select_server_a(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_DROP;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	if (err)
+		return SK_DROP;
+
+	return SK_PASS;
+}
+
+SEC("sk_lookup/multi_prog_redir1")
+int multi_prog_redir1(struct bpf_sk_lookup *ctx)
+{
+	int ret;
+
+	ret = select_server_a(ctx);
+	bpf_map_update_elem(&run_map, &KEY_PROG1, &PROG_DONE, BPF_ANY);
+	return SK_PASS;
+}
+
+SEC("sk_lookup/multi_prog_redir2")
+int multi_prog_redir2(struct bpf_sk_lookup *ctx)
+{
+	int ret;
+
+	ret = select_server_a(ctx);
+	bpf_map_update_elem(&run_map, &KEY_PROG2, &PROG_DONE, BPF_ANY);
+	return SK_PASS;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
+__u32 _version SEC("version") = 1;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (14 preceding siblings ...)
  2020-07-17 10:35 ` [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
@ 2020-07-17 16:40 ` Lorenz Bauer
  2020-07-18  3:25   ` Alexei Starovoitov
  2020-08-18 15:49 ` BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks Jakub Sitnicki
  16 siblings, 1 reply; 32+ messages in thread
From: Lorenz Bauer @ 2020-07-17 16:40 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

On Fri, 17 Jul 2020 at 11:35, Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Changelog
> =========
> v4 -> v5:
> - Enforce BPF prog return value to be SK_DROP or SK_PASS. (Andrii)
> - Simplify prog runners now that only SK_DROP/PASS can be returned.
> - Enable bpf_perf_event_output from the start. (Andrii)
> - Drop patch
>   "selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c"
> - Remove tests for narrow loads from context at an offset wider in size
>   than target field, while we are discussing how to fix it:
>   https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@cloudflare.com/
> - Rebase onto recent bpf-next (bfdfa51702de)
> - Other minor changes called out in per-patch changelogs,
>   see patches: 2, 4, 6, 13-15
> - Carried over Andrii's Acks where nothing changed.
>
> v3 -> v4:
> - Reduce BPF prog return codes to SK_DROP/SK_PASS (Lorenz)
> - Default to drop on illegal return value from BPF prog (Lorenz)
> - Extend bpf_sk_assign to accept NULL socket pointer.
> - Switch to saner return values and add docs for new prog_array API (Andrii)
> - Add support for narrow loads from BPF context fields (Yonghong)
> - Fix broken build when IPv6 is compiled as a module (kernel test robot)
> - Fix null/wild-ptr-deref on BPF context access
> - Rebase to recent bpf-next (eef8a42d6ce0)
> - Other minor changes called out in per-patch changelogs,
>   see patches 1-2, 4, 6, 8, 10-12, 14, 16
>
> v2 -> v3:
> - Switch to link-based program attachment
> - Support for multi-prog attachment
> - Ability to skip reuseport socket selection
> - Code on RX path is guarded by a static key
> - struct in6_addr's are no longer copied into BPF prog context
> - BPF prog context is initialized as late as possible
> - Changes called out in patches 1-2, 4, 6, 8, 10-14, 16
> - Patches dropped:
>   01/17 flow_dissector: Extract attach/detach/query helpers
>   03/17 inet: Store layer 4 protocol in inet_hashinfo
>   08/17 udp: Store layer 4 protocol in udp_table
>
> v1 -> v2:
> - Changes called out in patches 2, 13-15, 17
> - Rebase to recent bpf-next (b4563facdcae)
>
> RFCv2 -> v1:
>
> - Switch to fetching a socket from a map and selecting a socket with
>   bpf_sk_assign, instead of having a dedicated helper that does both.
> - Run reuseport logic on sockets selected by BPF sk_lookup.
> - Allow BPF sk_lookup to fail the lookup with no match.
> - Go back to having just 2 hash table lookups in UDP.
>
> RFCv1 -> RFCv2:
>
> - Make socket lookup redirection map-based. BPF program now uses a
>   dedicated helper and a SOCKARRAY map to select the socket to redirect to.
>   A consequence of this change is that bpf_inet_lookup context is now
>   read-only.
> - Look for connected UDP sockets before allowing redirection from BPF.
>   This makes connected UDP socket work as expected in the presence of
>   inet_lookup prog.
> - Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
>   the only other per-netns BPF prog type.
>
> Overview
> ========
>
> This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
> or BPF sk_lookup for short.
>
> BPF sk_lookup program runs when transport layer is looking up a listening
> socket for a new connection request (TCP), or when looking up an
> unconnected socket for a packet (UDP).
>
> This serves as a mechanism to overcome the limits of what bind() API allows
> to express. Two use-cases driving this work are:
>
>  (1) steer packets destined to an IP range, fixed port to a single socket
>
>      192.0.2.0/24, port 80 -> NGINX socket
>
>  (2) steer packets destined to an IP address, any port to a single socket
>
>      198.51.100.1, any port -> L7 proxy socket
>
> In its context, program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple.
>
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection, and returns SK_PASS code. Transport layer
> then uses the selected socket as a result of socket lookup.
>
> Alternatively, program can also fail the lookup (SK_DROP), or let the
> lookup continue as usual (SK_PASS without selecting a socket).
>
> This lets the user match packets with listening (TCP) or receiving (UDP)
> sockets freely at the last possible point on the receive path, where we
> know that packets are destined for local delivery after undergoing
> policing, filtering, and routing.
>
> Program is attached to a network namespace, similar to BPF flow_dissector.
> We add a new attach type, BPF_SK_LOOKUP, for this. Multiple programs can be
> attached at the same time, in which case their return values are aggregated
> according the rules outlined in patch #4 description.
>
> Series structure
> ================
>
> Patches are organized as so:
>
>  1: enables multiple link-based prog attachments for bpf-netns
>  2: introduces sk_lookup program type
>  3-4: hook up the program to run on ipv4/tcp socket lookup
>  5-6: hook up the program to run on ipv6/tcp socket lookup
>  7-8: hook up the program to run on ipv4/udp socket lookup
>  9-10: hook up the program to run on ipv6/udp socket lookup
>  11-13: libbpf & bpftool support for sk_lookup
>  14-15: verifier and selftests for sk_lookup
>
> Patches are also available on GH:
>
>   https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v5
>
> Follow-up work
> ==============
>
> I'll follow up with below items, which IMHO don't block the review:
>
> - benchmark results for udp6 small packet flood scenario,
> - user docs for new BPF prog type, Documentation/bpf/prog_sk_lookup.rst,
> - timeout for accept() in tests after extending network_helper.[ch].
>
> Thanks to the reviewers for their feedback to this patch series:
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Cc: Marek Majkowski <marek@cloudflare.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
>
> -jkbs

Phew, I have to admit that at the patch that adds 2k lines of tests my
eyes glazed over a bit, but other than that: thank you for your hard
work!

For the series:
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>

>
> [RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
> [RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> [v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@cloudflare.com/
> [v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@cloudflare.com/
> [v3] https://lore.kernel.org/bpf/20200702092416.11961-1-jakub@cloudflare.com/
> [v4] https://lore.kernel.org/bpf/20200713174654.642628-1-jakub@cloudflare.com/
>
> Jakub Sitnicki (15):
>   bpf, netns: Handle multiple link attachments
>   bpf: Introduce SK_LOOKUP program type with a dedicated attach point
>   inet: Extract helper for selecting socket from reuseport group
>   inet: Run SK_LOOKUP BPF program on socket lookup
>   inet6: Extract helper for selecting socket from reuseport group
>   inet6: Run SK_LOOKUP BPF program on socket lookup
>   udp: Extract helper for selecting socket from reuseport group
>   udp: Run SK_LOOKUP BPF program on socket lookup
>   udp6: Extract helper for selecting socket from reuseport group
>   udp6: Run SK_LOOKUP BPF program on socket lookup
>   bpf: Sync linux/bpf.h to tools/
>   libbpf: Add support for SK_LOOKUP program type
>   tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type
>   selftests/bpf: Add verifier tests for bpf_sk_lookup context access
>   selftests/bpf: Tests for BPF_SK_LOOKUP attach point
>
>  include/linux/bpf-netns.h                     |    3 +
>  include/linux/bpf.h                           |    4 +
>  include/linux/bpf_types.h                     |    2 +
>  include/linux/filter.h                        |  147 ++
>  include/uapi/linux/bpf.h                      |   77 +
>  kernel/bpf/core.c                             |   55 +
>  kernel/bpf/net_namespace.c                    |  127 +-
>  kernel/bpf/syscall.c                          |    9 +
>  kernel/bpf/verifier.c                         |   13 +-
>  net/core/filter.c                             |  183 +++
>  net/ipv4/inet_hashtables.c                    |   60 +-
>  net/ipv4/udp.c                                |   93 +-
>  net/ipv6/inet6_hashtables.c                   |   66 +-
>  net/ipv6/udp.c                                |   97 +-
>  scripts/bpf_helpers_doc.py                    |    9 +-
>  .../bpftool/Documentation/bpftool-prog.rst    |    2 +-
>  tools/bpf/bpftool/bash-completion/bpftool     |    2 +-
>  tools/bpf/bpftool/common.c                    |    1 +
>  tools/bpf/bpftool/prog.c                      |    3 +-
>  tools/include/uapi/linux/bpf.h                |   77 +
>  tools/lib/bpf/libbpf.c                        |    3 +
>  tools/lib/bpf/libbpf.h                        |    2 +
>  tools/lib/bpf/libbpf.map                      |    2 +
>  tools/lib/bpf/libbpf_probes.c                 |    3 +
>  tools/testing/selftests/bpf/network_helpers.c |   58 +-
>  tools/testing/selftests/bpf/network_helpers.h |    2 +
>  .../selftests/bpf/prog_tests/sk_lookup.c      | 1282 +++++++++++++++++
>  .../selftests/bpf/progs/test_sk_lookup.c      |  641 +++++++++
>  .../selftests/bpf/verifier/ctx_sk_lookup.c    |  492 +++++++
>  29 files changed, 3418 insertions(+), 97 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_sk_lookup.c
>  create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
>
> --
> 2.25.4
>


-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup
  2020-07-17 16:40 ` [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Lorenz Bauer
@ 2020-07-18  3:25   ` Alexei Starovoitov
  0 siblings, 0 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2020-07-18  3:25 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Jakub Sitnicki, bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

On Fri, Jul 17, 2020 at 9:40 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Fri, 17 Jul 2020 at 11:35, Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >
> > Changelog
> > =========
> > v4 -> v5:
> > - Enforce BPF prog return value to be SK_DROP or SK_PASS. (Andrii)
> > - Simplify prog runners now that only SK_DROP/PASS can be returned.
> > - Enable bpf_perf_event_output from the start. (Andrii)
> > - Drop patch
> >   "selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c"
> > - Remove tests for narrow loads from context at an offset wider in size
> >   than target field, while we are discussing how to fix it:
> >   https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@cloudflare.com/
> > - Rebase onto recent bpf-next (bfdfa51702de)
> > - Other minor changes called out in per-patch changelogs,
> >   see patches: 2, 4, 6, 13-15
> > - Carried over Andrii's Acks where nothing changed.

Applied. Thanks

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-17 10:35 ` [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
@ 2020-07-28 20:13   ` Andrii Nakryiko
  2020-07-28 20:47     ` Daniel Borkmann
  2020-07-29  8:57     ` Jakub Sitnicki
  0 siblings, 2 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2020-07-28 20:13 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

On Fri, Jul 17, 2020 at 3:36 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Add tests to test_progs that exercise:
>
>  - attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
>  - redirecting socket lookup to a socket selected by BPF program,
>  - failing a socket lookup on BPF program's request,
>  - error scenarios for selecting a socket from BPF program,
>  - accessing BPF program context,
>  - attaching and running multiple BPF programs.
>
> Run log:
>
>   bash-5.0# ./test_progs -n 70
>   #70/1 query lookup prog:OK
>   #70/2 TCP IPv4 redir port:OK
>   #70/3 TCP IPv4 redir addr:OK
>   #70/4 TCP IPv4 redir with reuseport:OK
>   #70/5 TCP IPv4 redir skip reuseport:OK
>   #70/6 TCP IPv6 redir port:OK
>   #70/7 TCP IPv6 redir addr:OK
>   #70/8 TCP IPv4->IPv6 redir port:OK
>   #70/9 TCP IPv6 redir with reuseport:OK
>   #70/10 TCP IPv6 redir skip reuseport:OK
>   #70/11 UDP IPv4 redir port:OK
>   #70/12 UDP IPv4 redir addr:OK
>   #70/13 UDP IPv4 redir with reuseport:OK
>   #70/14 UDP IPv4 redir skip reuseport:OK
>   #70/15 UDP IPv6 redir port:OK
>   #70/16 UDP IPv6 redir addr:OK
>   #70/17 UDP IPv4->IPv6 redir port:OK
>   #70/18 UDP IPv6 redir and reuseport:OK
>   #70/19 UDP IPv6 redir skip reuseport:OK
>   #70/20 TCP IPv4 drop on lookup:OK
>   #70/21 TCP IPv6 drop on lookup:OK
>   #70/22 UDP IPv4 drop on lookup:OK
>   #70/23 UDP IPv6 drop on lookup:OK
>   #70/24 TCP IPv4 drop on reuseport:OK
>   #70/25 TCP IPv6 drop on reuseport:OK
>   #70/26 UDP IPv4 drop on reuseport:OK
>   #70/27 TCP IPv6 drop on reuseport:OK
>   #70/28 sk_assign returns EEXIST:OK
>   #70/29 sk_assign honors F_REPLACE:OK
>   #70/30 sk_assign accepts NULL socket:OK
>   #70/31 access ctx->sk:OK
>   #70/32 narrow access to ctx v4:OK
>   #70/33 narrow access to ctx v6:OK
>   #70/34 sk_assign rejects TCP established:OK
>   #70/35 sk_assign rejects UDP connected:OK
>   #70/36 multi prog - pass, pass:OK
>   #70/37 multi prog - drop, drop:OK
>   #70/38 multi prog - pass, drop:OK
>   #70/39 multi prog - drop, pass:OK
>   #70/40 multi prog - pass, redir:OK
>   #70/41 multi prog - redir, pass:OK
>   #70/42 multi prog - drop, redir:OK
>   #70/43 multi prog - redir, drop:OK
>   #70/44 multi prog - redir, redir:OK
>   #70 sk_lookup:OK
>   Summary: 1/44 PASSED, 0 SKIPPED, 0 FAILED
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>

Hey Jakub!

We are getting this failure in Travis CI when syncing libbpf [0]:

```
ip: either "local" is duplicate, or "nodad" is garbage

switch_netns:PASS:unshare 0 nsec

switch_netns:FAIL:system failed

(/home/travis/build/libbpf/libbpf/travis-ci/vmtest/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:1310:
errno: No such file or directory) system(ip -6 addr add dev lo
fd00::1/128 nodad)

#73 sk_lookup:FAIL
```


Can you please help fix it so that it works in a Travis CI environment
as well? For now I disabled sk_lookup selftests altogether. You can
try to repro it locally by forking https://github.com/libbpf/libbpf
and enabling Travis CI for your account. See [1] for the PR that
disabled sk_lookup.


  [0] https://travis-ci.com/github/libbpf/libbpf/jobs/365878309#L5408
  [1] https://github.com/libbpf/libbpf/pull/182/commits/78368c2eaed8b0681381fc34d6016c9b5a443be8


Thanks for your help!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-28 20:13   ` Andrii Nakryiko
@ 2020-07-28 20:47     ` Daniel Borkmann
  2020-07-29  8:55       ` Jakub Sitnicki
  2020-07-29  8:57     ` Jakub Sitnicki
  1 sibling, 1 reply; 32+ messages in thread
From: Daniel Borkmann @ 2020-07-28 20:47 UTC (permalink / raw)
  To: Andrii Nakryiko, Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	David S. Miller, Jakub Kicinski

On 7/28/20 10:13 PM, Andrii Nakryiko wrote:
> On Fri, Jul 17, 2020 at 3:36 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Add tests to test_progs that exercise:
>>
>>   - attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
>>   - redirecting socket lookup to a socket selected by BPF program,
>>   - failing a socket lookup on BPF program's request,
>>   - error scenarios for selecting a socket from BPF program,
>>   - accessing BPF program context,
>>   - attaching and running multiple BPF programs.
>>
>> Run log:
>>
>>    bash-5.0# ./test_progs -n 70
>>    #70/1 query lookup prog:OK
>>    #70/2 TCP IPv4 redir port:OK
>>    #70/3 TCP IPv4 redir addr:OK
>>    #70/4 TCP IPv4 redir with reuseport:OK
>>    #70/5 TCP IPv4 redir skip reuseport:OK
>>    #70/6 TCP IPv6 redir port:OK
>>    #70/7 TCP IPv6 redir addr:OK
>>    #70/8 TCP IPv4->IPv6 redir port:OK
>>    #70/9 TCP IPv6 redir with reuseport:OK
>>    #70/10 TCP IPv6 redir skip reuseport:OK
>>    #70/11 UDP IPv4 redir port:OK
>>    #70/12 UDP IPv4 redir addr:OK
>>    #70/13 UDP IPv4 redir with reuseport:OK
>>    #70/14 UDP IPv4 redir skip reuseport:OK
>>    #70/15 UDP IPv6 redir port:OK
>>    #70/16 UDP IPv6 redir addr:OK
>>    #70/17 UDP IPv4->IPv6 redir port:OK
>>    #70/18 UDP IPv6 redir and reuseport:OK
>>    #70/19 UDP IPv6 redir skip reuseport:OK
>>    #70/20 TCP IPv4 drop on lookup:OK
>>    #70/21 TCP IPv6 drop on lookup:OK
>>    #70/22 UDP IPv4 drop on lookup:OK
>>    #70/23 UDP IPv6 drop on lookup:OK
>>    #70/24 TCP IPv4 drop on reuseport:OK
>>    #70/25 TCP IPv6 drop on reuseport:OK
>>    #70/26 UDP IPv4 drop on reuseport:OK
>>    #70/27 TCP IPv6 drop on reuseport:OK
>>    #70/28 sk_assign returns EEXIST:OK
>>    #70/29 sk_assign honors F_REPLACE:OK
>>    #70/30 sk_assign accepts NULL socket:OK
>>    #70/31 access ctx->sk:OK
>>    #70/32 narrow access to ctx v4:OK
>>    #70/33 narrow access to ctx v6:OK
>>    #70/34 sk_assign rejects TCP established:OK
>>    #70/35 sk_assign rejects UDP connected:OK
>>    #70/36 multi prog - pass, pass:OK
>>    #70/37 multi prog - drop, drop:OK
>>    #70/38 multi prog - pass, drop:OK
>>    #70/39 multi prog - drop, pass:OK
>>    #70/40 multi prog - pass, redir:OK
>>    #70/41 multi prog - redir, pass:OK
>>    #70/42 multi prog - drop, redir:OK
>>    #70/43 multi prog - redir, drop:OK
>>    #70/44 multi prog - redir, redir:OK
>>    #70 sk_lookup:OK
>>    Summary: 1/44 PASSED, 0 SKIPPED, 0 FAILED
>>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>
> 
> Hey Jakub!
> 
> We are getting this failure in Travis CI when syncing libbpf [0]:
> 
> ```
> ip: either "local" is duplicate, or "nodad" is garbage
> 
> switch_netns:PASS:unshare 0 nsec
> 
> switch_netns:FAIL:system failed
> 
> (/home/travis/build/libbpf/libbpf/travis-ci/vmtest/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:1310:
> errno: No such file or directory) system(ip -6 addr add dev lo
> fd00::1/128 nodad)
> 
> #73 sk_lookup:FAIL
> ```

Jakub, I'm actually seeing a slightly different one on my test machine with sk_lookup:

# ./test_progs -t sk_lookup
#14 cgroup_skb_sk_lookup:OK
#73/1 query lookup prog:OK
#73/2 TCP IPv4 redir port:OK
#73/3 TCP IPv4 redir addr:OK
#73/4 TCP IPv4 redir with reuseport:OK
#73/5 TCP IPv4 redir skip reuseport:OK
#73/6 TCP IPv6 redir port:OK
#73/7 TCP IPv6 redir addr:OK
#73/8 TCP IPv4->IPv6 redir port:OK
#73/9 TCP IPv6 redir with reuseport:OK
#73/10 TCP IPv6 redir skip reuseport:OK
#73/11 UDP IPv4 redir port:OK
#73/12 UDP IPv4 redir addr:OK
#73/13 UDP IPv4 redir with reuseport:OK
attach_lookup_prog:PASS:open 0 nsec
attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec
make_socket:PASS:make_address 0 nsec
make_socket:PASS:socket 0 nsec
make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
make_server:PASS:bind 0 nsec
make_server:PASS:attach_reuseport 0 nsec
update_lookup_map:PASS:bpf_map__fd 0 nsec
update_lookup_map:PASS:bpf_map_update_elem 0 nsec
make_socket:PASS:make_address 0 nsec
make_socket:PASS:socket 0 nsec
make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
make_server:PASS:bind 0 nsec
make_server:PASS:attach_reuseport 0 nsec
update_lookup_map:PASS:bpf_map__fd 0 nsec
update_lookup_map:PASS:bpf_map_update_elem 0 nsec
make_socket:PASS:make_address 0 nsec
make_socket:PASS:socket 0 nsec
make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
make_server:PASS:bind 0 nsec
make_server:PASS:attach_reuseport 0 nsec
run_lookup_prog:PASS:getsockname 0 nsec
run_lookup_prog:PASS:connect 0 nsec
make_socket:PASS:make_address 0 nsec
make_socket:PASS:socket 0 nsec
make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
make_client:PASS:make_client 0 nsec
send_byte:PASS:send_byte 0 nsec
udp_recv_send:FAIL:recvmsg failed
(/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: errno: Resource temporarily unavailable) failed to receive
#73/14 UDP IPv4 redir and reuseport with conns:FAIL
#73/15 UDP IPv4 redir skip reuseport:OK
#73/16 UDP IPv6 redir port:OK
#73/17 UDP IPv6 redir addr:OK
#73/18 UDP IPv4->IPv6 redir port:OK
#73/19 UDP IPv6 redir and reuseport:OK
attach_lookup_prog:PASS:open 0 nsec
attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec
make_socket:PASS:make_address 0 nsec
make_socket:PASS:socket 0 nsec
make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
make_server:PASS:bind 0 nsec
make_server:PASS:attach_reuseport 0 nsec
update_lookup_map:PASS:bpf_map__fd 0 nsec
update_lookup_map:PASS:bpf_map_update_elem 0 nsec
make_socket:PASS:make_address 0 nsec
make_socket:PASS:socket 0 nsec
make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
make_server:PASS:bind 0 nsec
make_server:PASS:attach_reuseport 0 nsec
update_lookup_map:PASS:bpf_map__fd 0 nsec
update_lookup_map:PASS:bpf_map_update_elem 0 nsec
make_socket:PASS:make_address 0 nsec
make_socket:PASS:socket 0 nsec
make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
make_server:PASS:bind 0 nsec
make_server:PASS:attach_reuseport 0 nsec
run_lookup_prog:PASS:getsockname 0 nsec
run_lookup_prog:PASS:connect 0 nsec
make_socket:PASS:make_address 0 nsec
make_socket:PASS:socket 0 nsec
make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
make_client:PASS:make_client 0 nsec
send_byte:PASS:send_byte 0 nsec
udp_recv_send:FAIL:recvmsg failed
(/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: errno: Resource temporarily unavailable) failed to receive
#73/20 UDP IPv6 redir and reuseport with conns:FAIL
#73/21 UDP IPv6 redir skip reuseport:OK
#73/22 TCP IPv4 drop on lookup:OK
#73/23 TCP IPv6 drop on lookup:OK
#73/24 UDP IPv4 drop on lookup:OK
#73/25 UDP IPv6 drop on lookup:OK
#73/26 TCP IPv4 drop on reuseport:OK
#73/27 TCP IPv6 drop on reuseport:OK
#73/28 UDP IPv4 drop on reuseport:OK
#73/29 TCP IPv6 drop on reuseport:OK
#73/30 sk_assign returns EEXIST:OK
#73/31 sk_assign honors F_REPLACE:OK
#73/32 sk_assign accepts NULL socket:OK
#73/33 access ctx->sk:OK
#73/34 narrow access to ctx v4:OK
#73/35 narrow access to ctx v6:OK
#73/36 sk_assign rejects TCP established:OK
#73/37 sk_assign rejects UDP connected:OK
#73/38 multi prog - pass, pass:OK
#73/39 multi prog - drop, drop:OK
#73/40 multi prog - pass, drop:OK
#73/41 multi prog - drop, pass:OK
#73/42 multi prog - pass, redir:OK
#73/43 multi prog - redir, pass:OK
#73/44 multi prog - drop, redir:OK
#73/45 multi prog - redir, drop:OK
#73/46 multi prog - redir, redir:OK
#73 sk_lookup:FAIL
Summary: 1/44 PASSED, 0 SKIPPED, 3 FAILED

> Can you please help fix it so that it works in a Travis CI environment
> as well? For now I disabled sk_lookup selftests altogether. You can
> try to repro it locally by forking https://github.com/libbpf/libbpf
> and enabling Travis CI for your account. See [1] for the PR that
> disabled sk_lookup.
> 
> 
>    [0] https://travis-ci.com/github/libbpf/libbpf/jobs/365878309#L5408
>    [1] https://github.com/libbpf/libbpf/pull/182/commits/78368c2eaed8b0681381fc34d6016c9b5a443be8
> 
> 
> Thanks for your help!
> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-28 20:47     ` Daniel Borkmann
@ 2020-07-29  8:55       ` Jakub Sitnicki
  2020-07-31  0:04         ` Daniel Borkmann
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-29  8:55 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Andrii Nakryiko, bpf, Networking, kernel-team,
	Alexei Starovoitov, David S. Miller, Jakub Kicinski

Hi Daniel,

On Tue, Jul 28, 2020 at 10:47 PM CEST, Daniel Borkmann wrote:

[...]

> Jakub, I'm actually seeing a slightly different one on my test machine with sk_lookup:
>
> # ./test_progs -t sk_lookup
> #14 cgroup_skb_sk_lookup:OK
> #73/1 query lookup prog:OK
> #73/2 TCP IPv4 redir port:OK
> #73/3 TCP IPv4 redir addr:OK
> #73/4 TCP IPv4 redir with reuseport:OK
> #73/5 TCP IPv4 redir skip reuseport:OK
> #73/6 TCP IPv6 redir port:OK
> #73/7 TCP IPv6 redir addr:OK
> #73/8 TCP IPv4->IPv6 redir port:OK
> #73/9 TCP IPv6 redir with reuseport:OK
> #73/10 TCP IPv6 redir skip reuseport:OK
> #73/11 UDP IPv4 redir port:OK
> #73/12 UDP IPv4 redir addr:OK
> #73/13 UDP IPv4 redir with reuseport:OK
> attach_lookup_prog:PASS:open 0 nsec
> attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> update_lookup_map:PASS:bpf_map__fd 0 nsec
> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> update_lookup_map:PASS:bpf_map__fd 0 nsec
> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> run_lookup_prog:PASS:getsockname 0 nsec
> run_lookup_prog:PASS:connect 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_client:PASS:make_client 0 nsec
> send_byte:PASS:send_byte 0 nsec
> udp_recv_send:FAIL:recvmsg failed
> (/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: errno: Resource temporarily unavailable) failed to receive
> #73/14 UDP IPv4 redir and reuseport with conns:FAIL
> #73/15 UDP IPv4 redir skip reuseport:OK
> #73/16 UDP IPv6 redir port:OK
> #73/17 UDP IPv6 redir addr:OK
> #73/18 UDP IPv4->IPv6 redir port:OK
> #73/19 UDP IPv6 redir and reuseport:OK
> attach_lookup_prog:PASS:open 0 nsec
> attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> update_lookup_map:PASS:bpf_map__fd 0 nsec
> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> update_lookup_map:PASS:bpf_map__fd 0 nsec
> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
> make_server:PASS:bind 0 nsec
> make_server:PASS:attach_reuseport 0 nsec
> run_lookup_prog:PASS:getsockname 0 nsec
> run_lookup_prog:PASS:connect 0 nsec
> make_socket:PASS:make_address 0 nsec
> make_socket:PASS:socket 0 nsec
> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
> make_client:PASS:make_client 0 nsec
> send_byte:PASS:send_byte 0 nsec
> udp_recv_send:FAIL:recvmsg failed
> (/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: errno: Resource temporarily unavailable) failed to receive
> #73/20 UDP IPv6 redir and reuseport with conns:FAIL
> #73/21 UDP IPv6 redir skip reuseport:OK
> #73/22 TCP IPv4 drop on lookup:OK
> #73/23 TCP IPv6 drop on lookup:OK
> #73/24 UDP IPv4 drop on lookup:OK
> #73/25 UDP IPv6 drop on lookup:OK
> #73/26 TCP IPv4 drop on reuseport:OK
> #73/27 TCP IPv6 drop on reuseport:OK
> #73/28 UDP IPv4 drop on reuseport:OK
> #73/29 TCP IPv6 drop on reuseport:OK
> #73/30 sk_assign returns EEXIST:OK
> #73/31 sk_assign honors F_REPLACE:OK
> #73/32 sk_assign accepts NULL socket:OK
> #73/33 access ctx->sk:OK
> #73/34 narrow access to ctx v4:OK
> #73/35 narrow access to ctx v6:OK
> #73/36 sk_assign rejects TCP established:OK
> #73/37 sk_assign rejects UDP connected:OK
> #73/38 multi prog - pass, pass:OK
> #73/39 multi prog - drop, drop:OK
> #73/40 multi prog - pass, drop:OK
> #73/41 multi prog - drop, pass:OK
> #73/42 multi prog - pass, redir:OK
> #73/43 multi prog - redir, pass:OK
> #73/44 multi prog - drop, redir:OK
> #73/45 multi prog - redir, drop:OK
> #73/46 multi prog - redir, redir:OK
> #73 sk_lookup:FAIL
> Summary: 1/44 PASSED, 0 SKIPPED, 3 FAILED

This patch addresses the failures:

  https://lore.kernel.org/bpf/20200726120228.1414348-1-jakub@cloudflare.com/

It spawned a discussion on what to do about reuseport bpf returning
connected udp sockets, and the conclusion was that it would be best to
change reuseport logic itself if no one is relying on said behavior.

IOW, I belive the fix does the right thing and can be applied as is. We
get the same reuseport behavior everywhere, that is with regular socket
lookup and BPF sk lookup, even if that behavior needs to be changed.

[...]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-28 20:13   ` Andrii Nakryiko
  2020-07-28 20:47     ` Daniel Borkmann
@ 2020-07-29  8:57     ` Jakub Sitnicki
  2020-07-30 13:10       ` Jakub Sitnicki
  1 sibling, 1 reply; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-29  8:57 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

Hi Andrii,

On Tue, Jul 28, 2020 at 10:13 PM CEST, Andrii Nakryiko wrote:

[...]

> We are getting this failure in Travis CI when syncing libbpf [0]:
>
> ```
> ip: either "local" is duplicate, or "nodad" is garbage
>
> switch_netns:PASS:unshare 0 nsec
>
> switch_netns:FAIL:system failed
>
> (/home/travis/build/libbpf/libbpf/travis-ci/vmtest/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:1310:
> errno: No such file or directory) system(ip -6 addr add dev lo
> fd00::1/128 nodad)
>
> #73 sk_lookup:FAIL
> ```
>
>
> Can you please help fix it so that it works in a Travis CI environment
> as well? For now I disabled sk_lookup selftests altogether. You can
> try to repro it locally by forking https://github.com/libbpf/libbpf
> and enabling Travis CI for your account. See [1] for the PR that
> disabled sk_lookup.
>
>
>   [0] https://travis-ci.com/github/libbpf/libbpf/jobs/365878309#L5408
>   [1] https://github.com/libbpf/libbpf/pull/182/commits/78368c2eaed8b0681381fc34d6016c9b5a443be8
>
>
> Thanks for your help!

"nodad is garbage" message smells like old iproute2. I will take a look.

Thanks for letting me know.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-29  8:57     ` Jakub Sitnicki
@ 2020-07-30 13:10       ` Jakub Sitnicki
  2020-07-30 19:43         ` Andrii Nakryiko
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Sitnicki @ 2020-07-30 13:10 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

On Wed, Jul 29, 2020 at 10:57 AM CEST, Jakub Sitnicki wrote:
> On Tue, Jul 28, 2020 at 10:13 PM CEST, Andrii Nakryiko wrote:
>
> [...]
>
>> We are getting this failure in Travis CI when syncing libbpf [0]:
>>
>> ```
>> ip: either "local" is duplicate, or "nodad" is garbage
>>
>> switch_netns:PASS:unshare 0 nsec
>>
>> switch_netns:FAIL:system failed
>>
>> (/home/travis/build/libbpf/libbpf/travis-ci/vmtest/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:1310:
>> errno: No such file or directory) system(ip -6 addr add dev lo
>> fd00::1/128 nodad)
>>
>> #73 sk_lookup:FAIL
>> ```
>>
>>
>> Can you please help fix it so that it works in a Travis CI environment
>> as well? For now I disabled sk_lookup selftests altogether. You can
>> try to repro it locally by forking https://github.com/libbpf/libbpf
>> and enabling Travis CI for your account. See [1] for the PR that
>> disabled sk_lookup.

[...]

Once this fix-up finds its way to bpf-next, we will be able to re-enable
sk_loookup tests:

  https://lore.kernel.org/bpf/20200730125325.1869363-1-jakub@cloudflare.com/

And I now know that I need to test shell commands against BusyBox 'ip'
command implementation, that libbpf project uses in CI env.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-30 13:10       ` Jakub Sitnicki
@ 2020-07-30 19:43         ` Andrii Nakryiko
  0 siblings, 0 replies; 32+ messages in thread
From: Andrii Nakryiko @ 2020-07-30 19:43 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

On Thu, Jul 30, 2020 at 6:10 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Wed, Jul 29, 2020 at 10:57 AM CEST, Jakub Sitnicki wrote:
> > On Tue, Jul 28, 2020 at 10:13 PM CEST, Andrii Nakryiko wrote:
> >
> > [...]
> >
> >> We are getting this failure in Travis CI when syncing libbpf [0]:
> >>
> >> ```
> >> ip: either "local" is duplicate, or "nodad" is garbage
> >>
> >> switch_netns:PASS:unshare 0 nsec
> >>
> >> switch_netns:FAIL:system failed
> >>
> >> (/home/travis/build/libbpf/libbpf/travis-ci/vmtest/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:1310:
> >> errno: No such file or directory) system(ip -6 addr add dev lo
> >> fd00::1/128 nodad)
> >>
> >> #73 sk_lookup:FAIL
> >> ```
> >>
> >>
> >> Can you please help fix it so that it works in a Travis CI environment
> >> as well? For now I disabled sk_lookup selftests altogether. You can
> >> try to repro it locally by forking https://github.com/libbpf/libbpf
> >> and enabling Travis CI for your account. See [1] for the PR that
> >> disabled sk_lookup.
>
> [...]
>
> Once this fix-up finds its way to bpf-next, we will be able to re-enable
> sk_loookup tests:
>
>   https://lore.kernel.org/bpf/20200730125325.1869363-1-jakub@cloudflare.com/
>
> And I now know that I need to test shell commands against BusyBox 'ip'
> command implementation, that libbpf project uses in CI env.


Thanks! I still see some (other) failures, it might be that our
environment is not full enough or something (you also mentioned some
other fix to Daniel, that might help as well, dunno). But your fix is
good nevertheless.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-29  8:55       ` Jakub Sitnicki
@ 2020-07-31  0:04         ` Daniel Borkmann
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel Borkmann @ 2020-07-31  0:04 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: Andrii Nakryiko, bpf, Networking, kernel-team,
	Alexei Starovoitov, David S. Miller, Jakub Kicinski

On 7/29/20 10:55 AM, Jakub Sitnicki wrote:
> Hi Daniel,
> 
> On Tue, Jul 28, 2020 at 10:47 PM CEST, Daniel Borkmann wrote:
> 
> [...]
> 
>> Jakub, I'm actually seeing a slightly different one on my test machine with sk_lookup:
>>
>> # ./test_progs -t sk_lookup
>> #14 cgroup_skb_sk_lookup:OK
>> #73/1 query lookup prog:OK
>> #73/2 TCP IPv4 redir port:OK
>> #73/3 TCP IPv4 redir addr:OK
>> #73/4 TCP IPv4 redir with reuseport:OK
>> #73/5 TCP IPv4 redir skip reuseport:OK
>> #73/6 TCP IPv6 redir port:OK
>> #73/7 TCP IPv6 redir addr:OK
>> #73/8 TCP IPv4->IPv6 redir port:OK
>> #73/9 TCP IPv6 redir with reuseport:OK
>> #73/10 TCP IPv6 redir skip reuseport:OK
>> #73/11 UDP IPv4 redir port:OK
>> #73/12 UDP IPv4 redir addr:OK
>> #73/13 UDP IPv4 redir with reuseport:OK
>> attach_lookup_prog:PASS:open 0 nsec
>> attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec
>> make_socket:PASS:make_address 0 nsec
>> make_socket:PASS:socket 0 nsec
>> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
>> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
>> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
>> make_server:PASS:bind 0 nsec
>> make_server:PASS:attach_reuseport 0 nsec
>> update_lookup_map:PASS:bpf_map__fd 0 nsec
>> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
>> make_socket:PASS:make_address 0 nsec
>> make_socket:PASS:socket 0 nsec
>> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
>> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
>> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
>> make_server:PASS:bind 0 nsec
>> make_server:PASS:attach_reuseport 0 nsec
>> update_lookup_map:PASS:bpf_map__fd 0 nsec
>> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
>> make_socket:PASS:make_address 0 nsec
>> make_socket:PASS:socket 0 nsec
>> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
>> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
>> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
>> make_server:PASS:bind 0 nsec
>> make_server:PASS:attach_reuseport 0 nsec
>> run_lookup_prog:PASS:getsockname 0 nsec
>> run_lookup_prog:PASS:connect 0 nsec
>> make_socket:PASS:make_address 0 nsec
>> make_socket:PASS:socket 0 nsec
>> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
>> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
>> make_client:PASS:make_client 0 nsec
>> send_byte:PASS:send_byte 0 nsec
>> udp_recv_send:FAIL:recvmsg failed
>> (/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: errno: Resource temporarily unavailable) failed to receive
>> #73/14 UDP IPv4 redir and reuseport with conns:FAIL
>> #73/15 UDP IPv4 redir skip reuseport:OK
>> #73/16 UDP IPv6 redir port:OK
>> #73/17 UDP IPv6 redir addr:OK
>> #73/18 UDP IPv4->IPv6 redir port:OK
>> #73/19 UDP IPv6 redir and reuseport:OK
>> attach_lookup_prog:PASS:open 0 nsec
>> attach_lookup_prog:PASS:bpf_program__attach_netns 0 nsec
>> make_socket:PASS:make_address 0 nsec
>> make_socket:PASS:socket 0 nsec
>> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
>> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
>> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
>> make_server:PASS:bind 0 nsec
>> make_server:PASS:attach_reuseport 0 nsec
>> update_lookup_map:PASS:bpf_map__fd 0 nsec
>> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
>> make_socket:PASS:make_address 0 nsec
>> make_socket:PASS:socket 0 nsec
>> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
>> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
>> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
>> make_server:PASS:bind 0 nsec
>> make_server:PASS:attach_reuseport 0 nsec
>> update_lookup_map:PASS:bpf_map__fd 0 nsec
>> update_lookup_map:PASS:bpf_map_update_elem 0 nsec
>> make_socket:PASS:make_address 0 nsec
>> make_socket:PASS:socket 0 nsec
>> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
>> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
>> make_server:PASS:setsockopt(IP_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(IPV6_RECVORIGDSTADDR) 0 nsec
>> make_server:PASS:setsockopt(SO_REUSEPORT) 0 nsec
>> make_server:PASS:bind 0 nsec
>> make_server:PASS:attach_reuseport 0 nsec
>> run_lookup_prog:PASS:getsockname 0 nsec
>> run_lookup_prog:PASS:connect 0 nsec
>> make_socket:PASS:make_address 0 nsec
>> make_socket:PASS:socket 0 nsec
>> make_socket:PASS:setsockopt(SO_SNDTIMEO) 0 nsec
>> make_socket:PASS:setsockopt(SO_RCVTIMEO) 0 nsec
>> make_client:PASS:make_client 0 nsec
>> send_byte:PASS:send_byte 0 nsec
>> udp_recv_send:FAIL:recvmsg failed
>> (/root/bpf-next/tools/testing/selftests/bpf/prog_tests/sk_lookup.c:339: errno: Resource temporarily unavailable) failed to receive
>> #73/20 UDP IPv6 redir and reuseport with conns:FAIL
>> #73/21 UDP IPv6 redir skip reuseport:OK
>> #73/22 TCP IPv4 drop on lookup:OK
>> #73/23 TCP IPv6 drop on lookup:OK
>> #73/24 UDP IPv4 drop on lookup:OK
>> #73/25 UDP IPv6 drop on lookup:OK
>> #73/26 TCP IPv4 drop on reuseport:OK
>> #73/27 TCP IPv6 drop on reuseport:OK
>> #73/28 UDP IPv4 drop on reuseport:OK
>> #73/29 TCP IPv6 drop on reuseport:OK
>> #73/30 sk_assign returns EEXIST:OK
>> #73/31 sk_assign honors F_REPLACE:OK
>> #73/32 sk_assign accepts NULL socket:OK
>> #73/33 access ctx->sk:OK
>> #73/34 narrow access to ctx v4:OK
>> #73/35 narrow access to ctx v6:OK
>> #73/36 sk_assign rejects TCP established:OK
>> #73/37 sk_assign rejects UDP connected:OK
>> #73/38 multi prog - pass, pass:OK
>> #73/39 multi prog - drop, drop:OK
>> #73/40 multi prog - pass, drop:OK
>> #73/41 multi prog - drop, pass:OK
>> #73/42 multi prog - pass, redir:OK
>> #73/43 multi prog - redir, pass:OK
>> #73/44 multi prog - drop, redir:OK
>> #73/45 multi prog - redir, drop:OK
>> #73/46 multi prog - redir, redir:OK
>> #73 sk_lookup:FAIL
>> Summary: 1/44 PASSED, 0 SKIPPED, 3 FAILED
> 
> This patch addresses the failures:
> 
>    https://lore.kernel.org/bpf/20200726120228.1414348-1-jakub@cloudflare.com/
> 
> It spawned a discussion on what to do about reuseport bpf returning
> connected udp sockets, and the conclusion was that it would be best to
> change reuseport logic itself if no one is relying on said behavior.
> 
> IOW, I belive the fix does the right thing and can be applied as is. We
> get the same reuseport behavior everywhere, that is with regular socket
> lookup and BPF sk lookup, even if that behavior needs to be changed.

Seems reasonable to me, I've applied it to bpf-next, thanks Jakub!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks
  2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (15 preceding siblings ...)
  2020-07-17 16:40 ` [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Lorenz Bauer
@ 2020-08-18 15:49 ` Jakub Sitnicki
  2020-08-18 18:19   ` Alexei Starovoitov
  16 siblings, 1 reply; 32+ messages in thread
From: Jakub Sitnicki @ 2020-08-18 15:49 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko, Lorenz Bauer,
	Marek Majkowski, Martin KaFai Lau, Yonghong Song

I got around to re-running flood benchmarks. Mainly to confirm that
introduction of static key had the desired effect - users not attaching
BPF sk_lookup programs won't notice a performance hit in Linux v5.9.

But also to check for any unexpected bottlenecks when BPF sk_lookup
program is attached, like struct in6_addr copying that turned out to be
a bad idea in v1.

The test setup has been already covered in the cover letter for v1 of
the series so I'm not going to repeat it here. Please take a look at
"Performance considerations" in [0].

BPF program [1] used during benchmarks has been updated to work with the
BPF sk_lookup uAPI in v5.

RX pps and CPU cycles events were recorded in 3 configurations:

 1. 5.8-rc7 w/o this BPF sk_lookup patch series (baseline),
 2. 5.8-rc7 with patches applied, but no SK_LOOKUP program attached,
 3. 5.8-rc7 with patches applied, and SK_LOOKUP program attached;
    BPF program [1] is doing a lookup LPM_TRIE map with 200 entries.

RX pps measured with `ifpps -d <dev> -t 1000 --csv --loop` for 60 sec.

| tcp4 SYN flood               | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.8-rc7 vanilla (baseline)   | 899,875 ± 1.0%         |        - |
| no SK_LOOKUP prog attached   | 889,798 ± 0.6%         |    -1.1% |
| with SK_LOOKUP prog attached | 868,885 ± 1.4%         |    -3.4% |

| tcp6 SYN flood               | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.8-rc7 vanilla (baseline)   | 823,364 ± 0.6%         |        - |
| no SK_LOOKUP prog attached   | 832,667 ± 0.7%         |     1.1% |
| with SK_LOOKUP prog attached | 820,505 ± 0.4%         |    -0.3% |

| udp4 0-len flood             | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.8-rc7 vanilla (baseline)   | 2,486,313 ± 1.2%       |        - |
| no SK_LOOKUP prog attached   | 2,486,932 ± 0.4%       |     0.0% |
| with SK_LOOKUP prog attached | 2,340,425 ± 1.6%       |    -5.9% |

| udp6 0-len flood             | rx pps (mean ± sstdev) | Δ rx pps |
|------------------------------+------------------------+----------|
| 5.8-rc7 vanilla (baseline)   | 2,505,270 ± 1.3%       |        - |
| no SK_LOOKUP prog attached   | 2,522,286 ± 1.3%       |     0.7% |
| with SK_LOOKUP prog attached | 2,418,737 ± 1.3%       |    -3.5% |

cpu-cycles measured with `perf record -F 999 --cpu 1-4 -g -- sleep 60`.

|                              |      cpu-cycles events |          |
| tcp4 SYN flood               | __inet_lookup_listener | Δ events |
|------------------------------+------------------------+----------|
| 5.8-rc7 vanilla (baseline)   |                  1.31% |        - |
| no SK_LOOKUP prog attached   |                  1.24% |    -0.1% |
| with SK_LOOKUP prog attached |                  2.59% |     1.3% |

|                              |      cpu-cycles events |          |
| tcp6 SYN flood               |  inet6_lookup_listener | Δ events |
|------------------------------+------------------------+----------|
| 5.8-rc7 vanilla (baseline)   |                  1.28% |        - |
| no SK_LOOKUP prog attached   |                  1.22% |    -0.1% |
| with SK_LOOKUP prog attached |                  3.15% |     1.4% |

|                              |      cpu-cycles events |          |
| udp4 0-len flood             |      __udp4_lib_lookup | Δ events |
|------------------------------+------------------------+----------|
| 5.8-rc7 vanilla (baseline)   |                  3.70% |        - |
| no SK_LOOKUP prog attached   |                  4.13% |     0.4% |
| with SK_LOOKUP prog attached |                  7.55% |     3.9% |

|                              |      cpu-cycles events |          |
| udp6 0-len flood             |      __udp6_lib_lookup | Δ events |
|------------------------------+------------------------+----------|
| 5.8-rc7 vanilla (baseline)   |                  4.94% |        - |
| no SK_LOOKUP prog attached   |                  4.32% |    -0.6% |
| with SK_LOOKUP prog attached |                  8.07% |     3.1% |

Couple comments:

1. udp6 outperformed udp4 in our setup. The likely suspect is
   CONFIG_IP_FIB_TRIE_STATS which put fib_table_lookup at the top of
   perf report when it comes to cpu-cycles w/o counting children. It
   should have been disabled.

2. When BPF sk_lookup program is attached, the hot spot remains to be
   copying data to populate BPF context object before each program run.

   For example, snippet from perf annotate for __udp4_lib_lookup:

---8<---
         :                      rcu_read_lock();
         :                      run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
    0.01 :   ffffffff817f8624:       mov    0xd68(%r12),%rsi
         :                      if (run_array) {
    0.00 :   ffffffff817f862c:       test   %rsi,%rsi
    0.00 :   ffffffff817f862f:       je     ffffffff817f87a9 <__udp4_lib_lookup+0x2c9>
         :                      struct bpf_sk_lookup_kern ctx = {
    1.05 :   ffffffff817f8635:       xor    %eax,%eax
    0.00 :   ffffffff817f8637:       mov    $0x6,%ecx
    0.01 :   ffffffff817f863c:       movl   $0x110002,0x40(%rsp)
    0.00 :   ffffffff817f8644:       lea    0x48(%rsp),%rdi
   18.76 :   ffffffff817f8649:       rep stos %rax,%es:(%rdi)
    1.12 :   ffffffff817f864c:       mov    0xc(%rsp),%eax
    0.00 :   ffffffff817f8650:       mov    %ebp,0x48(%rsp)
    0.00 :   ffffffff817f8654:       mov    %eax,0x44(%rsp)
    0.00 :   ffffffff817f8658:       movzwl 0x10(%rsp),%eax
    1.21 :   ffffffff817f865d:       mov    %ax,0x60(%rsp)
    0.00 :   ffffffff817f8662:       movzwl 0x20(%rsp),%eax
    0.00 :   ffffffff817f8667:       mov    %ax,0x62(%rsp)
         :                      .sport          = sport,
         :                      .dport          = dport,
         :                      };
         :                      u32 act;
         :
         :                      act = BPF_PROG_SK_LOOKUP_RUN_ARRAY(run_array, ctx, BPF_PROG_RUN);
--->8---

   Looking at the RX pps drop this is not something we're concerned with
   ATM. The overhead will drown in cycles burned in iptables, which were
   intentionally unloaded for the benchmark.

   If someone has an idea how to tune it, though, I'm all ears.

Thanks,
-jkbs

[0] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@cloudflare.com/
[1] https://github.com/majek/inet-tool/blob/master/ebpf/inet-kern.c

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks
  2020-08-18 15:49 ` BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks Jakub Sitnicki
@ 2020-08-18 18:19   ` Alexei Starovoitov
  2020-08-20 10:29     ` Jakub Sitnicki
  2020-08-24  8:17     ` Paolo Abeni
  0 siblings, 2 replies; 32+ messages in thread
From: Alexei Starovoitov @ 2020-08-18 18:19 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Network Development, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Lorenz Bauer, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

On Tue, Aug 18, 2020 at 8:49 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>          :                      rcu_read_lock();
>          :                      run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
>     0.01 :   ffffffff817f8624:       mov    0xd68(%r12),%rsi
>          :                      if (run_array) {
>     0.00 :   ffffffff817f862c:       test   %rsi,%rsi
>     0.00 :   ffffffff817f862f:       je     ffffffff817f87a9 <__udp4_lib_lookup+0x2c9>
>          :                      struct bpf_sk_lookup_kern ctx = {
>     1.05 :   ffffffff817f8635:       xor    %eax,%eax
>     0.00 :   ffffffff817f8637:       mov    $0x6,%ecx
>     0.01 :   ffffffff817f863c:       movl   $0x110002,0x40(%rsp)
>     0.00 :   ffffffff817f8644:       lea    0x48(%rsp),%rdi
>    18.76 :   ffffffff817f8649:       rep stos %rax,%es:(%rdi)
>     1.12 :   ffffffff817f864c:       mov    0xc(%rsp),%eax
>     0.00 :   ffffffff817f8650:       mov    %ebp,0x48(%rsp)
>     0.00 :   ffffffff817f8654:       mov    %eax,0x44(%rsp)
>     0.00 :   ffffffff817f8658:       movzwl 0x10(%rsp),%eax
>     1.21 :   ffffffff817f865d:       mov    %ax,0x60(%rsp)
>     0.00 :   ffffffff817f8662:       movzwl 0x20(%rsp),%eax
>     0.00 :   ffffffff817f8667:       mov    %ax,0x62(%rsp)
>          :                      .sport          = sport,
>          :                      .dport          = dport,
>          :                      };

Such heavy hit to zero init 56-byte structure is surprising.
There are two 4-byte holes in this struct. You can try to pack it and
make sure that 'rep stoq' is used instead of 'rep stos' (8 byte at a time vs 4).

Long term we should probably stop doing *_kern style of ctx passing
into bpf progs.
We have BTF, CO-RE and freplace now. This old style of memset *_kern and manual
ctx conversion has performance implications and annoying copy-paste of ctx
conversion routines.
For this particular case instead of introducing udp4_lookup_run_bpf()
and copying registers into stack we could have used freplace of
udp4_lib_lookup2.
More verifier work needed, of course.
My main point that existing approach "lets prep args for bpf prog to
run" that is used
pretty much in every bpf hook is no longer necessary.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks
  2020-08-18 18:19   ` Alexei Starovoitov
@ 2020-08-20 10:29     ` Jakub Sitnicki
  2020-08-20 12:20       ` David Laight
  2020-08-20 22:18       ` Alexei Starovoitov
  2020-08-24  8:17     ` Paolo Abeni
  1 sibling, 2 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-08-20 10:29 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Network Development, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Lorenz Bauer, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

On Tue, Aug 18, 2020 at 08:19 PM CEST, Alexei Starovoitov wrote:
> On Tue, Aug 18, 2020 at 8:49 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>          :                      rcu_read_lock();
>>          :                      run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
>>     0.01 :   ffffffff817f8624:       mov    0xd68(%r12),%rsi
>>          :                      if (run_array) {
>>     0.00 :   ffffffff817f862c:       test   %rsi,%rsi
>>     0.00 :   ffffffff817f862f:       je     ffffffff817f87a9 <__udp4_lib_lookup+0x2c9>
>>          :                      struct bpf_sk_lookup_kern ctx = {
>>     1.05 :   ffffffff817f8635:       xor    %eax,%eax
>>     0.00 :   ffffffff817f8637:       mov    $0x6,%ecx
>>     0.01 :   ffffffff817f863c:       movl   $0x110002,0x40(%rsp)
>>     0.00 :   ffffffff817f8644:       lea    0x48(%rsp),%rdi
>>    18.76 :   ffffffff817f8649:       rep stos %rax,%es:(%rdi)
>>     1.12 :   ffffffff817f864c:       mov    0xc(%rsp),%eax
>>     0.00 :   ffffffff817f8650:       mov    %ebp,0x48(%rsp)
>>     0.00 :   ffffffff817f8654:       mov    %eax,0x44(%rsp)
>>     0.00 :   ffffffff817f8658:       movzwl 0x10(%rsp),%eax
>>     1.21 :   ffffffff817f865d:       mov    %ax,0x60(%rsp)
>>     0.00 :   ffffffff817f8662:       movzwl 0x20(%rsp),%eax
>>     0.00 :   ffffffff817f8667:       mov    %ax,0x62(%rsp)
>>          :                      .sport          = sport,
>>          :                      .dport          = dport,
>>          :                      };
>
> Such heavy hit to zero init 56-byte structure is surprising.
> There are two 4-byte holes in this struct. You can try to pack it and
> make sure that 'rep stoq' is used instead of 'rep stos' (8 byte at a time vs 4).

Thanks for the tip. I'll give it a try.

> Long term we should probably stop doing *_kern style of ctx passing
> into bpf progs.
> We have BTF, CO-RE and freplace now. This old style of memset *_kern and manual
> ctx conversion has performance implications and annoying copy-paste of ctx
> conversion routines.
> For this particular case instead of introducing udp4_lookup_run_bpf()
> and copying registers into stack we could have used freplace of
> udp4_lib_lookup2.
> More verifier work needed, of course.
> My main point that existing approach "lets prep args for bpf prog to
> run" that is used
> pretty much in every bpf hook is no longer necessary.

Andrii has also suggested leveraging BTF [0], but to expose the *_kern
struct directly to BPF prog instead of emitting ctx access instructions.

What I'm curious about is if we get rid of prepping args and ctx
conversion, then how do we limit what memory BPF prog can access?

Say, I'm passing a struct sock * to my BPF prog. If it's not a tracing
prog, then I don't want it to have access to everything that is
reachable from struct sock *. This is where this approach currently
breaks down for me.

[0] https://lore.kernel.org/bpf/CAEf4BzZ7-0TFD4+NqpK9X=Yuiem89Ug27v90fev=nn+3anCTpA@mail.gmail.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks
  2020-08-20 10:29     ` Jakub Sitnicki
@ 2020-08-20 12:20       ` David Laight
  2020-08-20 22:18       ` Alexei Starovoitov
  1 sibling, 0 replies; 32+ messages in thread
From: David Laight @ 2020-08-20 12:20 UTC (permalink / raw)
  To: 'Jakub Sitnicki', Alexei Starovoitov
  Cc: bpf, Network Development, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Lorenz Bauer, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

From: Jakub Sitnicki
> Sent: 20 August 2020 11:30
> Subject: Re: BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks
> 
> On Tue, Aug 18, 2020 at 08:19 PM CEST, Alexei Starovoitov wrote:
> > On Tue, Aug 18, 2020 at 8:49 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>          :                      rcu_read_lock();
> >>          :                      run_array = rcu_dereference(net-
> >bpf.run_array[NETNS_BPF_SK_LOOKUP]);
> >>     0.01 :   ffffffff817f8624:       mov    0xd68(%r12),%rsi
> >>          :                      if (run_array) {
> >>     0.00 :   ffffffff817f862c:       test   %rsi,%rsi
> >>     0.00 :   ffffffff817f862f:       je     ffffffff817f87a9 <__udp4_lib_lookup+0x2c9>
> >>          :                      struct bpf_sk_lookup_kern ctx = {
> >>     1.05 :   ffffffff817f8635:       xor    %eax,%eax
> >>     0.00 :   ffffffff817f8637:       mov    $0x6,%ecx
> >>     0.01 :   ffffffff817f863c:       movl   $0x110002,0x40(%rsp)
> >>     0.00 :   ffffffff817f8644:       lea    0x48(%rsp),%rdi
> >>    18.76 :   ffffffff817f8649:       rep stos %rax,%es:(%rdi)
> >>     1.12 :   ffffffff817f864c:       mov    0xc(%rsp),%eax
> >>     0.00 :   ffffffff817f8650:       mov    %ebp,0x48(%rsp)
> >>     0.00 :   ffffffff817f8654:       mov    %eax,0x44(%rsp)
> >>     0.00 :   ffffffff817f8658:       movzwl 0x10(%rsp),%eax
> >>     1.21 :   ffffffff817f865d:       mov    %ax,0x60(%rsp)
> >>     0.00 :   ffffffff817f8662:       movzwl 0x20(%rsp),%eax
> >>     0.00 :   ffffffff817f8667:       mov    %ax,0x62(%rsp)
> >>          :                      .sport          = sport,
> >>          :                      .dport          = dport,
> >>          :                      };
> >
> > Such heavy hit to zero init 56-byte structure is surprising.
> > There are two 4-byte holes in this struct. You can try to pack it and
> > make sure that 'rep stoq' is used instead of 'rep stos' (8 byte at a time vs 4).
> 
> Thanks for the tip. I'll give it a try.

You probably don't want to use 'rep stos' in any of its forms.
The instruction 'setup' time is horrid on most cpu variants.
For a 48 byte structure six writes of a zero register will be faster.

If gcc is generating the 'rep stos' then the compiler source code for that
pessimisation needs deleting...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks
  2020-08-20 10:29     ` Jakub Sitnicki
  2020-08-20 12:20       ` David Laight
@ 2020-08-20 22:18       ` Alexei Starovoitov
  2020-08-21 10:22         ` Jakub Sitnicki
  1 sibling, 1 reply; 32+ messages in thread
From: Alexei Starovoitov @ 2020-08-20 22:18 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Network Development, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Lorenz Bauer, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

On Thu, Aug 20, 2020 at 3:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> On Tue, Aug 18, 2020 at 08:19 PM CEST, Alexei Starovoitov wrote:
> > On Tue, Aug 18, 2020 at 8:49 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>          :                      rcu_read_lock();
> >>          :                      run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
> >>     0.01 :   ffffffff817f8624:       mov    0xd68(%r12),%rsi
> >>          :                      if (run_array) {
> >>     0.00 :   ffffffff817f862c:       test   %rsi,%rsi
> >>     0.00 :   ffffffff817f862f:       je     ffffffff817f87a9 <__udp4_lib_lookup+0x2c9>
> >>          :                      struct bpf_sk_lookup_kern ctx = {
> >>     1.05 :   ffffffff817f8635:       xor    %eax,%eax
> >>     0.00 :   ffffffff817f8637:       mov    $0x6,%ecx
> >>     0.01 :   ffffffff817f863c:       movl   $0x110002,0x40(%rsp)
> >>     0.00 :   ffffffff817f8644:       lea    0x48(%rsp),%rdi
> >>    18.76 :   ffffffff817f8649:       rep stos %rax,%es:(%rdi)
> >>     1.12 :   ffffffff817f864c:       mov    0xc(%rsp),%eax
> >>     0.00 :   ffffffff817f8650:       mov    %ebp,0x48(%rsp)
> >>     0.00 :   ffffffff817f8654:       mov    %eax,0x44(%rsp)
> >>     0.00 :   ffffffff817f8658:       movzwl 0x10(%rsp),%eax
> >>     1.21 :   ffffffff817f865d:       mov    %ax,0x60(%rsp)
> >>     0.00 :   ffffffff817f8662:       movzwl 0x20(%rsp),%eax
> >>     0.00 :   ffffffff817f8667:       mov    %ax,0x62(%rsp)
> >>          :                      .sport          = sport,
> >>          :                      .dport          = dport,
> >>          :                      };
> >
> > Such heavy hit to zero init 56-byte structure is surprising.
> > There are two 4-byte holes in this struct. You can try to pack it and
> > make sure that 'rep stoq' is used instead of 'rep stos' (8 byte at a time vs 4).
>
> Thanks for the tip. I'll give it a try.
>
> > Long term we should probably stop doing *_kern style of ctx passing
> > into bpf progs.
> > We have BTF, CO-RE and freplace now. This old style of memset *_kern and manual
> > ctx conversion has performance implications and annoying copy-paste of ctx
> > conversion routines.
> > For this particular case instead of introducing udp4_lookup_run_bpf()
> > and copying registers into stack we could have used freplace of
> > udp4_lib_lookup2.
> > More verifier work needed, of course.
> > My main point that existing approach "lets prep args for bpf prog to
> > run" that is used
> > pretty much in every bpf hook is no longer necessary.
>
> Andrii has also suggested leveraging BTF [0], but to expose the *_kern
> struct directly to BPF prog instead of emitting ctx access instructions.
>
> What I'm curious about is if we get rid of prepping args and ctx
> conversion, then how do we limit what memory BPF prog can access?
>
> Say, I'm passing a struct sock * to my BPF prog. If it's not a tracing
> prog, then I don't want it to have access to everything that is
> reachable from struct sock *. This is where this approach currently
> breaks down for me.

Why do you want to limit it?
Time after time we keep extending structs in uapi/bpf.h because new
use cases are coming up. Just let the prog access everything.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks
  2020-08-20 22:18       ` Alexei Starovoitov
@ 2020-08-21 10:22         ` Jakub Sitnicki
  0 siblings, 0 replies; 32+ messages in thread
From: Jakub Sitnicki @ 2020-08-21 10:22 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: bpf, Network Development, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Lorenz Bauer, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

On Fri, Aug 21, 2020 at 12:18 AM CEST, Alexei Starovoitov wrote:
> On Thu, Aug 20, 2020 at 3:29 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>> On Tue, Aug 18, 2020 at 08:19 PM CEST, Alexei Starovoitov wrote:

[...]

>> > Long term we should probably stop doing *_kern style of ctx passing
>> > into bpf progs.
>> > We have BTF, CO-RE and freplace now. This old style of memset *_kern and manual
>> > ctx conversion has performance implications and annoying copy-paste of ctx
>> > conversion routines.
>> > For this particular case instead of introducing udp4_lookup_run_bpf()
>> > and copying registers into stack we could have used freplace of
>> > udp4_lib_lookup2.
>> > More verifier work needed, of course.
>> > My main point that existing approach "lets prep args for bpf prog to
>> > run" that is used
>> > pretty much in every bpf hook is no longer necessary.
>>
>> Andrii has also suggested leveraging BTF [0], but to expose the *_kern
>> struct directly to BPF prog instead of emitting ctx access instructions.
>>
>> What I'm curious about is if we get rid of prepping args and ctx
>> conversion, then how do we limit what memory BPF prog can access?
>>
>> Say, I'm passing a struct sock * to my BPF prog. If it's not a tracing
>> prog, then I don't want it to have access to everything that is
>> reachable from struct sock *. This is where this approach currently
>> breaks down for me.
>
> Why do you want to limit it?
> Time after time we keep extending structs in uapi/bpf.h because new
> use cases are coming up. Just let the prog access everything.

I guess I wasn't thinking big enough :-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks
  2020-08-18 18:19   ` Alexei Starovoitov
  2020-08-20 10:29     ` Jakub Sitnicki
@ 2020-08-24  8:17     ` Paolo Abeni
  1 sibling, 0 replies; 32+ messages in thread
From: Paolo Abeni @ 2020-08-24  8:17 UTC (permalink / raw)
  To: Alexei Starovoitov, Jakub Sitnicki
  Cc: bpf, Network Development, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Lorenz Bauer, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

On Tue, 2020-08-18 at 11:19 -0700, Alexei Starovoitov wrote:
> On Tue, Aug 18, 2020 at 8:49 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >          :                      rcu_read_lock();
> >          :                      run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
> >     0.01 :   ffffffff817f8624:       mov    0xd68(%r12),%rsi
> >          :                      if (run_array) {
> >     0.00 :   ffffffff817f862c:       test   %rsi,%rsi
> >     0.00 :   ffffffff817f862f:       je     ffffffff817f87a9 <__udp4_lib_lookup+0x2c9>
> >          :                      struct bpf_sk_lookup_kern ctx = {
> >     1.05 :   ffffffff817f8635:       xor    %eax,%eax
> >     0.00 :   ffffffff817f8637:       mov    $0x6,%ecx
> >     0.01 :   ffffffff817f863c:       movl   $0x110002,0x40(%rsp)
> >     0.00 :   ffffffff817f8644:       lea    0x48(%rsp),%rdi
> >    18.76 :   ffffffff817f8649:       rep stos %rax,%es:(%rdi)
> >     1.12 :   ffffffff817f864c:       mov    0xc(%rsp),%eax
> >     0.00 :   ffffffff817f8650:       mov    %ebp,0x48(%rsp)
> >     0.00 :   ffffffff817f8654:       mov    %eax,0x44(%rsp)
> >     0.00 :   ffffffff817f8658:       movzwl 0x10(%rsp),%eax
> >     1.21 :   ffffffff817f865d:       mov    %ax,0x60(%rsp)
> >     0.00 :   ffffffff817f8662:       movzwl 0x20(%rsp),%eax
> >     0.00 :   ffffffff817f8667:       mov    %ax,0x62(%rsp)
> >          :                      .sport          = sport,
> >          :                      .dport          = dport,
> >          :                      };
> 
> Such heavy hit to zero init 56-byte structure is surprising.
> There are two 4-byte holes in this struct. You can try to pack it and
> make sure that 'rep stoq' is used instead of 'rep stos' (8 byte at a time vs 4).

I think here rep stos is copying 8 bytes at a time (%rax operand, %ecx
initalized with '6').

I think that you can avoid the costly instruction explicitly
initializing each field individually:

	struct bpf_sk_lookup_kern ctx;

	ctx.family = AF_INET;
	ctx.protocol = protocol;
	// ...

note, you likely want to explicitly zero the v6 addresses, too.

Cheers,

Paolo


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-08-24  8:17 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17 10:35 [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 01/15] bpf, netns: Handle multiple link attachments Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 02/15] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 03/15] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 04/15] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 05/15] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 06/15] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 07/15] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 08/15] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 09/15] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 10/15] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 11/15] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 12/15] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 13/15] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 14/15] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
2020-07-17 10:35 ` [PATCH bpf-next v5 15/15] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
2020-07-28 20:13   ` Andrii Nakryiko
2020-07-28 20:47     ` Daniel Borkmann
2020-07-29  8:55       ` Jakub Sitnicki
2020-07-31  0:04         ` Daniel Borkmann
2020-07-29  8:57     ` Jakub Sitnicki
2020-07-30 13:10       ` Jakub Sitnicki
2020-07-30 19:43         ` Andrii Nakryiko
2020-07-17 16:40 ` [PATCH bpf-next v5 00/15] Run a BPF program on socket lookup Lorenz Bauer
2020-07-18  3:25   ` Alexei Starovoitov
2020-08-18 15:49 ` BPF sk_lookup v5 - TCP SYN and UDP 0-len flood benchmarks Jakub Sitnicki
2020-08-18 18:19   ` Alexei Starovoitov
2020-08-20 10:29     ` Jakub Sitnicki
2020-08-20 12:20       ` David Laight
2020-08-20 22:18       ` Alexei Starovoitov
2020-08-21 10:22         ` Jakub Sitnicki
2020-08-24  8:17     ` Paolo Abeni

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).