All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup
@ 2020-07-13 17:46 Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 01/16] bpf, netns: Handle multiple link attachments Jakub Sitnicki
                   ` (16 more replies)
  0 siblings, 17 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Andrii Nakryiko, Lorenz Bauer,
	Marek Majkowski, Martin KaFai Lau, Yonghong Song

Dependencies
============

This patch series depends on:

1. 'bpf-multi-prog-prep' series in 'bpf' [0]
   (commit 951f38cf0835 ("Merge branch 'bpf-multi-prog-prep'"))
2. "bpf: Shift and mask loads narrower than context field size" patch
   https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@cloudflare.com/

Changelog
=========

v3 -> v4:
- Reduce BPF prog return codes to SK_DROP/SK_PASS (Lorenz)
- Default to drop on illegal return value from BPF prog (Lorenz)
- Extend bpf_sk_assign to accept NULL socket pointer.
- Switch to saner return values and add docs for new prog_array API (Andrii)
- Add support for narrow loads from BPF context fields (Yonghong)
- Fix broken build when IPv6 is compiled as a module (kernel test robot)
- Fix null/wild-ptr-deref on BPF context access
- Rebase to recent bpf-next (eef8a42d6ce0)
- Other minor changes called out in per-patch changelogs,
  see patches 1-2, 4, 6, 8, 10-12, 14, 16

v2 -> v3:
- Switch to link-based program attachment
- Support for multi-prog attachment
- Ability to skip reuseport socket selection
- Code on RX path is guarded by a static key
- struct in6_addr's are no longer copied into BPF prog context
- BPF prog context is initialized as late as possible
- Changes called out in patches 1-2, 4, 6, 8, 10-14, 16
- Patches dropped:
  01/17 flow_dissector: Extract attach/detach/query helpers
  03/17 inet: Store layer 4 protocol in inet_hashinfo
  08/17 udp: Store layer 4 protocol in udp_table

v1 -> v2:
- Changes called out in patches 2, 13-15, 17
- Rebase to recent bpf-next (b4563facdcae)

RFCv2 -> v1:

- Switch to fetching a socket from a map and selecting a socket with
  bpf_sk_assign, instead of having a dedicated helper that does both.
- Run reuseport logic on sockets selected by BPF sk_lookup.
- Allow BPF sk_lookup to fail the lookup with no match.
- Go back to having just 2 hash table lookups in UDP.

RFCv1 -> RFCv2:

- Make socket lookup redirection map-based. BPF program now uses a
  dedicated helper and a SOCKARRAY map to select the socket to redirect to.
  A consequence of this change is that bpf_inet_lookup context is now
  read-only.
- Look for connected UDP sockets before allowing redirection from BPF.
  This makes connected UDP socket work as expected in the presence of
  inet_lookup prog.
- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
  the only other per-netns BPF prog type.

Overview
========

This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
or BPF sk_lookup for short.

BPF sk_lookup program runs when transport layer is looking up a listening
socket for a new connection request (TCP), or when looking up an
unconnected socket for a packet (UDP).

This serves as a mechanism to overcome the limits of what bind() API allows
to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, fixed port to a single socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, any port to a single socket

     198.51.100.1, any port -> L7 proxy socket

In its context, program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection, and returns SK_PASS code. Transport layer
then uses the selected socket as a result of socket lookup.

Alternatively, program can also fail the lookup (SK_DROP), or let the
lookup continue as usual (SK_PASS without selecting a socket).

This lets the user match packets with listening (TCP) or receiving (UDP)
sockets freely at the last possible point on the receive path, where we
know that packets are destined for local delivery after undergoing
policing, filtering, and routing.

Program is attached to a network namespace, similar to BPF flow_dissector.
We add a new attach type, BPF_SK_LOOKUP, for this. Multiple programs can be
attached at the same time, in which case their return values are aggregated
according the rules outlined in patch #4 description.

Series structure
================

Patches are organized as so:

 1: enables multiple link-based prog attachments for bpf-netns
 2: introduces sk_lookup program type
 3-4: hook up the program to run on ipv4/tcp socket lookup
 5-6: hook up the program to run on ipv6/tcp socket lookup
 7-8: hook up the program to run on ipv4/udp socket lookup
 9-10: hook up the program to run on ipv6/udp socket lookup
 11-13: libbpf & bpftool support for sk_lookup
 14-16: verifier and selftests for sk_lookup

Patches are also available on GH:

  https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v4

Follow-up work
==============

I'll follow up with below items, which IMHO don't block the review:

- benchmark results for udp6 small packet flood scenario,
- user docs for new BPF prog type, Documentation/bpf/prog_sk_lookup.rst,
- timeout for accept() in tests after extending network_helper.[ch].

Thanks to the reviewers for their feedback to this patch series:

Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andriin@fb.com>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Cc: Marek Majkowski <marek@cloudflare.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Yonghong Song <yhs@fb.com>

-jkbs

[RFCv1] https://lore.kernel.org/bpf/20190618130050.8344-1-jakub@cloudflare.com/
[RFCv2] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
[v1] https://lore.kernel.org/bpf/20200511185218.1422406-18-jakub@cloudflare.com/
[v2] https://lore.kernel.org/bpf/20200506125514.1020829-1-jakub@cloudflare.com/
[0] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git/commit/?id=951f38cf08350884e72e0936adf147a8d764cc5d

Jakub Sitnicki (16):
  bpf, netns: Handle multiple link attachments
  bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  inet: Extract helper for selecting socket from reuseport group
  inet: Run SK_LOOKUP BPF program on socket lookup
  inet6: Extract helper for selecting socket from reuseport group
  inet6: Run SK_LOOKUP BPF program on socket lookup
  udp: Extract helper for selecting socket from reuseport group
  udp: Run SK_LOOKUP BPF program on socket lookup
  udp6: Extract helper for selecting socket from reuseport group
  udp6: Run SK_LOOKUP BPF program on socket lookup
  bpf: Sync linux/bpf.h to tools/
  libbpf: Add support for SK_LOOKUP program type
  tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type
  selftests/bpf: Add verifier tests for bpf_sk_lookup context access
  selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c
  selftests/bpf: Tests for BPF_SK_LOOKUP attach point

 include/linux/bpf-netns.h                     |    3 +
 include/linux/bpf.h                           |    4 +
 include/linux/bpf_types.h                     |    2 +
 include/linux/filter.h                        |  163 +++
 include/uapi/linux/bpf.h                      |   77 +
 kernel/bpf/core.c                             |   55 +
 kernel/bpf/net_namespace.c                    |  127 +-
 kernel/bpf/syscall.c                          |    9 +
 kernel/bpf/verifier.c                         |   10 +-
 net/core/filter.c                             |  182 +++
 net/ipv4/inet_hashtables.c                    |   60 +-
 net/ipv4/udp.c                                |   93 +-
 net/ipv6/inet6_hashtables.c                   |   66 +-
 net/ipv6/udp.c                                |   97 +-
 scripts/bpf_helpers_doc.py                    |    9 +-
 tools/bpf/bpftool/common.c                    |    1 +
 tools/bpf/bpftool/prog.c                      |    3 +-
 tools/include/uapi/linux/bpf.h                |   77 +
 tools/lib/bpf/libbpf.c                        |    3 +
 tools/lib/bpf/libbpf.h                        |    2 +
 tools/lib/bpf/libbpf.map                      |    2 +
 tools/lib/bpf/libbpf_probes.c                 |    3 +
 tools/testing/selftests/bpf/network_helpers.c |   58 +-
 tools/testing/selftests/bpf/network_helpers.h |    2 +
 .../bpf/prog_tests/reference_tracking.c       |    2 +-
 .../selftests/bpf/prog_tests/sk_lookup.c      | 1282 +++++++++++++++++
 .../selftests/bpf/progs/test_ref_track_kern.c |  181 +++
 .../selftests/bpf/progs/test_sk_lookup_kern.c |  688 +++++++--
 .../selftests/bpf/verifier/ctx_sk_lookup.c    |  471 ++++++
 29 files changed, 3521 insertions(+), 211 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_ref_track_kern.c
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c

-- 
2.25.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 01/16] bpf, netns: Handle multiple link attachments
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-15 21:30   ` Andrii Nakryiko
  2020-07-13 17:46 ` [PATCH bpf-next v4 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the
prog_array at given position when link gets attached/updated/released.

This let's us lift the limit of having just one link attached for the new
attach type introduced by subsequent patch.

No functional changes intended.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Document prog_array {delete_safe,update}_at() behavior. (Andrii)
    - Return -EINVAL/-ENOENT on failure in {delete_safe,update}_at(). (Andrii)
    - Return -ENOENT on index out of range in link_index(). (Andrii)
    
    v3:
    - New in v3 to support multi-prog attachments. (Alexei)

 include/linux/bpf.h        |  3 ++
 kernel/bpf/core.c          | 55 +++++++++++++++++++++++
 kernel/bpf/net_namespace.c | 90 ++++++++++++++++++++++++++++++++++----
 3 files changed, 139 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 0cd7f6884c5c..ad9c61ae8640 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -928,6 +928,9 @@ int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs,
 
 void bpf_prog_array_delete_safe(struct bpf_prog_array *progs,
 				struct bpf_prog *old_prog);
+int bpf_prog_array_delete_safe_at(struct bpf_prog_array *array, int index);
+int bpf_prog_array_update_at(struct bpf_prog_array *array, int index,
+			     struct bpf_prog *prog);
 int bpf_prog_array_copy_info(struct bpf_prog_array *array,
 			     u32 *prog_ids, u32 request_cnt,
 			     u32 *prog_cnt);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 9df4cc9a2907..7be02e555ab9 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1958,6 +1958,61 @@ void bpf_prog_array_delete_safe(struct bpf_prog_array *array,
 		}
 }
 
+/**
+ * bpf_prog_array_delete_safe_at() - Replaces the program at the given
+ *                                   index into the program array with
+ *                                   a dummy no-op program.
+ * @array: a bpf_prog_array
+ * @index: the index of the program to replace
+ *
+ * Skips over dummy programs, by not counting them, when calculating
+ * the the position of the program to replace.
+ *
+ * Return:
+ * * 0		- Success
+ * * -EINVAL	- Invalid index value. Must be a non-negative integer.
+ * * -ENOENT	- Index out of range
+ */
+int bpf_prog_array_delete_safe_at(struct bpf_prog_array *array, int index)
+{
+	return bpf_prog_array_update_at(array, index, &dummy_bpf_prog.prog);
+}
+
+/**
+ * bpf_prog_array_update_at() - Updates the program at the given index
+ *                              into the program array.
+ * @array: a bpf_prog_array
+ * @index: the index of the program to update
+ * @prog: the program to insert into the array
+ *
+ * Skips over dummy programs, by not counting them, when calculating
+ * the position of the program to update.
+ *
+ * Return:
+ * * 0		- Success
+ * * -EINVAL	- Invalid index value. Must be a non-negative integer.
+ * * -ENOENT	- Index out of range
+ */
+int bpf_prog_array_update_at(struct bpf_prog_array *array, int index,
+			     struct bpf_prog *prog)
+{
+	struct bpf_prog_array_item *item;
+
+	if (unlikely(index < 0))
+		return -EINVAL;
+
+	for (item = array->items; item->prog; item++) {
+		if (item->prog == &dummy_bpf_prog.prog)
+			continue;
+		if (!index) {
+			WRITE_ONCE(item->prog, prog);
+			return 0;
+		}
+		index--;
+	}
+	return -ENOENT;
+}
+
 int bpf_prog_array_copy(struct bpf_prog_array *old_array,
 			struct bpf_prog *exclude_prog,
 			struct bpf_prog *include_prog,
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 247543380fa6..988c2766ec97 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -36,12 +36,50 @@ static void netns_bpf_run_array_detach(struct net *net,
 	bpf_prog_array_free(run_array);
 }
 
+static int link_index(struct net *net, enum netns_bpf_attach_type type,
+		      struct bpf_netns_link *link)
+{
+	struct bpf_netns_link *pos;
+	int i = 0;
+
+	list_for_each_entry(pos, &net->bpf.links[type], node) {
+		if (pos == link)
+			return i;
+		i++;
+	}
+	return -ENOENT;
+}
+
+static int link_count(struct net *net, enum netns_bpf_attach_type type)
+{
+	struct list_head *pos;
+	int i = 0;
+
+	list_for_each(pos, &net->bpf.links[type])
+		i++;
+	return i;
+}
+
+static void fill_prog_array(struct net *net, enum netns_bpf_attach_type type,
+			    struct bpf_prog_array *prog_array)
+{
+	struct bpf_netns_link *pos;
+	unsigned int i = 0;
+
+	list_for_each_entry(pos, &net->bpf.links[type], node) {
+		prog_array->items[i].prog = pos->link.prog;
+		i++;
+	}
+}
+
 static void bpf_netns_link_release(struct bpf_link *link)
 {
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	enum netns_bpf_attach_type type = net_link->netns_type;
+	struct bpf_prog_array *old_array, *new_array;
 	struct net *net;
+	int cnt, idx;
 
 	mutex_lock(&netns_bpf_mutex);
 
@@ -53,9 +91,27 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	if (!net)
 		goto out_unlock;
 
-	netns_bpf_run_array_detach(net, type);
+	/* Remember link position in case of safe delete */
+	idx = link_index(net, type, net_link);
 	list_del(&net_link->node);
 
+	cnt = link_count(net, type);
+	if (!cnt) {
+		netns_bpf_run_array_detach(net, type);
+		goto out_unlock;
+	}
+
+	old_array = rcu_dereference_protected(net->bpf.run_array[type],
+					      lockdep_is_held(&netns_bpf_mutex));
+	new_array = bpf_prog_array_alloc(cnt, GFP_KERNEL);
+	if (!new_array) {
+		WARN_ON(bpf_prog_array_delete_safe_at(old_array, idx));
+		goto out_unlock;
+	}
+	fill_prog_array(net, type, new_array);
+	rcu_assign_pointer(net->bpf.run_array[type], new_array);
+	bpf_prog_array_free(old_array);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 }
@@ -77,7 +133,7 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 	enum netns_bpf_attach_type type = net_link->netns_type;
 	struct bpf_prog_array *run_array;
 	struct net *net;
-	int ret = 0;
+	int idx, ret;
 
 	if (old_prog && old_prog != link->prog)
 		return -EPERM;
@@ -95,7 +151,10 @@ static int bpf_netns_link_update_prog(struct bpf_link *link,
 
 	run_array = rcu_dereference_protected(net->bpf.run_array[type],
 					      lockdep_is_held(&netns_bpf_mutex));
-	WRITE_ONCE(run_array->items[0].prog, new_prog);
+	idx = link_index(net, type, net_link);
+	ret = bpf_prog_array_update_at(run_array, idx, new_prog);
+	if (ret)
+		goto out_unlock;
 
 	old_prog = xchg(&link->prog, new_prog);
 	bpf_prog_put(old_prog);
@@ -295,18 +354,28 @@ int netns_bpf_prog_detach(const union bpf_attr *attr)
 	return ret;
 }
 
+static int netns_bpf_max_progs(enum netns_bpf_attach_type type)
+{
+	switch (type) {
+	case NETNS_BPF_FLOW_DISSECTOR:
+		return 1;
+	default:
+		return 0;
+	}
+}
+
 static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 				 enum netns_bpf_attach_type type)
 {
 	struct bpf_netns_link *net_link =
 		container_of(link, struct bpf_netns_link, link);
 	struct bpf_prog_array *run_array;
-	int err;
+	int cnt, err;
 
 	mutex_lock(&netns_bpf_mutex);
 
-	/* Allow attaching only one prog or link for now */
-	if (!list_empty(&net->bpf.links[type])) {
+	cnt = link_count(net, type);
+	if (cnt >= netns_bpf_max_progs(type)) {
 		err = -E2BIG;
 		goto out_unlock;
 	}
@@ -327,16 +396,19 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	if (err)
 		goto out_unlock;
 
-	run_array = bpf_prog_array_alloc(1, GFP_KERNEL);
+	run_array = bpf_prog_array_alloc(cnt + 1, GFP_KERNEL);
 	if (!run_array) {
 		err = -ENOMEM;
 		goto out_unlock;
 	}
-	run_array->items[0].prog = link->prog;
-	rcu_assign_pointer(net->bpf.run_array[type], run_array);
 
 	list_add_tail(&net_link->node, &net->bpf.links[type]);
 
+	fill_prog_array(net, type, run_array);
+	run_array = rcu_replace_pointer(net->bpf.run_array[type], run_array,
+					lockdep_is_held(&netns_bpf_mutex));
+	bpf_prog_array_free(run_array);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 	return err;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 01/16] bpf, netns: Handle multiple link attachments Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-16  1:41   ` Andrii Nakryiko
  2020-07-13 17:46 ` [PATCH bpf-next v4 03/16] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski

Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
when looking up a listening socket for a new connection request for
connection oriented protocols, or when looking up an unconnected socket for
a packet for connection-less protocols.

When called, SK_LOOKUP BPF program can select a socket that will receive
the packet. This serves as a mechanism to overcome the limits of what
bind() API allows to express. Two use-cases driving this work are:

 (1) steer packets destined to an IP range, on fixed port to a socket

     192.0.2.0/24, port 80 -> NGINX socket

 (2) steer packets destined to an IP address, on any port to a socket

     198.51.100.1, any port -> L7 proxy socket

In its run-time context program receives information about the packet that
triggered the socket lookup. Namely IP version, L4 protocol identifier, and
address 4-tuple. Context can be further extended to include ingress
interface identifier.

To select a socket BPF program fetches it from a map holding socket
references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
helper to record the selection. Transport layer then uses the selected
socket as a result of socket lookup.

This patch only enables the user to attach an SK_LOOKUP program to a
network namespace. Subsequent patches hook it up to run on local delivery
path in ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Reintroduce narrow load support for most BPF context fields. (Yonghong)
    - Fix null-ptr-deref in BPF context access when IPv6 address not set.
    - Unpack v4/v6 IP address union in bpf_sk_lookup context type.
    - Add verifier support for ARG_PTR_TO_SOCKET_OR_NULL.
    - Allow resetting socket selection with bpf_sk_assign(ctx, NULL).
    - Document that bpf_sk_assign accepts a NULL socket.
    
    v3:
    - Allow bpf_sk_assign helper to replace previously selected socket only
      when BPF_SK_LOOKUP_F_REPLACE flag is set, as a precaution for multiple
      programs running in series to accidentally override each other's verdict.
    - Let BPF program decide that load-balancing within a reuseport socket group
      should be skipped for the socket selected with bpf_sk_assign() by passing
      BPF_SK_LOOKUP_F_NO_REUSEPORT flag. (Martin)
    - Extend struct bpf_sk_lookup program context with an 'sk' field containing
      the selected socket with an intention for multiple attached program
      running in series to see each other's choices. However, currently the
      verifier doesn't allow checking if pointer is set.
    - Use bpf-netns infra for link-based multi-program attachment. (Alexei)
    - Get rid of macros in convert_ctx_access to make it easier to read.
    - Disallow 1-,2-byte access to context fields containing IP addresses.
    
    v2:
    - Make bpf_sk_assign reject sockets that don't use RCU freeing.
      Update bpf_sk_assign docs accordingly. (Martin)
    - Change bpf_sk_assign proto to take PTR_TO_SOCKET as argument. (Martin)
    - Fix broken build when CONFIG_INET is not selected. (Martin)
    - Rename bpf_sk_lookup{} src_/dst_* fields remote_/local_*. (Martin)
    - Enforce BPF_SK_LOOKUP attach point on load & attach. (Martin)

 include/linux/bpf-netns.h  |   3 +
 include/linux/bpf.h        |   1 +
 include/linux/bpf_types.h  |   2 +
 include/linux/filter.h     |  17 ++++
 include/uapi/linux/bpf.h   |  77 ++++++++++++++++
 kernel/bpf/net_namespace.c |   5 ++
 kernel/bpf/syscall.c       |   9 ++
 kernel/bpf/verifier.c      |  10 ++-
 net/core/filter.c          | 179 +++++++++++++++++++++++++++++++++++++
 scripts/bpf_helpers_doc.py |   9 +-
 10 files changed, 308 insertions(+), 4 deletions(-)

diff --git a/include/linux/bpf-netns.h b/include/linux/bpf-netns.h
index 4052d649f36d..cb1d849c5d4f 100644
--- a/include/linux/bpf-netns.h
+++ b/include/linux/bpf-netns.h
@@ -8,6 +8,7 @@
 enum netns_bpf_attach_type {
 	NETNS_BPF_INVALID = -1,
 	NETNS_BPF_FLOW_DISSECTOR = 0,
+	NETNS_BPF_SK_LOOKUP,
 	MAX_NETNS_BPF_ATTACH_TYPE
 };
 
@@ -17,6 +18,8 @@ to_netns_bpf_attach_type(enum bpf_attach_type attach_type)
 	switch (attach_type) {
 	case BPF_FLOW_DISSECTOR:
 		return NETNS_BPF_FLOW_DISSECTOR;
+	case BPF_SK_LOOKUP:
+		return NETNS_BPF_SK_LOOKUP;
 	default:
 		return NETNS_BPF_INVALID;
 	}
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index ad9c61ae8640..f092d13bdd08 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -249,6 +249,7 @@ enum bpf_arg_type {
 	ARG_PTR_TO_INT,		/* pointer to int */
 	ARG_PTR_TO_LONG,	/* pointer to long */
 	ARG_PTR_TO_SOCKET,	/* pointer to bpf_sock (fullsock) */
+	ARG_PTR_TO_SOCKET_OR_NULL,	/* pointer to bpf_sock (fullsock) or NULL */
 	ARG_PTR_TO_BTF_ID,	/* pointer to in-kernel struct */
 	ARG_PTR_TO_ALLOC_MEM,	/* pointer to dynamically allocated memory */
 	ARG_PTR_TO_ALLOC_MEM_OR_NULL,	/* pointer to dynamically allocated memory or NULL */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index a18ae82a298a..a52a5688418e 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -64,6 +64,8 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2,
 #ifdef CONFIG_INET
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport,
 	      struct sk_reuseport_md, struct sk_reuseport_kern)
+BPF_PROG_TYPE(BPF_PROG_TYPE_SK_LOOKUP, sk_lookup,
+	      struct bpf_sk_lookup, struct bpf_sk_lookup_kern)
 #endif
 #if defined(CONFIG_BPF_JIT)
 BPF_PROG_TYPE(BPF_PROG_TYPE_STRUCT_OPS, bpf_struct_ops,
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 259377723603..380746f47fa1 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1278,4 +1278,21 @@ struct bpf_sockopt_kern {
 	s32		retval;
 };
 
+struct bpf_sk_lookup_kern {
+	u16		family;
+	u16		protocol;
+	struct {
+		__be32 saddr;
+		__be32 daddr;
+	} v4;
+	struct {
+		const struct in6_addr *saddr;
+		const struct in6_addr *daddr;
+	} v6;
+	__be16		sport;
+	u16		dport;
+	struct sock	*selected_sk;
+	bool		no_reuseport;
+};
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 548a749aebb3..e2ffeb150d0f 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -189,6 +189,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_STRUCT_OPS,
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
+	BPF_PROG_TYPE_SK_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -227,6 +228,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_INET6_GETSOCKNAME,
 	BPF_XDP_DEVMAP,
 	BPF_CGROUP_INET_SOCK_RELEASE,
+	BPF_SK_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3068,6 +3070,10 @@ union bpf_attr {
  *
  * long bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
  *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
+ *		**BPF_PROG_TYPE_SCHED_ACT** programs.
+ *
  *		Assign the *sk* to the *skb*. When combined with appropriate
  *		routing configuration to receive the packet towards the socket,
  *		will cause *skb* to be delivered to the specified socket.
@@ -3093,6 +3099,56 @@ union bpf_attr {
  *		**-ESOCKTNOSUPPORT** if the socket type is not supported
  *		(reuseport).
  *
+ * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
+ *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
+ *
+ *		Select the *sk* as a result of a socket lookup.
+ *
+ *		For the operation to succeed passed socket must be compatible
+ *		with the packet description provided by the *ctx* object.
+ *
+ *		L4 protocol (**IPPROTO_TCP** or **IPPROTO_UDP**) must
+ *		be an exact match. While IP family (**AF_INET** or
+ *		**AF_INET6**) must be compatible, that is IPv6 sockets
+ *		that are not v6-only can be selected for IPv4 packets.
+ *
+ *		Only TCP listeners and UDP unconnected sockets can be
+ *		selected. *sk* can also be NULL to reset any previous
+ *		selection.
+ *
+ *		*flags* argument can combination of following values:
+ *
+ *		* **BPF_SK_LOOKUP_F_REPLACE** to override the previous
+ *		  socket selection, potentially done by a BPF program
+ *		  that ran before us.
+ *
+ *		* **BPF_SK_LOOKUP_F_NO_REUSEPORT** to skip
+ *		  load-balancing within reuseport group for the socket
+ *		  being selected.
+ *
+ *		On success *ctx->sk* will point to the selected socket.
+ *
+ *	Return
+ *		0 on success, or a negative errno in case of failure.
+ *
+ *		* **-EAFNOSUPPORT** if socket family (*sk->family*) is
+ *		  not compatible with packet family (*ctx->family*).
+ *
+ *		* **-EEXIST** if socket has been already selected,
+ *		  potentially by another program, and
+ *		  **BPF_SK_LOOKUP_F_REPLACE** flag was not specified.
+ *
+ *		* **-EINVAL** if unsupported flags were specified.
+ *
+ *		* **-EPROTOTYPE** if socket L4 protocol
+ *		  (*sk->protocol*) doesn't match packet protocol
+ *		  (*ctx->protocol*).
+ *
+ *		* **-ESOCKTNOSUPPORT** if socket is not in allowed
+ *		  state (TCP listening or UDP unconnected).
+ *
  * u64 bpf_ktime_get_boot_ns(void)
  * 	Description
  * 		Return the time elapsed since system boot, in nanoseconds.
@@ -3605,6 +3661,12 @@ enum {
 	BPF_RINGBUF_HDR_SZ		= 8,
 };
 
+/* BPF_FUNC_sk_assign flags in bpf_sk_lookup context. */
+enum {
+	BPF_SK_LOOKUP_F_REPLACE		= (1ULL << 0),
+	BPF_SK_LOOKUP_F_NO_REUSEPORT	= (1ULL << 1),
+};
+
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
 	BPF_ADJ_ROOM_NET,
@@ -4334,4 +4396,19 @@ struct bpf_pidns_info {
 	__u32 pid;
 	__u32 tgid;
 };
+
+/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
+struct bpf_sk_lookup {
+	__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
+
+	__u32 family;		/* Protocol family (AF_INET, AF_INET6) */
+	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
+	__u32 remote_ip4;	/* Network byte order */
+	__u32 remote_ip6[4];	/* Network byte order */
+	__u32 remote_port;	/* Network byte order */
+	__u32 local_ip4;	/* Network byte order */
+	__u32 local_ip6[4];	/* Network byte order */
+	__u32 local_port;	/* Host byte order */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 988c2766ec97..596c30b963f3 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -359,6 +359,8 @@ static int netns_bpf_max_progs(enum netns_bpf_attach_type type)
 	switch (type) {
 	case NETNS_BPF_FLOW_DISSECTOR:
 		return 1;
+	case NETNS_BPF_SK_LOOKUP:
+		return 64;
 	default:
 		return 0;
 	}
@@ -389,6 +391,9 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 	case NETNS_BPF_FLOW_DISSECTOR:
 		err = flow_dissector_bpf_prog_attach_check(net, link->prog);
 		break;
+	case NETNS_BPF_SK_LOOKUP:
+		err = 0; /* nothing to check */
+		break;
 	default:
 		err = -EINVAL;
 		break;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 156f51ffada2..945975a14582 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2022,6 +2022,10 @@ bpf_prog_load_check_attach(enum bpf_prog_type prog_type,
 		default:
 			return -EINVAL;
 		}
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		if (expected_attach_type == BPF_SK_LOOKUP)
+			return 0;
+		return -EINVAL;
 	case BPF_PROG_TYPE_EXT:
 		if (expected_attach_type)
 			return -EINVAL;
@@ -2756,6 +2760,7 @@ static int bpf_prog_attach_check_attach_type(const struct bpf_prog *prog,
 	case BPF_PROG_TYPE_CGROUP_SOCK:
 	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_SK_LOOKUP:
 		return attach_type == prog->expected_attach_type ? 0 : -EINVAL;
 	case BPF_PROG_TYPE_CGROUP_SKB:
 		if (!capable(CAP_NET_ADMIN))
@@ -2817,6 +2822,8 @@ attach_type_to_prog_type(enum bpf_attach_type attach_type)
 		return BPF_PROG_TYPE_CGROUP_SOCKOPT;
 	case BPF_TRACE_ITER:
 		return BPF_PROG_TYPE_TRACING;
+	case BPF_SK_LOOKUP:
+		return BPF_PROG_TYPE_SK_LOOKUP;
 	default:
 		return BPF_PROG_TYPE_UNSPEC;
 	}
@@ -2955,6 +2962,7 @@ static int bpf_prog_query(const union bpf_attr *attr,
 	case BPF_LIRC_MODE2:
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
+	case BPF_SK_LOOKUP:
 		return netns_bpf_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
@@ -3888,6 +3896,7 @@ static int link_create(union bpf_attr *attr)
 		ret = tracing_bpf_link_attach(attr, prog);
 		break;
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
+	case BPF_PROG_TYPE_SK_LOOKUP:
 		ret = netns_bpf_link_create(attr, prog);
 		break;
 	default:
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 2196523c9716..cc6cc92c19e2 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3879,10 +3879,14 @@ static int check_func_arg(struct bpf_verifier_env *env, u32 arg,
 			}
 			meta->ref_obj_id = reg->ref_obj_id;
 		}
-	} else if (arg_type == ARG_PTR_TO_SOCKET) {
+	} else if (arg_type == ARG_PTR_TO_SOCKET ||
+		   arg_type == ARG_PTR_TO_SOCKET_OR_NULL) {
 		expected_type = PTR_TO_SOCKET;
-		if (type != expected_type)
-			goto err_type;
+		if (!(register_is_null(reg) &&
+		      arg_type == ARG_PTR_TO_SOCKET_OR_NULL)) {
+			if (type != expected_type)
+				goto err_type;
+		}
 	} else if (arg_type == ARG_PTR_TO_BTF_ID) {
 		expected_type = PTR_TO_BTF_ID;
 		if (type != expected_type)
diff --git a/net/core/filter.c b/net/core/filter.c
index ddcc0d6209e1..81c462881133 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9220,6 +9220,185 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
+
+BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
+	   struct sock *, sk, u64, flags)
+{
+	if (unlikely(flags & ~(BPF_SK_LOOKUP_F_REPLACE |
+			       BPF_SK_LOOKUP_F_NO_REUSEPORT)))
+		return -EINVAL;
+	if (unlikely(sk && sk_is_refcounted(sk)))
+		return -ESOCKTNOSUPPORT; /* reject non-RCU freed sockets */
+	if (unlikely(sk && sk->sk_state == TCP_ESTABLISHED))
+		return -ESOCKTNOSUPPORT; /* reject connected sockets */
+
+	/* Check if socket is suitable for packet L3/L4 protocol */
+	if (sk && sk->sk_protocol != ctx->protocol)
+		return -EPROTOTYPE;
+	if (sk && sk->sk_family != ctx->family &&
+	    (sk->sk_family == AF_INET || ipv6_only_sock(sk)))
+		return -EAFNOSUPPORT;
+
+	if (ctx->selected_sk && !(flags & BPF_SK_LOOKUP_F_REPLACE))
+		return -EEXIST;
+
+	/* Select socket as lookup result */
+	ctx->selected_sk = sk;
+	ctx->no_reuseport = flags & BPF_SK_LOOKUP_F_NO_REUSEPORT;
+	return 0;
+}
+
+static const struct bpf_func_proto bpf_sk_lookup_assign_proto = {
+	.func		= bpf_sk_lookup_assign,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_PTR_TO_SOCKET_OR_NULL,
+	.arg3_type	= ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *
+sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_sk_assign:
+		return &bpf_sk_lookup_assign_proto;
+	case BPF_FUNC_sk_release:
+		return &bpf_sk_release_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+static bool sk_lookup_is_valid_access(int off, int size,
+				      enum bpf_access_type type,
+				      const struct bpf_prog *prog,
+				      struct bpf_insn_access_aux *info)
+{
+	if (off < 0 || off >= sizeof(struct bpf_sk_lookup))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (type != BPF_READ)
+		return false;
+
+	switch (off) {
+	case offsetof(struct bpf_sk_lookup, sk):
+		info->reg_type = PTR_TO_SOCKET_OR_NULL;
+		return size == sizeof(__u64);
+
+	case bpf_ctx_range(struct bpf_sk_lookup, family):
+	case bpf_ctx_range(struct bpf_sk_lookup, protocol):
+	case bpf_ctx_range(struct bpf_sk_lookup, remote_ip4):
+	case bpf_ctx_range(struct bpf_sk_lookup, local_ip4):
+	case bpf_ctx_range_till(struct bpf_sk_lookup, remote_ip6[0], remote_ip6[3]):
+	case bpf_ctx_range_till(struct bpf_sk_lookup, local_ip6[0], local_ip6[3]):
+	case bpf_ctx_range(struct bpf_sk_lookup, remote_port):
+	case bpf_ctx_range(struct bpf_sk_lookup, local_port):
+		bpf_ctx_record_field_size(info, sizeof(__u32));
+		return bpf_ctx_narrow_access_ok(off, size, sizeof(__u32));
+
+	default:
+		return false;
+	}
+}
+
+static u32 sk_lookup_convert_ctx_access(enum bpf_access_type type,
+					const struct bpf_insn *si,
+					struct bpf_insn *insn_buf,
+					struct bpf_prog *prog,
+					u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+#if IS_ENABLED(CONFIG_IPV6)
+	int off;
+#endif
+
+	switch (si->off) {
+	case offsetof(struct bpf_sk_lookup, sk):
+		*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sk_lookup_kern, selected_sk));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, family):
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     family, 2, target_size));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, protocol):
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     protocol, 2, target_size));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, remote_ip4):
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     v4.saddr, 4, target_size));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, local_ip4):
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     v4.daddr, 4, target_size));
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				remote_ip6[0], remote_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		off = si->off;
+		off -= offsetof(struct bpf_sk_lookup, remote_ip6[0]);
+		off += bpf_target_off(struct in6_addr, s6_addr32[0], 4, target_size);
+		*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sk_lookup_kern, v6.saddr));
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1);
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case bpf_ctx_range_till(struct bpf_sk_lookup,
+				local_ip6[0], local_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		off = si->off;
+		off -= offsetof(struct bpf_sk_lookup, local_ip6[0]);
+		off += bpf_target_off(struct in6_addr, s6_addr32[0], 4, target_size);
+		*insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
+				      offsetof(struct bpf_sk_lookup_kern, v6.daddr));
+		*insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1);
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
+#else
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct bpf_sk_lookup, remote_port):
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     sport, 2, target_size));
+		break;
+
+	case offsetof(struct bpf_sk_lookup, local_port):
+		*insn++ = BPF_LDX_MEM(BPF_H, si->dst_reg, si->src_reg,
+				      bpf_target_off(struct bpf_sk_lookup_kern,
+						     dport, 2, target_size));
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+const struct bpf_prog_ops sk_lookup_prog_ops = {
+};
+
+const struct bpf_verifier_ops sk_lookup_verifier_ops = {
+	.get_func_proto		= sk_lookup_func_proto,
+	.is_valid_access	= sk_lookup_is_valid_access,
+	.convert_ctx_access	= sk_lookup_convert_ctx_access,
+};
+
 #endif /* CONFIG_INET */
 
 DEFINE_BPF_DISPATCHER(xdp)
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 6843376733df..5bfa448b4704 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -404,6 +404,7 @@ class PrinterHelpers(Printer):
 
     type_fwds = [
             'struct bpf_fib_lookup',
+            'struct bpf_sk_lookup',
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
@@ -450,6 +451,7 @@ class PrinterHelpers(Printer):
             'struct bpf_perf_event_data',
             'struct bpf_perf_event_value',
             'struct bpf_pidns_info',
+            'struct bpf_sk_lookup',
             'struct bpf_sock',
             'struct bpf_sock_addr',
             'struct bpf_sock_ops',
@@ -487,6 +489,11 @@ class PrinterHelpers(Printer):
             'struct sk_msg_buff': 'struct sk_msg_md',
             'struct xdp_buff': 'struct xdp_md',
     }
+    # Helpers overloaded for different context types.
+    overloaded_helpers = [
+        'bpf_get_socket_cookie',
+        'bpf_sk_assign',
+    ]
 
     def print_header(self):
         header = '''\
@@ -543,7 +550,7 @@ class PrinterHelpers(Printer):
         for i, a in enumerate(proto['args']):
             t = a['type']
             n = a['name']
-            if proto['name'] == 'bpf_get_socket_cookie' and i == 0:
+            if proto['name'] in self.overloaded_helpers and i == 0:
                     t = 'void'
                     n = 'ctx'
             one_arg = '{}{}'.format(comma, self.map_type(t))
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 03/16] inet: Extract helper for selecting socket from reuseport group
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 01/16] bpf, netns: Handle multiple link attachments Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-16  1:44   ` Andrii Nakryiko
  2020-07-13 17:46 ` [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
                   ` (13 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Prepare for calling into reuseport from __inet_lookup_listener as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/inet_hashtables.c | 29 ++++++++++++++++++++---------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 2bbaaf0c7176..ab64834837c8 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -246,6 +246,21 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb, int doff,
+					    __be32 saddr, __be16 sport,
+					    __be32 daddr, unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 phash;
+
+	if (sk->sk_reuseport) {
+		phash = inet_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
+	}
+	return reuse_sk;
+}
+
 /*
  * Here are some nice properties to exploit here. The BSD API
  * does not allow a listening sock to specify the remote port nor the
@@ -265,21 +280,17 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	struct inet_connection_sock *icsk;
 	struct sock *sk, *result = NULL;
 	int score, hiscore = 0;
-	u32 phash = 0;
 
 	inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) {
 		sk = (struct sock *)icsk;
 		score = compute_score(sk, net, hnum, daddr,
 				      dif, sdif, exact_dif);
 		if (score > hiscore) {
-			if (sk->sk_reuseport) {
-				phash = inet_ehashfn(net, daddr, hnum,
-						     saddr, sport);
-				result = reuseport_select_sock(sk, phash,
-							       skb, doff);
-				if (result)
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb, doff,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			hiscore = score;
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (2 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 03/16] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-16  2:23   ` Andrii Nakryiko
  2020-07-13 17:46 ` [PATCH bpf-next v4 05/16] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski

Run a BPF program before looking up a listening socket on the receive path.
Program selects a listening socket to yield as result of socket lookup by
calling bpf_sk_assign() helper and returning SK_PASS code. Program can
revert its decision by assigning a NULL socket with bpf_sk_assign().

Alternatively, BPF program can also fail the lookup by returning with
SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
no socket has not been selected with bpf_sk_assign(). Other return values
are treated the same as SK_DROP.

This lets the user match packets with listening sockets freely at the last
possible point on the receive path, where we know that packets are destined
for local delivery after undergoing policing, filtering, and routing.

With BPF code selecting the socket, directing packets destined to an IP
range or to a port range to a single socket becomes possible.

In case multiple programs are attached, they are run in series in the order
in which they were attached. The end result is determined from return codes
of all the programs according to following rules:

 1. If any program returned SK_PASS and selected a valid socket, the socket
    is used as result of socket lookup.
 2. If more than one program returned SK_PASS and selected a socket,
    last selection takes effect.
 3. If any program returned SK_DROP or an invalid return code, and no
    program returned SK_PASS and selected a socket, socket lookup fails
    with -ECONNREFUSED.
 4. If all programs returned SK_PASS and none of them selected a socket,
    socket lookup continues to htable-based lookup.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Reduce BPF sk_lookup prog return codes to SK_PASS/SK_DROP. (Lorenz)
    - Default to drop & warn on illegal return value from BPF prog. (Lorenz)
    - Rename netns_bpf_attach_type_enable/disable to _need/unneed. (Lorenz)
    - Export bpf_sk_lookup_enabled symbol for CONFIG_IPV6=m (kernel test robot)
    - Invert return value from bpf_sk_lookup_run_v4 to true on skip reuseport.
    - Move dedicated prog_array runner close to its callers in filter.h.
    
    v3:
    - Use a static_key to minimize the hook overhead when not used. (Alexei)
    - Adapt for running an array of attached programs. (Alexei)
    - Adapt for optionally skipping reuseport selection. (Martin)

 include/linux/filter.h     | 102 +++++++++++++++++++++++++++++++++++++
 kernel/bpf/net_namespace.c |  32 +++++++++++-
 net/core/filter.c          |   3 ++
 net/ipv4/inet_hashtables.c |  31 +++++++++++
 4 files changed, 167 insertions(+), 1 deletion(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 380746f47fa1..b9ad0fdabca5 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1295,4 +1295,106 @@ struct bpf_sk_lookup_kern {
 	bool		no_reuseport;
 };
 
+extern struct static_key_false bpf_sk_lookup_enabled;
+
+/* Runners for BPF_SK_LOOKUP programs to invoke on socket lookup.
+ *
+ * Allowed return values for a BPF SK_LOOKUP program are SK_PASS and
+ * SK_DROP. Any other return value is treated as SK_DROP. Their
+ * meaning is as follows:
+ *
+ *  SK_PASS && ctx.selected_sk != NULL: use selected_sk as lookup result
+ *  SK_PASS && ctx.selected_sk == NULL: continue to htable-based socket lookup
+ *  SK_DROP                           : terminate lookup with -ECONNREFUSED
+ *
+ * This macro aggregates return values and selected sockets from
+ * multiple BPF programs according to following rules:
+ *
+ *  1. If any program returned SK_PASS and a non-NULL ctx.selected_sk,
+ *     macro result is SK_PASS and last ctx.selected_sk is used.
+ *  2. If any program returned non-SK_PASS return value,
+ *     macro result is the last non-SK_PASS return value.
+ *  3. Otherwise result is SK_PASS and ctx.selected_sk is NULL.
+ *
+ * Caller must ensure that the prog array is non-NULL, and that the
+ * array as well as the programs it contains remain valid.
+ */
+#define BPF_PROG_SK_LOOKUP_RUN_ARRAY(array, ctx, func)			\
+	({								\
+		struct bpf_sk_lookup_kern *_ctx = &(ctx);		\
+		struct bpf_prog_array_item *_item;			\
+		struct sock *_selected_sk;				\
+		struct bpf_prog *_prog;					\
+		u32 _ret, _last_ret;					\
+		bool _no_reuseport;					\
+									\
+		migrate_disable();					\
+		_last_ret = SK_PASS;					\
+		_selected_sk = NULL;					\
+		_no_reuseport = false;					\
+		_item = &(array)->items[0];				\
+		while ((_prog = READ_ONCE(_item->prog))) {		\
+			/* restore most recent selection */		\
+			_ctx->selected_sk = _selected_sk;		\
+			_ctx->no_reuseport = _no_reuseport;		\
+									\
+			_ret = func(_prog, _ctx);			\
+			if (_ret == SK_PASS) {				\
+				/* remember last non-NULL socket */	\
+				if (_ctx->selected_sk) {		\
+					_selected_sk = _ctx->selected_sk;	\
+					_no_reuseport = _ctx->no_reuseport;	\
+				}					\
+			} else {					\
+				/* remember last non-PASS ret code */	\
+				_last_ret = _ret;			\
+			}						\
+			_item++;					\
+		}							\
+		_ctx->selected_sk = _selected_sk;			\
+		_ctx->no_reuseport = _no_reuseport;			\
+		migrate_enable();					\
+		_ctx->selected_sk ? SK_PASS : _last_ret;		\
+	 })
+
+static inline bool bpf_sk_lookup_run_v4(struct net *net, int protocol,
+					const __be32 saddr, const __be16 sport,
+					const __be32 daddr, const u16 dport,
+					struct sock **psk)
+{
+	struct bpf_prog_array *run_array;
+	struct sock *selected_sk = NULL;
+	bool no_reuseport = false;
+
+	rcu_read_lock();
+	run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
+	if (run_array) {
+		struct bpf_sk_lookup_kern ctx = {
+			.family		= AF_INET,
+			.protocol	= protocol,
+			.v4.saddr	= saddr,
+			.v4.daddr	= daddr,
+			.sport		= sport,
+			.dport		= dport,
+		};
+		u32 act;
+
+		act = BPF_PROG_SK_LOOKUP_RUN_ARRAY(run_array, ctx, BPF_PROG_RUN);
+		if (act == SK_PASS) {
+			selected_sk = ctx.selected_sk;
+			no_reuseport = ctx.no_reuseport;
+			goto unlock;
+		}
+
+		selected_sk = ERR_PTR(-ECONNREFUSED);
+		WARN_ONCE(act != SK_DROP,
+			  "Illegal BPF SK_LOOKUP return value %u, expect packet loss!\n",
+			  act);
+	}
+unlock:
+	rcu_read_unlock();
+	*psk = selected_sk;
+	return no_reuseport;
+}
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
index 596c30b963f3..ee3599a51891 100644
--- a/kernel/bpf/net_namespace.c
+++ b/kernel/bpf/net_namespace.c
@@ -25,6 +25,28 @@ struct bpf_netns_link {
 /* Protects updates to netns_bpf */
 DEFINE_MUTEX(netns_bpf_mutex);
 
+static void netns_bpf_attach_type_unneed(enum netns_bpf_attach_type type)
+{
+	switch (type) {
+	case NETNS_BPF_SK_LOOKUP:
+		static_branch_dec(&bpf_sk_lookup_enabled);
+		break;
+	default:
+		break;
+	}
+}
+
+static void netns_bpf_attach_type_need(enum netns_bpf_attach_type type)
+{
+	switch (type) {
+	case NETNS_BPF_SK_LOOKUP:
+		static_branch_inc(&bpf_sk_lookup_enabled);
+		break;
+	default:
+		break;
+	}
+}
+
 /* Must be called with netns_bpf_mutex held. */
 static void netns_bpf_run_array_detach(struct net *net,
 				       enum netns_bpf_attach_type type)
@@ -91,6 +113,9 @@ static void bpf_netns_link_release(struct bpf_link *link)
 	if (!net)
 		goto out_unlock;
 
+	/* Mark attach point as unused */
+	netns_bpf_attach_type_unneed(type);
+
 	/* Remember link position in case of safe delete */
 	idx = link_index(net, type, net_link);
 	list_del(&net_link->node);
@@ -414,6 +439,9 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
 					lockdep_is_held(&netns_bpf_mutex));
 	bpf_prog_array_free(run_array);
 
+	/* Mark attach point as used */
+	netns_bpf_attach_type_need(type);
+
 out_unlock:
 	mutex_unlock(&netns_bpf_mutex);
 	return err;
@@ -489,8 +517,10 @@ static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
 	mutex_lock(&netns_bpf_mutex);
 	for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
 		netns_bpf_run_array_detach(net, type);
-		list_for_each_entry(net_link, &net->bpf.links[type], node)
+		list_for_each_entry(net_link, &net->bpf.links[type], node) {
 			net_link->net = NULL; /* auto-detach link */
+			netns_bpf_attach_type_unneed(type);
+		}
 		if (net->bpf.progs[type])
 			bpf_prog_put(net->bpf.progs[type]);
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index 81c462881133..3fcb9c8cec4c 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9221,6 +9221,9 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
 
+DEFINE_STATIC_KEY_FALSE(bpf_sk_lookup_enabled);
+EXPORT_SYMBOL(bpf_sk_lookup_enabled);
+
 BPF_CALL_3(bpf_sk_lookup_assign, struct bpf_sk_lookup_kern *, ctx,
 	   struct sock *, sk, u64, flags)
 {
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index ab64834837c8..4eb4cd8d20dd 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -299,6 +299,29 @@ static struct sock *inet_lhash2_lookup(struct net *net,
 	return result;
 }
 
+static inline struct sock *inet_lookup_run_bpf(struct net *net,
+					       struct inet_hashinfo *hashinfo,
+					       struct sk_buff *skb, int doff,
+					       __be32 saddr, __be16 sport,
+					       __be32 daddr, u16 hnum)
+{
+	struct sock *sk, *reuse_sk;
+	bool no_reuseport;
+
+	if (hashinfo != &tcp_hashinfo)
+		return NULL; /* only TCP is supported */
+
+	no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_TCP,
+					    saddr, sport, daddr, hnum, &sk);
+	if (no_reuseport || IS_ERR_OR_NULL(sk))
+		return sk;
+
+	reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
+	if (reuse_sk)
+		sk = reuse_sk;
+	return sk;
+}
+
 struct sock *__inet_lookup_listener(struct net *net,
 				    struct inet_hashinfo *hashinfo,
 				    struct sk_buff *skb, int doff,
@@ -310,6 +333,14 @@ struct sock *__inet_lookup_listener(struct net *net,
 	struct sock *result = NULL;
 	unsigned int hash2;
 
+	/* Lookup redirect from BPF */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
+		result = inet_lookup_run_bpf(net, hashinfo, skb, doff,
+					     saddr, sport, daddr, hnum);
+		if (result)
+			goto done;
+	}
+
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 05/16] inet6: Extract helper for selecting socket from reuseport group
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (3 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 06/16] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Prepare for calling into reuseport from inet6_lookup_listener as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/inet6_hashtables.c | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index fbe9d4295eac..03942eef8ab6 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -111,6 +111,23 @@ static inline int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb, int doff,
+					    const struct in6_addr *saddr,
+					    __be16 sport,
+					    const struct in6_addr *daddr,
+					    unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 phash;
+
+	if (sk->sk_reuseport) {
+		phash = inet6_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, phash, skb, doff);
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *inet6_lhash2_lookup(struct net *net,
 		struct inet_listen_hashbucket *ilb2,
@@ -123,21 +140,17 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 	struct inet_connection_sock *icsk;
 	struct sock *sk, *result = NULL;
 	int score, hiscore = 0;
-	u32 phash = 0;
 
 	inet_lhash2_for_each_icsk_rcu(icsk, &ilb2->head) {
 		sk = (struct sock *)icsk;
 		score = compute_score(sk, net, hnum, daddr, dif, sdif,
 				      exact_dif);
 		if (score > hiscore) {
-			if (sk->sk_reuseport) {
-				phash = inet6_ehashfn(net, daddr, hnum,
-						      saddr, sport);
-				result = reuseport_select_sock(sk, phash,
-							       skb, doff);
-				if (result)
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb, doff,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			hiscore = score;
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 06/16] inet6: Run SK_LOOKUP BPF program on socket lookup
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (4 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 05/16] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 07/16] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski

Following ipv4 stack changes, run a BPF program attached to netns before
looking up a listening socket. Program can return a listening socket to use
as result of socket lookup, fail the lookup, or take no action.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Adapt to changes in BPF prog return codes.
    - Invert return value from bpf_sk_lookup_run_v6 to true on skip reuseport.
    
    v3:
    - Use a static_key to minimize the hook overhead when not used. (Alexei)
    - Don't copy struct in6_addr when populating BPF prog context. (Martin)
    - Adapt for running an array of attached programs. (Alexei)
    - Adapt for optionally skipping reuseport selection. (Martin)

 include/linux/filter.h      | 44 +++++++++++++++++++++++++++++++++++++
 net/ipv6/inet6_hashtables.c | 35 +++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index b9ad0fdabca5..900b71af5580 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1397,4 +1397,48 @@ static inline bool bpf_sk_lookup_run_v4(struct net *net, int protocol,
 	return no_reuseport;
 }
 
+#if IS_ENABLED(CONFIG_IPV6)
+static inline bool bpf_sk_lookup_run_v6(struct net *net, int protocol,
+					const struct in6_addr *saddr,
+					const __be16 sport,
+					const struct in6_addr *daddr,
+					const u16 dport,
+					struct sock **psk)
+{
+	struct bpf_prog_array *run_array;
+	struct sock *selected_sk = NULL;
+	bool no_reuseport = false;
+
+	rcu_read_lock();
+	run_array = rcu_dereference(net->bpf.run_array[NETNS_BPF_SK_LOOKUP]);
+	if (run_array) {
+		struct bpf_sk_lookup_kern ctx = {
+			.family		= AF_INET6,
+			.protocol	= protocol,
+			.v6.saddr	= saddr,
+			.v6.daddr	= daddr,
+			.sport		= sport,
+			.dport		= dport,
+		};
+		u32 act;
+
+		act = BPF_PROG_SK_LOOKUP_RUN_ARRAY(run_array, ctx, BPF_PROG_RUN);
+		if (act == SK_PASS) {
+			selected_sk = ctx.selected_sk;
+			no_reuseport = ctx.no_reuseport;
+			goto unlock;
+		}
+
+		selected_sk = ERR_PTR(-ECONNREFUSED);
+		WARN_ONCE(act != SK_DROP,
+			  "Illegal BPF SK_LOOKUP return value %u, expect packet loss!\n",
+			  act);
+	}
+unlock:
+	rcu_read_unlock();
+	*psk = selected_sk;
+	return no_reuseport;
+}
+#endif /* IS_ENABLED(CONFIG_IPV6) */
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 03942eef8ab6..2d3add9e6116 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -21,6 +21,8 @@
 #include <net/ip.h>
 #include <net/sock_reuseport.h>
 
+extern struct inet_hashinfo tcp_hashinfo;
+
 u32 inet6_ehashfn(const struct net *net,
 		  const struct in6_addr *laddr, const u16 lport,
 		  const struct in6_addr *faddr, const __be16 fport)
@@ -159,6 +161,31 @@ static struct sock *inet6_lhash2_lookup(struct net *net,
 	return result;
 }
 
+static inline struct sock *inet6_lookup_run_bpf(struct net *net,
+						struct inet_hashinfo *hashinfo,
+						struct sk_buff *skb, int doff,
+						const struct in6_addr *saddr,
+						const __be16 sport,
+						const struct in6_addr *daddr,
+						const u16 hnum)
+{
+	struct sock *sk, *reuse_sk;
+	bool no_reuseport;
+
+	if (hashinfo != &tcp_hashinfo)
+		return NULL; /* only TCP is supported */
+
+	no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_TCP,
+					    saddr, sport, daddr, hnum, &sk);
+	if (no_reuseport || IS_ERR_OR_NULL(sk))
+		return sk;
+
+	reuse_sk = lookup_reuseport(net, sk, skb, doff, saddr, sport, daddr, hnum);
+	if (reuse_sk)
+		sk = reuse_sk;
+	return sk;
+}
+
 struct sock *inet6_lookup_listener(struct net *net,
 		struct inet_hashinfo *hashinfo,
 		struct sk_buff *skb, int doff,
@@ -170,6 +197,14 @@ struct sock *inet6_lookup_listener(struct net *net,
 	struct sock *result = NULL;
 	unsigned int hash2;
 
+	/* Lookup redirect from BPF */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
+		result = inet6_lookup_run_bpf(net, hashinfo, skb, doff,
+					      saddr, sport, daddr, hnum);
+		if (result)
+			goto done;
+	}
+
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 07/16] udp: Extract helper for selecting socket from reuseport group
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (5 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 06/16] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 08/16] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Prepare for calling into reuseport from __udp4_lib_lookup as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c | 34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 31530129f137..0d03e0277263 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -408,6 +408,25 @@ static u32 udp_ehashfn(const struct net *net, const __be32 laddr,
 			      udp_ehash_secret + net_hash_mix(net));
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb,
+					    __be32 saddr, __be16 sport,
+					    __be32 daddr, unsigned short hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 hash;
+
+	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
+		hash = udp_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, hash, skb,
+						 sizeof(struct udphdr));
+		/* Fall back to scoring if group has connections */
+		if (reuseport_has_conns(sk, false))
+			return NULL;
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 saddr, __be16 sport,
@@ -418,7 +437,6 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
-	u32 hash = 0;
 
 	result = NULL;
 	badness = 0;
@@ -426,15 +444,11 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport &&
-			    sk->sk_state != TCP_ESTABLISHED) {
-				hash = udp_ehashfn(net, daddr, hnum,
-						   saddr, sport);
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			badness = score;
 			result = sk;
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 08/16] udp: Run SK_LOOKUP BPF program on socket lookup
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (6 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 07/16] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 09/16] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski

Following INET/TCP socket lookup changes, modify UDP socket lookup to let
BPF program select a receiving socket before searching for a socket by
destination address and port as usual.

Lookup of connected sockets that match packet 4-tuple is unaffected by this
change. BPF program runs, and potentially overrides the lookup result, only
if a 4-tuple match was not found.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Adapt to change in bpf_sk_lookup_run_v4 return value semantics.
    
    v3:
    - Use a static_key to minimize the hook overhead when not used. (Alexei)
    - Adapt for running an array of attached programs. (Alexei)
    - Adapt for optionally skipping reuseport selection. (Martin)

 net/ipv4/udp.c | 59 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 50 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 0d03e0277263..e82db3ab49d3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -456,6 +456,29 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 	return result;
 }
 
+static inline struct sock *udp4_lookup_run_bpf(struct net *net,
+					       struct udp_table *udptable,
+					       struct sk_buff *skb,
+					       __be32 saddr, __be16 sport,
+					       __be32 daddr, u16 hnum)
+{
+	struct sock *sk, *reuse_sk;
+	bool no_reuseport;
+
+	if (udptable != &udp_table)
+		return NULL; /* only UDP is supported */
+
+	no_reuseport = bpf_sk_lookup_run_v4(net, IPPROTO_UDP,
+					    saddr, sport, daddr, hnum, &sk);
+	if (no_reuseport || IS_ERR_OR_NULL(sk))
+		return sk;
+
+	reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+	if (reuse_sk)
+		sk = reuse_sk;
+	return sk;
+}
+
 /* UDP is nearly always wildcards out the wazoo, it makes no sense to try
  * harder than this. -DaveM
  */
@@ -463,27 +486,45 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 		__be16 sport, __be32 daddr, __be16 dport, int dif,
 		int sdif, struct udp_table *udptable, struct sk_buff *skb)
 {
-	struct sock *result;
 	unsigned short hnum = ntohs(dport);
 	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
+	struct sock *result, *sk;
 
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected or non-wildcard socket */
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
 				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
-		slot2 = hash2 & udptable->mask;
-		hslot2 = &udptable->hash2[slot2];
-
-		result = udp4_lib_lookup2(net, saddr, sport,
-					  htonl(INADDR_ANY), hnum, dif, sdif,
-					  hslot2, skb);
+	if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
+		goto done;
+
+	/* Lookup redirect from BPF */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
+		sk = udp4_lookup_run_bpf(net, udptable, skb,
+					 saddr, sport, daddr, hnum);
+		if (sk) {
+			result = sk;
+			goto done;
+		}
 	}
+
+	/* Got non-wildcard socket or error on first lookup */
+	if (result)
+		goto done;
+
+	/* Lookup wildcard sockets */
+	hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp4_lib_lookup2(net, saddr, sport,
+				  htonl(INADDR_ANY), hnum, dif, sdif,
+				  hslot2, skb);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 09/16] udp6: Extract helper for selecting socket from reuseport group
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (7 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 08/16] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 10/16] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Prepare for calling into reuseport from __udp6_lib_lookup as well.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/udp.c | 37 ++++++++++++++++++++++++++-----------
 1 file changed, 26 insertions(+), 11 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 7d4151747340..65b843e7acde 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -141,6 +141,27 @@ static int compute_score(struct sock *sk, struct net *net,
 	return score;
 }
 
+static inline struct sock *lookup_reuseport(struct net *net, struct sock *sk,
+					    struct sk_buff *skb,
+					    const struct in6_addr *saddr,
+					    __be16 sport,
+					    const struct in6_addr *daddr,
+					    unsigned int hnum)
+{
+	struct sock *reuse_sk = NULL;
+	u32 hash;
+
+	if (sk->sk_reuseport && sk->sk_state != TCP_ESTABLISHED) {
+		hash = udp6_ehashfn(net, daddr, hnum, saddr, sport);
+		reuse_sk = reuseport_select_sock(sk, hash, skb,
+						 sizeof(struct udphdr));
+		/* Fall back to scoring if group has connections */
+		if (reuseport_has_conns(sk, false))
+			return NULL;
+	}
+	return reuse_sk;
+}
+
 /* called with rcu_read_lock() */
 static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
@@ -150,7 +171,6 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 {
 	struct sock *sk, *result;
 	int score, badness;
-	u32 hash = 0;
 
 	result = NULL;
 	badness = -1;
@@ -158,16 +178,11 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		score = compute_score(sk, net, saddr, sport,
 				      daddr, hnum, dif, sdif);
 		if (score > badness) {
-			if (sk->sk_reuseport &&
-			    sk->sk_state != TCP_ESTABLISHED) {
-				hash = udp6_ehashfn(net, daddr, hnum,
-						    saddr, sport);
-
-				result = reuseport_select_sock(sk, hash, skb,
-							sizeof(struct udphdr));
-				if (result && !reuseport_has_conns(sk, false))
-					return result;
-			}
+			result = lookup_reuseport(net, sk, skb,
+						  saddr, sport, daddr, hnum);
+			if (result)
+				return result;
+
 			result = sk;
 			badness = score;
 		}
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 10/16] udp6: Run SK_LOOKUP BPF program on socket lookup
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (8 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 09/16] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 11/16] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski, Marek Majkowski

Same as for udp4, let BPF program override the socket lookup result, by
selecting a receiving socket of its choice or failing the lookup, if no
connected UDP socket matched packet 4-tuple.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Adapt to change in bpf_sk_lookup_run_v6 return value semantics.
    
    v3:
    - Use a static_key to minimize the hook overhead when not used. (Alexei)
    - Adapt for running an array of attached programs. (Alexei)
    - Adapt for optionally skipping reuseport selection. (Martin)

 net/ipv6/udp.c | 60 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 51 insertions(+), 9 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 65b843e7acde..d46c62976b5b 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -190,6 +190,31 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 	return result;
 }
 
+static inline struct sock *udp6_lookup_run_bpf(struct net *net,
+					       struct udp_table *udptable,
+					       struct sk_buff *skb,
+					       const struct in6_addr *saddr,
+					       __be16 sport,
+					       const struct in6_addr *daddr,
+					       u16 hnum)
+{
+	struct sock *sk, *reuse_sk;
+	bool no_reuseport;
+
+	if (udptable != &udp_table)
+		return NULL; /* only UDP is supported */
+
+	no_reuseport = bpf_sk_lookup_run_v6(net, IPPROTO_UDP,
+					    saddr, sport, daddr, hnum, &sk);
+	if (no_reuseport || IS_ERR_OR_NULL(sk))
+		return sk;
+
+	reuse_sk = lookup_reuseport(net, sk, skb, saddr, sport, daddr, hnum);
+	if (reuse_sk)
+		sk = reuse_sk;
+	return sk;
+}
+
 /* rcu_read_lock() must be held */
 struct sock *__udp6_lib_lookup(struct net *net,
 			       const struct in6_addr *saddr, __be16 sport,
@@ -200,25 +225,42 @@ struct sock *__udp6_lib_lookup(struct net *net,
 	unsigned short hnum = ntohs(dport);
 	unsigned int hash2, slot2;
 	struct udp_hslot *hslot2;
-	struct sock *result;
+	struct sock *result, *sk;
 
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected or non-wildcard sockets */
 	result = udp6_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
 				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
-		slot2 = hash2 & udptable->mask;
+	if (!IS_ERR_OR_NULL(result) && result->sk_state == TCP_ESTABLISHED)
+		goto done;
+
+	/* Lookup redirect from BPF */
+	if (static_branch_unlikely(&bpf_sk_lookup_enabled)) {
+		sk = udp6_lookup_run_bpf(net, udptable, skb,
+					 saddr, sport, daddr, hnum);
+		if (sk) {
+			result = sk;
+			goto done;
+		}
+	}
 
-		hslot2 = &udptable->hash2[slot2];
+	/* Got non-wildcard socket or error on first lookup */
+	if (result)
+		goto done;
 
-		result = udp6_lib_lookup2(net, saddr, sport,
-					  &in6addr_any, hnum, dif, sdif,
-					  hslot2, skb);
-	}
+	/* Lookup wildcard sockets */
+	hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp6_lib_lookup2(net, saddr, sport,
+				  &in6addr_any, hnum, dif, sdif,
+				  hslot2, skb);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 11/16] bpf: Sync linux/bpf.h to tools/
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (9 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 10/16] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 12/16] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Newly added program, context type and helper is used by tests in a
subsequent patch. Synchronize the header file.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Update after changes to bpf.h in earlier patch.
    
    v3:
    - Update after changes to bpf.h in earlier patch.
    
    v2:
    - Update after changes to bpf.h in earlier patch.

 tools/include/uapi/linux/bpf.h | 77 ++++++++++++++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 548a749aebb3..e2ffeb150d0f 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -189,6 +189,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_STRUCT_OPS,
 	BPF_PROG_TYPE_EXT,
 	BPF_PROG_TYPE_LSM,
+	BPF_PROG_TYPE_SK_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -227,6 +228,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_INET6_GETSOCKNAME,
 	BPF_XDP_DEVMAP,
 	BPF_CGROUP_INET_SOCK_RELEASE,
+	BPF_SK_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -3068,6 +3070,10 @@ union bpf_attr {
  *
  * long bpf_sk_assign(struct sk_buff *skb, struct bpf_sock *sk, u64 flags)
  *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SCHED_CLS** and
+ *		**BPF_PROG_TYPE_SCHED_ACT** programs.
+ *
  *		Assign the *sk* to the *skb*. When combined with appropriate
  *		routing configuration to receive the packet towards the socket,
  *		will cause *skb* to be delivered to the specified socket.
@@ -3093,6 +3099,56 @@ union bpf_attr {
  *		**-ESOCKTNOSUPPORT** if the socket type is not supported
  *		(reuseport).
  *
+ * long bpf_sk_assign(struct bpf_sk_lookup *ctx, struct bpf_sock *sk, u64 flags)
+ *	Description
+ *		Helper is overloaded depending on BPF program type. This
+ *		description applies to **BPF_PROG_TYPE_SK_LOOKUP** programs.
+ *
+ *		Select the *sk* as a result of a socket lookup.
+ *
+ *		For the operation to succeed passed socket must be compatible
+ *		with the packet description provided by the *ctx* object.
+ *
+ *		L4 protocol (**IPPROTO_TCP** or **IPPROTO_UDP**) must
+ *		be an exact match. While IP family (**AF_INET** or
+ *		**AF_INET6**) must be compatible, that is IPv6 sockets
+ *		that are not v6-only can be selected for IPv4 packets.
+ *
+ *		Only TCP listeners and UDP unconnected sockets can be
+ *		selected. *sk* can also be NULL to reset any previous
+ *		selection.
+ *
+ *		*flags* argument can combination of following values:
+ *
+ *		* **BPF_SK_LOOKUP_F_REPLACE** to override the previous
+ *		  socket selection, potentially done by a BPF program
+ *		  that ran before us.
+ *
+ *		* **BPF_SK_LOOKUP_F_NO_REUSEPORT** to skip
+ *		  load-balancing within reuseport group for the socket
+ *		  being selected.
+ *
+ *		On success *ctx->sk* will point to the selected socket.
+ *
+ *	Return
+ *		0 on success, or a negative errno in case of failure.
+ *
+ *		* **-EAFNOSUPPORT** if socket family (*sk->family*) is
+ *		  not compatible with packet family (*ctx->family*).
+ *
+ *		* **-EEXIST** if socket has been already selected,
+ *		  potentially by another program, and
+ *		  **BPF_SK_LOOKUP_F_REPLACE** flag was not specified.
+ *
+ *		* **-EINVAL** if unsupported flags were specified.
+ *
+ *		* **-EPROTOTYPE** if socket L4 protocol
+ *		  (*sk->protocol*) doesn't match packet protocol
+ *		  (*ctx->protocol*).
+ *
+ *		* **-ESOCKTNOSUPPORT** if socket is not in allowed
+ *		  state (TCP listening or UDP unconnected).
+ *
  * u64 bpf_ktime_get_boot_ns(void)
  * 	Description
  * 		Return the time elapsed since system boot, in nanoseconds.
@@ -3605,6 +3661,12 @@ enum {
 	BPF_RINGBUF_HDR_SZ		= 8,
 };
 
+/* BPF_FUNC_sk_assign flags in bpf_sk_lookup context. */
+enum {
+	BPF_SK_LOOKUP_F_REPLACE		= (1ULL << 0),
+	BPF_SK_LOOKUP_F_NO_REUSEPORT	= (1ULL << 1),
+};
+
 /* Mode for BPF_FUNC_skb_adjust_room helper. */
 enum bpf_adj_room_mode {
 	BPF_ADJ_ROOM_NET,
@@ -4334,4 +4396,19 @@ struct bpf_pidns_info {
 	__u32 pid;
 	__u32 tgid;
 };
+
+/* User accessible data for SK_LOOKUP programs. Add new fields at the end. */
+struct bpf_sk_lookup {
+	__bpf_md_ptr(struct bpf_sock *, sk); /* Selected socket */
+
+	__u32 family;		/* Protocol family (AF_INET, AF_INET6) */
+	__u32 protocol;		/* IP protocol (IPPROTO_TCP, IPPROTO_UDP) */
+	__u32 remote_ip4;	/* Network byte order */
+	__u32 remote_ip6[4];	/* Network byte order */
+	__u32 remote_port;	/* Network byte order */
+	__u32 local_ip4;	/* Network byte order */
+	__u32 local_ip6[4];	/* Network byte order */
+	__u32 local_port;	/* Host byte order */
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 12/16] libbpf: Add support for SK_LOOKUP program type
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (10 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 11/16] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Make libbpf aware of the newly added program type, and assign it a
section name.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Add trailing slash to section prefix ("sk_lookup/"). (Andrii)
    
    v3:
    - Move new libbpf symbols to version 0.1.0.
    - Set expected_attach_type in probe_load for new prog type.
    
    v2:
    - Add new libbpf symbols to version 0.0.9. (Andrii)

 tools/lib/bpf/libbpf.c        | 3 +++
 tools/lib/bpf/libbpf.h        | 2 ++
 tools/lib/bpf/libbpf.map      | 2 ++
 tools/lib/bpf/libbpf_probes.c | 3 +++
 4 files changed, 10 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 25e4f77be8d7..1dfdf7d36352 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6793,6 +6793,7 @@ BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
 BPF_PROG_TYPE_FNS(tracing, BPF_PROG_TYPE_TRACING);
 BPF_PROG_TYPE_FNS(struct_ops, BPF_PROG_TYPE_STRUCT_OPS);
 BPF_PROG_TYPE_FNS(extension, BPF_PROG_TYPE_EXT);
+BPF_PROG_TYPE_FNS(sk_lookup, BPF_PROG_TYPE_SK_LOOKUP);
 
 enum bpf_attach_type
 bpf_program__get_expected_attach_type(struct bpf_program *prog)
@@ -6973,6 +6974,8 @@ static const struct bpf_sec_def section_defs[] = {
 	BPF_EAPROG_SEC("cgroup/setsockopt",	BPF_PROG_TYPE_CGROUP_SOCKOPT,
 						BPF_CGROUP_SETSOCKOPT),
 	BPF_PROG_SEC("struct_ops",		BPF_PROG_TYPE_STRUCT_OPS),
+	BPF_EAPROG_SEC("sk_lookup/",		BPF_PROG_TYPE_SK_LOOKUP,
+						BPF_SK_LOOKUP),
 };
 
 #undef BPF_PROG_SEC_IMPL
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 2335971ed0bd..c2272132e929 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -350,6 +350,7 @@ LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_tracing(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_struct_ops(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_extension(struct bpf_program *prog);
+LIBBPF_API int bpf_program__set_sk_lookup(struct bpf_program *prog);
 
 LIBBPF_API enum bpf_prog_type bpf_program__get_type(struct bpf_program *prog);
 LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
@@ -377,6 +378,7 @@ LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_tracing(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_struct_ops(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_extension(const struct bpf_program *prog);
+LIBBPF_API bool bpf_program__is_sk_lookup(const struct bpf_program *prog);
 
 /*
  * No need for __attribute__((packed)), all members of 'bpf_map_def'
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index c5d5c7664c3b..6f0856abe299 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -287,6 +287,8 @@ LIBBPF_0.1.0 {
 		bpf_map__type;
 		bpf_map__value_size;
 		bpf_program__autoload;
+		bpf_program__is_sk_lookup;
 		bpf_program__set_autoload;
+		bpf_program__set_sk_lookup;
 		btf__set_fd;
 } LIBBPF_0.0.9;
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 10cd8d1891f5..5a3d3f078408 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -78,6 +78,9 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_CGROUP_SOCK_ADDR:
 		xattr.expected_attach_type = BPF_CGROUP_INET4_CONNECT;
 		break;
+	case BPF_PROG_TYPE_SK_LOOKUP:
+		xattr.expected_attach_type = BPF_SK_LOOKUP;
+		break;
 	case BPF_PROG_TYPE_KPROBE:
 		xattr.kern_version = get_kernel_version();
 		break;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (11 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 12/16] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-16  2:10   ` Andrii Nakryiko
  2020-07-13 17:46 ` [PATCH bpf-next v4 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Make bpftool show human-friendly identifiers for newly introduced program
and attach type, BPF_PROG_TYPE_SK_LOOKUP and BPF_SK_LOOKUP, respectively.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v3:
    - New patch in v3.

 tools/bpf/bpftool/common.c | 1 +
 tools/bpf/bpftool/prog.c   | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/common.c b/tools/bpf/bpftool/common.c
index 29f4e7611ae8..9b28c69dd8e4 100644
--- a/tools/bpf/bpftool/common.c
+++ b/tools/bpf/bpftool/common.c
@@ -64,6 +64,7 @@ const char * const attach_type_name[__MAX_BPF_ATTACH_TYPE] = {
 	[BPF_TRACE_FEXIT]		= "fexit",
 	[BPF_MODIFY_RETURN]		= "mod_ret",
 	[BPF_LSM_MAC]			= "lsm_mac",
+	[BPF_SK_LOOKUP]			= "sk_lookup",
 };
 
 void p_err(const char *fmt, ...)
diff --git a/tools/bpf/bpftool/prog.c b/tools/bpf/bpftool/prog.c
index 6863c57effd0..3e6ecc6332e2 100644
--- a/tools/bpf/bpftool/prog.c
+++ b/tools/bpf/bpftool/prog.c
@@ -59,6 +59,7 @@ const char * const prog_type_name[] = {
 	[BPF_PROG_TYPE_TRACING]			= "tracing",
 	[BPF_PROG_TYPE_STRUCT_OPS]		= "struct_ops",
 	[BPF_PROG_TYPE_EXT]			= "ext",
+	[BPF_PROG_TYPE_SK_LOOKUP]		= "sk_lookup",
 };
 
 const size_t prog_type_name_size = ARRAY_SIZE(prog_type_name);
@@ -1905,7 +1906,7 @@ static int do_help(int argc, char **argv)
 		"                 cgroup/getsockname4 | cgroup/getsockname6 | cgroup/sendmsg4 |\n"
 		"                 cgroup/sendmsg6 | cgroup/recvmsg4 | cgroup/recvmsg6 |\n"
 		"                 cgroup/getsockopt | cgroup/setsockopt |\n"
-		"                 struct_ops | fentry | fexit | freplace }\n"
+		"                 struct_ops | fentry | fexit | freplace | sk_lookup }\n"
 		"       ATTACH_TYPE := { msg_verdict | stream_verdict | stream_parser |\n"
 		"                        flow_dissector }\n"
 		"       METRIC := { cycles | instructions | l1d_loads | llc_misses }\n"
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (12 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-16  2:13   ` Andrii Nakryiko
  2020-07-13 17:46 ` [PATCH bpf-next v4 15/16] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Exercise verifier access checks for bpf_sk_lookup context fields.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Bring back tests for narrow loads.
    
    v3:
    - Consolidate ACCEPT tests into one.
    - Deduplicate REJECT tests and arrange them into logical groups.
    - Add tests for out-of-bounds and unaligned access.
    - Cover access to newly introduced 'sk' field.
    
    v2:
     - Adjust for fields renames in struct bpf_sk_lookup.

 .../selftests/bpf/verifier/ctx_sk_lookup.c    | 471 ++++++++++++++++++
 1 file changed, 471 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c

diff --git a/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
new file mode 100644
index 000000000000..24b0852984cf
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
@@ -0,0 +1,471 @@
+{
+	"valid 1,2,4,8-byte reads from bpf_sk_lookup",
+	.insns = {
+		/* 1-byte read from family field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family) + 3),
+		/* 2-byte read from family field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family) + 2),
+		/* 4-byte read from family field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+
+		/* 1-byte read from protocol field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol) + 3),
+		/* 2-byte read from protocol field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol) + 2),
+		/* 4-byte read from protocol field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+
+		/* 1-byte read from remote_ip4 field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4) + 3),
+		/* 2-byte read from remote_ip4 field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4) + 2),
+		/* 4-byte read from remote_ip4 field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4)),
+
+		/* 1-byte read from remote_ip6 field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 3),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 4),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 5),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 6),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 7),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 8),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 9),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 10),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 11),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 12),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 13),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 14),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 15),
+		/* 2-byte read from remote_ip6 field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 2),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 4),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 6),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 8),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 10),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 12),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 14),
+		/* 4-byte read from remote_ip6 field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6)),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 4),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 8),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6) + 12),
+
+		/* 1-byte read from remote_port field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port) + 3),
+		/* 2-byte read from remote_port field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port) + 2),
+		/* 4-byte read from remote_port field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port)),
+
+		/* 1-byte read from local_ip4 field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4) + 3),
+		/* 2-byte read from local_ip4 field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4) + 2),
+		/* 4-byte read from local_ip4 field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4)),
+
+		/* 1-byte read from local_ip6 field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 3),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 4),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 5),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 6),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 7),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 8),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 9),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 10),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 11),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 12),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 13),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 14),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 15),
+		/* 2-byte read from local_ip6 field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 2),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 4),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 6),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 8),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 10),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 12),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 14),
+		/* 4-byte read from local_ip6 field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6)),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 4),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 8),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6) + 12),
+
+		/* 1-byte read from local_port field */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port) + 1),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port) + 2),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port) + 3),
+		/* 2-byte read from local_port field */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port) + 2),
+		/* 4-byte read from local_port field */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port)),
+
+		/* 8-byte read from sk field */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, sk)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+/* invalid 8-byte reads from a 4-byte fields in bpf_sk_lookup */
+{
+	"invalid 8-byte read from bpf_sk_lookup family field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup protocol field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup remote_ip4 field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup remote_ip6 field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_ip6)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup remote_port field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup local_ip4 field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup local_ip6 field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_ip6)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 8-byte read from bpf_sk_lookup local_port field",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+/* invalid 1,2,4-byte reads from 8-byte fields in bpf_sk_lookup */
+{
+	"invalid 4-byte read from bpf_sk_lookup sk field",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, sk)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 2-byte read from bpf_sk_lookup sk field",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, sk)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 1-byte read from bpf_sk_lookup sk field",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_sk_lookup, sk)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+/* out of bounds and unaligned reads from bpf_sk_lookup */
+{
+	"invalid 4-byte read past end of bpf_sk_lookup",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    sizeof(struct bpf_sk_lookup)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 4-byte unaligned read from bpf_sk_lookup at odd offset",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, 1),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 4-byte unaligned read from bpf_sk_lookup at even offset",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1, 2),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+/* in-bound and out-of-bound writes to bpf_sk_lookup */
+{
+	"invalid 8-byte write to bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write to bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 2-byte write to bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 1-byte write to bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
+{
+	"invalid 4-byte write past end of bpf_sk_lookup",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0xcafe4a11U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    sizeof(struct bpf_sk_lookup)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_SK_LOOKUP,
+	.expected_attach_type = BPF_SK_LOOKUP,
+},
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 15/16] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (13 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-13 17:46 ` [PATCH bpf-next v4 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
  2020-07-16  2:25 ` [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Andrii Nakryiko
  16 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Name the BPF C file after the test case that uses it.

This frees up "test_sk_lookup" namespace for BPF sk_lookup program tests
introduced by the following patch.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/testing/selftests/bpf/prog_tests/reference_tracking.c     | 2 +-
 .../bpf/progs/{test_sk_lookup_kern.c => test_ref_track_kern.c}  | 0
 2 files changed, 1 insertion(+), 1 deletion(-)
 rename tools/testing/selftests/bpf/progs/{test_sk_lookup_kern.c => test_ref_track_kern.c} (100%)

diff --git a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
index fc0d7f4f02cf..106ca8bb2a8f 100644
--- a/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
+++ b/tools/testing/selftests/bpf/prog_tests/reference_tracking.c
@@ -3,7 +3,7 @@
 
 void test_reference_tracking(void)
 {
-	const char *file = "test_sk_lookup_kern.o";
+	const char *file = "test_ref_track_kern.o";
 	const char *obj_name = "ref_track";
 	DECLARE_LIBBPF_OPTS(bpf_object_open_opts, open_opts,
 		.object_name = obj_name,
diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c b/tools/testing/selftests/bpf/progs/test_ref_track_kern.c
similarity index 100%
rename from tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
rename to tools/testing/selftests/bpf/progs/test_ref_track_kern.c
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH bpf-next v4 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (14 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 15/16] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
@ 2020-07-13 17:46 ` Jakub Sitnicki
  2020-07-16  2:19   ` Andrii Nakryiko
  2020-07-16  2:25 ` [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Andrii Nakryiko
  16 siblings, 1 reply; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-13 17:46 UTC (permalink / raw)
  To: bpf
  Cc: netdev, kernel-team, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, Jakub Kicinski

Add tests to test_progs that exercise:

 - attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
 - redirecting socket lookup to a socket selected by BPF program,
 - failing a socket lookup on BPF program's request,
 - error scenarios for selecting a socket from BPF program,
 - accessing BPF program context,
 - attaching and running multiple BPF programs.

Run log:

  # ./test_progs -n 69
  #69/1 query lookup prog:OK
  #69/2 TCP IPv4 redir port:OK
  #69/3 TCP IPv4 redir addr:OK
  #69/4 TCP IPv4 redir with reuseport:OK
  #69/5 TCP IPv4 redir skip reuseport:OK
  #69/6 TCP IPv6 redir port:OK
  #69/7 TCP IPv6 redir addr:OK
  #69/8 TCP IPv4->IPv6 redir port:OK
  #69/9 TCP IPv6 redir with reuseport:OK
  #69/10 TCP IPv6 redir skip reuseport:OK
  #69/11 UDP IPv4 redir port:OK
  #69/12 UDP IPv4 redir addr:OK
  #69/13 UDP IPv4 redir with reuseport:OK
  #69/14 UDP IPv4 redir skip reuseport:OK
  #69/15 UDP IPv6 redir port:OK
  #69/16 UDP IPv6 redir addr:OK
  #69/17 UDP IPv4->IPv6 redir port:OK
  #69/18 UDP IPv6 redir and reuseport:OK
  #69/19 UDP IPv6 redir skip reuseport:OK
  #69/20 TCP IPv4 drop on lookup:OK
  #69/21 TCP IPv6 drop on lookup:OK
  #69/22 UDP IPv4 drop on lookup:OK
  #69/23 UDP IPv6 drop on lookup:OK
  #69/24 TCP IPv4 drop on reuseport:OK
  #69/25 TCP IPv6 drop on reuseport:OK
  #69/26 UDP IPv4 drop on reuseport:OK
  #69/27 TCP IPv6 drop on reuseport:OK
  #69/28 sk_assign returns EEXIST:OK
  #69/29 sk_assign honors F_REPLACE:OK
  #69/30 sk_assign accepts NULL socket:OK
  #69/31 access ctx->sk:OK
  #69/32 narrow access to ctx v4:OK
  #69/33 narrow access to ctx v6:OK
  #69/34 sk_assign rejects TCP established:OK
  #69/35 sk_assign rejects UDP connected:OK
  #69/36 multi prog - pass, pass:OK
  #69/37 multi prog - drop, drop:OK
  #69/38 multi prog - pass, drop:OK
  #69/39 multi prog - drop, pass:OK
  #69/40 multi prog - pass, redir:OK
  #69/41 multi prog - redir, pass:OK
  #69/42 multi prog - drop, redir:OK
  #69/43 multi prog - redir, drop:OK
  #69/44 multi prog - redir, redir:OK
  #69 sk_lookup:OK
  Summary: 1/44 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---

Notes:
    v4:
    - Remove system("bpftool ...") call left over from debugging. (Lorenz)
    - Dedup BPF code that selects a socket. (Lorenz)
    - Switch from CHECK_FAIL to CHECK macro. (Andrii)
    - Extract a network_helper that wraps inet_pton.
    - Don't restore netns now that test_progs does it.
    - Cover bpf_sk_assign(ctx, NULL) in tests.
    - Cover narrow loads in tests.
    - Cover NULL ctx->sk access attempts in tests.
    - Cover accessing IPv6 ctx fields on IPv4 lookup.
    
    v3:
    - Extend tests to cover new functionality in v3:
      - multi-prog attachments (query, running, verdict precedence)
      - socket selecting for the second time with bpf_sk_assign
      - skipping over reuseport load-balancing
    
    v2:
     - Adjust for fields renames in struct bpf_sk_lookup.

 tools/testing/selftests/bpf/network_helpers.c |   58 +-
 tools/testing/selftests/bpf/network_helpers.h |    2 +
 .../selftests/bpf/prog_tests/sk_lookup.c      | 1282 +++++++++++++++++
 .../selftests/bpf/progs/test_sk_lookup_kern.c |  639 ++++++++
 4 files changed, 1958 insertions(+), 23 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c

diff --git a/tools/testing/selftests/bpf/network_helpers.c b/tools/testing/selftests/bpf/network_helpers.c
index acd08715be2e..f56655690f9b 100644
--- a/tools/testing/selftests/bpf/network_helpers.c
+++ b/tools/testing/selftests/bpf/network_helpers.c
@@ -73,29 +73,8 @@ int start_server(int family, int type, const char *addr_str, __u16 port,
 	socklen_t len;
 	int fd;
 
-	if (family == AF_INET) {
-		struct sockaddr_in *sin = (void *)&addr;
-
-		sin->sin_family = AF_INET;
-		sin->sin_port = htons(port);
-		if (addr_str &&
-		    inet_pton(AF_INET, addr_str, &sin->sin_addr) != 1) {
-			log_err("inet_pton(AF_INET, %s)", addr_str);
-			return -1;
-		}
-		len = sizeof(*sin);
-	} else {
-		struct sockaddr_in6 *sin6 = (void *)&addr;
-
-		sin6->sin6_family = AF_INET6;
-		sin6->sin6_port = htons(port);
-		if (addr_str &&
-		    inet_pton(AF_INET6, addr_str, &sin6->sin6_addr) != 1) {
-			log_err("inet_pton(AF_INET6, %s)", addr_str);
-			return -1;
-		}
-		len = sizeof(*sin6);
-	}
+	if (make_sockaddr(family, addr_str, port, &addr, &len))
+		return -1;
 
 	fd = socket(family, type, 0);
 	if (fd < 0) {
@@ -194,3 +173,36 @@ int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms)
 
 	return 0;
 }
+
+int make_sockaddr(int family, const char *addr_str, __u16 port,
+		  struct sockaddr_storage *addr, socklen_t *len)
+{
+	if (family == AF_INET) {
+		struct sockaddr_in *sin = (void *)addr;
+
+		sin->sin_family = AF_INET;
+		sin->sin_port = htons(port);
+		if (addr_str &&
+		    inet_pton(AF_INET, addr_str, &sin->sin_addr) != 1) {
+			log_err("inet_pton(AF_INET, %s)", addr_str);
+			return -1;
+		}
+		if (len)
+			*len = sizeof(*sin);
+		return 0;
+	} else if (family == AF_INET6) {
+		struct sockaddr_in6 *sin6 = (void *)addr;
+
+		sin6->sin6_family = AF_INET6;
+		sin6->sin6_port = htons(port);
+		if (addr_str &&
+		    inet_pton(AF_INET6, addr_str, &sin6->sin6_addr) != 1) {
+			log_err("inet_pton(AF_INET6, %s)", addr_str);
+			return -1;
+		}
+		if (len)
+			*len = sizeof(*sin6);
+		return 0;
+	}
+	return -1;
+}
diff --git a/tools/testing/selftests/bpf/network_helpers.h b/tools/testing/selftests/bpf/network_helpers.h
index f580e82fda58..c3728f6667e4 100644
--- a/tools/testing/selftests/bpf/network_helpers.h
+++ b/tools/testing/selftests/bpf/network_helpers.h
@@ -37,5 +37,7 @@ int start_server(int family, int type, const char *addr, __u16 port,
 		 int timeout_ms);
 int connect_to_fd(int server_fd, int timeout_ms);
 int connect_fd_to_fd(int client_fd, int server_fd, int timeout_ms);
+int make_sockaddr(int family, const char *addr_str, __u16 port,
+		  struct sockaddr_storage *addr, socklen_t *len);
 
 #endif
diff --git a/tools/testing/selftests/bpf/prog_tests/sk_lookup.c b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
new file mode 100644
index 000000000000..992e35ee2bc8
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/sk_lookup.c
@@ -0,0 +1,1282 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2020 Cloudflare
+/*
+ * Test BPF attach point for INET socket lookup (BPF_SK_LOOKUP).
+ *
+ * Tests exercise:
+ *  - attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
+ *  - redirecting socket lookup to a socket selected by BPF program,
+ *  - failing a socket lookup on BPF program's request,
+ *  - error scenarios for selecting a socket from BPF program,
+ *  - accessing BPF program context,
+ *  - attaching and running multiple BPF programs.
+ *
+ * Tests run in a dedicated network namespace.
+ */
+
+#define _GNU_SOURCE
+#include <arpa/inet.h>
+#include <assert.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+
+#include "test_progs.h"
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+#include "cgroup_helpers.h"
+#include "network_helpers.h"
+#include "test_sk_lookup_kern.skel.h"
+
+/* External (address, port) pairs the client sends packets to. */
+#define EXT_IP4		"127.0.0.1"
+#define EXT_IP6		"fd00::1"
+#define EXT_PORT	7007
+
+/* Internal (address, port) pairs the server listens/receives at. */
+#define INT_IP4		"127.0.0.2"
+#define INT_IP4_V6	"::ffff:127.0.0.2"
+#define INT_IP6		"fd00::2"
+#define INT_PORT	8008
+
+#define IO_TIMEOUT_SEC	3
+
+enum server {
+	SERVER_A = 0,
+	SERVER_B = 1,
+	MAX_SERVERS,
+};
+
+enum {
+	PROG1 = 0,
+	PROG2,
+};
+
+struct inet_addr {
+	const char *ip;
+	unsigned short port;
+};
+
+struct test {
+	const char *desc;
+	struct bpf_program *lookup_prog;
+	struct bpf_program *reuseport_prog;
+	struct bpf_map *sock_map;
+	int sotype;
+	struct inet_addr connect_to;
+	struct inet_addr listen_at;
+	enum server accept_on;
+};
+
+static __u32 duration;		/* for CHECK macro */
+
+static bool is_ipv6(const char *ip)
+{
+	return !!strchr(ip, ':');
+}
+
+static int attach_reuseport(int sock_fd, struct bpf_program *reuseport_prog)
+{
+	int err, prog_fd;
+
+	prog_fd = bpf_program__fd(reuseport_prog);
+	if (prog_fd < 0) {
+		errno = -prog_fd;
+		return -1;
+	}
+
+	err = setsockopt(sock_fd, SOL_SOCKET, SO_ATTACH_REUSEPORT_EBPF,
+			 &prog_fd, sizeof(prog_fd));
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static socklen_t inetaddr_len(const struct sockaddr_storage *addr)
+{
+	return (addr->ss_family == AF_INET ? sizeof(struct sockaddr_in) :
+		addr->ss_family == AF_INET6 ? sizeof(struct sockaddr_in6) : 0);
+}
+
+static int make_socket(int sotype, const char *ip, int port,
+		       struct sockaddr_storage *addr)
+{
+	struct timeval timeo = { .tv_sec = IO_TIMEOUT_SEC };
+	int err, family, fd;
+
+	family = is_ipv6(ip) ? AF_INET6 : AF_INET;
+	err = make_sockaddr(family, ip, port, addr, NULL);
+	if (CHECK(err, "make_address", "failed\n"))
+		return -1;
+
+	fd = socket(addr->ss_family, sotype, 0);
+	if (CHECK(fd < 0, "socket", "failed\n")) {
+		log_err("failed to make socket");
+		return -1;
+	}
+
+	err = setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &timeo, sizeof(timeo));
+	if (CHECK(err, "setsockopt(SO_SNDTIMEO)", "failed\n")) {
+		log_err("failed to set SNDTIMEO");
+		close(fd);
+		return -1;
+	}
+
+	err = setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &timeo, sizeof(timeo));
+	if (CHECK(err, "setsockopt(SO_RCVTIMEO)", "failed\n")) {
+		log_err("failed to set RCVTIMEO");
+		close(fd);
+		return -1;
+	}
+
+	return fd;
+}
+
+static int make_server(int sotype, const char *ip, int port,
+		       struct bpf_program *reuseport_prog)
+{
+	struct sockaddr_storage addr = {0};
+	const int one = 1;
+	int err, fd = -1;
+
+	fd = make_socket(sotype, ip, port, &addr);
+	if (fd < 0)
+		return -1;
+
+	/* Enabled for UDPv6 sockets for IPv4-mapped IPv6 to work. */
+	if (sotype == SOCK_DGRAM) {
+		err = setsockopt(fd, SOL_IP, IP_RECVORIGDSTADDR, &one,
+				 sizeof(one));
+		if (CHECK(err, "setsockopt(IP_RECVORIGDSTADDR)", "failed\n")) {
+			log_err("failed to enable IP_RECVORIGDSTADDR");
+			goto fail;
+		}
+	}
+
+	if (sotype == SOCK_DGRAM && addr.ss_family == AF_INET6) {
+		err = setsockopt(fd, SOL_IPV6, IPV6_RECVORIGDSTADDR, &one,
+				 sizeof(one));
+		if (CHECK(err, "setsockopt(IPV6_RECVORIGDSTADDR)", "failed\n")) {
+			log_err("failed to enable IPV6_RECVORIGDSTADDR");
+			goto fail;
+		}
+	}
+
+	if (sotype == SOCK_STREAM) {
+		err = setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &one,
+				 sizeof(one));
+		if (CHECK(err, "setsockopt(SO_REUSEADDR)", "failed\n")) {
+			log_err("failed to enable SO_REUSEADDR");
+			goto fail;
+		}
+	}
+
+	if (reuseport_prog) {
+		err = setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &one,
+				 sizeof(one));
+		if (CHECK(err, "setsockopt(SO_REUSEPORT)", "failed\n")) {
+			log_err("failed to enable SO_REUSEPORT");
+			goto fail;
+		}
+	}
+
+	err = bind(fd, (void *)&addr, inetaddr_len(&addr));
+	if (CHECK(err, "bind", "failed\n")) {
+		log_err("failed to bind listen socket");
+		goto fail;
+	}
+
+	if (sotype == SOCK_STREAM) {
+		err = listen(fd, SOMAXCONN);
+		if (CHECK(err, "make_server", "listen")) {
+			log_err("failed to listen on port %d", port);
+			goto fail;
+		}
+	}
+
+	/* Late attach reuseport prog so we can have one init path */
+	if (reuseport_prog) {
+		err = attach_reuseport(fd, reuseport_prog);
+		if (CHECK(err, "attach_reuseport", "failed\n")) {
+			log_err("failed to attach reuseport prog");
+			goto fail;
+		}
+	}
+
+	return fd;
+fail:
+	close(fd);
+	return -1;
+}
+
+static int make_client(int sotype, const char *ip, int port)
+{
+	struct sockaddr_storage addr = {0};
+	int err, fd;
+
+	fd = make_socket(sotype, ip, port, &addr);
+	if (fd < 0)
+		return -1;
+
+	err = connect(fd, (void *)&addr, inetaddr_len(&addr));
+	if (CHECK(err, "make_client", "connect")) {
+		log_err("failed to connect client socket");
+		goto fail;
+	}
+
+	return fd;
+fail:
+	close(fd);
+	return -1;
+}
+
+static int send_byte(int fd)
+{
+	ssize_t n;
+
+	errno = 0;
+	n = send(fd, "a", 1, 0);
+	if (CHECK(n <= 0, "send_byte", "send")) {
+		log_err("failed/partial send");
+		return -1;
+	}
+	return 0;
+}
+
+static int recv_byte(int fd)
+{
+	char buf[1];
+	ssize_t n;
+
+	n = recv(fd, buf, sizeof(buf), 0);
+	if (CHECK(n <= 0, "recv_byte", "recv")) {
+		log_err("failed/partial recv");
+		return -1;
+	}
+	return 0;
+}
+
+static int tcp_recv_send(int server_fd)
+{
+	char buf[1];
+	int ret, fd;
+	ssize_t n;
+
+	fd = accept(server_fd, NULL, NULL);
+	if (CHECK(fd < 0, "accept", "failed\n")) {
+		log_err("failed to accept");
+		return -1;
+	}
+
+	n = recv(fd, buf, sizeof(buf), 0);
+	if (CHECK(n <= 0, "recv", "failed\n")) {
+		log_err("failed/partial recv");
+		ret = -1;
+		goto close;
+	}
+
+	n = send(fd, buf, n, 0);
+	if (CHECK(n <= 0, "send", "failed\n")) {
+		log_err("failed/partial send");
+		ret = -1;
+		goto close;
+	}
+
+	ret = 0;
+close:
+	close(fd);
+	return ret;
+}
+
+static void v4_to_v6(struct sockaddr_storage *ss)
+{
+	struct sockaddr_in6 *v6 = (struct sockaddr_in6 *)ss;
+	struct sockaddr_in v4 = *(struct sockaddr_in *)ss;
+
+	v6->sin6_family = AF_INET6;
+	v6->sin6_port = v4.sin_port;
+	v6->sin6_addr.s6_addr[10] = 0xff;
+	v6->sin6_addr.s6_addr[11] = 0xff;
+	memcpy(&v6->sin6_addr.s6_addr[12], &v4.sin_addr.s_addr, 4);
+}
+
+static int udp_recv_send(int server_fd)
+{
+	char cmsg_buf[CMSG_SPACE(sizeof(struct sockaddr_storage))];
+	struct sockaddr_storage _src_addr = { 0 };
+	struct sockaddr_storage *src_addr = &_src_addr;
+	struct sockaddr_storage *dst_addr = NULL;
+	struct msghdr msg = { 0 };
+	struct iovec iov = { 0 };
+	struct cmsghdr *cm;
+	char buf[1];
+	int ret, fd;
+	ssize_t n;
+
+	iov.iov_base = buf;
+	iov.iov_len = sizeof(buf);
+
+	msg.msg_name = src_addr;
+	msg.msg_namelen = sizeof(*src_addr);
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+
+	errno = 0;
+	n = recvmsg(server_fd, &msg, 0);
+	if (CHECK(n <= 0, "recvmsg", "failed\n")) {
+		log_err("failed to receive");
+		return -1;
+	}
+	if (CHECK(msg.msg_flags & MSG_CTRUNC, "recvmsg", "truncated cmsg\n"))
+		return -1;
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if ((cm->cmsg_level == SOL_IP &&
+		     cm->cmsg_type == IP_ORIGDSTADDR) ||
+		    (cm->cmsg_level == SOL_IPV6 &&
+		     cm->cmsg_type == IPV6_ORIGDSTADDR)) {
+			dst_addr = (struct sockaddr_storage *)CMSG_DATA(cm);
+			break;
+		}
+		log_err("warning: ignored cmsg at level %d type %d",
+			cm->cmsg_level, cm->cmsg_type);
+	}
+	if (CHECK(!dst_addr, "recvmsg", "missing ORIGDSTADDR\n"))
+		return -1;
+
+	/* Server socket bound to IPv4-mapped IPv6 address */
+	if (src_addr->ss_family == AF_INET6 &&
+	    dst_addr->ss_family == AF_INET) {
+		v4_to_v6(dst_addr);
+	}
+
+	/* Reply from original destination address. */
+	fd = socket(dst_addr->ss_family, SOCK_DGRAM, 0);
+	if (CHECK(fd < 0, "socket", "failed\n")) {
+		log_err("failed to create tx socket");
+		return -1;
+	}
+
+	ret = bind(fd, (struct sockaddr *)dst_addr, sizeof(*dst_addr));
+	if (CHECK(ret, "bind", "failed\n")) {
+		log_err("failed to bind tx socket");
+		goto out;
+	}
+
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	n = sendmsg(fd, &msg, 0);
+	if (CHECK(n <= 0, "sendmsg", "failed\n")) {
+		log_err("failed to send echo reply");
+		ret = -1;
+		goto out;
+	}
+
+	ret = 0;
+out:
+	close(fd);
+	return ret;
+}
+
+static int tcp_echo_test(int client_fd, int server_fd)
+{
+	int err;
+
+	err = send_byte(client_fd);
+	if (err)
+		return -1;
+	err = tcp_recv_send(server_fd);
+	if (err)
+		return -1;
+	err = recv_byte(client_fd);
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static int udp_echo_test(int client_fd, int server_fd)
+{
+	int err;
+
+	err = send_byte(client_fd);
+	if (err)
+		return -1;
+	err = udp_recv_send(server_fd);
+	if (err)
+		return -1;
+	err = recv_byte(client_fd);
+	if (err)
+		return -1;
+
+	return 0;
+}
+
+static struct bpf_link *attach_lookup_prog(struct bpf_program *prog)
+{
+	struct bpf_link *link;
+	int net_fd;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK(net_fd < 0, "open", "failed\n")) {
+		log_err("failed to open /proc/self/ns/net");
+		return NULL;
+	}
+
+	link = bpf_program__attach_netns(prog, net_fd);
+	if (CHECK(IS_ERR(link), "bpf_program__attach_netns", "failed\n")) {
+		errno = -PTR_ERR(link);
+		log_err("failed to attach program '%s' to netns",
+			bpf_program__name(prog));
+		link = NULL;
+	}
+
+	close(net_fd);
+	return link;
+}
+
+static int update_lookup_map(struct bpf_map *map, int index, int sock_fd)
+{
+	int err, map_fd;
+	uint64_t value;
+
+	map_fd = bpf_map__fd(map);
+	if (CHECK(map_fd < 0, "bpf_map__fd", "failed\n")) {
+		errno = -map_fd;
+		log_err("failed to get map FD");
+		return -1;
+	}
+
+	value = (uint64_t)sock_fd;
+	err = bpf_map_update_elem(map_fd, &index, &value, BPF_NOEXIST);
+	if (CHECK(err, "bpf_map_update_elem", "failed\n")) {
+		log_err("failed to update redir_map @ %d", index);
+		return -1;
+	}
+
+	return 0;
+}
+
+static __u32 link_info_prog_id(struct bpf_link *link)
+{
+	struct bpf_link_info info = {};
+	__u32 info_len = sizeof(info);
+	int link_fd, err;
+
+	link_fd = bpf_link__fd(link);
+	if (CHECK(link_fd < 0, "bpf_link__fd", "failed\n")) {
+		errno = -link_fd;
+		log_err("bpf_link__fd failed");
+		return 0;
+	}
+
+	err = bpf_obj_get_info_by_fd(link_fd, &info, &info_len);
+	if (CHECK(err, "bpf_obj_get_info_by_fd", "failed\n")) {
+		log_err("bpf_obj_get_info_by_fd");
+		return 0;
+	}
+	if (CHECK(info_len != sizeof(info), "bpf_obj_get_info_by_fd",
+		  "unexpected info len %u\n", info_len))
+		return 0;
+
+	return info.prog_id;
+}
+
+static void query_lookup_prog(struct test_sk_lookup_kern *skel)
+{
+	struct bpf_link *link[3] = {};
+	__u32 attach_flags = 0;
+	__u32 prog_ids[3] = {};
+	__u32 prog_cnt = 3;
+	__u32 prog_id;
+	int net_fd;
+	int err;
+
+	net_fd = open("/proc/self/ns/net", O_RDONLY);
+	if (CHECK(net_fd < 0, "open", "failed\n")) {
+		log_err("failed to open /proc/self/ns/net");
+		return;
+	}
+
+	link[0] = attach_lookup_prog(skel->progs.lookup_pass);
+	if (!link[0])
+		goto close;
+	link[1] = attach_lookup_prog(skel->progs.lookup_pass);
+	if (!link[1])
+		goto detach;
+	link[2] = attach_lookup_prog(skel->progs.lookup_drop);
+	if (!link[2])
+		goto detach;
+
+	err = bpf_prog_query(net_fd, BPF_SK_LOOKUP, 0 /* query flags */,
+			     &attach_flags, prog_ids, &prog_cnt);
+	if (CHECK(err, "bpf_prog_query", "failed\n")) {
+		log_err("failed to query lookup prog");
+		goto detach;
+	}
+
+	errno = 0;
+	if (CHECK(attach_flags != 0, "bpf_prog_query",
+		  "wrong attach_flags on query: %u", attach_flags))
+		goto detach;
+	if (CHECK(prog_cnt != 3, "bpf_prog_query",
+		  "wrong program count on query: %u", prog_cnt))
+		goto detach;
+	prog_id = link_info_prog_id(link[0]);
+	CHECK(prog_ids[0] != prog_id, "bpf_prog_query",
+	      "invalid program #0 id on query: %u != %u\n",
+	      prog_ids[0], prog_id);
+	prog_id = link_info_prog_id(link[1]);
+	CHECK(prog_ids[1] != prog_id, "bpf_prog_query",
+	      "invalid program #1 id on query: %u != %u\n",
+	      prog_ids[1], prog_id);
+	prog_id = link_info_prog_id(link[2]);
+	CHECK(prog_ids[2] != prog_id, "bpf_prog_query",
+	      "invalid program #2 id on query: %u != %u\n",
+	      prog_ids[2], prog_id);
+
+detach:
+	if (link[2])
+		bpf_link__destroy(link[2]);
+	if (link[1])
+		bpf_link__destroy(link[1]);
+	if (link[0])
+		bpf_link__destroy(link[0]);
+close:
+	close(net_fd);
+}
+
+static void run_lookup_prog(const struct test *t)
+{
+	int client_fd, server_fds[MAX_SERVERS] = { -1 };
+	struct bpf_link *lookup_link;
+	int i, err;
+
+	lookup_link = attach_lookup_prog(t->lookup_prog);
+	if (!lookup_link)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		server_fds[i] = make_server(t->sotype, t->listen_at.ip,
+					    t->listen_at.port,
+					    t->reuseport_prog);
+		if (server_fds[i] < 0)
+			goto close;
+
+		err = update_lookup_map(t->sock_map, i, server_fds[i]);
+		if (err)
+			goto close;
+
+		/* want just one server for non-reuseport test */
+		if (!t->reuseport_prog)
+			break;
+	}
+
+	client_fd = make_client(t->sotype, t->connect_to.ip, t->connect_to.port);
+	if (client_fd < 0)
+		goto close;
+
+	if (t->sotype == SOCK_STREAM)
+		tcp_echo_test(client_fd, server_fds[t->accept_on]);
+	else
+		udp_echo_test(client_fd, server_fds[t->accept_on]);
+
+	close(client_fd);
+close:
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		if (server_fds[i] != -1)
+			close(server_fds[i]);
+	}
+	bpf_link__destroy(lookup_link);
+}
+
+static void test_redirect_lookup(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { EXT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4 redir addr",
+			.lookup_prog	= skel->progs.redir_ip4,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4 redir with reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_B,
+		},
+		{
+			.desc		= "TCP IPv4 redir skip reuseport",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_A,
+		},
+		{
+			.desc		= "TCP IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { EXT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 redir addr",
+			.lookup_prog	= skel->progs.redir_ip6,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv4->IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4_V6, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 redir with reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+			.accept_on	= SERVER_B,
+		},
+		{
+			.desc		= "TCP IPv6 redir skip reuseport",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+			.accept_on	= SERVER_A,
+		},
+		{
+			.desc		= "UDP IPv4 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { EXT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 redir addr",
+			.lookup_prog	= skel->progs.redir_ip4,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 redir with reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_B,
+		},
+		{
+			.desc		= "UDP IPv4 redir skip reuseport",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+			.accept_on	= SERVER_A,
+		},
+		{
+			.desc		= "UDP IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { EXT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 redir addr",
+			.lookup_prog	= skel->progs.redir_ip6,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4->IPv6 redir port",
+			.lookup_prog	= skel->progs.redir_port,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.listen_at	= { INT_IP4_V6, INT_PORT },
+			.connect_to	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 redir and reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+			.accept_on	= SERVER_B,
+		},
+		{
+			.desc		= "UDP IPv6 redir skip reuseport",
+			.lookup_prog	= skel->progs.select_sock_a_no_reuseport,
+			.reuseport_prog	= skel->progs.select_sock_b,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+			.accept_on	= SERVER_A,
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			run_lookup_prog(t);
+	}
+}
+
+static void drop_on_lookup(const struct test *t)
+{
+	struct sockaddr_storage dst = {};
+	int client_fd, server_fd, err;
+	struct bpf_link *lookup_link;
+	ssize_t n;
+
+	lookup_link = attach_lookup_prog(t->lookup_prog);
+	if (!lookup_link)
+		return;
+
+	server_fd = make_server(t->sotype, t->listen_at.ip, t->listen_at.port,
+				t->reuseport_prog);
+	if (server_fd < 0)
+		goto detach;
+
+	client_fd = make_socket(t->sotype, t->connect_to.ip,
+				t->connect_to.port, &dst);
+	if (client_fd < 0)
+		goto close_srv;
+
+	err = connect(client_fd, (void *)&dst, inetaddr_len(&dst));
+	if (t->sotype == SOCK_DGRAM) {
+		err = send_byte(client_fd);
+		if (err)
+			goto close_all;
+
+		/* Read out asynchronous error */
+		n = recv(client_fd, NULL, 0, 0);
+		err = n == -1;
+	}
+	if (CHECK(!err || errno != ECONNREFUSED, "connect",
+		  "unexpected success or error\n"))
+		log_err("expected ECONNREFUSED on connect");
+
+close_all:
+	close(client_fd);
+close_srv:
+	close(server_fd);
+detach:
+	bpf_link__destroy(lookup_link);
+}
+
+static void test_drop_on_lookup(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { EXT_IP6, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "UDP IPv6 drop on lookup",
+			.lookup_prog	= skel->progs.lookup_drop,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { EXT_IP6, INT_PORT },
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			drop_on_lookup(t);
+	}
+}
+
+static void drop_on_reuseport(const struct test *t)
+{
+	struct sockaddr_storage dst = { 0 };
+	int client, server1, server2, err;
+	struct bpf_link *lookup_link;
+	ssize_t n;
+
+	lookup_link = attach_lookup_prog(t->lookup_prog);
+	if (!lookup_link)
+		return;
+
+	server1 = make_server(t->sotype, t->listen_at.ip, t->listen_at.port,
+			      t->reuseport_prog);
+	if (server1 < 0)
+		goto detach;
+
+	err = update_lookup_map(t->sock_map, SERVER_A, server1);
+	if (err)
+		goto detach;
+
+	/* second server on destination address we should never reach */
+	server2 = make_server(t->sotype, t->connect_to.ip, t->connect_to.port,
+			      NULL /* reuseport prog */);
+	if (server2 < 0)
+		goto close_srv1;
+
+	client = make_socket(t->sotype, t->connect_to.ip,
+			     t->connect_to.port, &dst);
+	if (client < 0)
+		goto close_srv2;
+
+	err = connect(client, (void *)&dst, inetaddr_len(&dst));
+	if (t->sotype == SOCK_DGRAM) {
+		err = send_byte(client);
+		if (err)
+			goto close_all;
+
+		/* Read out asynchronous error */
+		n = recv(client, NULL, 0, 0);
+		err = n == -1;
+	}
+	if (CHECK(!err || errno != ECONNREFUSED, "connect",
+		  "unexpected success or error\n"))
+		log_err("expected ECONNREFUSED on connect");
+
+close_all:
+	close(client);
+close_srv2:
+	close(server2);
+close_srv1:
+	close(server1);
+detach:
+	bpf_link__destroy(lookup_link);
+}
+
+static void test_drop_on_reuseport(struct test_sk_lookup_kern *skel)
+{
+	const struct test tests[] = {
+		{
+			.desc		= "TCP IPv4 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+		},
+		{
+			.desc		= "UDP IPv4 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_DGRAM,
+			.connect_to	= { EXT_IP4, EXT_PORT },
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "TCP IPv6 drop on reuseport",
+			.lookup_prog	= skel->progs.select_sock_a,
+			.reuseport_prog	= skel->progs.reuseport_drop,
+			.sock_map	= skel->maps.redir_map,
+			.sotype		= SOCK_STREAM,
+			.connect_to	= { EXT_IP6, EXT_PORT },
+			.listen_at	= { INT_IP6, INT_PORT },
+		},
+	};
+	const struct test *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		if (test__start_subtest(t->desc))
+			drop_on_reuseport(t);
+	}
+}
+
+static void run_sk_assign(struct test_sk_lookup_kern *skel,
+			  struct bpf_program *lookup_prog,
+			  const char *listen_ip, const char *connect_ip)
+{
+	int client_fd, peer_fd, server_fds[MAX_SERVERS] = { -1 };
+	struct bpf_link *lookup_link;
+	int i, err;
+
+	lookup_link = attach_lookup_prog(lookup_prog);
+	if (!lookup_link)
+		return;
+
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		server_fds[i] = make_server(SOCK_STREAM, listen_ip, 0, NULL);
+		if (server_fds[i] < 0)
+			goto close_servers;
+
+		err = update_lookup_map(skel->maps.redir_map, i,
+					server_fds[i]);
+		if (err)
+			goto close_servers;
+	}
+
+	client_fd = make_client(SOCK_STREAM, connect_ip, EXT_PORT);
+	if (client_fd < 0)
+		goto close_servers;
+
+	peer_fd = accept(server_fds[SERVER_B], NULL, NULL);
+	if (CHECK(peer_fd < 0, "accept", "failed\n"))
+		goto close_client;
+
+	close(peer_fd);
+close_client:
+	close(client_fd);
+close_servers:
+	for (i = 0; i < ARRAY_SIZE(server_fds); i++) {
+		if (server_fds[i] != -1)
+			close(server_fds[i]);
+	}
+	bpf_link__destroy(lookup_link);
+}
+
+static void run_sk_assign_v4(struct test_sk_lookup_kern *skel,
+			     struct bpf_program *lookup_prog)
+{
+	run_sk_assign(skel, lookup_prog, INT_IP4, EXT_IP4);
+}
+
+static void run_sk_assign_v6(struct test_sk_lookup_kern *skel,
+			     struct bpf_program *lookup_prog)
+{
+	run_sk_assign(skel, lookup_prog, INT_IP6, EXT_IP6);
+}
+
+static void run_sk_assign_connected(struct test_sk_lookup_kern *skel,
+				    int sotype)
+{
+	int err, client_fd, connected_fd, server_fd;
+	struct bpf_link *lookup_link;
+
+	server_fd = make_server(sotype, EXT_IP4, EXT_PORT, NULL);
+	if (server_fd < 0)
+		return;
+
+	connected_fd = make_client(sotype, EXT_IP4, EXT_PORT);
+	if (connected_fd < 0)
+		goto out_close_server;
+
+	/* Put a connected socket in redirect map */
+	err = update_lookup_map(skel->maps.redir_map, SERVER_A, connected_fd);
+	if (err)
+		goto out_close_connected;
+
+	lookup_link = attach_lookup_prog(skel->progs.sk_assign_esocknosupport);
+	if (!lookup_link)
+		goto out_close_connected;
+
+	/* Try to redirect TCP SYN / UDP packet to a connected socket */
+	client_fd = make_client(sotype, EXT_IP4, EXT_PORT);
+	if (client_fd < 0)
+		goto out_unlink_prog;
+	if (sotype == SOCK_DGRAM) {
+		send_byte(client_fd);
+		recv_byte(server_fd);
+	}
+
+	close(client_fd);
+out_unlink_prog:
+	bpf_link__destroy(lookup_link);
+out_close_connected:
+	close(connected_fd);
+out_close_server:
+	close(server_fd);
+}
+
+static void test_sk_assign_helper(struct test_sk_lookup_kern *skel)
+{
+	if (test__start_subtest("sk_assign returns EEXIST"))
+		run_sk_assign_v4(skel, skel->progs.sk_assign_eexist);
+	if (test__start_subtest("sk_assign honors F_REPLACE"))
+		run_sk_assign_v4(skel, skel->progs.sk_assign_replace_flag);
+	if (test__start_subtest("sk_assign accepts NULL socket"))
+		run_sk_assign_v4(skel, skel->progs.sk_assign_null);
+	if (test__start_subtest("access ctx->sk"))
+		run_sk_assign_v4(skel, skel->progs.access_ctx_sk);
+	if (test__start_subtest("narrow access to ctx v4"))
+		run_sk_assign_v4(skel, skel->progs.ctx_narrow_access);
+	if (test__start_subtest("narrow access to ctx v6"))
+		run_sk_assign_v6(skel, skel->progs.ctx_narrow_access);
+	if (test__start_subtest("sk_assign rejects TCP established"))
+		run_sk_assign_connected(skel, SOCK_STREAM);
+	if (test__start_subtest("sk_assign rejects UDP connected"))
+		run_sk_assign_connected(skel, SOCK_DGRAM);
+}
+
+struct test_multi_prog {
+	const char *desc;
+	struct bpf_program *prog1;
+	struct bpf_program *prog2;
+	struct bpf_map *redir_map;
+	struct bpf_map *run_map;
+	int expect_errno;
+	struct inet_addr listen_at;
+};
+
+static void run_multi_prog_lookup(const struct test_multi_prog *t)
+{
+	struct sockaddr_storage dst = {};
+	int map_fd, server_fd, client_fd;
+	struct bpf_link *link1, *link2;
+	int prog_idx, done, err;
+
+	map_fd = bpf_map__fd(t->run_map);
+
+	done = 0;
+	prog_idx = PROG1;
+	err = bpf_map_update_elem(map_fd, &prog_idx, &done, BPF_ANY);
+	if (CHECK(err, "bpf_map_update_elem", "failed\n"))
+		return;
+	prog_idx = PROG2;
+	err = bpf_map_update_elem(map_fd, &prog_idx, &done, BPF_ANY);
+	if (CHECK(err, "bpf_map_update_elem", "failed\n"))
+		return;
+
+	link1 = attach_lookup_prog(t->prog1);
+	if (!link1)
+		return;
+	link2 = attach_lookup_prog(t->prog2);
+	if (!link2)
+		goto out_unlink1;
+
+	server_fd = make_server(SOCK_STREAM, t->listen_at.ip,
+				t->listen_at.port, NULL);
+	if (server_fd < 0)
+		goto out_unlink2;
+
+	err = update_lookup_map(t->redir_map, SERVER_A, server_fd);
+	if (err)
+		goto out_close_server;
+
+	client_fd = make_socket(SOCK_STREAM, EXT_IP4, EXT_PORT, &dst);
+	if (client_fd < 0)
+		goto out_close_server;
+
+	err = connect(client_fd, (void *)&dst, inetaddr_len(&dst));
+	if (CHECK(err && !t->expect_errno, "connect",
+		  "unexpected error %d\n", errno))
+		goto out_close_client;
+	if (CHECK(err && t->expect_errno && errno != t->expect_errno,
+		  "connect", "unexpected error %d\n", errno))
+		goto out_close_client;
+
+	done = 0;
+	prog_idx = PROG1;
+	err = bpf_map_lookup_elem(map_fd, &prog_idx, &done);
+	CHECK(err, "bpf_map_lookup_elem", "failed\n");
+	CHECK(!done, "bpf_map_lookup_elem", "PROG1 !done\n");
+
+	done = 0;
+	prog_idx = PROG2;
+	err = bpf_map_lookup_elem(map_fd, &prog_idx, &done);
+	CHECK(err, "bpf_map_lookup_elem", "failed\n");
+	CHECK(!done, "bpf_map_lookup_elem", "PROG2 !done\n");
+
+out_close_client:
+	close(client_fd);
+out_close_server:
+	close(server_fd);
+out_unlink2:
+	bpf_link__destroy(link2);
+out_unlink1:
+	bpf_link__destroy(link1);
+}
+
+static void test_multi_prog_lookup(struct test_sk_lookup_kern *skel)
+{
+	struct test_multi_prog tests[] = {
+		{
+			.desc		= "multi prog - pass, pass",
+			.prog1		= skel->progs.multi_prog_pass1,
+			.prog2		= skel->progs.multi_prog_pass2,
+			.listen_at	= { EXT_IP4, EXT_PORT },
+		},
+		{
+			.desc		= "multi prog - drop, drop",
+			.prog1		= skel->progs.multi_prog_drop1,
+			.prog2		= skel->progs.multi_prog_drop2,
+			.listen_at	= { EXT_IP4, EXT_PORT },
+			.expect_errno	= ECONNREFUSED,
+		},
+		{
+			.desc		= "multi prog - pass, drop",
+			.prog1		= skel->progs.multi_prog_pass1,
+			.prog2		= skel->progs.multi_prog_drop2,
+			.listen_at	= { EXT_IP4, EXT_PORT },
+			.expect_errno	= ECONNREFUSED,
+		},
+		{
+			.desc		= "multi prog - drop, pass",
+			.prog1		= skel->progs.multi_prog_drop1,
+			.prog2		= skel->progs.multi_prog_pass2,
+			.listen_at	= { EXT_IP4, EXT_PORT },
+			.expect_errno	= ECONNREFUSED,
+		},
+		{
+			.desc		= "multi prog - pass, redir",
+			.prog1		= skel->progs.multi_prog_pass1,
+			.prog2		= skel->progs.multi_prog_redir2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "multi prog - redir, pass",
+			.prog1		= skel->progs.multi_prog_redir1,
+			.prog2		= skel->progs.multi_prog_pass2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "multi prog - drop, redir",
+			.prog1		= skel->progs.multi_prog_drop1,
+			.prog2		= skel->progs.multi_prog_redir2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "multi prog - redir, drop",
+			.prog1		= skel->progs.multi_prog_redir1,
+			.prog2		= skel->progs.multi_prog_drop2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+		{
+			.desc		= "multi prog - redir, redir",
+			.prog1		= skel->progs.multi_prog_redir1,
+			.prog2		= skel->progs.multi_prog_redir2,
+			.listen_at	= { INT_IP4, INT_PORT },
+		},
+	};
+	struct test_multi_prog *t;
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++) {
+		t->redir_map = skel->maps.redir_map;
+		t->run_map = skel->maps.run_map;
+		if (test__start_subtest(t->desc))
+			run_multi_prog_lookup(t);
+	}
+}
+
+static void run_tests(struct test_sk_lookup_kern *skel)
+{
+	if (test__start_subtest("query lookup prog"))
+		query_lookup_prog(skel);
+	test_redirect_lookup(skel);
+	test_drop_on_lookup(skel);
+	test_drop_on_reuseport(skel);
+	test_sk_assign_helper(skel);
+	test_multi_prog_lookup(skel);
+}
+
+static int switch_netns(void)
+{
+	static const char * const setup_script[] = {
+		"ip -6 addr add dev lo " EXT_IP6 "/128 nodad",
+		"ip -6 addr add dev lo " INT_IP6 "/128 nodad",
+		"ip link set dev lo up",
+		NULL,
+	};
+	const char * const *cmd;
+	int err;
+
+	err = unshare(CLONE_NEWNET);
+	if (CHECK(err, "unshare", "failed\n")) {
+		log_err("unshare(CLONE_NEWNET)");
+		return -1;
+	}
+
+	for (cmd = setup_script; *cmd; cmd++) {
+		err = system(*cmd);
+		if (CHECK(err, "system", "failed\n")) {
+			log_err("system(%s)", *cmd);
+			return -1;
+		}
+	}
+
+	return 0;
+}
+
+void test_sk_lookup(void)
+{
+	struct test_sk_lookup_kern *skel;
+	int err;
+
+	err = switch_netns();
+	if (err)
+		return;
+
+	skel = test_sk_lookup_kern__open_and_load();
+	if (CHECK(!skel, "skel open_and_load", "failed\n"))
+		return;
+
+	run_tests(skel);
+
+	test_sk_lookup_kern__destroy(skel);
+}
diff --git a/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c b/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
new file mode 100644
index 000000000000..b6d2ca75ac91
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
@@ -0,0 +1,639 @@
+// SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause
+// Copyright (c) 2020 Cloudflare
+
+#include <errno.h>
+#include <stdbool.h>
+#include <stddef.h>
+#include <linux/bpf.h>
+#include <linux/in.h>
+#include <sys/socket.h>
+
+#include <bpf/bpf_endian.h>
+#include <bpf/bpf_helpers.h>
+
+#define IP4(a, b, c, d)					\
+	bpf_htonl((((__u32)(a) & 0xffU) << 24) |	\
+		  (((__u32)(b) & 0xffU) << 16) |	\
+		  (((__u32)(c) & 0xffU) <<  8) |	\
+		  (((__u32)(d) & 0xffU) <<  0))
+#define IP6(aaaa, bbbb, cccc, dddd)			\
+	{ bpf_htonl(aaaa), bpf_htonl(bbbb), bpf_htonl(cccc), bpf_htonl(dddd) }
+
+#define MAX_SOCKS 32
+
+struct {
+	__uint(type, BPF_MAP_TYPE_SOCKMAP);
+	__uint(max_entries, MAX_SOCKS);
+	__type(key, __u32);
+	__type(value, __u64);
+} redir_map SEC(".maps");
+
+struct {
+	__uint(type, BPF_MAP_TYPE_ARRAY);
+	__uint(max_entries, 2);
+	__type(key, int);
+	__type(value, int);
+} run_map SEC(".maps");
+
+enum {
+	PROG1 = 0,
+	PROG2,
+};
+
+enum {
+	SERVER_A = 0,
+	SERVER_B,
+};
+
+/* Addressable key/value constants for convenience */
+static const int KEY_PROG1 = PROG1;
+static const int KEY_PROG2 = PROG2;
+static const int PROG_DONE = 1;
+
+static const __u32 KEY_SERVER_A = SERVER_A;
+static const __u32 KEY_SERVER_B = SERVER_B;
+
+static const __u16 DST_PORT = 7007; /* Host byte order */
+static const __u32 DST_IP4 = IP4(127, 0, 0, 1);
+static const __u32 DST_IP6[] = IP6(0xfd000000, 0x0, 0x0, 0x00000001);
+
+SEC("sk_lookup/lookup_pass")
+int lookup_pass(struct bpf_sk_lookup *ctx)
+{
+	return SK_PASS;
+}
+
+SEC("sk_lookup/lookup_drop")
+int lookup_drop(struct bpf_sk_lookup *ctx)
+{
+	return SK_DROP;
+}
+
+SEC("sk_reuseport/reuse_pass")
+int reuseport_pass(struct sk_reuseport_md *ctx)
+{
+	return SK_PASS;
+}
+
+SEC("sk_reuseport/reuse_drop")
+int reuseport_drop(struct sk_reuseport_md *ctx)
+{
+	return SK_DROP;
+}
+
+/* Redirect packets destined for port DST_PORT to socket at redir_map[0]. */
+SEC("sk_lookup/redir_port")
+int redir_port(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->local_port != DST_PORT)
+		return SK_PASS;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_PASS;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+/* Redirect packets destined for DST_IP4 address to socket at redir_map[0]. */
+SEC("sk_lookup/redir_ip4")
+int redir_ip4(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->family != AF_INET)
+		return SK_PASS;
+	if (ctx->local_port != DST_PORT)
+		return SK_PASS;
+	if (ctx->local_ip4 != DST_IP4)
+		return SK_PASS;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_PASS;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+/* Redirect packets destined for DST_IP6 address to socket at redir_map[0]. */
+SEC("sk_lookup/redir_ip6")
+int redir_ip6(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	if (ctx->family != AF_INET6)
+		return SK_PASS;
+	if (ctx->local_port != DST_PORT)
+		return SK_PASS;
+	if (ctx->local_ip6[0] != DST_IP6[0] ||
+	    ctx->local_ip6[1] != DST_IP6[1] ||
+	    ctx->local_ip6[2] != DST_IP6[2] ||
+	    ctx->local_ip6[3] != DST_IP6[3])
+		return SK_PASS;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_PASS;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+SEC("sk_lookup/select_sock_a")
+int select_sock_a(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_PASS;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+SEC("sk_lookup/select_sock_a_no_reuseport")
+int select_sock_a_no_reuseport(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_DROP;
+
+	err = bpf_sk_assign(ctx, sk, BPF_SK_LOOKUP_F_NO_REUSEPORT);
+	bpf_sk_release(sk);
+	return err ? SK_DROP : SK_PASS;
+}
+
+SEC("sk_reuseport/select_sock_b")
+int select_sock_b(struct sk_reuseport_md *ctx)
+{
+	__u32 key = KEY_SERVER_B;
+	int err;
+
+	err = bpf_sk_select_reuseport(ctx, &redir_map, &key, 0);
+	return err ? SK_DROP : SK_PASS;
+}
+
+/* Check that bpf_sk_assign() returns -EEXIST if socket already selected. */
+SEC("sk_lookup/sk_assign_eexist")
+int sk_assign_eexist(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err, ret;
+
+	ret = SK_DROP;
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, 0);
+	if (err)
+		goto out;
+	bpf_sk_release(sk);
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, 0);
+	if (err != -EEXIST) {
+		bpf_printk("sk_assign returned %d, expected %d\n",
+			   err, -EEXIST);
+		goto out;
+	}
+
+	ret = SK_PASS; /* Success, redirect to KEY_SERVER_B */
+out:
+	if (sk)
+		bpf_sk_release(sk);
+	return ret;
+}
+
+/* Check that bpf_sk_assign(BPF_SK_LOOKUP_F_REPLACE) can override selection. */
+SEC("sk_lookup/sk_assign_replace_flag")
+int sk_assign_replace_flag(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err, ret;
+
+	ret = SK_DROP;
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, 0);
+	if (err)
+		goto out;
+	bpf_sk_release(sk);
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, BPF_SK_LOOKUP_F_REPLACE);
+	if (err) {
+		bpf_printk("sk_assign returned %d, expected 0\n", err);
+		goto out;
+	}
+
+	ret = SK_PASS; /* Success, redirect to KEY_SERVER_B */
+out:
+	if (sk)
+		bpf_sk_release(sk);
+	return ret;
+}
+
+/* Check that bpf_sk_assign(sk=NULL) is accepted. */
+SEC("sk_lookup/sk_assign_null")
+int sk_assign_null(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk = NULL;
+	int err, ret;
+
+	ret = SK_DROP;
+
+	err = bpf_sk_assign(ctx, NULL, 0);
+	if (err) {
+		bpf_printk("sk_assign returned %d, expected 0\n", err);
+		goto out;
+	}
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (!sk)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, BPF_SK_LOOKUP_F_REPLACE);
+	if (err) {
+		bpf_printk("sk_assign returned %d, expected 0\n", err);
+		goto out;
+	}
+
+	if (ctx->sk != sk)
+		goto out;
+	err = bpf_sk_assign(ctx, NULL, 0);
+	if (err != -EEXIST)
+		goto out;
+	err = bpf_sk_assign(ctx, NULL, BPF_SK_LOOKUP_F_REPLACE);
+	if (err)
+		goto out;
+	err = bpf_sk_assign(ctx, sk, BPF_SK_LOOKUP_F_REPLACE);
+	if (err)
+		goto out;
+
+	ret = SK_PASS; /* Success, redirect to KEY_SERVER_B */
+out:
+	if (sk)
+		bpf_sk_release(sk);
+	return ret;
+}
+
+/* Check that selected sk is accessible through context. */
+SEC("sk_lookup/access_ctx_sk")
+int access_ctx_sk(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk1 = NULL, *sk2 = NULL;
+	int err, ret;
+
+	ret = SK_DROP;
+
+	/* Try accessing unassigned (NULL) ctx->sk field */
+	if (ctx->sk && ctx->sk->family != AF_INET)
+		goto out;
+
+	/* Assign a value to ctx->sk */
+	sk1 = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk1)
+		goto out;
+	err = bpf_sk_assign(ctx, sk1, 0);
+	if (err)
+		goto out;
+	if (ctx->sk != sk1)
+		goto out;
+
+	/* Access ctx->sk fields */
+	if (ctx->sk->family != AF_INET ||
+	    ctx->sk->type != SOCK_STREAM ||
+	    ctx->sk->state != BPF_TCP_LISTEN)
+		goto out;
+
+	/* Reset selection */
+	err = bpf_sk_assign(ctx, NULL, BPF_SK_LOOKUP_F_REPLACE);
+	if (err)
+		goto out;
+	if (ctx->sk)
+		goto out;
+
+	/* Assign another socket */
+	sk2 = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (!sk2)
+		goto out;
+	err = bpf_sk_assign(ctx, sk2, BPF_SK_LOOKUP_F_REPLACE);
+	if (err)
+		goto out;
+	if (ctx->sk != sk2)
+		goto out;
+
+	/* Access reassigned ctx->sk fields */
+	if (ctx->sk->family != AF_INET ||
+	    ctx->sk->type != SOCK_STREAM ||
+	    ctx->sk->state != BPF_TCP_LISTEN)
+		goto out;
+
+	ret = SK_PASS; /* Success, redirect to KEY_SERVER_B */
+out:
+	if (sk1)
+		bpf_sk_release(sk1);
+	if (sk2)
+		bpf_sk_release(sk2);
+	return ret;
+}
+
+/* Check narrow loads from ctx fields that support them */
+SEC("sk_lookup/ctx_narrow_access")
+int ctx_narrow_access(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err, family;
+	__u16 *half;
+	__u8 *byte;
+	bool v4;
+
+	v4 = (ctx->family == AF_INET);
+
+	/* Narrow loads from family field */
+	byte = (__u8 *)&ctx->family;
+	half = (__u16 *)&ctx->family;
+	if (byte[0] != (v4 ? AF_INET : AF_INET6) ||
+	    byte[1] != 0 || byte[2] != 0 || byte[3] != 0)
+		return SK_DROP;
+	if (half[0] != (v4 ? AF_INET : AF_INET6) ||
+	    half[1] != 0)
+		return SK_DROP;
+
+	byte = (__u8 *)&ctx->protocol;
+	if (byte[0] != IPPROTO_TCP ||
+	    byte[1] != 0 || byte[2] != 0 || byte[3] != 0)
+		return SK_DROP;
+	half = (__u16 *)&ctx->protocol;
+	if (half[0] != IPPROTO_TCP ||
+	    half[1] != 0)
+		return SK_DROP;
+
+	/* Narrow loads from remote_port field. Expect non-0 value. */
+	byte = (__u8 *)&ctx->remote_port;
+	if (byte[0] == 0 && byte[1] == 0 && byte[2] == 0 && byte[3] == 0)
+		return SK_DROP;
+	half = (__u16 *)&ctx->remote_port;
+	if (half[0] == 0 && half[1] == 0)
+		return SK_DROP;
+
+	/* Narrow loads from local_port field. Expect DST_PORT. */
+	byte = (__u8 *)&ctx->local_port;
+	if (byte[0] != ((DST_PORT >> 0) & 0xff) ||
+	    byte[1] != ((DST_PORT >> 8) & 0xff) ||
+	    byte[2] != 0 || byte[3] != 0)
+		return SK_DROP;
+	half = (__u16 *)&ctx->local_port;
+	if (half[0] != DST_PORT ||
+	    half[1] != 0)
+		return SK_DROP;
+
+	/* Narrow loads from IPv4 fields */
+	if (v4) {
+		/* Expect non-0.0.0.0 in remote_ip4 */
+		byte = (__u8 *)&ctx->remote_ip4;
+		if (byte[0] == 0 && byte[1] == 0 &&
+		    byte[2] == 0 && byte[3] == 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->remote_ip4;
+		if (half[0] == 0 && half[1] == 0)
+			return SK_DROP;
+
+		/* Expect DST_IP4 in local_ip4 */
+		byte = (__u8 *)&ctx->local_ip4;
+		if (byte[0] != ((DST_IP4 >>  0) & 0xff) ||
+		    byte[1] != ((DST_IP4 >>  8) & 0xff) ||
+		    byte[2] != ((DST_IP4 >> 16) & 0xff) ||
+		    byte[3] != ((DST_IP4 >> 24) & 0xff))
+			return SK_DROP;
+		half = (__u16 *)&ctx->local_ip4;
+		if (half[0] != ((DST_IP4 >>  0) & 0xffff) ||
+		    half[1] != ((DST_IP4 >> 16) & 0xffff))
+			return SK_DROP;
+	} else {
+		/* Expect 0.0.0.0 IPs when family != AF_INET */
+		byte = (__u8 *)&ctx->remote_ip4;
+		if (byte[0] != 0 || byte[1] != 0 &&
+		    byte[2] != 0 || byte[3] != 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->remote_ip4;
+		if (half[0] != 0 || half[1] != 0)
+			return SK_DROP;
+
+		byte = (__u8 *)&ctx->local_ip4;
+		if (byte[0] != 0 || byte[1] != 0 &&
+		    byte[2] != 0 || byte[3] != 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->local_ip4;
+		if (half[0] != 0 || half[1] != 0)
+			return SK_DROP;
+	}
+
+	/* Narrow loads from IPv6 fields */
+	if (!v4) {
+		/* Expenct non-:: IP in remote_ip6 */
+		byte = (__u8 *)&ctx->remote_ip6;
+		if (byte[0] == 0 && byte[1] == 0 &&
+		    byte[2] == 0 && byte[3] == 0 &&
+		    byte[4] == 0 && byte[5] == 0 &&
+		    byte[6] == 0 && byte[7] == 0 &&
+		    byte[8] == 0 && byte[9] == 0 &&
+		    byte[10] == 0 && byte[11] == 0 &&
+		    byte[12] == 0 && byte[13] == 0 &&
+		    byte[14] == 0 && byte[15] == 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->remote_ip6;
+		if (half[0] == 0 && half[1] == 0 &&
+		    half[2] == 0 && half[3] == 0 &&
+		    half[4] == 0 && half[5] == 0 &&
+		    half[6] == 0 && half[7] == 0)
+			return SK_DROP;
+
+		/* Expect DST_IP6 in local_ip6 */
+		byte = (__u8 *)&ctx->local_ip6;
+		if (byte[0] != ((DST_IP6[0] >>  0) & 0xff) ||
+		    byte[1] != ((DST_IP6[0] >>  8) & 0xff) ||
+		    byte[2] != ((DST_IP6[0] >> 16) & 0xff) ||
+		    byte[3] != ((DST_IP6[0] >> 24) & 0xff) ||
+		    byte[4] != ((DST_IP6[1] >>  0) & 0xff) ||
+		    byte[5] != ((DST_IP6[1] >>  8) & 0xff) ||
+		    byte[6] != ((DST_IP6[1] >> 16) & 0xff) ||
+		    byte[7] != ((DST_IP6[1] >> 24) & 0xff) ||
+		    byte[8] != ((DST_IP6[2] >>  0) & 0xff) ||
+		    byte[9] != ((DST_IP6[2] >>  8) & 0xff) ||
+		    byte[10] != ((DST_IP6[2] >> 16) & 0xff) ||
+		    byte[11] != ((DST_IP6[2] >> 24) & 0xff) ||
+		    byte[12] != ((DST_IP6[3] >>  0) & 0xff) ||
+		    byte[13] != ((DST_IP6[3] >>  8) & 0xff) ||
+		    byte[14] != ((DST_IP6[3] >> 16) & 0xff) ||
+		    byte[15] != ((DST_IP6[3] >> 24) & 0xff))
+			return SK_DROP;
+		half = (__u16 *)&ctx->local_ip6;
+		if (half[0] != ((DST_IP6[0] >>  0) & 0xffff) ||
+		    half[1] != ((DST_IP6[0] >> 16) & 0xffff) ||
+		    half[2] != ((DST_IP6[1] >>  0) & 0xffff) ||
+		    half[3] != ((DST_IP6[1] >> 16) & 0xffff) ||
+		    half[4] != ((DST_IP6[2] >>  0) & 0xffff) ||
+		    half[5] != ((DST_IP6[2] >> 16) & 0xffff) ||
+		    half[6] != ((DST_IP6[3] >>  0) & 0xffff) ||
+		    half[7] != ((DST_IP6[3] >> 16) & 0xffff))
+			return SK_DROP;
+	} else {
+		/* Expect :: IPs when family != AF_INET6 */
+		byte = (__u8 *)&ctx->remote_ip6;
+		if (byte[0] != 0 || byte[1] != 0 ||
+		    byte[2] != 0 || byte[3] != 0 ||
+		    byte[4] != 0 || byte[5] != 0 ||
+		    byte[6] != 0 || byte[7] != 0 ||
+		    byte[8] != 0 || byte[9] != 0 ||
+		    byte[10] != 0 || byte[11] != 0 ||
+		    byte[12] != 0 || byte[13] != 0 ||
+		    byte[14] != 0 || byte[15] != 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->remote_ip6;
+		if (half[0] != 0 || half[1] != 0 ||
+		    half[2] != 0 || half[3] != 0 ||
+		    half[4] != 0 || half[5] != 0 ||
+		    half[6] != 0 || half[7] != 0)
+			return SK_DROP;
+
+		byte = (__u8 *)&ctx->local_ip6;
+		if (byte[0] != 0 || byte[1] != 0 ||
+		    byte[2] != 0 || byte[3] != 0 ||
+		    byte[4] != 0 || byte[5] != 0 ||
+		    byte[6] != 0 || byte[7] != 0 ||
+		    byte[8] != 0 || byte[9] != 0 ||
+		    byte[10] != 0 || byte[11] != 0 ||
+		    byte[12] != 0 || byte[13] != 0 ||
+		    byte[14] != 0 || byte[15] != 0)
+			return SK_DROP;
+		half = (__u16 *)&ctx->local_ip6;
+		if (half[0] != 0 || half[1] != 0 ||
+		    half[2] != 0 || half[3] != 0 ||
+		    half[4] != 0 || half[5] != 0 ||
+		    half[6] != 0 || half[7] != 0)
+			return SK_DROP;
+	}
+
+	/* Success, redirect to KEY_SERVER_B */
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_B);
+	if (sk) {
+		bpf_sk_assign(ctx, sk, 0);
+		bpf_sk_release(sk);
+	}
+	return SK_PASS;
+}
+
+/* Check that sk_assign rejects SERVER_A socket with -ESOCKNOSUPPORT */
+SEC("sk_lookup/sk_assign_esocknosupport")
+int sk_assign_esocknosupport(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err, ret;
+
+	ret = SK_DROP;
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		goto out;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	if (err != -ESOCKTNOSUPPORT) {
+		bpf_printk("sk_assign returned %d, expected %d\n",
+			   err, -ESOCKTNOSUPPORT);
+		goto out;
+	}
+
+	ret = SK_PASS; /* Success, pass to regular lookup */
+out:
+	if (sk)
+		bpf_sk_release(sk);
+	return ret;
+}
+
+SEC("sk_lookup/multi_prog_pass1")
+int multi_prog_pass1(struct bpf_sk_lookup *ctx)
+{
+	bpf_map_update_elem(&run_map, &KEY_PROG1, &PROG_DONE, BPF_ANY);
+	return SK_PASS;
+}
+
+SEC("sk_lookup/multi_prog_pass2")
+int multi_prog_pass2(struct bpf_sk_lookup *ctx)
+{
+	bpf_map_update_elem(&run_map, &KEY_PROG2, &PROG_DONE, BPF_ANY);
+	return SK_PASS;
+}
+
+SEC("sk_lookup/multi_prog_drop1")
+int multi_prog_drop1(struct bpf_sk_lookup *ctx)
+{
+	bpf_map_update_elem(&run_map, &KEY_PROG1, &PROG_DONE, BPF_ANY);
+	return SK_DROP;
+}
+
+SEC("sk_lookup/multi_prog_drop2")
+int multi_prog_drop2(struct bpf_sk_lookup *ctx)
+{
+	bpf_map_update_elem(&run_map, &KEY_PROG2, &PROG_DONE, BPF_ANY);
+	return SK_DROP;
+}
+
+static __always_inline int select_server_a(struct bpf_sk_lookup *ctx)
+{
+	struct bpf_sock *sk;
+	int err;
+
+	sk = bpf_map_lookup_elem(&redir_map, &KEY_SERVER_A);
+	if (!sk)
+		return SK_DROP;
+
+	err = bpf_sk_assign(ctx, sk, 0);
+	bpf_sk_release(sk);
+	if (err)
+		return SK_DROP;
+
+	return SK_PASS;
+}
+
+SEC("sk_lookup/multi_prog_redir1")
+int multi_prog_redir1(struct bpf_sk_lookup *ctx)
+{
+	int ret;
+
+	ret = select_server_a(ctx);
+	bpf_map_update_elem(&run_map, &KEY_PROG1, &PROG_DONE, BPF_ANY);
+	return SK_PASS;
+}
+
+SEC("sk_lookup/multi_prog_redir2")
+int multi_prog_redir2(struct bpf_sk_lookup *ctx)
+{
+	int ret;
+
+	ret = select_server_a(ctx);
+	bpf_map_update_elem(&run_map, &KEY_PROG2, &PROG_DONE, BPF_ANY);
+	return SK_PASS;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
+__u32 _version SEC("version") = 1;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 01/16] bpf, netns: Handle multiple link attachments
  2020-07-13 17:46 ` [PATCH bpf-next v4 01/16] bpf, netns: Handle multiple link attachments Jakub Sitnicki
@ 2020-07-15 21:30   ` Andrii Nakryiko
  0 siblings, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2020-07-15 21:30 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Extend the BPF netns link callbacks to rebuild (grow/shrink) or update the
> prog_array at given position when link gets attached/updated/released.
>
> This let's us lift the limit of having just one link attached for the new
> attach type introduced by subsequent patch.
>
> No functional changes intended.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

LGTM.

Acked-by: Andrii Nakryiko <andriin@fb.com>

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-07-13 17:46 ` [PATCH bpf-next v4 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
@ 2020-07-16  1:41   ` Andrii Nakryiko
  2020-07-16 12:17     ` Jakub Sitnicki
  0 siblings, 1 reply; 27+ messages in thread
From: Andrii Nakryiko @ 2020-07-16  1:41 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Marek Majkowski

On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
> BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
> when looking up a listening socket for a new connection request for
> connection oriented protocols, or when looking up an unconnected socket for
> a packet for connection-less protocols.
>
> When called, SK_LOOKUP BPF program can select a socket that will receive
> the packet. This serves as a mechanism to overcome the limits of what
> bind() API allows to express. Two use-cases driving this work are:
>
>  (1) steer packets destined to an IP range, on fixed port to a socket
>
>      192.0.2.0/24, port 80 -> NGINX socket
>
>  (2) steer packets destined to an IP address, on any port to a socket
>
>      198.51.100.1, any port -> L7 proxy socket
>
> In its run-time context program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple. Context can be further extended to include ingress
> interface identifier.
>
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection. Transport layer then uses the selected
> socket as a result of socket lookup.
>
> This patch only enables the user to attach an SK_LOOKUP program to a
> network namespace. Subsequent patches hook it up to run on local delivery
> path in ipv4 and ipv6 stacks.
>
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>
> Notes:
>     v4:
>     - Reintroduce narrow load support for most BPF context fields. (Yonghong)
>     - Fix null-ptr-deref in BPF context access when IPv6 address not set.
>     - Unpack v4/v6 IP address union in bpf_sk_lookup context type.
>     - Add verifier support for ARG_PTR_TO_SOCKET_OR_NULL.
>     - Allow resetting socket selection with bpf_sk_assign(ctx, NULL).
>     - Document that bpf_sk_assign accepts a NULL socket.
>
>     v3:
>     - Allow bpf_sk_assign helper to replace previously selected socket only
>       when BPF_SK_LOOKUP_F_REPLACE flag is set, as a precaution for multiple
>       programs running in series to accidentally override each other's verdict.
>     - Let BPF program decide that load-balancing within a reuseport socket group
>       should be skipped for the socket selected with bpf_sk_assign() by passing
>       BPF_SK_LOOKUP_F_NO_REUSEPORT flag. (Martin)
>     - Extend struct bpf_sk_lookup program context with an 'sk' field containing
>       the selected socket with an intention for multiple attached program
>       running in series to see each other's choices. However, currently the
>       verifier doesn't allow checking if pointer is set.
>     - Use bpf-netns infra for link-based multi-program attachment. (Alexei)
>     - Get rid of macros in convert_ctx_access to make it easier to read.
>     - Disallow 1-,2-byte access to context fields containing IP addresses.
>
>     v2:
>     - Make bpf_sk_assign reject sockets that don't use RCU freeing.
>       Update bpf_sk_assign docs accordingly. (Martin)
>     - Change bpf_sk_assign proto to take PTR_TO_SOCKET as argument. (Martin)
>     - Fix broken build when CONFIG_INET is not selected. (Martin)
>     - Rename bpf_sk_lookup{} src_/dst_* fields remote_/local_*. (Martin)
>     - Enforce BPF_SK_LOOKUP attach point on load & attach. (Martin)
>
>  include/linux/bpf-netns.h  |   3 +
>  include/linux/bpf.h        |   1 +
>  include/linux/bpf_types.h  |   2 +
>  include/linux/filter.h     |  17 ++++
>  include/uapi/linux/bpf.h   |  77 ++++++++++++++++
>  kernel/bpf/net_namespace.c |   5 ++
>  kernel/bpf/syscall.c       |   9 ++
>  kernel/bpf/verifier.c      |  10 ++-
>  net/core/filter.c          | 179 +++++++++++++++++++++++++++++++++++++
>  scripts/bpf_helpers_doc.py |   9 +-
>  10 files changed, 308 insertions(+), 4 deletions(-)
>

Looks good, two suggestions below.

Acked-by: Andrii Nakryiko <andriin@fb.com>

[...]

> +
> +static const struct bpf_func_proto *
> +sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
> +{
> +       switch (func_id) {
> +       case BPF_FUNC_sk_assign:
> +               return &bpf_sk_lookup_assign_proto;
> +       case BPF_FUNC_sk_release:
> +               return &bpf_sk_release_proto;
> +       default:

Wouldn't it be useful to have functions like
get_current_comm/get_current_pid_tgid/perf_event_output as well?
Similarly how they were added to a bunch of other socket-related BPF
program types recently?


> +               return bpf_base_func_proto(func_id);
> +       }
> +}
> +

[...]

> +       case offsetof(struct bpf_sk_lookup, local_ip4):
> +               *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
> +                                     bpf_target_off(struct bpf_sk_lookup_kern,
> +                                                    v4.daddr, 4, target_size));
> +               break;
> +
> +       case bpf_ctx_range_till(struct bpf_sk_lookup,
> +                               remote_ip6[0], remote_ip6[3]):
> +#if IS_ENABLED(CONFIG_IPV6)

nit: if you added {} to this case block, you could have combined the
above `int off` section with this one.

> +               off = si->off;
> +               off -= offsetof(struct bpf_sk_lookup, remote_ip6[0]);
> +               off += bpf_target_off(struct in6_addr, s6_addr32[0], 4, target_size);
> +               *insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
> +                                     offsetof(struct bpf_sk_lookup_kern, v6.saddr));
> +               *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1);
> +               *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
> +#else
> +               *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
> +#endif
> +               break;
> +

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 03/16] inet: Extract helper for selecting socket from reuseport group
  2020-07-13 17:46 ` [PATCH bpf-next v4 03/16] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
@ 2020-07-16  1:44   ` Andrii Nakryiko
  0 siblings, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2020-07-16  1:44 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Prepare for calling into reuseport from __inet_lookup_listener as well.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

Acked-by: Andrii Nakryiko <andriin@fb.com>

>  net/ipv4/inet_hashtables.c | 29 ++++++++++++++++++++---------
>  1 file changed, 20 insertions(+), 9 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type
  2020-07-13 17:46 ` [PATCH bpf-next v4 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
@ 2020-07-16  2:10   ` Andrii Nakryiko
  0 siblings, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2020-07-16  2:10 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

On Mon, Jul 13, 2020 at 10:48 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Make bpftool show human-friendly identifiers for newly introduced program
> and attach type, BPF_PROG_TYPE_SK_LOOKUP and BPF_SK_LOOKUP, respectively.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---

You should probably also update bash-completion file, but please do it
as a follow-up.

Acked-by: Andrii Nakryiko <andriin@fb.com>

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access
  2020-07-13 17:46 ` [PATCH bpf-next v4 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
@ 2020-07-16  2:13   ` Andrii Nakryiko
  0 siblings, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2020-07-16  2:13 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

On Mon, Jul 13, 2020 at 10:48 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Exercise verifier access checks for bpf_sk_lookup context fields.
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>

LGTM.

Acked-by: Andrii Nakryiko <andriin@fb.com>

> Notes:
>     v4:
>     - Bring back tests for narrow loads.
>
>     v3:
>     - Consolidate ACCEPT tests into one.
>     - Deduplicate REJECT tests and arrange them into logical groups.
>     - Add tests for out-of-bounds and unaligned access.
>     - Cover access to newly introduced 'sk' field.
>
>     v2:
>      - Adjust for fields renames in struct bpf_sk_lookup.
>
>  .../selftests/bpf/verifier/ctx_sk_lookup.c    | 471 ++++++++++++++++++
>  1 file changed, 471 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/verifier/ctx_sk_lookup.c
>

[...]

> +               /* 1-byte read from local_port field */
> +               BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, local_port)),
> +               BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, local_port) + 1),
> +               BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, local_port) + 2),
> +               BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, local_port) + 3),
> +               /* 2-byte read from local_port field */
> +               BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, local_port)),
> +               BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, local_port) + 2),
> +               /* 4-byte read from local_port field */
> +               BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, local_port)),
> +
> +               /* 8-byte read from sk field */
> +               BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, sk)),
> +               BPF_EXIT_INSN(),
> +       },
> +       .result = ACCEPT,
> +       .prog_type = BPF_PROG_TYPE_SK_LOOKUP,
> +       .expected_attach_type = BPF_SK_LOOKUP,
> +},

This looks like a common class of tests which can be auto-generated
just from the list of fields and their sizes. Something for someone's
wishlist, though.

> +/* invalid 8-byte reads from a 4-byte fields in bpf_sk_lookup */
> +{
> +       "invalid 8-byte read from bpf_sk_lookup family field",
> +       .insns = {
> +               BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
> +                           offsetof(struct bpf_sk_lookup, family)),
> +               BPF_EXIT_INSN(),
> +       },
> +       .errstr = "invalid bpf_context access",
> +       .result = REJECT,
> +       .prog_type = BPF_PROG_TYPE_SK_LOOKUP,
> +       .expected_attach_type = BPF_SK_LOOKUP,
> +},

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point
  2020-07-13 17:46 ` [PATCH bpf-next v4 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
@ 2020-07-16  2:19   ` Andrii Nakryiko
  0 siblings, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2020-07-16  2:19 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski

On Mon, Jul 13, 2020 at 10:48 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Add tests to test_progs that exercise:
>
>  - attaching/detaching/querying programs to BPF_SK_LOOKUP hook,
>  - redirecting socket lookup to a socket selected by BPF program,
>  - failing a socket lookup on BPF program's request,
>  - error scenarios for selecting a socket from BPF program,
>  - accessing BPF program context,
>  - attaching and running multiple BPF programs.
>
> Run log:
>
>   # ./test_progs -n 69
>   #69/1 query lookup prog:OK
>   #69/2 TCP IPv4 redir port:OK

[...]

>   #69/42 multi prog - drop, redir:OK
>   #69/43 multi prog - redir, drop:OK
>   #69/44 multi prog - redir, redir:OK
>   #69 sk_lookup:OK
>   Summary: 1/44 PASSED, 0 SKIPPED, 0 FAILED
>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>

nit: It's not universal, unfortunately, but the typical naming pattern
for selftest are: progs/test_<something>.c for BPF part, and
prog_tests/<something>.c for user-space part. Just FYI for the future.

But looks good to me either way.

Acked-by: Andrii Nakryiko <andriin@fb.com>

> Notes:
>     v4:
>     - Remove system("bpftool ...") call left over from debugging. (Lorenz)
>     - Dedup BPF code that selects a socket. (Lorenz)
>     - Switch from CHECK_FAIL to CHECK macro. (Andrii)
>     - Extract a network_helper that wraps inet_pton.
>     - Don't restore netns now that test_progs does it.
>     - Cover bpf_sk_assign(ctx, NULL) in tests.
>     - Cover narrow loads in tests.
>     - Cover NULL ctx->sk access attempts in tests.
>     - Cover accessing IPv6 ctx fields on IPv4 lookup.
>
>     v3:
>     - Extend tests to cover new functionality in v3:
>       - multi-prog attachments (query, running, verdict precedence)
>       - socket selecting for the second time with bpf_sk_assign
>       - skipping over reuseport load-balancing
>
>     v2:
>      - Adjust for fields renames in struct bpf_sk_lookup.
>
>  tools/testing/selftests/bpf/network_helpers.c |   58 +-
>  tools/testing/selftests/bpf/network_helpers.h |    2 +
>  .../selftests/bpf/prog_tests/sk_lookup.c      | 1282 +++++++++++++++++
>  .../selftests/bpf/progs/test_sk_lookup_kern.c |  639 ++++++++
>  4 files changed, 1958 insertions(+), 23 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/sk_lookup.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_sk_lookup_kern.c
>

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup
  2020-07-13 17:46 ` [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
@ 2020-07-16  2:23   ` Andrii Nakryiko
  2020-07-16 12:32     ` Jakub Sitnicki
  0 siblings, 1 reply; 27+ messages in thread
From: Andrii Nakryiko @ 2020-07-16  2:23 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Marek Majkowski

On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Run a BPF program before looking up a listening socket on the receive path.
> Program selects a listening socket to yield as result of socket lookup by
> calling bpf_sk_assign() helper and returning SK_PASS code. Program can
> revert its decision by assigning a NULL socket with bpf_sk_assign().
>
> Alternatively, BPF program can also fail the lookup by returning with
> SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
> no socket has not been selected with bpf_sk_assign(). Other return values

you probably meant "no socket has been selected"?

> are treated the same as SK_DROP.


Why not enforce it instead? Check check_return_code() in verifier.c,
it's trivial to do it for SK_LOOKUP.


>
> This lets the user match packets with listening sockets freely at the last
> possible point on the receive path, where we know that packets are destined
> for local delivery after undergoing policing, filtering, and routing.
>
> With BPF code selecting the socket, directing packets destined to an IP
> range or to a port range to a single socket becomes possible.
>
> In case multiple programs are attached, they are run in series in the order
> in which they were attached. The end result is determined from return codes
> of all the programs according to following rules:
>
>  1. If any program returned SK_PASS and selected a valid socket, the socket
>     is used as result of socket lookup.
>  2. If more than one program returned SK_PASS and selected a socket,
>     last selection takes effect.
>  3. If any program returned SK_DROP or an invalid return code, and no
>     program returned SK_PASS and selected a socket, socket lookup fails
>     with -ECONNREFUSED.
>  4. If all programs returned SK_PASS and none of them selected a socket,
>     socket lookup continues to htable-based lookup.
>
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>
> Notes:
>     v4:
>     - Reduce BPF sk_lookup prog return codes to SK_PASS/SK_DROP. (Lorenz)

your description above still assumes prog can return something besides
SK_PASS and SK_DROP?

>     - Default to drop & warn on illegal return value from BPF prog. (Lorenz)
>     - Rename netns_bpf_attach_type_enable/disable to _need/unneed. (Lorenz)
>     - Export bpf_sk_lookup_enabled symbol for CONFIG_IPV6=m (kernel test robot)
>     - Invert return value from bpf_sk_lookup_run_v4 to true on skip reuseport.
>     - Move dedicated prog_array runner close to its callers in filter.h.
>
>     v3:
>     - Use a static_key to minimize the hook overhead when not used. (Alexei)
>     - Adapt for running an array of attached programs. (Alexei)
>     - Adapt for optionally skipping reuseport selection. (Martin)
>
>  include/linux/filter.h     | 102 +++++++++++++++++++++++++++++++++++++
>  kernel/bpf/net_namespace.c |  32 +++++++++++-
>  net/core/filter.c          |   3 ++
>  net/ipv4/inet_hashtables.c |  31 +++++++++++
>  4 files changed, 167 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index 380746f47fa1..b9ad0fdabca5 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1295,4 +1295,106 @@ struct bpf_sk_lookup_kern {
>         bool            no_reuseport;
>  };
>
> +extern struct static_key_false bpf_sk_lookup_enabled;
> +
> +/* Runners for BPF_SK_LOOKUP programs to invoke on socket lookup.
> + *
> + * Allowed return values for a BPF SK_LOOKUP program are SK_PASS and
> + * SK_DROP. Any other return value is treated as SK_DROP. Their
> + * meaning is as follows:
> + *
> + *  SK_PASS && ctx.selected_sk != NULL: use selected_sk as lookup result
> + *  SK_PASS && ctx.selected_sk == NULL: continue to htable-based socket lookup
> + *  SK_DROP                           : terminate lookup with -ECONNREFUSED
> + *
> + * This macro aggregates return values and selected sockets from
> + * multiple BPF programs according to following rules:
> + *
> + *  1. If any program returned SK_PASS and a non-NULL ctx.selected_sk,
> + *     macro result is SK_PASS and last ctx.selected_sk is used.
> + *  2. If any program returned non-SK_PASS return value,
> + *     macro result is the last non-SK_PASS return value.
> + *  3. Otherwise result is SK_PASS and ctx.selected_sk is NULL.
> + *
> + * Caller must ensure that the prog array is non-NULL, and that the
> + * array as well as the programs it contains remain valid.
> + */
> +#define BPF_PROG_SK_LOOKUP_RUN_ARRAY(array, ctx, func)                 \
> +       ({                                                              \
> +               struct bpf_sk_lookup_kern *_ctx = &(ctx);               \
> +               struct bpf_prog_array_item *_item;                      \
> +               struct sock *_selected_sk;                              \
> +               struct bpf_prog *_prog;                                 \
> +               u32 _ret, _last_ret;                                    \
> +               bool _no_reuseport;                                     \
> +                                                                       \
> +               migrate_disable();                                      \
> +               _last_ret = SK_PASS;                                    \
> +               _selected_sk = NULL;                                    \
> +               _no_reuseport = false;                                  \

these three could be moved before migrate_disable(), or even better
just initialize corresponding variables above?


> +               _item = &(array)->items[0];                             \
> +               while ((_prog = READ_ONCE(_item->prog))) {              \
> +                       /* restore most recent selection */             \
> +                       _ctx->selected_sk = _selected_sk;               \
> +                       _ctx->no_reuseport = _no_reuseport;             \
> +                                                                       \
> +                       _ret = func(_prog, _ctx);                       \
> +                       if (_ret == SK_PASS) {                          \
> +                               /* remember last non-NULL socket */     \
> +                               if (_ctx->selected_sk) {                \
> +                                       _selected_sk = _ctx->selected_sk;       \
> +                                       _no_reuseport = _ctx->no_reuseport;     \
> +                               }                                       \
> +                       } else {                                        \
> +                               /* remember last non-PASS ret code */   \
> +                               _last_ret = _ret;                       \
> +                       }                                               \
> +                       _item++;                                        \
> +               }                                                       \
> +               _ctx->selected_sk = _selected_sk;                       \
> +               _ctx->no_reuseport = _no_reuseport;                     \
> +               migrate_enable();                                       \
> +               _ctx->selected_sk ? SK_PASS : _last_ret;                \
> +        })
> +

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup
  2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
                   ` (15 preceding siblings ...)
  2020-07-13 17:46 ` [PATCH bpf-next v4 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
@ 2020-07-16  2:25 ` Andrii Nakryiko
  16 siblings, 0 replies; 27+ messages in thread
From: Andrii Nakryiko @ 2020-07-16  2:25 UTC (permalink / raw)
  To: Jakub Sitnicki
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Andrii Nakryiko, Lorenz Bauer, Marek Majkowski, Martin KaFai Lau,
	Yonghong Song

On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> Dependencies
> ============
>
> This patch series depends on:
>
> 1. 'bpf-multi-prog-prep' series in 'bpf' [0]
>    (commit 951f38cf0835 ("Merge branch 'bpf-multi-prog-prep'"))
> 2. "bpf: Shift and mask loads narrower than context field size" patch
>    https://lore.kernel.org/bpf/20200710173123.427983-1-jakub@cloudflare.com/
>

[...]

>
> Overview
> ========
>
> This series proposes a new BPF program type named BPF_PROG_TYPE_SK_LOOKUP,
> or BPF sk_lookup for short.
>
> BPF sk_lookup program runs when transport layer is looking up a listening
> socket for a new connection request (TCP), or when looking up an
> unconnected socket for a packet (UDP).
>
> This serves as a mechanism to overcome the limits of what bind() API allows
> to express. Two use-cases driving this work are:
>
>  (1) steer packets destined to an IP range, fixed port to a single socket
>
>      192.0.2.0/24, port 80 -> NGINX socket
>
>  (2) steer packets destined to an IP address, any port to a single socket
>
>      198.51.100.1, any port -> L7 proxy socket
>
> In its context, program receives information about the packet that
> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
> address 4-tuple.
>
> To select a socket BPF program fetches it from a map holding socket
> references, like SOCKMAP or SOCKHASH, calls bpf_sk_assign(ctx, sk, ...)
> helper to record the selection, and returns SK_PASS code. Transport layer
> then uses the selected socket as a result of socket lookup.
>
> Alternatively, program can also fail the lookup (SK_DROP), or let the
> lookup continue as usual (SK_PASS without selecting a socket).
>
> This lets the user match packets with listening (TCP) or receiving (UDP)
> sockets freely at the last possible point on the receive path, where we
> know that packets are destined for local delivery after undergoing
> policing, filtering, and routing.
>
> Program is attached to a network namespace, similar to BPF flow_dissector.
> We add a new attach type, BPF_SK_LOOKUP, for this. Multiple programs can be
> attached at the same time, in which case their return values are aggregated
> according the rules outlined in patch #4 description.
>
> Series structure
> ================
>
> Patches are organized as so:
>
>  1: enables multiple link-based prog attachments for bpf-netns
>  2: introduces sk_lookup program type
>  3-4: hook up the program to run on ipv4/tcp socket lookup
>  5-6: hook up the program to run on ipv6/tcp socket lookup
>  7-8: hook up the program to run on ipv4/udp socket lookup
>  9-10: hook up the program to run on ipv6/udp socket lookup
>  11-13: libbpf & bpftool support for sk_lookup
>  14-16: verifier and selftests for sk_lookup
>
> Patches are also available on GH:
>
>   https://github.com/jsitnicki/linux/commits/bpf-inet-lookup-v4
>
> Follow-up work
> ==============
>
> I'll follow up with below items, which IMHO don't block the review:
>
> - benchmark results for udp6 small packet flood scenario,
> - user docs for new BPF prog type, Documentation/bpf/prog_sk_lookup.rst,
> - timeout for accept() in tests after extending network_helper.[ch].
>

Looks good to me overall. I've looked through networking-specific code
and didn't spot anything, but I might be missing some subtleties,
hopefully not, though.

I left a few suggestions, please take a look, and if they make sense,
apply them in the follow up(s). Thanks!

For the series:

Acked-by: Andrii Nakryiko <andriin@fb.com>

> Thanks to the reviewers for their feedback to this patch series:
>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Cc: Andrii Nakryiko <andriin@fb.com>
> Cc: Lorenz Bauer <lmb@cloudflare.com>
> Cc: Marek Majkowski <marek@cloudflare.com>
> Cc: Martin KaFai Lau <kafai@fb.com>
> Cc: Yonghong Song <yhs@fb.com>
>
> -jkbs
>

[...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point
  2020-07-16  1:41   ` Andrii Nakryiko
@ 2020-07-16 12:17     ` Jakub Sitnicki
  0 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-16 12:17 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Marek Majkowski

On Thu, Jul 16, 2020 at 03:41 AM CEST, Andrii Nakryiko wrote:
> On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Add a new program type BPF_PROG_TYPE_SK_LOOKUP with a dedicated attach type
>> BPF_SK_LOOKUP. The new program kind is to be invoked by the transport layer
>> when looking up a listening socket for a new connection request for
>> connection oriented protocols, or when looking up an unconnected socket for
>> a packet for connection-less protocols.
>>
>> When called, SK_LOOKUP BPF program can select a socket that will receive
>> the packet. This serves as a mechanism to overcome the limits of what
>> bind() API allows to express. Two use-cases driving this work are:
>>
>>  (1) steer packets destined to an IP range, on fixed port to a socket
>>
>>      192.0.2.0/24, port 80 -> NGINX socket
>>
>>  (2) steer packets destined to an IP address, on any port to a socket
>>
>>      198.51.100.1, any port -> L7 proxy socket
>>
>> In its run-time context program receives information about the packet that
>> triggered the socket lookup. Namely IP version, L4 protocol identifier, and
>> address 4-tuple. Context can be further extended to include ingress
>> interface identifier.
>>
>> To select a socket BPF program fetches it from a map holding socket
>> references, like SOCKMAP or SOCKHASH, and calls bpf_sk_assign(ctx, sk, ...)
>> helper to record the selection. Transport layer then uses the selected
>> socket as a result of socket lookup.
>>
>> This patch only enables the user to attach an SK_LOOKUP program to a
>> network namespace. Subsequent patches hook it up to run on local delivery
>> path in ipv4 and ipv6 stacks.
>>
>> Suggested-by: Marek Majkowski <marek@cloudflare.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>
>> Notes:
>>     v4:
>>     - Reintroduce narrow load support for most BPF context fields. (Yonghong)
>>     - Fix null-ptr-deref in BPF context access when IPv6 address not set.
>>     - Unpack v4/v6 IP address union in bpf_sk_lookup context type.
>>     - Add verifier support for ARG_PTR_TO_SOCKET_OR_NULL.
>>     - Allow resetting socket selection with bpf_sk_assign(ctx, NULL).
>>     - Document that bpf_sk_assign accepts a NULL socket.
>>
>>     v3:
>>     - Allow bpf_sk_assign helper to replace previously selected socket only
>>       when BPF_SK_LOOKUP_F_REPLACE flag is set, as a precaution for multiple
>>       programs running in series to accidentally override each other's verdict.
>>     - Let BPF program decide that load-balancing within a reuseport socket group
>>       should be skipped for the socket selected with bpf_sk_assign() by passing
>>       BPF_SK_LOOKUP_F_NO_REUSEPORT flag. (Martin)
>>     - Extend struct bpf_sk_lookup program context with an 'sk' field containing
>>       the selected socket with an intention for multiple attached program
>>       running in series to see each other's choices. However, currently the
>>       verifier doesn't allow checking if pointer is set.
>>     - Use bpf-netns infra for link-based multi-program attachment. (Alexei)
>>     - Get rid of macros in convert_ctx_access to make it easier to read.
>>     - Disallow 1-,2-byte access to context fields containing IP addresses.
>>
>>     v2:
>>     - Make bpf_sk_assign reject sockets that don't use RCU freeing.
>>       Update bpf_sk_assign docs accordingly. (Martin)
>>     - Change bpf_sk_assign proto to take PTR_TO_SOCKET as argument. (Martin)
>>     - Fix broken build when CONFIG_INET is not selected. (Martin)
>>     - Rename bpf_sk_lookup{} src_/dst_* fields remote_/local_*. (Martin)
>>     - Enforce BPF_SK_LOOKUP attach point on load & attach. (Martin)
>>
>>  include/linux/bpf-netns.h  |   3 +
>>  include/linux/bpf.h        |   1 +
>>  include/linux/bpf_types.h  |   2 +
>>  include/linux/filter.h     |  17 ++++
>>  include/uapi/linux/bpf.h   |  77 ++++++++++++++++
>>  kernel/bpf/net_namespace.c |   5 ++
>>  kernel/bpf/syscall.c       |   9 ++
>>  kernel/bpf/verifier.c      |  10 ++-
>>  net/core/filter.c          | 179 +++++++++++++++++++++++++++++++++++++
>>  scripts/bpf_helpers_doc.py |   9 +-
>>  10 files changed, 308 insertions(+), 4 deletions(-)
>>
>
> Looks good, two suggestions below.
>
> Acked-by: Andrii Nakryiko <andriin@fb.com>
>
> [...]
>
>> +
>> +static const struct bpf_func_proto *
>> +sk_lookup_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>> +{
>> +       switch (func_id) {
>> +       case BPF_FUNC_sk_assign:
>> +               return &bpf_sk_lookup_assign_proto;
>> +       case BPF_FUNC_sk_release:
>> +               return &bpf_sk_release_proto;
>> +       default:
>
> Wouldn't it be useful to have functions like
> get_current_comm/get_current_pid_tgid/perf_event_output as well?
> Similarly how they were added to a bunch of other socket-related BPF
> program types recently?

I can certainly see value in perf_event_output as a way to log/trace
prog decisions. Less so for helpers that provide access to current task,
as the prog usually will be called in softirq context.

bpf_get_socket_cookie and bpf_get_netns_cookie have been on my mind, but
first they need to be taught to accept ARG_PTR_TO_SOCKET.

That is to say, I expected the list of allowed helpers to grow.

>
>
>> +               return bpf_base_func_proto(func_id);
>> +       }
>> +}
>> +
>
> [...]
>
>> +       case offsetof(struct bpf_sk_lookup, local_ip4):
>> +               *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->src_reg,
>> +                                     bpf_target_off(struct bpf_sk_lookup_kern,
>> +                                                    v4.daddr, 4, target_size));
>> +               break;
>> +
>> +       case bpf_ctx_range_till(struct bpf_sk_lookup,
>> +                               remote_ip6[0], remote_ip6[3]):
>> +#if IS_ENABLED(CONFIG_IPV6)
>
> nit: if you added {} to this case block, you could have combined the
> above `int off` section with this one.

Nifty. Thanks.

>
>> +               off = si->off;
>> +               off -= offsetof(struct bpf_sk_lookup, remote_ip6[0]);
>> +               off += bpf_target_off(struct in6_addr, s6_addr32[0], 4, target_size);
>> +               *insn++ = BPF_LDX_MEM(BPF_SIZEOF(void *), si->dst_reg, si->src_reg,
>> +                                     offsetof(struct bpf_sk_lookup_kern, v6.saddr));
>> +               *insn++ = BPF_JMP_IMM(BPF_JEQ, si->dst_reg, 0, 1);
>> +               *insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg, off);
>> +#else
>> +               *insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
>> +#endif
>> +               break;
>> +
>
> [...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup
  2020-07-16  2:23   ` Andrii Nakryiko
@ 2020-07-16 12:32     ` Jakub Sitnicki
  0 siblings, 0 replies; 27+ messages in thread
From: Jakub Sitnicki @ 2020-07-16 12:32 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: bpf, Networking, kernel-team, Alexei Starovoitov,
	Daniel Borkmann, David S. Miller, Jakub Kicinski,
	Marek Majkowski

On Thu, Jul 16, 2020 at 04:23 AM CEST, Andrii Nakryiko wrote:
> On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Run a BPF program before looking up a listening socket on the receive path.
>> Program selects a listening socket to yield as result of socket lookup by
>> calling bpf_sk_assign() helper and returning SK_PASS code. Program can
>> revert its decision by assigning a NULL socket with bpf_sk_assign().
>>
>> Alternatively, BPF program can also fail the lookup by returning with
>> SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
>> no socket has not been selected with bpf_sk_assign(). Other return values
>
> you probably meant "no socket has been selected"?

Yes, a typo. Will fix.

>
>> are treated the same as SK_DROP.
>
>
> Why not enforce it instead? Check check_return_code() in verifier.c,
> it's trivial to do it for SK_LOOKUP.

That's a game changer D-: Thank you. This will simplify the prog
runners.

>
>
>>
>> This lets the user match packets with listening sockets freely at the last
>> possible point on the receive path, where we know that packets are destined
>> for local delivery after undergoing policing, filtering, and routing.
>>
>> With BPF code selecting the socket, directing packets destined to an IP
>> range or to a port range to a single socket becomes possible.
>>
>> In case multiple programs are attached, they are run in series in the order
>> in which they were attached. The end result is determined from return codes
>> of all the programs according to following rules:
>>
>>  1. If any program returned SK_PASS and selected a valid socket, the socket
>>     is used as result of socket lookup.
>>  2. If more than one program returned SK_PASS and selected a socket,
>>     last selection takes effect.
>>  3. If any program returned SK_DROP or an invalid return code, and no
>>     program returned SK_PASS and selected a socket, socket lookup fails
>>     with -ECONNREFUSED.
>>  4. If all programs returned SK_PASS and none of them selected a socket,
>>     socket lookup continues to htable-based lookup.
>>
>> Suggested-by: Marek Majkowski <marek@cloudflare.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>
>> Notes:
>>     v4:
>>     - Reduce BPF sk_lookup prog return codes to SK_PASS/SK_DROP. (Lorenz)
>
> your description above still assumes prog can return something besides
> SK_PASS and SK_DROP?

I should have written 'reduce allowed prog return codes'.

>
>>     - Default to drop & warn on illegal return value from BPF prog. (Lorenz)
>>     - Rename netns_bpf_attach_type_enable/disable to _need/unneed. (Lorenz)
>>     - Export bpf_sk_lookup_enabled symbol for CONFIG_IPV6=m (kernel test robot)
>>     - Invert return value from bpf_sk_lookup_run_v4 to true on skip reuseport.
>>     - Move dedicated prog_array runner close to its callers in filter.h.
>>
>>     v3:
>>     - Use a static_key to minimize the hook overhead when not used. (Alexei)
>>     - Adapt for running an array of attached programs. (Alexei)
>>     - Adapt for optionally skipping reuseport selection. (Martin)
>>
>>  include/linux/filter.h     | 102 +++++++++++++++++++++++++++++++++++++
>>  kernel/bpf/net_namespace.c |  32 +++++++++++-
>>  net/core/filter.c          |   3 ++
>>  net/ipv4/inet_hashtables.c |  31 +++++++++++
>>  4 files changed, 167 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 380746f47fa1..b9ad0fdabca5 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -1295,4 +1295,106 @@ struct bpf_sk_lookup_kern {
>>         bool            no_reuseport;
>>  };
>>
>> +extern struct static_key_false bpf_sk_lookup_enabled;
>> +
>> +/* Runners for BPF_SK_LOOKUP programs to invoke on socket lookup.
>> + *
>> + * Allowed return values for a BPF SK_LOOKUP program are SK_PASS and
>> + * SK_DROP. Any other return value is treated as SK_DROP. Their
>> + * meaning is as follows:
>> + *
>> + *  SK_PASS && ctx.selected_sk != NULL: use selected_sk as lookup result
>> + *  SK_PASS && ctx.selected_sk == NULL: continue to htable-based socket lookup
>> + *  SK_DROP                           : terminate lookup with -ECONNREFUSED
>> + *
>> + * This macro aggregates return values and selected sockets from
>> + * multiple BPF programs according to following rules:
>> + *
>> + *  1. If any program returned SK_PASS and a non-NULL ctx.selected_sk,
>> + *     macro result is SK_PASS and last ctx.selected_sk is used.
>> + *  2. If any program returned non-SK_PASS return value,
>> + *     macro result is the last non-SK_PASS return value.
>> + *  3. Otherwise result is SK_PASS and ctx.selected_sk is NULL.
>> + *
>> + * Caller must ensure that the prog array is non-NULL, and that the
>> + * array as well as the programs it contains remain valid.
>> + */
>> +#define BPF_PROG_SK_LOOKUP_RUN_ARRAY(array, ctx, func)                 \
>> +       ({                                                              \
>> +               struct bpf_sk_lookup_kern *_ctx = &(ctx);               \
>> +               struct bpf_prog_array_item *_item;                      \
>> +               struct sock *_selected_sk;                              \
>> +               struct bpf_prog *_prog;                                 \
>> +               u32 _ret, _last_ret;                                    \
>> +               bool _no_reuseport;                                     \
>> +                                                                       \
>> +               migrate_disable();                                      \
>> +               _last_ret = SK_PASS;                                    \
>> +               _selected_sk = NULL;                                    \
>> +               _no_reuseport = false;                                  \
>
> these three could be moved before migrate_disable(), or even better
> just initialize corresponding variables above?

I was torn between keeping all info needed to read through the loop
close to it and keeping the critical section tight. I can move it up.

>
>
>> +               _item = &(array)->items[0];                             \
>> +               while ((_prog = READ_ONCE(_item->prog))) {              \
>> +                       /* restore most recent selection */             \
>> +                       _ctx->selected_sk = _selected_sk;               \
>> +                       _ctx->no_reuseport = _no_reuseport;             \
>> +                                                                       \
>> +                       _ret = func(_prog, _ctx);                       \
>> +                       if (_ret == SK_PASS) {                          \
>> +                               /* remember last non-NULL socket */     \
>> +                               if (_ctx->selected_sk) {                \
>> +                                       _selected_sk = _ctx->selected_sk;       \
>> +                                       _no_reuseport = _ctx->no_reuseport;     \
>> +                               }                                       \
>> +                       } else {                                        \
>> +                               /* remember last non-PASS ret code */   \
>> +                               _last_ret = _ret;                       \
>> +                       }                                               \
>> +                       _item++;                                        \
>> +               }                                                       \
>> +               _ctx->selected_sk = _selected_sk;                       \
>> +               _ctx->no_reuseport = _no_reuseport;                     \
>> +               migrate_enable();                                       \
>> +               _ctx->selected_sk ? SK_PASS : _last_ret;                \
>> +        })
>> +
>
> [...]

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-07-16 12:32 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 01/16] bpf, netns: Handle multiple link attachments Jakub Sitnicki
2020-07-15 21:30   ` Andrii Nakryiko
2020-07-13 17:46 ` [PATCH bpf-next v4 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-07-16  1:41   ` Andrii Nakryiko
2020-07-16 12:17     ` Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 03/16] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-16  1:44   ` Andrii Nakryiko
2020-07-13 17:46 ` [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-16  2:23   ` Andrii Nakryiko
2020-07-16 12:32     ` Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 05/16] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 06/16] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 07/16] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 08/16] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 09/16] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 10/16] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 11/16] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 12/16] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
2020-07-16  2:10   ` Andrii Nakryiko
2020-07-13 17:46 ` [PATCH bpf-next v4 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
2020-07-16  2:13   ` Andrii Nakryiko
2020-07-13 17:46 ` [PATCH bpf-next v4 15/16] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
2020-07-16  2:19   ` Andrii Nakryiko
2020-07-16  2:25 ` [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.