bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFCv2 bpf-next 00/12] Programming socket lookup with BPF
@ 2019-08-28  7:22 Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 01/12] flow_dissector: Extract attach/detach/query helpers Jakub Sitnicki
                   ` (11 more replies)
  0 siblings, 12 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

This patch set adds a mechanism for programming mappings between the local
addresses and listening/receiving sockets with BPF.

It introduces a new per-netns BPF program type, called inet_lookup, which
runs during the socket lookup. The program is allowed to select a
listening/receiving socket from a SOCKARRAY map that the packet will be
delivered to.

BPF inet_lookup intends to be an alternative for:

* SO_BINDTOPREFIX [1] - a mechanism that provides a way to listen/receive
  on all local addresses that belong to a network prefix. An alternative to
  binding to INADDR_ANY that allows applications bound to disjoint network
  prefixes to share a port. Not generic. Never got upstreamed.

* TPROXY [2] - a powerful mechanism that allows steering packets destined
  to non-local addresses to a local socket. It also works for local
  addresses, which is a less restrictive case. Can be used to implement
  what SO_BINDTOPREFIX does, and more - in particular, all ports can be
  redirected to a single socket. Socket dispatch happens early in ingress
  path (PREROUTING hook). Versatile but comes with complexities.

Compared to the above, inet_lookup aims to be a programmatic way to map
(address, port) pairs to a socket. It runs after a routing decision for
local delivery was made, and hence is limited to local addresses only.

Being part of the socket lookup, has a desired effect that redirection is
visible to XDP programs which call bpf_sk_lookup helpers.

When it comes to use cases, we have presented them in RFCv1 [3] cover
letter and also at last Netconf [4]. To recap, they are:

1) sharing a port between two services

   Services are accepting connections on different (disjoint) IP ranges but
   same port. Requests going to 192.0.2.0/24 tcp/80 are handled by NGINX,
   while 198.51.100.0/24 tcp/80 IP range is handled by Apache server.
   Applications are running as different users, in a flat single-netns
   setup.

2) receiving traffic on all ports

   We have a proxy server that accepts connections to _any_ port [5].

A simple demo program that implements (1) could look like

#define NET1 (IP4(192,  0,   2, 0) >> 8)
#define NET2 (IP4(198, 51, 100, 0) >> 8)

#define MAX_SERVERS 2

struct {
	__uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
	__uint(max_entries, MAX_SERVERS);
	__type(key, __u32);
	__type(value, __u64);
} redir_map SEC(".maps");

SEC("inet_lookup/demo_two_servers")
int demo_two_http_servers(struct bpf_inet_lookup *ctx)
{
	__u32 index = 0;
	__u64 flags = 0;

        if (ctx->family != AF_INET)
                return BPF_OK;
	if (ctx->protocol != IPPROTO_TCP)
		return BPF_OK;
        if (ctx->local_port != 80)
                return BPF_OK;

        switch (bpf_ntohl(ctx->local_ip4) >> 8) {
        case NET1:
		index = 0;
		break;
        case NET2:
		index = 1;
		break;
	default:
		return BPF_OK;
        }

        return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
}

Since RFCv1, we've changed the approach from rewriting the lookup key to
map-based redirection. This has been suggested at Netconf, and is a
recurring pattern in existing BPF program types.

We're posting the 2nd version of RFC patch set to collect further feedback
and set context for the presentation and discussions at the upcoming
Network Summit at LPC '19 [6].

Patches are also available on GitHub [7].

Thanks,
Jakub

[1] https://www.spinics.net/lists/netdev/msg370789.html
[2] https://www.kernel.org/doc/Documentation/networking/tproxy.txt
[3] https://lore.kernel.org/netdev/20190618130050.8344-1-jakub@cloudflare.com/
[4] http://vger.kernel.org/netconf2019_files/Programmable%20socket%20lookup.pdf
[5] https://blog.cloudflare.com/how-we-built-spectrum/
[6] https://linuxplumbersconf.org/event/4/contributions/487/
[7] https://github.com/jsitnicki/linux/commits/bpf-inet-lookup

Changes RFCv1 -> RFCv2:

- Make socket lookup redirection map-based. BPF program now uses a
  dedicated helper and a SOCKARRAY map to select the socket to redirect to.
  A consequence of this change is that bpf_inet_lookup context is now
  read-only.

- Look for connected UDP sockets before allowing redirection from BPF.
  This makes connected UDP socket work as expected in the presence of
  inet_lookup prog.

- Share the code for BPF_PROG_{ATTACH,DETACH,QUERY} with flow_dissector,
  the only other per-netns BPF prog type.


Jakub Sitnicki (12):
  flow_dissector: Extract attach/detach/query helpers
  bpf: Introduce inet_lookup program type for redirecting socket lookup
  bpf: Add verifier tests for inet_lookup context access
  inet: Store layer 4 protocol in inet_hashinfo
  udp: Store layer 4 protocol in udp_table
  inet: Run inet_lookup bpf program on socket lookup
  inet6: Run inet_lookup bpf program on socket lookup
  udp: Run inet_lookup bpf program on socket lookup
  udp6: Run inet_lookup bpf program on socket lookup
  bpf: Sync linux/bpf.h to tools/
  libbpf: Add support for inet_lookup program type
  bpf: Test redirecting listening/receiving socket lookup

 include/linux/bpf.h                           |   8 +
 include/linux/bpf_types.h                     |   1 +
 include/linux/filter.h                        |  18 +
 include/net/inet6_hashtables.h                |  19 +
 include/net/inet_hashtables.h                 |  36 +
 include/net/net_namespace.h                   |   2 +
 include/net/udp.h                             |  10 +-
 include/uapi/linux/bpf.h                      |  58 +-
 kernel/bpf/syscall.c                          |  10 +
 kernel/bpf/verifier.c                         |   7 +-
 net/core/filter.c                             | 304 ++++++++
 net/core/flow_dissector.c                     |  65 +-
 net/dccp/proto.c                              |   2 +-
 net/ipv4/inet_hashtables.c                    |   5 +
 net/ipv4/tcp_ipv4.c                           |   2 +-
 net/ipv4/udp.c                                |  59 +-
 net/ipv4/udp_impl.h                           |   2 +-
 net/ipv4/udplite.c                            |   4 +-
 net/ipv6/inet6_hashtables.c                   |   5 +
 net/ipv6/udp.c                                |  54 +-
 net/ipv6/udp_impl.h                           |   2 +-
 net/ipv6/udplite.c                            |   2 +-
 tools/include/uapi/linux/bpf.h                |  58 +-
 tools/lib/bpf/libbpf.c                        |   4 +
 tools/lib/bpf/libbpf.h                        |   2 +
 tools/lib/bpf/libbpf.map                      |   2 +
 tools/lib/bpf/libbpf_probes.c                 |   1 +
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
 .../selftests/bpf/progs/inet_lookup_progs.c   |  78 ++
 .../testing/selftests/bpf/test_inet_lookup.c  | 522 +++++++++++++
 .../testing/selftests/bpf/test_inet_lookup.sh |  35 +
 .../selftests/bpf/verifier/ctx_inet_lookup.c  | 696 ++++++++++++++++++
 34 files changed, 1974 insertions(+), 108 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/inet_lookup_progs.c
 create mode 100644 tools/testing/selftests/bpf/test_inet_lookup.c
 create mode 100755 tools/testing/selftests/bpf/test_inet_lookup.sh
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 01/12] flow_dissector: Extract attach/detach/query helpers
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 02/12] bpf: Introduce inet_lookup program type for redirecting socket lookup Jakub Sitnicki
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Move generic parts of callbacks for querying, attaching, and detaching a
single BPF program for reuse by other BPF program types.

Subsequent patch makes use of the extracted routines.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf.h       |  8 +++++
 net/core/filter.c         | 73 +++++++++++++++++++++++++++++++++++++++
 net/core/flow_dissector.c | 65 ++++++----------------------------
 3 files changed, 92 insertions(+), 54 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 5b9d22338606..b301e0c03a8c 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -23,6 +23,7 @@ struct sock;
 struct seq_file;
 struct btf;
 struct btf_type;
+struct mutex;
 
 extern struct idr btf_idr;
 extern spinlock_t btf_idr_lock;
@@ -1145,4 +1146,11 @@ static inline u32 bpf_xdp_sock_convert_ctx_access(enum bpf_access_type type,
 }
 #endif /* CONFIG_INET */
 
+int bpf_prog_query_one(struct bpf_prog __rcu **pprog,
+		       const union bpf_attr *attr,
+		       union bpf_attr __user *uattr);
+int bpf_prog_attach_one(struct bpf_prog __rcu **pprog, struct mutex *lock,
+			struct bpf_prog *prog, u32 flags);
+int bpf_prog_detach_one(struct bpf_prog __rcu **pprog, struct mutex *lock);
+
 #endif /* _LINUX_BPF_H */
diff --git a/net/core/filter.c b/net/core/filter.c
index 0c1059cdad3d..a498fbaa2d50 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -8668,6 +8668,79 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *ubuf,
 	return ret;
 }
 
+int bpf_prog_query_one(struct bpf_prog __rcu **pprog,
+		       const union bpf_attr *attr,
+		       union bpf_attr __user *uattr)
+{
+	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
+	u32 prog_id, prog_cnt = 0, flags = 0;
+	struct bpf_prog *attached;
+
+	if (attr->query.query_flags)
+		return -EINVAL;
+
+	rcu_read_lock();
+	attached = rcu_dereference(*pprog);
+	if (attached) {
+		prog_cnt = 1;
+		prog_id = attached->aux->id;
+	}
+	rcu_read_unlock();
+
+	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
+		return -EFAULT;
+	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
+		return -EFAULT;
+
+	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
+		return 0;
+
+	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
+		return -EFAULT;
+
+	return 0;
+}
+
+int bpf_prog_attach_one(struct bpf_prog __rcu **pprog, struct mutex *lock,
+			struct bpf_prog *prog, u32 flags)
+{
+	struct bpf_prog *attached;
+
+	if (flags)
+		return -EINVAL;
+
+	mutex_lock(lock);
+	attached = rcu_dereference_protected(*pprog,
+					     lockdep_is_held(lock));
+	if (attached) {
+		/* Only one BPF program can be attached at a time */
+		mutex_unlock(lock);
+		return -EEXIST;
+	}
+	rcu_assign_pointer(*pprog, prog);
+	mutex_unlock(lock);
+
+	return 0;
+}
+
+int bpf_prog_detach_one(struct bpf_prog __rcu **pprog, struct mutex *lock)
+{
+	struct bpf_prog *attached;
+
+	mutex_lock(lock);
+	attached = rcu_dereference_protected(*pprog,
+					     lockdep_is_held(lock));
+	if (!attached) {
+		mutex_unlock(lock);
+		return -ENOENT;
+	}
+	RCU_INIT_POINTER(*pprog, NULL);
+	bpf_prog_put(attached);
+	mutex_unlock(lock);
+
+	return 0;
+}
+
 #ifdef CONFIG_INET
 struct sk_reuseport_kern {
 	struct sk_buff *skb;
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 9741b593ea53..c51602158906 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -73,80 +73,37 @@ EXPORT_SYMBOL(skb_flow_dissector_init);
 int skb_flow_dissector_prog_query(const union bpf_attr *attr,
 				  union bpf_attr __user *uattr)
 {
-	__u32 __user *prog_ids = u64_to_user_ptr(attr->query.prog_ids);
-	u32 prog_id, prog_cnt = 0, flags = 0;
-	struct bpf_prog *attached;
 	struct net *net;
-
-	if (attr->query.query_flags)
-		return -EINVAL;
+	int ret;
 
 	net = get_net_ns_by_fd(attr->query.target_fd);
 	if (IS_ERR(net))
 		return PTR_ERR(net);
 
-	rcu_read_lock();
-	attached = rcu_dereference(net->flow_dissector_prog);
-	if (attached) {
-		prog_cnt = 1;
-		prog_id = attached->aux->id;
-	}
-	rcu_read_unlock();
+	ret = bpf_prog_query_one(&net->flow_dissector_prog, attr, uattr);
 
 	put_net(net);
-
-	if (copy_to_user(&uattr->query.attach_flags, &flags, sizeof(flags)))
-		return -EFAULT;
-	if (copy_to_user(&uattr->query.prog_cnt, &prog_cnt, sizeof(prog_cnt)))
-		return -EFAULT;
-
-	if (!attr->query.prog_cnt || !prog_ids || !prog_cnt)
-		return 0;
-
-	if (copy_to_user(prog_ids, &prog_id, sizeof(u32)))
-		return -EFAULT;
-
-	return 0;
+	return ret;
 }
 
 int skb_flow_dissector_bpf_prog_attach(const union bpf_attr *attr,
 				       struct bpf_prog *prog)
 {
-	struct bpf_prog *attached;
-	struct net *net;
+	struct net *net = current->nsproxy->net_ns;
 
-	net = current->nsproxy->net_ns;
-	mutex_lock(&flow_dissector_mutex);
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (attached) {
-		/* Only one BPF program can be attached at a time */
-		mutex_unlock(&flow_dissector_mutex);
-		return -EEXIST;
-	}
-	rcu_assign_pointer(net->flow_dissector_prog, prog);
-	mutex_unlock(&flow_dissector_mutex);
-	return 0;
+	return bpf_prog_attach_one(&net->flow_dissector_prog,
+				   &flow_dissector_mutex, prog,
+				   attr->attach_flags);
 }
 
 int skb_flow_dissector_bpf_prog_detach(const union bpf_attr *attr)
 {
-	struct bpf_prog *attached;
-	struct net *net;
+	struct net *net = current->nsproxy->net_ns;
 
-	net = current->nsproxy->net_ns;
-	mutex_lock(&flow_dissector_mutex);
-	attached = rcu_dereference_protected(net->flow_dissector_prog,
-					     lockdep_is_held(&flow_dissector_mutex));
-	if (!attached) {
-		mutex_unlock(&flow_dissector_mutex);
-		return -ENOENT;
-	}
-	bpf_prog_put(attached);
-	RCU_INIT_POINTER(net->flow_dissector_prog, NULL);
-	mutex_unlock(&flow_dissector_mutex);
-	return 0;
+	return bpf_prog_detach_one(&net->flow_dissector_prog,
+				   &flow_dissector_mutex);
 }
+
 /**
  * skb_flow_get_be16 - extract be16 entity
  * @skb: sk_buff to extract from
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 02/12] bpf: Introduce inet_lookup program type for redirecting socket lookup
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 01/12] flow_dissector: Extract attach/detach/query helpers Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 03/12] bpf: Add verifier tests for inet_lookup context access Jakub Sitnicki
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Add a new program type for redirecting the listening/bound socket lookup
from BPF. The program attaches to a network namespace. It is allowed to
select a socket from a SOCKARRAY, which will be used as a result of socket
lookup.

This provides a mechanism for programming the mapping between
local (address, port) pairs and listening/receiving sockets.

The program receives the 4-tuple, as well as the IP version and L4
protocol, of the packet that triggered the lookup as its context for making
a decision.

The netns-attached program is not called anywhere yet. Following patches
hook it up to ipv4 and ipv6 stacks.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/linux/bpf_types.h   |   1 +
 include/linux/filter.h      |  18 +++
 include/net/net_namespace.h |   2 +
 include/uapi/linux/bpf.h    |  58 ++++++++-
 kernel/bpf/syscall.c        |  10 ++
 kernel/bpf/verifier.c       |   7 +-
 net/core/filter.c           | 231 ++++++++++++++++++++++++++++++++++++
 7 files changed, 325 insertions(+), 2 deletions(-)

diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 36a9c2325176..cc5c4ece748a 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -37,6 +37,7 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LIRC_MODE2, lirc_mode2)
 #endif
 #ifdef CONFIG_INET
 BPF_PROG_TYPE(BPF_PROG_TYPE_SK_REUSEPORT, sk_reuseport)
+BPF_PROG_TYPE(BPF_PROG_TYPE_INET_LOOKUP, inet_lookup)
 #endif
 
 BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index 92c6e31fb008..5b1b3b754c28 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1229,4 +1229,22 @@ struct bpf_sockopt_kern {
 	s32		retval;
 };
 
+struct bpf_inet_lookup_kern {
+	unsigned short	family;
+	u8		protocol;
+	__be32		saddr;
+	struct in6_addr	saddr6;
+	__be16		sport;
+	__be32		daddr;
+	struct in6_addr	daddr6;
+	unsigned short	hnum;
+	struct sock	*redir_sk;
+};
+
+int inet_lookup_bpf_prog_attach(const union bpf_attr *attr,
+				struct bpf_prog *prog);
+int inet_lookup_bpf_prog_detach(const union bpf_attr *attr);
+int inet_lookup_bpf_prog_query(const union bpf_attr *attr,
+			       union bpf_attr __user *uattr);
+
 #endif /* __LINUX_FILTER_H__ */
diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index 4a9da951a794..bd01147cc064 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -171,6 +171,8 @@ struct net {
 #ifdef CONFIG_XDP_SOCKETS
 	struct netns_xdp	xdp;
 #endif
+	struct bpf_prog __rcu	*inet_lookup_prog;
+
 	struct sock		*diag_nlsk;
 	atomic_t		fnhe_genid;
 } __randomize_layout;
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index b5889257cc33..639abfa96779 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -173,6 +173,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_SYSCTL,
 	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
 	BPF_PROG_TYPE_CGROUP_SOCKOPT,
+	BPF_PROG_TYPE_INET_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -199,6 +200,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_UDP6_RECVMSG,
 	BPF_CGROUP_GETSOCKOPT,
 	BPF_CGROUP_SETSOCKOPT,
+	BPF_INET_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2747,6 +2749,33 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_redirect_lookup(struct bpf_inet_lookup *ctx, struct bpf_map *sockarray, void *key, u64 flags)
+ *	Description
+ *		Select a socket referenced by *map* (of type
+ *		**BPF_MAP_TYPE_REUSEPORT_SOCKARRAY**) at index *key* to use as a
+ *		result of listening (TCP) or bound (UDP) socket lookup.
+ *
+ *		The IP family and L4 protocol in *ctx* object, populated from
+ *		the packet that triggered the lookup, must match the selected
+ *		socket's family and protocol. IP6_V6ONLY socket option is
+ *		honored.
+ *
+ *		To be used by **BPF_INET_LOOKUP** programs attached to the
+ *		network namespace. Program needs to return **BPF_REDIRECT**, the
+ *		helper's success return value, for the selected socket to be
+ *		actually used.
+ *
+ *	Return
+ *		**BPF_REDIRECT** on success, if the socket at index *key* was selected.
+ *
+ *		**-EINVAL** if *flags* are invalid (not zero).
+ *
+ *		**-ENOENT** if there is no socket at index *key*.
+ *
+ *		**-EPROTOTYPE** if *ctx->protocol* does not match the socket protocol.
+ *
+ *		**-EAFNOSUPPORT** if socket does not accept IP version in *ctx->family*.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2859,7 +2888,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(redirect_lookup),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3116,6 +3146,32 @@ struct bpf_tcp_sock {
 	__u32 icsk_retransmits;	/* Number of unrecovered [RTO] timeouts */
 };
 
+/* User accessible data for inet_lookup programs.
+ * New fields must be added at the end.
+ */
+struct bpf_inet_lookup {
+	__u32 family;		/* AF_INET, AF_INET6 */
+	__u32 protocol;		/* IPROTO_TCP, IPPROTO_UDP */
+	__u32 remote_ip4;	/* Allows 1,2,4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_ip4;	/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 remote_ip6[4];	/* Allows 1,2,4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_ip6[4];	/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 remote_port;	/* Allows 4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_port;	/* Allows 4-byte read and write.
+				 * Stored in host byte order.
+				 */
+};
+
 struct bpf_sock_tuple {
 	union {
 		struct {
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index c0f62fd67c6b..763f2352ff7f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1935,6 +1935,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_CGROUP_SETSOCKOPT:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
 		break;
+	case BPF_INET_LOOKUP:
+		ptype = BPF_PROG_TYPE_INET_LOOKUP;
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -1959,6 +1962,9 @@ static int bpf_prog_attach(const union bpf_attr *attr)
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 		ret = skb_flow_dissector_bpf_prog_attach(attr, prog);
 		break;
+	case BPF_PROG_TYPE_INET_LOOKUP:
+		ret = inet_lookup_bpf_prog_attach(attr, prog);
+		break;
 	default:
 		ret = cgroup_bpf_prog_attach(attr, ptype, prog);
 	}
@@ -2022,6 +2028,8 @@ static int bpf_prog_detach(const union bpf_attr *attr)
 	case BPF_CGROUP_SETSOCKOPT:
 		ptype = BPF_PROG_TYPE_CGROUP_SOCKOPT;
 		break;
+	case BPF_INET_LOOKUP:
+		return inet_lookup_bpf_prog_detach(attr);
 	default:
 		return -EINVAL;
 	}
@@ -2065,6 +2073,8 @@ static int bpf_prog_query(const union bpf_attr *attr,
 		return lirc_prog_query(attr, uattr);
 	case BPF_FLOW_DISSECTOR:
 		return skb_flow_dissector_prog_query(attr, uattr);
+	case BPF_INET_LOOKUP:
+		return inet_lookup_bpf_prog_query(attr, uattr);
 	default:
 		return -EINVAL;
 	}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 10c0ff93f52b..5717dd10cc4d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3494,7 +3494,8 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 			goto error;
 		break;
 	case BPF_MAP_TYPE_REUSEPORT_SOCKARRAY:
-		if (func_id != BPF_FUNC_sk_select_reuseport)
+		if (func_id != BPF_FUNC_sk_select_reuseport &&
+		    func_id != BPF_FUNC_redirect_lookup)
 			goto error;
 		break;
 	case BPF_MAP_TYPE_QUEUE:
@@ -3578,6 +3579,10 @@ static int check_map_func_compatibility(struct bpf_verifier_env *env,
 		if (map->map_type != BPF_MAP_TYPE_SK_STORAGE)
 			goto error;
 		break;
+	case BPF_FUNC_redirect_lookup:
+		if (map->map_type != BPF_MAP_TYPE_REUSEPORT_SOCKARRAY)
+			goto error;
+		break;
 	default:
 		break;
 	}
diff --git a/net/core/filter.c b/net/core/filter.c
index a498fbaa2d50..d9375a7e60f5 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -9007,4 +9007,235 @@ const struct bpf_verifier_ops sk_reuseport_verifier_ops = {
 
 const struct bpf_prog_ops sk_reuseport_prog_ops = {
 };
+
 #endif /* CONFIG_INET */
+
+static DEFINE_MUTEX(inet_lookup_prog_mutex);
+
+BPF_CALL_4(redirect_lookup, struct bpf_inet_lookup_kern *, ctx,
+	   struct bpf_map *, map, void *, key, u64, flags)
+{
+	struct sock_reuseport *reuse;
+	struct sock *redir_sk;
+
+	if (unlikely(flags))
+		return -EINVAL;
+
+	/* Lookup socket in the map */
+	redir_sk = map->ops->map_lookup_elem(map, key);
+	if (!redir_sk)
+		return -ENOENT;
+
+	/* Check if socket got unhashed from sockets table, e.g. by
+	 * close(), after the above map_lookup_elem(). Treat it as
+	 * removed from the map.
+	 */
+	reuse = rcu_dereference(redir_sk->sk_reuseport_cb);
+	if (!reuse)
+		return -ENOENT;
+
+	/* Check protocol & family are a match */
+	if (ctx->protocol != redir_sk->sk_protocol)
+		return -EPROTOTYPE;
+	if (ctx->family != redir_sk->sk_family &&
+	    (redir_sk->sk_family == AF_INET || ipv6_only_sock(redir_sk)))
+		return -EAFNOSUPPORT;
+
+	/* Store socket in context */
+	ctx->redir_sk = redir_sk;
+
+	/* Signal redirect action */
+	return BPF_REDIRECT;
+}
+
+static const struct bpf_func_proto bpf_redirect_lookup_proto = {
+	.func		= redirect_lookup,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+	.arg2_type	= ARG_CONST_MAP_PTR,
+	.arg3_type	= ARG_PTR_TO_MAP_KEY,
+	.arg4_type	= ARG_ANYTHING,
+};
+
+static const struct bpf_func_proto *
+inet_lookup_func_proto(enum bpf_func_id func_id,
+		       const struct bpf_prog *prog)
+{
+	switch (func_id) {
+	case BPF_FUNC_redirect_lookup:
+		return &bpf_redirect_lookup_proto;
+	default:
+		return bpf_base_func_proto(func_id);
+	}
+}
+
+int inet_lookup_bpf_prog_attach(const union bpf_attr *attr,
+				struct bpf_prog *prog)
+{
+	struct net *net = current->nsproxy->net_ns;
+
+	return bpf_prog_attach_one(&net->inet_lookup_prog,
+				   &inet_lookup_prog_mutex, prog,
+				   attr->attach_flags);
+}
+
+int inet_lookup_bpf_prog_detach(const union bpf_attr *attr)
+{
+	struct net *net = current->nsproxy->net_ns;
+
+	return bpf_prog_detach_one(&net->inet_lookup_prog,
+				   &inet_lookup_prog_mutex);
+}
+
+int inet_lookup_bpf_prog_query(const union bpf_attr *attr,
+			       union bpf_attr __user *uattr)
+{
+	struct net *net;
+	int ret;
+
+	net = get_net_ns_by_fd(attr->query.target_fd);
+	if (IS_ERR(net))
+		return PTR_ERR(net);
+
+	ret = bpf_prog_query_one(&net->inet_lookup_prog, attr, uattr);
+
+	put_net(net);
+	return ret;
+}
+
+static bool inet_lookup_is_valid_access(int off, int size,
+					enum bpf_access_type type,
+					const struct bpf_prog *prog,
+					struct bpf_insn_access_aux *info)
+{
+	const int size_default = sizeof(__u32);
+
+	if (off < 0 || off >= sizeof(struct bpf_inet_lookup))
+		return false;
+	if (off % size != 0)
+		return false;
+	if (type != BPF_READ)
+		return false;
+
+	switch (off) {
+	case bpf_ctx_range(struct bpf_inet_lookup, remote_ip4):
+	case bpf_ctx_range(struct bpf_inet_lookup, local_ip4):
+	case bpf_ctx_range_till(struct bpf_inet_lookup,
+				remote_ip6[0], remote_ip6[3]):
+	case bpf_ctx_range_till(struct bpf_inet_lookup,
+				local_ip6[0], local_ip6[3]):
+		if (!bpf_ctx_narrow_access_ok(off, size, size_default))
+			return false;
+		bpf_ctx_record_field_size(info, size_default);
+		break;
+
+	case bpf_ctx_range(struct bpf_inet_lookup, family):
+	case bpf_ctx_range(struct bpf_inet_lookup, protocol):
+	case bpf_ctx_range(struct bpf_inet_lookup, remote_port):
+	case bpf_ctx_range(struct bpf_inet_lookup, local_port):
+		if (size != size_default)
+			return false;
+		break;
+
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+#define LOAD_FIELD_SIZE_OFF(TYPE, FIELD, SIZE, OFF) ({			\
+	*insn++ = BPF_LDX_MEM(SIZE, si->dst_reg, si->src_reg,		\
+			      bpf_target_off(TYPE, FIELD,		\
+					     FIELD_SIZEOF(TYPE, FIELD),	\
+					     target_size) + (OFF));	\
+})
+
+#define LOAD_FIELD_SIZE(TYPE, FIELD, SIZE) \
+	LOAD_FIELD_SIZE_OFF(TYPE, FIELD, SIZE, 0)
+
+#define LOAD_FIELD(TYPE, FIELD) \
+	LOAD_FIELD_SIZE(TYPE, FIELD, BPF_FIELD_SIZEOF(TYPE, FIELD))
+
+static u32 inet_lookup_convert_ctx_access(enum bpf_access_type type,
+					  const struct bpf_insn *si,
+					  struct bpf_insn *insn_buf,
+					  struct bpf_prog *prog,
+					  u32 *target_size)
+{
+	struct bpf_insn *insn = insn_buf;
+	int off;
+
+	switch (si->off) {
+	case offsetof(struct bpf_inet_lookup, family):
+		LOAD_FIELD(struct bpf_inet_lookup_kern, family);
+		break;
+
+	case offsetof(struct bpf_inet_lookup, protocol):
+		LOAD_FIELD(struct bpf_inet_lookup_kern, protocol);
+		break;
+
+	case offsetof(struct bpf_inet_lookup, remote_ip4):
+		LOAD_FIELD_SIZE(struct bpf_inet_lookup_kern, saddr,
+				BPF_SIZE(si->code));
+		break;
+
+	case offsetof(struct bpf_inet_lookup, local_ip4):
+		LOAD_FIELD_SIZE(struct bpf_inet_lookup_kern, daddr,
+				BPF_SIZE(si->code));
+
+		break;
+
+	case bpf_ctx_range_till(struct bpf_inet_lookup,
+				remote_ip6[0], remote_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		off = si->off;
+		off -= offsetof(struct bpf_inet_lookup, remote_ip6[0]);
+
+		LOAD_FIELD_SIZE_OFF(struct bpf_inet_lookup_kern,
+				    saddr6.s6_addr32[0],
+				    BPF_SIZE(si->code), off);
+#else
+		(void)off;
+
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case bpf_ctx_range_till(struct bpf_inet_lookup,
+				local_ip6[0], local_ip6[3]):
+#if IS_ENABLED(CONFIG_IPV6)
+		off = si->off;
+		off -= offsetof(struct bpf_inet_lookup, local_ip6[0]);
+
+		LOAD_FIELD_SIZE_OFF(struct bpf_inet_lookup_kern,
+				    daddr6.s6_addr32[0],
+				    BPF_SIZE(si->code), off);
+#else
+		(void)off;
+
+		*insn++ = BPF_MOV32_IMM(si->dst_reg, 0);
+#endif
+		break;
+
+	case offsetof(struct bpf_inet_lookup, remote_port):
+		LOAD_FIELD(struct bpf_inet_lookup_kern, sport);
+		break;
+
+	case offsetof(struct bpf_inet_lookup, local_port):
+		LOAD_FIELD(struct bpf_inet_lookup_kern, hnum);
+		break;
+	}
+
+	return insn - insn_buf;
+}
+
+const struct bpf_prog_ops inet_lookup_prog_ops = {
+};
+
+const struct bpf_verifier_ops inet_lookup_verifier_ops = {
+	.get_func_proto		= inet_lookup_func_proto,
+	.is_valid_access	= inet_lookup_is_valid_access,
+	.convert_ctx_access	= inet_lookup_convert_ctx_access,
+};
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 03/12] bpf: Add verifier tests for inet_lookup context access
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 01/12] flow_dissector: Extract attach/detach/query helpers Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 02/12] bpf: Introduce inet_lookup program type for redirecting socket lookup Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 04/12] inet: Store layer 4 protocol in inet_hashinfo Jakub Sitnicki
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Exercise verifier access checks for bpf_inet_lookup context object fields.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 .../selftests/bpf/verifier/ctx_inet_lookup.c  | 696 ++++++++++++++++++
 1 file changed, 696 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c

diff --git a/tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c b/tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c
new file mode 100644
index 000000000000..665984d8179d
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/ctx_inet_lookup.c
@@ -0,0 +1,696 @@
+{
+	"valid 1,2,4-byte read bpf_inet_lookup remote_ip4",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup remote_ip4",
+	.insns = {
+		/* 8-byte read */
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup remote_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 4-byte write */
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup remote_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 4-byte write */
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup remote_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 2-byte write */
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup remote_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		/* 1-byte write */
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_inet_lookup local_ip4",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4) + 3),
+		BPF_MOV64_IMM(BPF_REG_0, 0),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup local_ip4",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x7f000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip4)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_inet_lookup remote_ip6",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[3])),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup,
+				     remote_ip6[3]) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup,
+				     remote_ip6[3]) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup remote_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 1,2,4-byte read bpf_inet_lookup local_ip6",
+	.insns = {
+		/* 4-byte read */
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[3])),
+		/* 2-byte read */
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[3]) + 2),
+		/* 1-byte read */
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[3]) + 3),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup local_ip6",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 0x00000001U),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_ip6[0])),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup remote_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, remote_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_inet_lookup local_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup local_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_inet_lookup local_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_inet_lookup local_port",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup local_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup local_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup local_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup local_port",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, local_port)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_inet_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_inet_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_inet_lookup family",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup family",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, family)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"valid 4-byte read bpf_inet_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.result = ACCEPT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte read bpf_inet_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte read bpf_inet_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte read bpf_inet_lookup protocol",
+	.insns = {
+		BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_1,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 8-byte write bpf_inet_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 4-byte write bpf_inet_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 2-byte write bpf_inet_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_H, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
+{
+	"invalid 1-byte write bpf_inet_lookup protocol",
+	.insns = {
+		BPF_MOV64_IMM(BPF_REG_0, 1234),
+		BPF_STX_MEM(BPF_B, BPF_REG_1, BPF_REG_0,
+			    offsetof(struct bpf_inet_lookup, protocol)),
+		BPF_EXIT_INSN(),
+	},
+	.errstr = "invalid bpf_context access",
+	.result = REJECT,
+	.prog_type = BPF_PROG_TYPE_INET_LOOKUP,
+},
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 04/12] inet: Store layer 4 protocol in inet_hashinfo
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (2 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 03/12] bpf: Add verifier tests for inet_lookup context access Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 05/12] udp: Store layer 4 protocol in udp_table Jakub Sitnicki
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Make it possible to identify the protocol of the sockets stored in hashinfo
without looking up one.

Subsequent patches make use the new field at the socket lookup time to
enforce that the BPF program selects only sockets with matching protocol.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet_hashtables.h | 3 +++
 net/dccp/proto.c              | 2 +-
 net/ipv4/tcp_ipv4.c           | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index af2b4c065a04..b2d43ee72dc1 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -138,6 +138,9 @@ struct inet_hashinfo {
 	unsigned int			lhash2_mask;
 	struct inet_listen_hashbucket	*lhash2;
 
+	/* Layer 4 protocol of the stored sockets */
+	int				protocol;
+
 	/* All the above members are written once at bootup and
 	 * never written again _or_ are predominantly read-access.
 	 *
diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 5bad08dc4316..805eee1b4fb0 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -45,7 +45,7 @@ EXPORT_SYMBOL_GPL(dccp_statistics);
 struct percpu_counter dccp_orphan_count;
 EXPORT_SYMBOL_GPL(dccp_orphan_count);
 
-struct inet_hashinfo dccp_hashinfo;
+struct inet_hashinfo dccp_hashinfo = { .protocol = IPPROTO_DCCP };
 EXPORT_SYMBOL_GPL(dccp_hashinfo);
 
 /* the maximum queue length for tx in packets. 0 is no limit */
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index fd394ad179a0..5d2afbcc45cc 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -87,7 +87,7 @@ static int tcp_v4_md5_hash_hdr(char *md5_hash, const struct tcp_md5sig_key *key,
 			       __be32 daddr, __be32 saddr, const struct tcphdr *th);
 #endif
 
-struct inet_hashinfo tcp_hashinfo;
+struct inet_hashinfo tcp_hashinfo = { .protocol = IPPROTO_TCP };
 EXPORT_SYMBOL(tcp_hashinfo);
 
 static u32 tcp_v4_init_seq(const struct sk_buff *skb)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 05/12] udp: Store layer 4 protocol in udp_table
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (3 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 04/12] inet: Store layer 4 protocol in inet_hashinfo Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 06/12] inet: Run inet_lookup bpf program on socket lookup Jakub Sitnicki
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Because UDP and UDP-Lite share code we have to pass the L4 protocol
identifier alongside the socket table to down call sites where
distinguishing between the two is needed.

There is currently only one such call site, which by itself is not reason
enough for the change.

However, subsequent patches will also make use the new udp_table field
inside the socket lookup routine to enforce that the BPF program selects
only sockets with matching protocol.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/udp.h   | 10 ++++++----
 net/ipv4/udp.c      | 15 +++++++--------
 net/ipv4/udp_impl.h |  2 +-
 net/ipv4/udplite.c  |  4 ++--
 net/ipv6/udp.c      | 12 ++++++------
 net/ipv6/udp_impl.h |  2 +-
 net/ipv6/udplite.c  |  2 +-
 7 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/net/udp.h b/include/net/udp.h
index 79d141d2103b..97778976c5ec 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -63,16 +63,18 @@ struct udp_hslot {
 /**
  *	struct udp_table - UDP table
  *
- *	@hash:	hash table, sockets are hashed on (local port)
- *	@hash2:	hash table, sockets are hashed on (local port, local address)
- *	@mask:	number of slots in hash tables, minus 1
- *	@log:	log2(number of slots in hash table)
+ *	@hash:		hash table, sockets are hashed on (local port)
+ *	@hash2:		hash table, sockets are hashed on (local port, local address)
+ *	@mask:		number of slots in hash tables, minus 1
+ *	@log:		log2(number of slots in hash table)
+ *	@protocol:	layer 4 protocol of the stored sockets
  */
 struct udp_table {
 	struct udp_hslot	*hash;
 	struct udp_hslot	*hash2;
 	unsigned int		mask;
 	unsigned int		log;
+	int			protocol;
 };
 extern struct udp_table udp_table;
 void udp_table_init(struct udp_table *, const char *);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index d88821c794fb..9fffe9e9eec6 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -113,7 +113,7 @@
 #include <net/addrconf.h>
 #include <net/udp_tunnel.h>
 
-struct udp_table udp_table __read_mostly;
+struct udp_table udp_table __read_mostly = { .protocol = IPPROTO_UDP };
 EXPORT_SYMBOL(udp_table);
 
 long sysctl_udp_mem[3] __read_mostly;
@@ -2106,8 +2106,7 @@ EXPORT_SYMBOL(udp_sk_rx_dst_set);
 static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 				    struct udphdr  *uh,
 				    __be32 saddr, __be32 daddr,
-				    struct udp_table *udptable,
-				    int proto)
+				    struct udp_table *udptable)
 {
 	struct sock *sk, *first = NULL;
 	unsigned short hnum = ntohs(uh->dest);
@@ -2163,7 +2162,7 @@ static int __udp4_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	} else {
 		kfree_skb(skb);
 		__UDP_INC_STATS(net, UDP_MIB_IGNOREDMULTI,
-				proto == IPPROTO_UDPLITE);
+				udptable->protocol == IPPROTO_UDPLITE);
 	}
 	return 0;
 }
@@ -2240,8 +2239,7 @@ static int udp_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
  *	All we need to do is get the socket, and then do a checksum.
  */
 
-int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
-		   int proto)
+int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable)
 {
 	struct sock *sk;
 	struct udphdr *uh;
@@ -2249,6 +2247,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	struct rtable *rt = skb_rtable(skb);
 	__be32 saddr, daddr;
 	struct net *net = dev_net(skb->dev);
+	int proto = udptable->protocol;
 
 	/*
 	 *  Validate the packet.
@@ -2289,7 +2288,7 @@ int __udp4_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 
 	if (rt->rt_flags & (RTCF_BROADCAST|RTCF_MULTICAST))
 		return __udp4_lib_mcast_deliver(net, skb, uh,
-						saddr, daddr, udptable, proto);
+						saddr, daddr, udptable);
 
 	sk = __udp4_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
 	if (sk)
@@ -2463,7 +2462,7 @@ int udp_v4_early_demux(struct sk_buff *skb)
 
 int udp_rcv(struct sk_buff *skb)
 {
-	return __udp4_lib_rcv(skb, &udp_table, IPPROTO_UDP);
+	return __udp4_lib_rcv(skb, &udp_table);
 }
 
 void udp_destroy_sock(struct sock *sk)
diff --git a/net/ipv4/udp_impl.h b/net/ipv4/udp_impl.h
index 6b2fa77eeb1c..7013535f9084 100644
--- a/net/ipv4/udp_impl.h
+++ b/net/ipv4/udp_impl.h
@@ -6,7 +6,7 @@
 #include <net/protocol.h>
 #include <net/inet_common.h>
 
-int __udp4_lib_rcv(struct sk_buff *, struct udp_table *, int);
+int __udp4_lib_rcv(struct sk_buff *, struct udp_table *);
 int __udp4_lib_err(struct sk_buff *, u32, struct udp_table *);
 
 int udp_v4_get_port(struct sock *sk, unsigned short snum);
diff --git a/net/ipv4/udplite.c b/net/ipv4/udplite.c
index 5936d66d1ce2..4e4e85de95b2 100644
--- a/net/ipv4/udplite.c
+++ b/net/ipv4/udplite.c
@@ -14,12 +14,12 @@
 #include <linux/proc_fs.h>
 #include "udp_impl.h"
 
-struct udp_table 	udplite_table __read_mostly;
+struct udp_table udplite_table __read_mostly = { .protocol = IPPROTO_UDPLITE };
 EXPORT_SYMBOL(udplite_table);
 
 static int udplite_rcv(struct sk_buff *skb)
 {
-	return __udp4_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
+	return __udp4_lib_rcv(skb, &udplite_table);
 }
 
 static int udplite_err(struct sk_buff *skb, u32 info)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 827fe7385078..16ef2303bd8d 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -741,7 +741,7 @@ static void udp6_csum_zero_error(struct sk_buff *skb)
  */
 static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 		const struct in6_addr *saddr, const struct in6_addr *daddr,
-		struct udp_table *udptable, int proto)
+		struct udp_table *udptable)
 {
 	struct sock *sk, *first = NULL;
 	const struct udphdr *uh = udp_hdr(skb);
@@ -803,7 +803,7 @@ static int __udp6_lib_mcast_deliver(struct net *net, struct sk_buff *skb,
 	} else {
 		kfree_skb(skb);
 		__UDP6_INC_STATS(net, UDP_MIB_IGNOREDMULTI,
-				 proto == IPPROTO_UDPLITE);
+				 udptable->protocol == IPPROTO_UDPLITE);
 	}
 	return 0;
 }
@@ -836,11 +836,11 @@ static int udp6_unicast_rcv_skb(struct sock *sk, struct sk_buff *skb,
 	return 0;
 }
 
-int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
-		   int proto)
+int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable)
 {
 	const struct in6_addr *saddr, *daddr;
 	struct net *net = dev_net(skb->dev);
+	int proto = udptable->protocol;
 	struct udphdr *uh;
 	struct sock *sk;
 	u32 ulen = 0;
@@ -902,7 +902,7 @@ int __udp6_lib_rcv(struct sk_buff *skb, struct udp_table *udptable,
 	 */
 	if (ipv6_addr_is_multicast(daddr))
 		return __udp6_lib_mcast_deliver(net, skb,
-				saddr, daddr, udptable, proto);
+				saddr, daddr, udptable);
 
 	/* Unicast */
 	sk = __udp6_lib_lookup_skb(skb, uh->source, uh->dest, udptable);
@@ -1011,7 +1011,7 @@ INDIRECT_CALLABLE_SCOPE void udp_v6_early_demux(struct sk_buff *skb)
 
 INDIRECT_CALLABLE_SCOPE int udpv6_rcv(struct sk_buff *skb)
 {
-	return __udp6_lib_rcv(skb, &udp_table, IPPROTO_UDP);
+	return __udp6_lib_rcv(skb, &udp_table);
 }
 
 /*
diff --git a/net/ipv6/udp_impl.h b/net/ipv6/udp_impl.h
index 20e324b6f358..acd5a942c633 100644
--- a/net/ipv6/udp_impl.h
+++ b/net/ipv6/udp_impl.h
@@ -8,7 +8,7 @@
 #include <net/inet_common.h>
 #include <net/transp_v6.h>
 
-int __udp6_lib_rcv(struct sk_buff *, struct udp_table *, int);
+int __udp6_lib_rcv(struct sk_buff *, struct udp_table *);
 int __udp6_lib_err(struct sk_buff *, struct inet6_skb_parm *, u8, u8, int,
 		   __be32, struct udp_table *);
 
diff --git a/net/ipv6/udplite.c b/net/ipv6/udplite.c
index bf7a7acd39b1..f442ed595e6f 100644
--- a/net/ipv6/udplite.c
+++ b/net/ipv6/udplite.c
@@ -14,7 +14,7 @@
 
 static int udplitev6_rcv(struct sk_buff *skb)
 {
-	return __udp6_lib_rcv(skb, &udplite_table, IPPROTO_UDPLITE);
+	return __udp6_lib_rcv(skb, &udplite_table);
 }
 
 static int udplitev6_err(struct sk_buff *skb,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 06/12] inet: Run inet_lookup bpf program on socket lookup
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (4 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 05/12] udp: Store layer 4 protocol in udp_table Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 07/12] inet6: " Jakub Sitnicki
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Run a BPF program before looking up the listening socket. The program can
redirect the skb to a listening socket of its choice, providing it calls
bpf_redirect_lookup() helper and returns BPF_REDIRECT.

This lets the user-space program mappings between packet 4-tuple and
listening sockets. With the possibility to override the socket lookup from
BPF, applications don't need to bind sockets to every addresses they
receive on, or resort to listening on all addresses with INADDR_ANY.

Also port sharing conflicts become a non-issue. Application can listen on
any free port and still receive traffic destined to its assigned service
port.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet_hashtables.h | 33 +++++++++++++++++++++++++++++++++
 net/ipv4/inet_hashtables.c    |  5 +++++
 2 files changed, 38 insertions(+)

diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
index b2d43ee72dc1..c9c7efb961cb 100644
--- a/include/net/inet_hashtables.h
+++ b/include/net/inet_hashtables.h
@@ -417,4 +417,37 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
 
 int inet_hash_connect(struct inet_timewait_death_row *death_row,
 		      struct sock *sk);
+
+static inline struct sock *__inet_lookup_run_bpf(const struct net *net,
+						 struct bpf_inet_lookup_kern *ctx)
+{
+	struct bpf_prog *prog;
+	int ret = BPF_OK;
+
+	rcu_read_lock();
+	prog = rcu_dereference(net->inet_lookup_prog);
+	if (prog)
+		ret = BPF_PROG_RUN(prog, ctx);
+	rcu_read_unlock();
+
+	return ret == BPF_REDIRECT ? ctx->redir_sk : NULL;
+}
+
+static inline struct sock *inet_lookup_run_bpf(const struct net *net, u8 proto,
+					       __be32 saddr, __be16 sport,
+					       __be32 daddr,
+					       unsigned short hnum)
+{
+	struct bpf_inet_lookup_kern ctx = {
+		.family		= AF_INET,
+		.protocol	= proto,
+		.saddr		= saddr,
+		.sport		= sport,
+		.daddr		= daddr,
+		.hnum		= hnum,
+	};
+
+	return __inet_lookup_run_bpf(net, &ctx);
+}
+
 #endif /* _INET_HASHTABLES_H */
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 97824864e40d..ab6d89c27c94 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -299,6 +299,11 @@ struct sock *__inet_lookup_listener(struct net *net,
 	struct sock *result = NULL;
 	unsigned int hash2;
 
+	result = inet_lookup_run_bpf(net, hashinfo->protocol,
+				     saddr, sport, daddr, hnum);
+	if (result)
+		goto done;
+
 	hash2 = ipv4_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 07/12] inet6: Run inet_lookup bpf program on socket lookup
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (5 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 06/12] inet: Run inet_lookup bpf program on socket lookup Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 08/12] udp: " Jakub Sitnicki
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Following the ipv4 changes, run a BPF program attached to netns in context
of which we're doing the socket lookup so that it can redirect the skb to a
socket of its choice. The program runs before the listening socket lookup.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 include/net/inet6_hashtables.h | 19 +++++++++++++++++++
 net/ipv6/inet6_hashtables.c    |  5 +++++
 2 files changed, 24 insertions(+)

diff --git a/include/net/inet6_hashtables.h b/include/net/inet6_hashtables.h
index fe96bf247aac..c2393d148d8d 100644
--- a/include/net/inet6_hashtables.h
+++ b/include/net/inet6_hashtables.h
@@ -104,6 +104,25 @@ struct sock *inet6_lookup(struct net *net, struct inet_hashinfo *hashinfo,
 			  const int dif);
 
 int inet6_hash(struct sock *sk);
+
+static inline struct sock *inet6_lookup_run_bpf(struct net *net, u8 proto,
+						const struct in6_addr *saddr,
+						__be16 sport,
+						const struct in6_addr *daddr,
+						unsigned short hnum)
+{
+	struct bpf_inet_lookup_kern ctx = {
+		.family		= AF_INET6,
+		.protocol	= proto,
+		.saddr6		= *saddr,
+		.sport		= sport,
+		.daddr6		= *daddr,
+		.hnum		= hnum,
+	};
+
+	return __inet_lookup_run_bpf(net, &ctx);
+}
+
 #endif /* IS_ENABLED(CONFIG_IPV6) */
 
 #define INET6_MATCH(__sk, __net, __saddr, __daddr, __ports, __dif, __sdif) \
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index cf60fae9533b..40dd0a3d80ed 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -157,6 +157,11 @@ struct sock *inet6_lookup_listener(struct net *net,
 	struct sock *result = NULL;
 	unsigned int hash2;
 
+	result = inet6_lookup_run_bpf(net, hashinfo->protocol,
+				      saddr, sport, daddr, hnum);
+	if (result)
+		goto done;
+
 	hash2 = ipv6_portaddr_hash(net, daddr, hnum);
 	ilb2 = inet_lhash2_bucket(hashinfo, hash2);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 08/12] udp: Run inet_lookup bpf program on socket lookup
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (6 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 07/12] inet6: " Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 09/12] udp6: " Jakub Sitnicki
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Following the TCP socket lookup changes, allow selecting the receiving
socket from BPF before searching for bound socket by destination address
and port.

As connected and bound but non-connected socket lookup currently happens in
one step, we split the lookup in two phases to run BPF only after a lookup
for a connected socket was a miss. Hence making sure connected UDP sockets
continue to work as expected in presence of a BPF inet_lookup program.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv4/udp.c | 44 ++++++++++++++++++++++++++++++++------------
 1 file changed, 32 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9fffe9e9eec6..3a4b98f89249 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -353,7 +353,7 @@ int udp_v4_get_port(struct sock *sk, unsigned short snum)
 static int compute_score(struct sock *sk, struct net *net,
 			 __be32 saddr, __be16 sport,
 			 __be32 daddr, unsigned short hnum,
-			 int dif, int sdif)
+			 int dif, int sdif, unsigned char state)
 {
 	int score;
 	struct inet_sock *inet;
@@ -364,6 +364,9 @@ static int compute_score(struct sock *sk, struct net *net,
 	    ipv6_only_sock(sk))
 		return -1;
 
+	if (state && sk->sk_state != state)
+		return -1;
+
 	if (sk->sk_rcv_saddr != daddr)
 		return -1;
 
@@ -411,7 +414,8 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 				     __be32 daddr, unsigned int hnum,
 				     int dif, int sdif,
 				     struct udp_hslot *hslot2,
-				     struct sk_buff *skb)
+				     struct sk_buff *skb,
+				     unsigned char state)
 {
 	struct sock *sk, *result;
 	int score, badness;
@@ -421,7 +425,7 @@ static struct sock *udp4_lib_lookup2(struct net *net,
 	badness = 0;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
 		score = compute_score(sk, net, saddr, sport,
-				      daddr, hnum, dif, sdif);
+				      daddr, hnum, dif, sdif, state);
 		if (score > badness) {
 			if (sk->sk_reuseport) {
 				hash = udp_ehashfn(net, daddr, hnum,
@@ -454,18 +458,34 @@ struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr,
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected sockets */
 	result = udp4_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
-				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
-		slot2 = hash2 & udptable->mask;
-		hslot2 = &udptable->hash2[slot2];
+				  hslot2, skb, TCP_ESTABLISHED);
+	if (result)
+		goto done;
 
-		result = udp4_lib_lookup2(net, saddr, sport,
-					  htonl(INADDR_ANY), hnum, dif, sdif,
-					  hslot2, skb);
-	}
+	/* Lookup redirect from BPF */
+	result = inet_lookup_run_bpf(net, udptable->protocol,
+				     saddr, sport, daddr, hnum);
+	if (result)
+		goto done;
+
+	/* Lookup bound sockets */
+	result = udp4_lib_lookup2(net, saddr, sport,
+				  daddr, hnum, dif, sdif,
+				  hslot2, skb, 0);
+	if (result)
+		goto done;
+
+	hash2 = ipv4_portaddr_hash(net, htonl(INADDR_ANY), hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp4_lib_lookup2(net, saddr, sport,
+				  htonl(INADDR_ANY), hnum, dif, sdif,
+				  hslot2, skb, 0);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 09/12] udp6: Run inet_lookup bpf program on socket lookup
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (7 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 08/12] udp: " Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 10/12] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Same as for udp, split the socket lookup into two phases and let the BPF
inet_lookup program select the receiving socket.

Suggested-by: Marek Majkowski <marek@cloudflare.com>
Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/ipv6/udp.c | 42 ++++++++++++++++++++++++++++++------------
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 16ef2303bd8d..7380cf57e88c 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -101,7 +101,7 @@ void udp_v6_rehash(struct sock *sk)
 static int compute_score(struct sock *sk, struct net *net,
 			 const struct in6_addr *saddr, __be16 sport,
 			 const struct in6_addr *daddr, unsigned short hnum,
-			 int dif, int sdif)
+			 int dif, int sdif, unsigned char state)
 {
 	int score;
 	struct inet_sock *inet;
@@ -112,6 +112,9 @@ static int compute_score(struct sock *sk, struct net *net,
 	    sk->sk_family != PF_INET6)
 		return -1;
 
+	if (state && sk->sk_state != state)
+		return -1;
+
 	if (!ipv6_addr_equal(&sk->sk_v6_rcv_saddr, daddr))
 		return -1;
 
@@ -146,7 +149,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 		const struct in6_addr *saddr, __be16 sport,
 		const struct in6_addr *daddr, unsigned int hnum,
 		int dif, int sdif, struct udp_hslot *hslot2,
-		struct sk_buff *skb)
+		struct sk_buff *skb, unsigned char state)
 {
 	struct sock *sk, *result;
 	int score, badness;
@@ -156,7 +159,7 @@ static struct sock *udp6_lib_lookup2(struct net *net,
 	badness = -1;
 	udp_portaddr_for_each_entry_rcu(sk, &hslot2->head) {
 		score = compute_score(sk, net, saddr, sport,
-				      daddr, hnum, dif, sdif);
+				      daddr, hnum, dif, sdif, state);
 		if (score > badness) {
 			if (sk->sk_reuseport) {
 				hash = udp6_ehashfn(net, daddr, hnum,
@@ -190,19 +193,34 @@ struct sock *__udp6_lib_lookup(struct net *net,
 	slot2 = hash2 & udptable->mask;
 	hslot2 = &udptable->hash2[slot2];
 
+	/* Lookup connected sockets */
 	result = udp6_lib_lookup2(net, saddr, sport,
 				  daddr, hnum, dif, sdif,
-				  hslot2, skb);
-	if (!result) {
-		hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
-		slot2 = hash2 & udptable->mask;
+				  hslot2, skb, TCP_ESTABLISHED);
+	if (result)
+		goto done;
 
-		hslot2 = &udptable->hash2[slot2];
+	/* Lookup redirect from BPF */
+	result = inet6_lookup_run_bpf(net, udptable->protocol,
+				      saddr, sport, daddr, hnum);
+	if (result)
+		goto done;
 
-		result = udp6_lib_lookup2(net, saddr, sport,
-					  &in6addr_any, hnum, dif, sdif,
-					  hslot2, skb);
-	}
+	/* Lookup bound sockets */
+	result = udp6_lib_lookup2(net, saddr, sport,
+				  daddr, hnum, dif, sdif,
+				  hslot2, skb, 0);
+	if (result)
+		goto done;
+
+	hash2 = ipv6_portaddr_hash(net, &in6addr_any, hnum);
+	slot2 = hash2 & udptable->mask;
+	hslot2 = &udptable->hash2[slot2];
+
+	result = udp6_lib_lookup2(net, saddr, sport,
+				  &in6addr_any, hnum, dif, sdif,
+				  hslot2, skb, 0);
+done:
 	if (IS_ERR(result))
 		return NULL;
 	return result;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 10/12] bpf: Sync linux/bpf.h to tools/
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (8 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 09/12] udp6: " Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 11/12] libbpf: Add support for inet_lookup program type Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 12/12] bpf: Test redirecting listening/receiving socket lookup Jakub Sitnicki
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Newly added program, context type and helper is used by tests in a
subsequent patch. Synchronize the header file.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/include/uapi/linux/bpf.h | 58 +++++++++++++++++++++++++++++++++-
 1 file changed, 57 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index b5889257cc33..58ee3d24a430 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -173,6 +173,7 @@ enum bpf_prog_type {
 	BPF_PROG_TYPE_CGROUP_SYSCTL,
 	BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE,
 	BPF_PROG_TYPE_CGROUP_SOCKOPT,
+	BPF_PROG_TYPE_INET_LOOKUP,
 };
 
 enum bpf_attach_type {
@@ -199,6 +200,7 @@ enum bpf_attach_type {
 	BPF_CGROUP_UDP6_RECVMSG,
 	BPF_CGROUP_GETSOCKOPT,
 	BPF_CGROUP_SETSOCKOPT,
+	BPF_INET_LOOKUP,
 	__MAX_BPF_ATTACH_TYPE
 };
 
@@ -2747,6 +2749,33 @@ union bpf_attr {
  *		**-EOPNOTSUPP** kernel configuration does not enable SYN cookies
  *
  *		**-EPROTONOSUPPORT** IP packet version is not 4 or 6
+ *
+ * int bpf_redirect_lookup(struct bpf_inet_lookup_kern *ctx, struct bpf_map *sockarray, void *key, u64 flags)
+ *	Description
+ *		Select a socket referenced by *map* (of type
+ *		**BPF_MAP_TYPE_REUSEPORT_SOCKARRAY**) at index *key* to use as a
+ *		result of listening (TCP) or bound (UDP) socket lookup.
+ *
+ *		The IP family and L4 protocol in *ctx* object, populated from
+ *		the packet that triggered the lookup, must match the selected
+ *		socket's family and protocol. IP6_V6ONLY socket option is
+ *		honored.
+ *
+ *		To be used by **BPF_INET_LOOKUP** programs attached to the
+ *		network namespace. Program needs to return **BPF_REDIRECT**, the
+ *		helper's success return value, for the selected socket to be
+ *		actually used.
+ *
+ *	Return
+ *		**BPF_REDIRECT** on success, if the socket at index *key* was selected.
+ *
+ *		**-EINVAL** if *flags* are invalid (not zero).
+ *
+ *		**-ENOENT** if there is no socket at index *key*.
+ *
+ *		**-EPROTOTYPE** if *ctx->protocol* does not match the socket protocol.
+ *
+ *		**-EAFNOSUPPORT** if socket does not accept IP version in *ctx->family*.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -2859,7 +2888,8 @@ union bpf_attr {
 	FN(sk_storage_get),		\
 	FN(sk_storage_delete),		\
 	FN(send_signal),		\
-	FN(tcp_gen_syncookie),
+	FN(tcp_gen_syncookie),		\
+	FN(redirect_lookup),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -3116,6 +3146,32 @@ struct bpf_tcp_sock {
 	__u32 icsk_retransmits;	/* Number of unrecovered [RTO] timeouts */
 };
 
+/* User accessible data for inet_lookup programs.
+ * New fields must be added at the end.
+ */
+struct bpf_inet_lookup {
+	__u32 family;		/* AF_INET, AF_INET6 */
+	__u32 protocol;		/* IPROTO_TCP, IPPROTO_UDP */
+	__u32 remote_ip4;	/* Allows 1,2,4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_ip4;	/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 remote_ip6[4];	/* Allows 1,2,4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_ip6[4];	/* Allows 1,2,4-byte read and 4-byte write.
+				 * Stored in network byte order.
+				 */
+	__u32 remote_port;	/* Allows 4-byte read but no write.
+				 * Stored in network byte order.
+				 */
+	__u32 local_port;	/* Allows 4-byte read and write.
+				 * Stored in host byte order.
+				 */
+};
+
 struct bpf_sock_tuple {
 	union {
 		struct {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 11/12] libbpf: Add support for inet_lookup program type
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (9 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 10/12] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  2019-08-28  7:22 ` [RFCv2 bpf-next 12/12] bpf: Test redirecting listening/receiving socket lookup Jakub Sitnicki
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Make libbpf aware of the newly added program type. Reserve a section name
for it.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/lib/bpf/libbpf.c        | 4 ++++
 tools/lib/bpf/libbpf.h        | 2 ++
 tools/lib/bpf/libbpf.map      | 2 ++
 tools/lib/bpf/libbpf_probes.c | 1 +
 4 files changed, 9 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 2233f919dd88..addb9762e965 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -3580,6 +3580,7 @@ static bool bpf_prog_type__needs_kver(enum bpf_prog_type type)
 	case BPF_PROG_TYPE_PERF_EVENT:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_INET_LOOKUP:
 		return false;
 	case BPF_PROG_TYPE_KPROBE:
 	default:
@@ -4447,6 +4448,7 @@ BPF_PROG_TYPE_FNS(tracepoint, BPF_PROG_TYPE_TRACEPOINT);
 BPF_PROG_TYPE_FNS(raw_tracepoint, BPF_PROG_TYPE_RAW_TRACEPOINT);
 BPF_PROG_TYPE_FNS(xdp, BPF_PROG_TYPE_XDP);
 BPF_PROG_TYPE_FNS(perf_event, BPF_PROG_TYPE_PERF_EVENT);
+BPF_PROG_TYPE_FNS(inet_lookup, BPF_PROG_TYPE_INET_LOOKUP);
 
 void bpf_program__set_expected_attach_type(struct bpf_program *prog,
 					   enum bpf_attach_type type)
@@ -4542,6 +4544,8 @@ static const struct {
 						BPF_CGROUP_GETSOCKOPT),
 	BPF_EAPROG_SEC("cgroup/setsockopt",	BPF_PROG_TYPE_CGROUP_SOCKOPT,
 						BPF_CGROUP_SETSOCKOPT),
+	BPF_EAPROG_SEC("inet_lookup",		BPF_PROG_TYPE_INET_LOOKUP,
+						BPF_INET_LOOKUP),
 };
 
 #undef BPF_PROG_SEC_IMPL
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index e8f70977d137..937d6da9430a 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -262,6 +262,7 @@ LIBBPF_API int bpf_program__set_sched_cls(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_sched_act(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_xdp(struct bpf_program *prog);
 LIBBPF_API int bpf_program__set_perf_event(struct bpf_program *prog);
+LIBBPF_API int bpf_program__set_inet_lookup(struct bpf_program *prog);
 LIBBPF_API void bpf_program__set_type(struct bpf_program *prog,
 				      enum bpf_prog_type type);
 LIBBPF_API void
@@ -276,6 +277,7 @@ LIBBPF_API bool bpf_program__is_sched_cls(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_sched_act(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_xdp(const struct bpf_program *prog);
 LIBBPF_API bool bpf_program__is_perf_event(const struct bpf_program *prog);
+LIBBPF_API bool bpf_program__is_inet_lookup(const struct bpf_program *prog);
 
 /*
  * No need for __attribute__((packed)), all members of 'bpf_map_def'
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 664ce8e7a60e..57564ad458ba 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -67,6 +67,7 @@ LIBBPF_0.0.1 {
 		bpf_prog_test_run;
 		bpf_prog_test_run_xattr;
 		bpf_program__fd;
+		bpf_program__is_inet_lookup;
 		bpf_program__is_kprobe;
 		bpf_program__is_perf_event;
 		bpf_program__is_raw_tracepoint;
@@ -84,6 +85,7 @@ LIBBPF_0.0.1 {
 		bpf_program__priv;
 		bpf_program__set_expected_attach_type;
 		bpf_program__set_ifindex;
+		bpf_program__set_inet_lookup;
 		bpf_program__set_kprobe;
 		bpf_program__set_perf_event;
 		bpf_program__set_prep;
diff --git a/tools/lib/bpf/libbpf_probes.c b/tools/lib/bpf/libbpf_probes.c
index 4b0b0364f5fc..c365223a2d1e 100644
--- a/tools/lib/bpf/libbpf_probes.c
+++ b/tools/lib/bpf/libbpf_probes.c
@@ -102,6 +102,7 @@ probe_load(enum bpf_prog_type prog_type, const struct bpf_insn *insns,
 	case BPF_PROG_TYPE_FLOW_DISSECTOR:
 	case BPF_PROG_TYPE_CGROUP_SYSCTL:
 	case BPF_PROG_TYPE_CGROUP_SOCKOPT:
+	case BPF_PROG_TYPE_INET_LOOKUP:
 	default:
 		break;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFCv2 bpf-next 12/12] bpf: Test redirecting listening/receiving socket lookup
  2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
                   ` (10 preceding siblings ...)
  2019-08-28  7:22 ` [RFCv2 bpf-next 11/12] libbpf: Add support for inet_lookup program type Jakub Sitnicki
@ 2019-08-28  7:22 ` Jakub Sitnicki
  11 siblings, 0 replies; 13+ messages in thread
From: Jakub Sitnicki @ 2019-08-28  7:22 UTC (permalink / raw)
  To: bpf, netdev; +Cc: kernel-team, Lorenz Bauer, Marek Majkowski

Check that steering the packets targeted at a local (address, port) that is
different than the server's bind() address with a BPF inet_lookup program
works as expected for TCP or UDP over either IPv4 or IPv6. Make sure that
it is possible to redirect IPv4 packets to IPv6 sockets that are not
V6-only.

Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 tools/testing/selftests/bpf/.gitignore        |   1 +
 tools/testing/selftests/bpf/Makefile          |   5 +-
 tools/testing/selftests/bpf/bpf_helpers.h     |   3 +
 .../selftests/bpf/progs/inet_lookup_progs.c   |  78 +++
 .../testing/selftests/bpf/test_inet_lookup.c  | 522 ++++++++++++++++++
 .../testing/selftests/bpf/test_inet_lookup.sh |  35 ++
 6 files changed, 642 insertions(+), 2 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/inet_lookup_progs.c
 create mode 100644 tools/testing/selftests/bpf/test_inet_lookup.c
 create mode 100755 tools/testing/selftests/bpf/test_inet_lookup.sh

diff --git a/tools/testing/selftests/bpf/.gitignore b/tools/testing/selftests/bpf/.gitignore
index 60c9338cd9b4..7442bd9166c7 100644
--- a/tools/testing/selftests/bpf/.gitignore
+++ b/tools/testing/selftests/bpf/.gitignore
@@ -44,3 +44,4 @@ test_sockopt_sk
 test_sockopt_multi
 test_sockopt_inherit
 test_tcp_rtt
+test_inet_lookup
diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
index 7a23d94fe6a9..89dbbc032c8f 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -65,7 +65,8 @@ TEST_PROGS := test_kmod.sh \
 	test_tcp_check_syncookie.sh \
 	test_tc_tunnel.sh \
 	test_tc_edt.sh \
-	test_xdping.sh
+	test_xdping.sh \
+	test_inet_lookup.sh
 
 TEST_PROGS_EXTENDED := with_addr.sh \
 	with_tunnels.sh \
@@ -75,7 +76,7 @@ TEST_PROGS_EXTENDED := with_addr.sh \
 # Compile but not part of 'make run_tests'
 TEST_GEN_PROGS_EXTENDED = test_libbpf_open test_sock_addr test_skb_cgroup_id_user \
 	flow_dissector_load test_flow_dissector test_tcp_check_syncookie_user \
-	test_lirc_mode2_user
+	test_lirc_mode2_user test_inet_lookup
 
 include ../lib.mk
 
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h b/tools/testing/selftests/bpf/bpf_helpers.h
index 6c4930bc6e2e..dda00609098a 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -231,6 +231,9 @@ static int (*bpf_send_signal)(unsigned sig) = (void *)BPF_FUNC_send_signal;
 static long long (*bpf_tcp_gen_syncookie)(struct bpf_sock *sk, void *ip,
 					  int ip_len, void *tcp, int tcp_len) =
 	(void *) BPF_FUNC_tcp_gen_syncookie;
+static int (*bpf_redirect_lookup)(void *ctx, void *map, void *key,
+				  __u64 flags) =
+	(void *) BPF_FUNC_redirect_lookup;
 
 /* llvm builtin functions that eBPF C program may use to
  * emit BPF_LD_ABS and BPF_LD_IND instructions
diff --git a/tools/testing/selftests/bpf/progs/inet_lookup_progs.c b/tools/testing/selftests/bpf/progs/inet_lookup_progs.c
new file mode 100644
index 000000000000..16b1b2e241e4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/inet_lookup_progs.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <sys/socket.h>
+
+#include "bpf_endian.h"
+#include "bpf_helpers.h"
+
+#define IP4(a, b, c, d)	((__u32)(		\
+	((__u32)((a) & (__u32)0xffUL) << 24) |	\
+	((__u32)((b) & (__u32)0xffUL) << 16) |	\
+	((__u32)((c) & (__u32)0xffUL) <<  8) |	\
+	((__u32)((d) & (__u32)0xffUL) <<  0)))
+
+#define REUSEPORT_ARRAY_SIZE 32
+
+struct {
+	__uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
+	__uint(max_entries, REUSEPORT_ARRAY_SIZE);
+	__type(key, __u32);
+	__type(value, __u64);
+} redir_map SEC(".maps");
+
+static const __u32 DST_PORT = 7007;
+static const __u32 DST_IP4 = IP4(127, 0, 0, 1);
+static const __u32 DST_IP6[] = { 0xfd000000, 0x0, 0x0, 0x00000001 };
+
+/* Redirect packets destined for port DST_PORT to socket at redir_map[0]. */
+SEC("inet_lookup/redir_port")
+int redir_port(struct bpf_inet_lookup *ctx)
+{
+	__u32 index = 0;
+	__u64 flags = 0;
+
+	if (ctx->local_port != DST_PORT)
+		return BPF_OK;
+
+	return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
+}
+
+/* Redirect packets destined for DST_IP4 address to socket at redir_map[0]. */
+SEC("inet_lookup/redir_ip4")
+int redir_ip4(struct bpf_inet_lookup *ctx)
+{
+	__u32 index = 0;
+	__u64 flags = 0;
+
+	if (ctx->family != AF_INET)
+		return BPF_OK;
+	if (ctx->local_port != DST_PORT)
+		return BPF_OK;
+	if (ctx->local_ip4 != bpf_htonl(DST_IP4))
+		return BPF_OK;
+
+	return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
+}
+
+/* Redirect packets destined for DST_IP6 address to socket at redir_map[0]. */
+SEC("inet_lookup/redir_ip6")
+int redir_ip6(struct bpf_inet_lookup *ctx)
+{
+	__u32 index = 0;
+	__u64 flags = 0;
+
+	if (ctx->family != AF_INET6)
+		return BPF_OK;
+	if (ctx->local_port != DST_PORT)
+		return BPF_OK;
+	if (ctx->local_ip6[0] != bpf_htonl(DST_IP6[0]) ||
+	    ctx->local_ip6[1] != bpf_htonl(DST_IP6[1]) ||
+	    ctx->local_ip6[2] != bpf_htonl(DST_IP6[2]) ||
+	    ctx->local_ip6[3] != bpf_htonl(DST_IP6[3]))
+		return BPF_OK;
+
+	return bpf_redirect_lookup(ctx, &redir_map, &index, flags);
+}
+
+char _license[] SEC("license") = "GPL";
+__u32 _version SEC("version") = 1;
diff --git a/tools/testing/selftests/bpf/test_inet_lookup.c b/tools/testing/selftests/bpf/test_inet_lookup.c
new file mode 100644
index 000000000000..7e222488514c
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_inet_lookup.c
@@ -0,0 +1,522 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * L7 echo tests with the server listening/receiving at a different
+ * (address, port) than the client sends packets to.
+ *
+ * Traffic is steered to the server socket by redirecting the socket
+ * lookup with an eBPF inet_lookup program. The inet_lookup program,
+ * selected a target listening/bound socket from SOCKARRAY map based
+ * on the packet's 4-tuple.
+ */
+
+#include <arpa/inet.h>
+#include <assert.h>
+#include <errno.h>
+#include <error.h>
+#include <fcntl.h>
+#include <stdio.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <unistd.h>
+
+#include <bpf/libbpf.h>
+#include <bpf/bpf.h>
+
+#include "bpf_rlimit.h"
+#include "bpf_util.h"
+
+#define BPF_FILE	"./inet_lookup_progs.o"
+#define MAX_ERROR_LEN	256
+
+/* External (address, port) pairs the client sends packets to. */
+#define EXT_IP4		"127.0.0.1"
+#define EXT_IP6		"fd00::1"
+#define EXT_PORT	7007
+
+/* Internal (address, port) pairs the server listens/receives at. */
+#define INT_IP4		"127.0.0.2"
+#define INT_IP4_V6	"::ffff:127.0.0.2"
+#define INT_IP6		"fd00::2"
+#define INT_PORT	8008
+
+#define REUSEPORT_ARRAY_SIZE 32
+
+struct inet_addr {
+	const char *ip;
+	unsigned short port;
+};
+
+struct test {
+	const char *desc;
+	const char *bpf_prog;
+
+	int socket_type;
+
+	struct inet_addr send_to;
+	struct inet_addr recv_at;
+};
+
+static const struct test tests[] = {
+	{
+		.desc		= "TCP IPv4 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_STREAM,
+		.send_to	= { EXT_IP4, EXT_PORT },
+		.recv_at	= { EXT_IP4, INT_PORT },
+	},
+	{
+		.desc		= "TCP IPv4 redir addr",
+		.bpf_prog	= "inet_lookup/redir_ip4",
+		.socket_type	= SOCK_STREAM,
+		.send_to	= { EXT_IP4, EXT_PORT },
+		.recv_at	= { INT_IP4, EXT_PORT },
+	},
+	{
+		.desc		= "TCP IPv6 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_STREAM,
+		.send_to	= { EXT_IP6, EXT_PORT },
+		.recv_at	= { EXT_IP6, INT_PORT },
+	},
+	{
+		.desc		= "TCP IPv6 redir addr",
+		.bpf_prog	= "inet_lookup/redir_ip6",
+		.socket_type	= SOCK_STREAM,
+		.send_to	= { EXT_IP6, EXT_PORT },
+		.recv_at	= { INT_IP6, EXT_PORT },
+	},
+	{
+		.desc		= "TCP IPv4->IPv6 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_STREAM,
+		.recv_at	= { INT_IP4_V6, INT_PORT },
+		.send_to	= { EXT_IP4, EXT_PORT },
+	},
+	{
+		.desc		= "UDP IPv4 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_DGRAM,
+		.send_to	= { EXT_IP4, EXT_PORT },
+		.recv_at	= { EXT_IP4, INT_PORT },
+	},
+	{
+		.desc		= "UDP IPv4 redir addr",
+		.bpf_prog	= "inet_lookup/redir_ip4",
+		.socket_type	= SOCK_DGRAM,
+		.send_to	= { EXT_IP4, EXT_PORT },
+		.recv_at	= { INT_IP4, EXT_PORT },
+	},
+	{
+		.desc		= "UDP IPv6 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_DGRAM,
+		.send_to	= { EXT_IP6, EXT_PORT },
+		.recv_at	= { EXT_IP6, INT_PORT },
+	},
+	{
+		.desc		= "UDP IPv6 redir addr",
+		.bpf_prog	= "inet_lookup/redir_ip6",
+		.socket_type	= SOCK_DGRAM,
+		.send_to	= { EXT_IP6, EXT_PORT },
+		.recv_at	= { INT_IP6, EXT_PORT },
+	},
+	{
+		.desc		= "UDP IPv4->IPv6 redir port",
+		.bpf_prog	= "inet_lookup/redir_port",
+		.socket_type	= SOCK_DGRAM,
+		.recv_at	= { INT_IP4_V6, INT_PORT },
+		.send_to	= { EXT_IP4, EXT_PORT },
+	},
+};
+
+static bool is_ipv6_addr(const char *ip)
+{
+	return !!strchr(ip, ':');
+}
+
+static void make_addr(int family, const char *ip, int port,
+		      struct sockaddr_storage *ss, int *sz)
+{
+	struct sockaddr_in *addr4;
+	struct sockaddr_in6 *addr6;
+
+	switch (family) {
+	case AF_INET:
+		addr4 = (struct sockaddr_in *)ss;
+		addr4->sin_family = AF_INET;
+		addr4->sin_port = htons(port);
+		if (!inet_pton(AF_INET, ip, &addr4->sin_addr))
+			error(1, errno, "inet_pton failed: %s", ip);
+		*sz = sizeof(*addr4);
+		break;
+	case AF_INET6:
+		addr6 = (struct sockaddr_in6 *)ss;
+		addr6->sin6_family = AF_INET6;
+		addr6->sin6_port = htons(port);
+		if (!inet_pton(AF_INET6, ip, &addr6->sin6_addr))
+			error(1, errno, "inet_pton failed: %s", ip);
+		*sz = sizeof(*addr6);
+		break;
+	default:
+		error(1, 0, "unsupported family %d", family);
+	}
+}
+
+static int make_server(int type, const char *ip, int port)
+{
+	struct sockaddr_storage ss = {0};
+	int fd, opt, sz;
+	int family;
+
+	family = is_ipv6_addr(ip) ? AF_INET6 : AF_INET;
+	make_addr(family, ip, port, &ss, &sz);
+
+	fd = socket(family, type, 0);
+	if (fd < 0)
+		error(1, errno, "failed to create listen socket");
+
+	opt = 1;
+	if (setsockopt(fd, SOL_SOCKET, SO_REUSEPORT, &opt, sizeof(opt)))
+		error(1, errno, "failed to set SO_REUSEPORT");
+	if (type == SOCK_DGRAM) {
+		if (setsockopt(fd, SOL_IP, IP_RECVORIGDSTADDR,
+			       &opt, sizeof(opt)))
+			error(1, errno, "failed to set IP_RECVORIGDSTADDR");
+	}
+	if (family == AF_INET6 && type == SOCK_DGRAM) {
+		if (setsockopt(fd, SOL_IPV6, IPV6_RECVORIGDSTADDR,
+			       &opt, sizeof(opt)))
+			error(1, errno, "failed to set IPV6_RECVORIGDSTADDR");
+	}
+
+	if (bind(fd, (struct sockaddr *)&ss, sz))
+		error(1, errno, "failed to bind listen socket");
+
+	if (type == SOCK_STREAM && listen(fd, 1))
+		error(1, errno, "failed to listen on port %d", port);
+
+	return fd;
+}
+
+static int make_client(int type, const char *ip, int port)
+{
+	struct sockaddr_storage ss = {0};
+	struct sockaddr *sa;
+	int family;
+	int fd, sz;
+
+	family = is_ipv6_addr(ip) ? AF_INET6 : AF_INET;
+	make_addr(family, ip, port, &ss, &sz);
+	sa = (struct sockaddr *)&ss;
+
+	fd = socket(family, type, 0);
+	if (fd < 0)
+		error(1, errno, "failed to create socket");
+
+	if (connect(fd, sa, sz))
+		error(1, errno, "failed to connect socket");
+
+	return fd;
+}
+
+static void send_byte(int fd)
+{
+	if (send(fd, "a", 1, 0) < 1)
+		error(1, errno, "failed to send message");
+}
+
+static void recv_byte(int fd)
+{
+	char buf[1];
+
+	if (recv(fd, buf, sizeof(buf), 0) < 1)
+		error(1, errno, "failed to receive message");
+}
+
+static void tcp_recv_send(int server_fd)
+{
+	char buf[1];
+	size_t len;
+	ssize_t n;
+	int fd;
+
+	fd = accept(server_fd, NULL, NULL);
+	if (fd < 0)
+		error(1, errno, "failed to accept");
+
+	len = sizeof(buf);
+	n = recv(fd, buf, len, 0);
+	if (n < 0)
+		error(1, errno, "failed to receive");
+	if (n < len)
+		error(1, 0, "partial receive");
+
+	n = send(fd, buf, len, 0);
+	if (n < 0)
+		error(1, errno, "failed to send");
+	if (n < len)
+		error(1, 0, "partial send");
+
+	close(fd);
+}
+
+static void udp_recv_send(int server_fd)
+{
+	char cmsg_buf[CMSG_SPACE(sizeof(struct sockaddr_storage))];
+	struct sockaddr_storage _src_addr = { 0 };
+	struct sockaddr_storage _dst_addr = { 0 };
+	struct sockaddr_storage *src_addr = &_src_addr;
+	struct sockaddr_storage *dst_addr = NULL;
+	struct msghdr msg = { 0 };
+	struct iovec iov = { 0 };
+	struct cmsghdr *cm;
+	char buf[1];
+	ssize_t n;
+	int fd;
+
+	iov.iov_base = buf;
+	iov.iov_len = sizeof(buf);
+
+	msg.msg_name = src_addr;
+	msg.msg_namelen = sizeof(*src_addr);
+	msg.msg_iov = &iov;
+	msg.msg_iovlen = 1;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+
+	n = recvmsg(server_fd, &msg, 0);
+	if (n < 0)
+		error(1, errno, "failed to receive");
+	if (n < sizeof(buf))
+		error(1, 0, "partial receive");
+	if (msg.msg_flags & MSG_CTRUNC)
+		error(1, errno, "truncated cmsg");
+
+	for (cm = CMSG_FIRSTHDR(&msg); cm; cm = CMSG_NXTHDR(&msg, cm)) {
+		if ((cm->cmsg_level == SOL_IP &&
+		     cm->cmsg_type == IP_ORIGDSTADDR) ||
+		    (cm->cmsg_level == SOL_IPV6 &&
+		     cm->cmsg_type == IPV6_ORIGDSTADDR)) {
+			dst_addr = (struct sockaddr_storage *)CMSG_DATA(cm);
+			break;
+		}
+		error(0, 0, "ignored cmsg at level %d type %d",
+		      cm->cmsg_level, cm->cmsg_type);
+	}
+	if (!dst_addr)
+		error(1, 0, "failed to get destination address");
+
+	/* Server bound to IPv4-mapped IPv6 address */
+	if (src_addr->ss_family != dst_addr->ss_family) {
+		assert(dst_addr->ss_family == AF_INET);
+
+		struct sockaddr_in *dst4 = (void *)dst_addr;
+		struct sockaddr_in6 *dst6 = (void *)&_dst_addr;
+
+		dst6->sin6_family = AF_INET6;
+		dst6->sin6_port = dst4->sin_port;
+
+		dst6->sin6_addr.s6_addr[10] = 0xff;
+		dst6->sin6_addr.s6_addr[11] = 0xff;
+		memcpy(&dst6->sin6_addr.s6_addr[12],
+		       &dst4->sin_addr.s_addr, sizeof(dst4->sin_addr.s_addr));
+
+		dst_addr = (void *)dst6;
+	}
+
+	/* Reply from original destination address. */
+	fd = socket(dst_addr->ss_family, SOCK_DGRAM, 0);
+	if (fd < 0)
+		error(1, errno, "failed to create socket");
+
+	if (bind(fd, (struct sockaddr *)dst_addr, sizeof(*dst_addr)))
+		error(1, errno, "failed to bind socket");
+
+	msg.msg_control = NULL;
+	msg.msg_controllen = 0;
+	n = sendmsg(fd, &msg, 0);
+	if (n < 0)
+		error(1, errno, "failed to send");
+	if (n < sizeof(buf))
+		error(1, 0, "partial send");
+
+	close(fd);
+}
+
+static void tcp_echo(int client_fd, int server_fd)
+{
+	send_byte(client_fd);
+	tcp_recv_send(server_fd);
+	recv_byte(client_fd);
+}
+
+static void udp_echo(int client_fd, int server_fd)
+{
+	send_byte(client_fd);
+	udp_recv_send(server_fd);
+	recv_byte(client_fd);
+}
+
+static struct bpf_object *load_prog(void)
+{
+	char buf[MAX_ERROR_LEN];
+	struct bpf_object *obj;
+	int prog_fd;
+	int err;
+
+	err = bpf_prog_load(BPF_FILE, BPF_PROG_TYPE_UNSPEC, &obj, &prog_fd);
+	if (err) {
+		libbpf_strerror(err, buf, ARRAY_SIZE(buf));
+		error(1, 0, "failed to open bpf file '%s': %s", BPF_FILE, buf);
+	}
+
+	return obj;
+}
+
+static void attach_prog(struct bpf_object *obj, const char *sec)
+{
+	enum bpf_attach_type attach_type;
+	struct bpf_program *prog;
+	char buf[MAX_ERROR_LEN];
+	int target_fd = -1;
+	int prog_fd;
+	int err;
+
+	prog = bpf_object__find_program_by_title(obj, sec);
+	err = libbpf_get_error(prog);
+	if (err) {
+		libbpf_strerror(err, buf, ARRAY_SIZE(buf));
+		error(1, 0, "failed to find section \"%s\": %s", sec, buf);
+	}
+
+	err = libbpf_attach_type_by_name(sec, &attach_type);
+	if (err) {
+		libbpf_strerror(err, buf, ARRAY_SIZE(buf));
+		error(1, 0, "failed to identify attach type: %s", buf);
+	}
+
+	prog_fd = bpf_program__fd(prog);
+	if (prog_fd < 0)
+		error(1, errno, "failed to get prog fd");
+
+	err = bpf_prog_attach(prog_fd, target_fd, attach_type, 0);
+	if (err)
+		error(1, -err, "failed to attach prog");
+}
+
+static void detach_prog(const char *sec)
+{
+	enum bpf_attach_type attach_type;
+	char buf[MAX_ERROR_LEN];
+	int target_fd = -1;
+	int err;
+
+	err = libbpf_attach_type_by_name(sec, &attach_type);
+	if (err) {
+		libbpf_strerror(err, buf, ARRAY_SIZE(buf));
+		error(1, 0, "failed to identify attach type: %s", buf);
+	}
+
+	err = bpf_prog_detach(target_fd, attach_type);
+	if (err && err != -EPERM)
+		error(1, -err, "failed to detach prog");
+}
+
+static void update_redir_map(int map_fd, int index, int sock_fd)
+{
+	uint64_t value;
+	int err;
+
+	value = (uint64_t)sock_fd;
+	err = bpf_map_update_elem(map_fd, &index, &value, BPF_NOEXIST);
+	if (err)
+		error(1, errno, "failed to update redir_map @ %d", index);
+}
+
+static void test_prog_query(void)
+{
+	__u32 attach_flags = 0;
+	__u32 prog_ids[1] = { 0 };
+	__u32 prog_cnt = 1;
+	int fd, err;
+
+	fd = open("/proc/self/ns/net", O_RDONLY);
+	if (fd < 0)
+		error(1, errno, "failed to open /proc/self/ns/net");
+
+	err = bpf_prog_query(fd, BPF_INET_LOOKUP, 0,
+			     &attach_flags, prog_ids, &prog_cnt);
+	if (err)
+		error(1, errno, "failed to query BPF_INET_LOOKUP prog");
+
+	assert(attach_flags == 0);
+	assert(prog_cnt == 1);
+	assert(prog_ids[0] != 0);
+
+	close(fd);
+}
+
+static void run_test(const struct test *t, struct bpf_object *obj,
+		     int redir_map)
+{
+	int client_fd, server_fd;
+
+	fprintf(stderr, "test %s\n", t->desc);
+
+	/* Clean up after any previous failed test runs */
+	detach_prog(t->bpf_prog);
+
+	attach_prog(obj, t->bpf_prog);
+	test_prog_query();
+
+	server_fd = make_server(t->socket_type,
+				t->recv_at.ip, t->recv_at.port);
+	update_redir_map(redir_map, 0, server_fd);
+
+	client_fd = make_client(t->socket_type,
+				t->send_to.ip, t->send_to.port);
+
+	if (t->socket_type == SOCK_STREAM)
+		tcp_echo(client_fd, server_fd);
+	else
+		udp_echo(client_fd, server_fd);
+
+	close(client_fd);
+	close(server_fd);
+
+	detach_prog(t->bpf_prog);
+}
+
+static int find_redir_map(struct bpf_object *obj)
+{
+	struct bpf_map *map;
+	int fd;
+
+	map = bpf_object__find_map_by_name(obj, "redir_map");
+	if (!map)
+		error(1, 0, "failed to find 'redir_map'");
+	fd = bpf_map__fd(map);
+	if (fd < 0)
+		error(1, 0, "failed to get 'redir_map' fd");
+
+	return fd;
+}
+
+int main(void)
+{
+	struct bpf_object *obj;
+	const struct test *t;
+	int redir_map;
+
+	obj = load_prog();
+	redir_map = find_redir_map(obj);
+
+	for (t = tests; t < tests + ARRAY_SIZE(tests); t++)
+		run_test(t, obj, redir_map);
+
+	close(redir_map);
+	bpf_object__unload(obj);
+
+	fprintf(stderr, "PASS\n");
+	return 0;
+}
diff --git a/tools/testing/selftests/bpf/test_inet_lookup.sh b/tools/testing/selftests/bpf/test_inet_lookup.sh
new file mode 100755
index 000000000000..5efb42fbdf59
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_inet_lookup.sh
@@ -0,0 +1,35 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+if [[ $EUID -ne 0 ]]; then
+        echo "This script must be run as root"
+        echo "FAIL"
+        exit 1
+fi
+
+# Run the script in a dedicated network namespace.
+if [[ -z $(ip netns identify $$) ]]; then
+        ../net/in_netns.sh "$0" "$@"
+        exit $?
+fi
+
+readonly IP6_1="fd00::1"
+readonly IP6_2="fd00::2"
+
+setup()
+{
+        ip -6 addr add ${IP6_1}/128 dev lo
+        ip -6 addr add ${IP6_2}/128 dev lo
+}
+
+cleanup()
+{
+        ip -6 addr del ${IP6_1}/128 dev lo
+        ip -6 addr del ${IP6_2}/128 dev lo
+}
+
+trap cleanup EXIT
+setup
+
+./test_inet_lookup
+exit $?
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-08-28  7:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-28  7:22 [RFCv2 bpf-next 00/12] Programming socket lookup with BPF Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 01/12] flow_dissector: Extract attach/detach/query helpers Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 02/12] bpf: Introduce inet_lookup program type for redirecting socket lookup Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 03/12] bpf: Add verifier tests for inet_lookup context access Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 04/12] inet: Store layer 4 protocol in inet_hashinfo Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 05/12] udp: Store layer 4 protocol in udp_table Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 06/12] inet: Run inet_lookup bpf program on socket lookup Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 07/12] inet6: " Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 08/12] udp: " Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 09/12] udp6: " Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 10/12] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 11/12] libbpf: Add support for inet_lookup program type Jakub Sitnicki
2019-08-28  7:22 ` [RFCv2 bpf-next 12/12] bpf: Test redirecting listening/receiving socket lookup Jakub Sitnicki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).