All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Jakub Sitnicki <jakub@cloudflare.com>
Cc: netdev@vger.kernel.org, bpf@vger.kernel.org,
	dccp@vger.kernel.org, kernel-team@cloudflare.com,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Gerrit Renker <gerrit@erg.abdn.ac.uk>,
	Jakub Kicinski <kuba@kernel.org>,
	Andrii Nakryiko <andrii.nakryiko@gmail.com>,
	Martin KaFai Lau <kafai@fb.com>,
	Marek Majkowski <marek@cloudflare.com>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [PATCH bpf-next v2 05/17] inet: Run SK_LOOKUP BPF program on socket lookup
Date: Mon, 11 May 2020 13:44:45 -0700	[thread overview]
Message-ID: <20200511204445.i7sessmtszox36xd@ast-mbp> (raw)
In-Reply-To: <20200511185218.1422406-6-jakub@cloudflare.com>

On Mon, May 11, 2020 at 08:52:06PM +0200, Jakub Sitnicki wrote:
> Run a BPF program before looking up a listening socket on the receive path.
> Program selects a listening socket to yield as result of socket lookup by
> calling bpf_sk_assign() helper and returning BPF_REDIRECT code.
> 
> Alternatively, program can also fail the lookup by returning with BPF_DROP,
> or let the lookup continue as usual with BPF_OK on return.
> 
> This lets the user match packets with listening sockets freely at the last
> possible point on the receive path, where we know that packets are destined
> for local delivery after undergoing policing, filtering, and routing.
> 
> With BPF code selecting the socket, directing packets destined to an IP
> range or to a port range to a single socket becomes possible.
> 
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/net/inet_hashtables.h | 36 +++++++++++++++++++++++++++++++++++
>  net/ipv4/inet_hashtables.c    | 15 ++++++++++++++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index 6072dfbd1078..3fcbc8f66f88 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -422,4 +422,40 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
>  
>  int inet_hash_connect(struct inet_timewait_death_row *death_row,
>  		      struct sock *sk);
> +
> +static inline struct sock *bpf_sk_lookup_run(struct net *net,
> +					     struct bpf_sk_lookup_kern *ctx)
> +{
> +	struct bpf_prog *prog;
> +	int ret = BPF_OK;
> +
> +	rcu_read_lock();
> +	prog = rcu_dereference(net->sk_lookup_prog);
> +	if (prog)
> +		ret = BPF_PROG_RUN(prog, ctx);
> +	rcu_read_unlock();
> +
> +	if (ret == BPF_DROP)
> +		return ERR_PTR(-ECONNREFUSED);
> +	if (ret == BPF_REDIRECT)
> +		return ctx->selected_sk;
> +	return NULL;
> +}
> +
> +static inline struct sock *inet_lookup_run_bpf(struct net *net, u8 protocol,
> +					       __be32 saddr, __be16 sport,
> +					       __be32 daddr, u16 dport)
> +{
> +	struct bpf_sk_lookup_kern ctx = {
> +		.family		= AF_INET,
> +		.protocol	= protocol,
> +		.v4.saddr	= saddr,
> +		.v4.daddr	= daddr,
> +		.sport		= sport,
> +		.dport		= dport,
> +	};
> +
> +	return bpf_sk_lookup_run(net, &ctx);
> +}
> +
>  #endif /* _INET_HASHTABLES_H */
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index ab64834837c8..f4d07285591a 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -307,9 +307,22 @@ struct sock *__inet_lookup_listener(struct net *net,
>  				    const int dif, const int sdif)
>  {
>  	struct inet_listen_hashbucket *ilb2;
> -	struct sock *result = NULL;
> +	struct sock *result, *reuse_sk;
>  	unsigned int hash2;
>  
> +	/* Lookup redirect from BPF */
> +	result = inet_lookup_run_bpf(net, hashinfo->protocol,
> +				     saddr, sport, daddr, hnum);
> +	if (IS_ERR(result))
> +		return NULL;
> +	if (result) {
> +		reuse_sk = lookup_reuseport(net, result, skb, doff,
> +					    saddr, sport, daddr, hnum);
> +		if (reuse_sk)
> +			result = reuse_sk;
> +		goto done;
> +	}
> +

The overhead is too high to do this all the time.
The feature has to be static_key-ed.

Also please add multi-prog support. Adding it later will cause
all sorts of compatibility issues. The semantics of multi-prog
needs to be thought through right now.
For example BPF_DROP or BPF_REDIRECT could terminate the prog_run_array
sequence of progs while BPF_OK could continue.
It's not ideal, but better than nothing.
Another option could be to execute all attached progs regardless
of return code, but don't let second prog override selected_sk blindly.
bpf_sk_assign() could get smarter.

Also please switch to bpf_link way of attaching. All system wide attachments
should be visible and easily debuggable via 'bpftool link show'.
Currently we're converting tc and xdp hooks to bpf_link. This new hook
should have it from the beginning.

WARNING: multiple messages have this Message-ID (diff)
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: dccp@vger.kernel.org
Subject: Re: [PATCH bpf-next v2 05/17] inet: Run SK_LOOKUP BPF program on socket lookup
Date: Mon, 11 May 2020 20:44:45 +0000	[thread overview]
Message-ID: <20200511204445.i7sessmtszox36xd@ast-mbp> (raw)
In-Reply-To: <20200511185218.1422406-6-jakub@cloudflare.com>

On Mon, May 11, 2020 at 08:52:06PM +0200, Jakub Sitnicki wrote:
> Run a BPF program before looking up a listening socket on the receive path.
> Program selects a listening socket to yield as result of socket lookup by
> calling bpf_sk_assign() helper and returning BPF_REDIRECT code.
> 
> Alternatively, program can also fail the lookup by returning with BPF_DROP,
> or let the lookup continue as usual with BPF_OK on return.
> 
> This lets the user match packets with listening sockets freely at the last
> possible point on the receive path, where we know that packets are destined
> for local delivery after undergoing policing, filtering, and routing.
> 
> With BPF code selecting the socket, directing packets destined to an IP
> range or to a port range to a single socket becomes possible.
> 
> Suggested-by: Marek Majkowski <marek@cloudflare.com>
> Reviewed-by: Lorenz Bauer <lmb@cloudflare.com>
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> ---
>  include/net/inet_hashtables.h | 36 +++++++++++++++++++++++++++++++++++
>  net/ipv4/inet_hashtables.c    | 15 ++++++++++++++-
>  2 files changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/inet_hashtables.h b/include/net/inet_hashtables.h
> index 6072dfbd1078..3fcbc8f66f88 100644
> --- a/include/net/inet_hashtables.h
> +++ b/include/net/inet_hashtables.h
> @@ -422,4 +422,40 @@ int __inet_hash_connect(struct inet_timewait_death_row *death_row,
>  
>  int inet_hash_connect(struct inet_timewait_death_row *death_row,
>  		      struct sock *sk);
> +
> +static inline struct sock *bpf_sk_lookup_run(struct net *net,
> +					     struct bpf_sk_lookup_kern *ctx)
> +{
> +	struct bpf_prog *prog;
> +	int ret = BPF_OK;
> +
> +	rcu_read_lock();
> +	prog = rcu_dereference(net->sk_lookup_prog);
> +	if (prog)
> +		ret = BPF_PROG_RUN(prog, ctx);
> +	rcu_read_unlock();
> +
> +	if (ret = BPF_DROP)
> +		return ERR_PTR(-ECONNREFUSED);
> +	if (ret = BPF_REDIRECT)
> +		return ctx->selected_sk;
> +	return NULL;
> +}
> +
> +static inline struct sock *inet_lookup_run_bpf(struct net *net, u8 protocol,
> +					       __be32 saddr, __be16 sport,
> +					       __be32 daddr, u16 dport)
> +{
> +	struct bpf_sk_lookup_kern ctx = {
> +		.family		= AF_INET,
> +		.protocol	= protocol,
> +		.v4.saddr	= saddr,
> +		.v4.daddr	= daddr,
> +		.sport		= sport,
> +		.dport		= dport,
> +	};
> +
> +	return bpf_sk_lookup_run(net, &ctx);
> +}
> +
>  #endif /* _INET_HASHTABLES_H */
> diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
> index ab64834837c8..f4d07285591a 100644
> --- a/net/ipv4/inet_hashtables.c
> +++ b/net/ipv4/inet_hashtables.c
> @@ -307,9 +307,22 @@ struct sock *__inet_lookup_listener(struct net *net,
>  				    const int dif, const int sdif)
>  {
>  	struct inet_listen_hashbucket *ilb2;
> -	struct sock *result = NULL;
> +	struct sock *result, *reuse_sk;
>  	unsigned int hash2;
>  
> +	/* Lookup redirect from BPF */
> +	result = inet_lookup_run_bpf(net, hashinfo->protocol,
> +				     saddr, sport, daddr, hnum);
> +	if (IS_ERR(result))
> +		return NULL;
> +	if (result) {
> +		reuse_sk = lookup_reuseport(net, result, skb, doff,
> +					    saddr, sport, daddr, hnum);
> +		if (reuse_sk)
> +			result = reuse_sk;
> +		goto done;
> +	}
> +

The overhead is too high to do this all the time.
The feature has to be static_key-ed.

Also please add multi-prog support. Adding it later will cause
all sorts of compatibility issues. The semantics of multi-prog
needs to be thought through right now.
For example BPF_DROP or BPF_REDIRECT could terminate the prog_run_array
sequence of progs while BPF_OK could continue.
It's not ideal, but better than nothing.
Another option could be to execute all attached progs regardless
of return code, but don't let second prog override selected_sk blindly.
bpf_sk_assign() could get smarter.

Also please switch to bpf_link way of attaching. All system wide attachments
should be visible and easily debuggable via 'bpftool link show'.
Currently we're converting tc and xdp hooks to bpf_link. This new hook
should have it from the beginning.

  reply	other threads:[~2020-05-11 20:44 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11 18:52 [PATCH bpf-next v2 00/17] Run a BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52 ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 01/17] flow_dissector: Extract attach/detach/query helpers Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 02/17] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 19:06   ` Jakub Sitnicki
2020-05-11 19:06     ` Jakub Sitnicki
2020-05-13  5:41   ` Martin KaFai Lau
2020-05-13  5:41     ` Martin KaFai Lau
2020-05-13 14:34     ` Jakub Sitnicki
2020-05-13 14:34       ` Jakub Sitnicki
2020-05-13 18:10       ` Martin KaFai Lau
2020-05-13 18:10         ` Martin KaFai Lau
2020-05-11 18:52 ` [PATCH bpf-next v2 03/17] inet: Store layer 4 protocol in inet_hashinfo Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 04/17] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 05/17] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 20:44   ` Alexei Starovoitov [this message]
2020-05-11 20:44     ` Alexei Starovoitov
2020-05-12 13:52     ` Jakub Sitnicki
2020-05-12 13:52       ` Jakub Sitnicki
2020-05-12 23:58       ` Alexei Starovoitov
2020-05-12 23:58         ` Alexei Starovoitov
2020-05-13 13:55         ` Jakub Sitnicki
2020-05-13 13:55           ` Jakub Sitnicki
2020-05-13 14:21       ` Lorenz Bauer
2020-05-13 14:21         ` Lorenz Bauer
2020-05-13 14:50         ` Jakub Sitnicki
2020-05-13 14:50           ` Jakub Sitnicki
2020-05-15 12:28     ` Jakub Sitnicki
2020-05-15 12:28       ` Jakub Sitnicki
2020-05-15 15:07       ` Alexei Starovoitov
2020-05-15 15:07         ` Alexei Starovoitov
2020-05-11 18:52 ` [PATCH bpf-next v2 06/17] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 07/17] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 08/17] udp: Store layer 4 protocol in udp_table Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 09/17] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 10/17] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 11/17] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 12/17] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 13/17] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 14/17] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 15/17] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 16/17] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 18:52 ` [PATCH bpf-next v2 17/17] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
2020-05-11 18:52   ` Jakub Sitnicki
2020-05-11 19:45 ` [PATCH bpf-next v2 00/17] Run a BPF program on socket lookup Martin KaFai Lau
2020-05-11 19:45   ` Martin KaFai Lau
2020-05-12 11:57   ` Jakub Sitnicki
2020-05-12 11:57     ` Jakub Sitnicki
2020-05-12 16:34     ` Martin KaFai Lau
2020-05-12 16:34       ` Martin KaFai Lau
2020-05-13 17:54       ` Jakub Sitnicki
2020-05-13 17:54         ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200511204445.i7sessmtszox36xd@ast-mbp \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=dccp@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=gerrit@erg.abdn.ac.uk \
    --cc=jakub@cloudflare.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.