All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Marek Majkowski <marek@cloudflare.com>
Subject: Re: [PATCH bpf-next v3 04/16] inet: Run SK_LOOKUP BPF program on socket lookup
Date: Thu, 02 Jul 2020 14:46:41 +0200	[thread overview]
Message-ID: <87mu4inky6.fsf@cloudflare.com> (raw)
In-Reply-To: <CACAyw9-6OCPqg3eoPOPeKYy=kBNVJT8-qbLh6QOo=8aEV6pWjw@mail.gmail.com>

On Thu, Jul 02, 2020 at 12:27 PM CEST, Lorenz Bauer wrote:
> On Thu, 2 Jul 2020 at 10:24, Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Run a BPF program before looking up a listening socket on the receive path.
>> Program selects a listening socket to yield as result of socket lookup by
>> calling bpf_sk_assign() helper and returning BPF_REDIRECT (7) code.
>>
>> Alternatively, program can also fail the lookup by returning with
>> BPF_DROP (1), or let the lookup continue as usual with BPF_OK (0) on
>> return. Other return values are treated the same as BPF_OK.
>
> I'd prefer if other values were treated as BPF_DROP, with other semantics
> unchanged. Otherwise we won't be able to introduce new semantics
> without potentially breaking user code.

That might be surprising or even risky. If you attach a badly written
program that say returns a negative value, it will drop all TCP SYNs and
UDP traffic.

>
>>
>> This lets the user match packets with listening sockets freely at the last
>> possible point on the receive path, where we know that packets are destined
>> for local delivery after undergoing policing, filtering, and routing.
>>
>> With BPF code selecting the socket, directing packets destined to an IP
>> range or to a port range to a single socket becomes possible.
>>
>> In case multiple programs are attached, they are run in series in the order
>> in which they were attached. The end result gets determined from return
>> code from each program according to following rules.
>>
>>  1. If any program returned BPF_REDIRECT and selected a valid socket, this
>>     socket will be used as result of the lookup.
>>  2. If more than one program returned BPF_REDIRECT and selected a socket,
>>     last selection takes effect.
>>  3. If any program returned BPF_DROP and none returned BPF_REDIRECT, the
>>     socket lookup will fail with -ECONNREFUSED.
>>  4. If no program returned neither BPF_DROP nor BPF_REDIRECT, socket lookup
>>     continues to htable-based lookup.
>>
>> Suggested-by: Marek Majkowski <marek@cloudflare.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>
>> Notes:
>>     v3:
>>     - Use a static_key to minimize the hook overhead when not used. (Alexei)
>>     - Adapt for running an array of attached programs. (Alexei)
>>     - Adapt for optionally skipping reuseport selection. (Martin)
>>
>>  include/linux/bpf.h        | 29 ++++++++++++++++++++++++++++
>>  include/linux/filter.h     | 39 ++++++++++++++++++++++++++++++++++++++
>>  kernel/bpf/net_namespace.c | 32 ++++++++++++++++++++++++++++++-
>>  net/core/filter.c          |  2 ++
>>  net/ipv4/inet_hashtables.c | 31 ++++++++++++++++++++++++++++++
>>  5 files changed, 132 insertions(+), 1 deletion(-)
>>

[...]

>> diff --git a/kernel/bpf/net_namespace.c b/kernel/bpf/net_namespace.c
>> index 090166824ca4..a7768feb3ade 100644
>> --- a/kernel/bpf/net_namespace.c
>> +++ b/kernel/bpf/net_namespace.c
>> @@ -25,6 +25,28 @@ struct bpf_netns_link {
>>  /* Protects updates to netns_bpf */
>>  DEFINE_MUTEX(netns_bpf_mutex);
>>
>> +static void netns_bpf_attach_type_disable(enum netns_bpf_attach_type type)
>
> Nit: maybe netns_bpf_attach_type_dec()? Disable sounds like it happens
> unconditionally.

attach_type_dec()/_inc() seems a bit cryptic, since it's not the attach
type we are incrementing/decrementing.

But I was considering _need()/_unneed(), which would follow an existing
example, if you think that improves things.

>
>> +{
>> +       switch (type) {
>> +       case NETNS_BPF_SK_LOOKUP:
>> +               static_branch_dec(&bpf_sk_lookup_enabled);
>> +               break;
>> +       default:
>> +               break;
>> +       }
>> +}
>> +
>> +static void netns_bpf_attach_type_enable(enum netns_bpf_attach_type type)
>> +{
>> +       switch (type) {
>> +       case NETNS_BPF_SK_LOOKUP:
>> +               static_branch_inc(&bpf_sk_lookup_enabled);
>> +               break;
>> +       default:
>> +               break;
>> +       }
>> +}
>> +
>>  /* Must be called with netns_bpf_mutex held. */
>>  static void netns_bpf_run_array_detach(struct net *net,
>>                                        enum netns_bpf_attach_type type)
>> @@ -93,6 +115,9 @@ static void bpf_netns_link_release(struct bpf_link *link)
>>         if (!net)
>>                 goto out_unlock;
>>
>> +       /* Mark attach point as unused */
>> +       netns_bpf_attach_type_disable(type);
>> +
>>         /* Remember link position in case of safe delete */
>>         idx = link_index(net, type, net_link);
>>         list_del(&net_link->node);
>> @@ -416,6 +441,9 @@ static int netns_bpf_link_attach(struct net *net, struct bpf_link *link,
>>                                         lockdep_is_held(&netns_bpf_mutex));
>>         bpf_prog_array_free(run_array);
>>
>> +       /* Mark attach point as used */
>> +       netns_bpf_attach_type_enable(type);
>> +
>>  out_unlock:
>>         mutex_unlock(&netns_bpf_mutex);
>>         return err;
>> @@ -491,8 +519,10 @@ static void __net_exit netns_bpf_pernet_pre_exit(struct net *net)
>>         mutex_lock(&netns_bpf_mutex);
>>         for (type = 0; type < MAX_NETNS_BPF_ATTACH_TYPE; type++) {
>>                 netns_bpf_run_array_detach(net, type);
>> -               list_for_each_entry(net_link, &net->bpf.links[type], node)
>> +               list_for_each_entry(net_link, &net->bpf.links[type], node) {
>>                         net_link->net = NULL; /* auto-detach link */
>> +                       netns_bpf_attach_type_disable(type);
>> +               }
>>                 if (net->bpf.progs[type])
>>                         bpf_prog_put(net->bpf.progs[type]);
>>         }
>> diff --git a/net/core/filter.c b/net/core/filter.c

[...]

  reply	other threads:[~2020-07-02 12:46 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-02  9:24 [PATCH bpf-next v3 00/16] Run a BPF program on socket lookup Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 01/16] bpf, netns: Handle multiple link attachments Jakub Sitnicki
2020-07-09  3:44   ` Andrii Nakryiko
2020-07-09 12:49     ` Jakub Sitnicki
2020-07-09 22:02       ` Andrii Nakryiko
2020-07-10 19:23         ` Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-07-04 18:42   ` Yonghong Song
2020-07-06 11:44     ` Jakub Sitnicki
2020-07-05  9:20   ` kernel test robot
2020-07-05  9:20     ` kernel test robot
2020-07-05  9:20   ` [RFC PATCH] bpf: sk_lookup_prog_ops can be static kernel test robot
2020-07-05  9:20     ` kernel test robot
2020-07-07  9:21   ` [PATCH bpf-next v3 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-07-09  4:08   ` Andrii Nakryiko
2020-07-09 13:25     ` Jakub Sitnicki
2020-07-09 23:09       ` Andrii Nakryiko
2020-07-10  8:55         ` Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 03/16] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 04/16] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-02 10:27   ` Lorenz Bauer
2020-07-02 12:46     ` Jakub Sitnicki [this message]
2020-07-02 13:19       ` Lorenz Bauer
2020-07-06 11:24         ` Jakub Sitnicki
2020-07-06 12:06   ` Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 05/16] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 06/16] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 07/16] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 08/16] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 09/16] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 10/16] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-02 14:51   ` kernel test robot
2020-07-02 14:51     ` kernel test robot
2020-07-03 13:04     ` Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 11/16] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 12/16] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
2020-07-09  4:23   ` Andrii Nakryiko
2020-07-09 15:51     ` Jakub Sitnicki
2020-07-09 23:13       ` Andrii Nakryiko
2020-07-10  8:37         ` Jakub Sitnicki
2020-07-10 18:55           ` Andrii Nakryiko
2020-07-10 19:24             ` Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 15/16] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
2020-07-02  9:24 ` [PATCH bpf-next v3 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
2020-07-02 11:01   ` Lorenz Bauer
2020-07-02 12:59     ` Jakub Sitnicki
2020-07-09  4:28       ` Andrii Nakryiko
2020-07-09 15:54         ` Jakub Sitnicki
2020-07-02 11:05 ` [PATCH bpf-next v3 00/16] Run a BPF program on socket lookup Lorenz Bauer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87mu4inky6.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=lmb@cloudflare.com \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.