All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Sitnicki <jakub@cloudflare.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf <bpf@vger.kernel.org>, Networking <netdev@vger.kernel.org>,
	kernel-team <kernel-team@cloudflare.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Marek Majkowski <marek@cloudflare.com>
Subject: Re: [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup
Date: Thu, 16 Jul 2020 14:32:45 +0200	[thread overview]
Message-ID: <875zany70y.fsf@cloudflare.com> (raw)
In-Reply-To: <CAEf4BzY0Gc_FH=KUWY3xz6qG8yk+0U0mjXcAx7+39tWt_kQnGQ@mail.gmail.com>

On Thu, Jul 16, 2020 at 04:23 AM CEST, Andrii Nakryiko wrote:
> On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> Run a BPF program before looking up a listening socket on the receive path.
>> Program selects a listening socket to yield as result of socket lookup by
>> calling bpf_sk_assign() helper and returning SK_PASS code. Program can
>> revert its decision by assigning a NULL socket with bpf_sk_assign().
>>
>> Alternatively, BPF program can also fail the lookup by returning with
>> SK_DROP, or let the lookup continue as usual with SK_PASS on return, when
>> no socket has not been selected with bpf_sk_assign(). Other return values
>
> you probably meant "no socket has been selected"?

Yes, a typo. Will fix.

>
>> are treated the same as SK_DROP.
>
>
> Why not enforce it instead? Check check_return_code() in verifier.c,
> it's trivial to do it for SK_LOOKUP.

That's a game changer D-: Thank you. This will simplify the prog
runners.

>
>
>>
>> This lets the user match packets with listening sockets freely at the last
>> possible point on the receive path, where we know that packets are destined
>> for local delivery after undergoing policing, filtering, and routing.
>>
>> With BPF code selecting the socket, directing packets destined to an IP
>> range or to a port range to a single socket becomes possible.
>>
>> In case multiple programs are attached, they are run in series in the order
>> in which they were attached. The end result is determined from return codes
>> of all the programs according to following rules:
>>
>>  1. If any program returned SK_PASS and selected a valid socket, the socket
>>     is used as result of socket lookup.
>>  2. If more than one program returned SK_PASS and selected a socket,
>>     last selection takes effect.
>>  3. If any program returned SK_DROP or an invalid return code, and no
>>     program returned SK_PASS and selected a socket, socket lookup fails
>>     with -ECONNREFUSED.
>>  4. If all programs returned SK_PASS and none of them selected a socket,
>>     socket lookup continues to htable-based lookup.
>>
>> Suggested-by: Marek Majkowski <marek@cloudflare.com>
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>> ---
>>
>> Notes:
>>     v4:
>>     - Reduce BPF sk_lookup prog return codes to SK_PASS/SK_DROP. (Lorenz)
>
> your description above still assumes prog can return something besides
> SK_PASS and SK_DROP?

I should have written 'reduce allowed prog return codes'.

>
>>     - Default to drop & warn on illegal return value from BPF prog. (Lorenz)
>>     - Rename netns_bpf_attach_type_enable/disable to _need/unneed. (Lorenz)
>>     - Export bpf_sk_lookup_enabled symbol for CONFIG_IPV6=m (kernel test robot)
>>     - Invert return value from bpf_sk_lookup_run_v4 to true on skip reuseport.
>>     - Move dedicated prog_array runner close to its callers in filter.h.
>>
>>     v3:
>>     - Use a static_key to minimize the hook overhead when not used. (Alexei)
>>     - Adapt for running an array of attached programs. (Alexei)
>>     - Adapt for optionally skipping reuseport selection. (Martin)
>>
>>  include/linux/filter.h     | 102 +++++++++++++++++++++++++++++++++++++
>>  kernel/bpf/net_namespace.c |  32 +++++++++++-
>>  net/core/filter.c          |   3 ++
>>  net/ipv4/inet_hashtables.c |  31 +++++++++++
>>  4 files changed, 167 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/filter.h b/include/linux/filter.h
>> index 380746f47fa1..b9ad0fdabca5 100644
>> --- a/include/linux/filter.h
>> +++ b/include/linux/filter.h
>> @@ -1295,4 +1295,106 @@ struct bpf_sk_lookup_kern {
>>         bool            no_reuseport;
>>  };
>>
>> +extern struct static_key_false bpf_sk_lookup_enabled;
>> +
>> +/* Runners for BPF_SK_LOOKUP programs to invoke on socket lookup.
>> + *
>> + * Allowed return values for a BPF SK_LOOKUP program are SK_PASS and
>> + * SK_DROP. Any other return value is treated as SK_DROP. Their
>> + * meaning is as follows:
>> + *
>> + *  SK_PASS && ctx.selected_sk != NULL: use selected_sk as lookup result
>> + *  SK_PASS && ctx.selected_sk == NULL: continue to htable-based socket lookup
>> + *  SK_DROP                           : terminate lookup with -ECONNREFUSED
>> + *
>> + * This macro aggregates return values and selected sockets from
>> + * multiple BPF programs according to following rules:
>> + *
>> + *  1. If any program returned SK_PASS and a non-NULL ctx.selected_sk,
>> + *     macro result is SK_PASS and last ctx.selected_sk is used.
>> + *  2. If any program returned non-SK_PASS return value,
>> + *     macro result is the last non-SK_PASS return value.
>> + *  3. Otherwise result is SK_PASS and ctx.selected_sk is NULL.
>> + *
>> + * Caller must ensure that the prog array is non-NULL, and that the
>> + * array as well as the programs it contains remain valid.
>> + */
>> +#define BPF_PROG_SK_LOOKUP_RUN_ARRAY(array, ctx, func)                 \
>> +       ({                                                              \
>> +               struct bpf_sk_lookup_kern *_ctx = &(ctx);               \
>> +               struct bpf_prog_array_item *_item;                      \
>> +               struct sock *_selected_sk;                              \
>> +               struct bpf_prog *_prog;                                 \
>> +               u32 _ret, _last_ret;                                    \
>> +               bool _no_reuseport;                                     \
>> +                                                                       \
>> +               migrate_disable();                                      \
>> +               _last_ret = SK_PASS;                                    \
>> +               _selected_sk = NULL;                                    \
>> +               _no_reuseport = false;                                  \
>
> these three could be moved before migrate_disable(), or even better
> just initialize corresponding variables above?

I was torn between keeping all info needed to read through the loop
close to it and keeping the critical section tight. I can move it up.

>
>
>> +               _item = &(array)->items[0];                             \
>> +               while ((_prog = READ_ONCE(_item->prog))) {              \
>> +                       /* restore most recent selection */             \
>> +                       _ctx->selected_sk = _selected_sk;               \
>> +                       _ctx->no_reuseport = _no_reuseport;             \
>> +                                                                       \
>> +                       _ret = func(_prog, _ctx);                       \
>> +                       if (_ret == SK_PASS) {                          \
>> +                               /* remember last non-NULL socket */     \
>> +                               if (_ctx->selected_sk) {                \
>> +                                       _selected_sk = _ctx->selected_sk;       \
>> +                                       _no_reuseport = _ctx->no_reuseport;     \
>> +                               }                                       \
>> +                       } else {                                        \
>> +                               /* remember last non-PASS ret code */   \
>> +                               _last_ret = _ret;                       \
>> +                       }                                               \
>> +                       _item++;                                        \
>> +               }                                                       \
>> +               _ctx->selected_sk = _selected_sk;                       \
>> +               _ctx->no_reuseport = _no_reuseport;                     \
>> +               migrate_enable();                                       \
>> +               _ctx->selected_sk ? SK_PASS : _last_ret;                \
>> +        })
>> +
>
> [...]

  reply	other threads:[~2020-07-16 12:32 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-13 17:46 [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 01/16] bpf, netns: Handle multiple link attachments Jakub Sitnicki
2020-07-15 21:30   ` Andrii Nakryiko
2020-07-13 17:46 ` [PATCH bpf-next v4 02/16] bpf: Introduce SK_LOOKUP program type with a dedicated attach point Jakub Sitnicki
2020-07-16  1:41   ` Andrii Nakryiko
2020-07-16 12:17     ` Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 03/16] inet: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-16  1:44   ` Andrii Nakryiko
2020-07-13 17:46 ` [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-16  2:23   ` Andrii Nakryiko
2020-07-16 12:32     ` Jakub Sitnicki [this message]
2020-07-13 17:46 ` [PATCH bpf-next v4 05/16] inet6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 06/16] inet6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 07/16] udp: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 08/16] udp: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 09/16] udp6: Extract helper for selecting socket from reuseport group Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 10/16] udp6: Run SK_LOOKUP BPF program on socket lookup Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 11/16] bpf: Sync linux/bpf.h to tools/ Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 12/16] libbpf: Add support for SK_LOOKUP program type Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 13/16] tools/bpftool: Add name mappings for SK_LOOKUP prog and attach type Jakub Sitnicki
2020-07-16  2:10   ` Andrii Nakryiko
2020-07-13 17:46 ` [PATCH bpf-next v4 14/16] selftests/bpf: Add verifier tests for bpf_sk_lookup context access Jakub Sitnicki
2020-07-16  2:13   ` Andrii Nakryiko
2020-07-13 17:46 ` [PATCH bpf-next v4 15/16] selftests/bpf: Rename test_sk_lookup_kern.c to test_ref_track_kern.c Jakub Sitnicki
2020-07-13 17:46 ` [PATCH bpf-next v4 16/16] selftests/bpf: Tests for BPF_SK_LOOKUP attach point Jakub Sitnicki
2020-07-16  2:19   ` Andrii Nakryiko
2020-07-16  2:25 ` [PATCH bpf-next v4 00/16] Run a BPF program on socket lookup Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875zany70y.fsf@cloudflare.com \
    --to=jakub@cloudflare.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=kernel-team@cloudflare.com \
    --cc=kuba@kernel.org \
    --cc=marek@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.