From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.1 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 181E7C433E2 for ; Thu, 16 Jul 2020 12:32:51 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D6CA82065F for ; Thu, 16 Jul 2020 12:32:50 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=cloudflare.com header.i=@cloudflare.com header.b="q5JqW/m2" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728188AbgGPMcu (ORCPT ); Thu, 16 Jul 2020 08:32:50 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:57744 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727990AbgGPMct (ORCPT ); Thu, 16 Jul 2020 08:32:49 -0400 Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 53BBDC08C5C0 for ; Thu, 16 Jul 2020 05:32:49 -0700 (PDT) Received: by mail-lj1-x242.google.com with SMTP id z24so6911090ljn.8 for ; Thu, 16 Jul 2020 05:32:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=references:user-agent:from:to:cc:subject:in-reply-to:date :message-id:mime-version; bh=21RA+e4Ong/tEdD67nzo1deHFqPTX09b755P7erQrVw=; b=q5JqW/m2Mm0DQZPnCTMK4hceDumfvX36K/sUPOlIJ3Ztu0krN7gT98EVc9kWyZFlgi WcRA30uFbFfvJ4keeMcS54Qf7G6rHBEO000FyaVf+9QGPcu+Q6GmsLDHTY3rRVQtqpem 6NcCwGBjVNgV1I48rk4K+7z5oQRxWXTZU/Swo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:references:user-agent:from:to:cc:subject :in-reply-to:date:message-id:mime-version; bh=21RA+e4Ong/tEdD67nzo1deHFqPTX09b755P7erQrVw=; b=jlpKIzQRIyGPUHnbACtlOpAlas84r9h9weX0OatGH9b6Xmt+n7M9o0xngQeKiXi1MV MUOEJasY/KemoCmzMgBFMCwjNZQ2/fa2NfwPMEPD3dApTZeDcw8/JLZLqrr+KDkUFed/ CTjgR9sCPrwmmr/YtOXqFZ9keNYD+zJMI79wcaxhzHd9N7C9kykzTeuZNzgPxG6SJw2M stc3fEZFloVODXmRZECufk83ATMc+6KaTpZlain8gcI1HZTGqFi6o50JxGiarA3iO+SE KSDyy1DZ4TXh+TAt/Bt2sW9JWahGnby4HsS5UvoRwGMigrSw3H8Iv4qA0KBwr/WWoniz Wq6g== X-Gm-Message-State: AOAM530lgAbbF7Otw27gnnLyfindI1rizk5IUNnLisiSy/jDLsdwI5v/ 1Wc0GOdo3TFs7zVcoVFlnciT/g== X-Google-Smtp-Source: ABdhPJwQVq+YukWG7I51hGixTG2HbtBNWskpJWU/vmOkgH8wJskFdoy3gEPPbm8BLGClETOJ/bWeFQ== X-Received: by 2002:a2e:9f4d:: with SMTP id v13mr1872162ljk.122.1594902767506; Thu, 16 Jul 2020 05:32:47 -0700 (PDT) Received: from cloudflare.com ([2a02:a310:c262:aa00:b35e:8938:2c2a:ba8b]) by smtp.gmail.com with ESMTPSA id i10sm1017855ljg.80.2020.07.16.05.32.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jul 2020 05:32:46 -0700 (PDT) References: <20200713174654.642628-1-jakub@cloudflare.com> <20200713174654.642628-5-jakub@cloudflare.com> User-agent: mu4e 1.1.0; emacs 26.3 From: Jakub Sitnicki To: Andrii Nakryiko Cc: bpf , Networking , kernel-team , Alexei Starovoitov , Daniel Borkmann , "David S. Miller" , Jakub Kicinski , Marek Majkowski Subject: Re: [PATCH bpf-next v4 04/16] inet: Run SK_LOOKUP BPF program on socket lookup In-reply-to: Date: Thu, 16 Jul 2020 14:32:45 +0200 Message-ID: <875zany70y.fsf@cloudflare.com> MIME-Version: 1.0 Content-Type: text/plain Sender: bpf-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org On Thu, Jul 16, 2020 at 04:23 AM CEST, Andrii Nakryiko wrote: > On Mon, Jul 13, 2020 at 10:47 AM Jakub Sitnicki wrote: >> >> Run a BPF program before looking up a listening socket on the receive path. >> Program selects a listening socket to yield as result of socket lookup by >> calling bpf_sk_assign() helper and returning SK_PASS code. Program can >> revert its decision by assigning a NULL socket with bpf_sk_assign(). >> >> Alternatively, BPF program can also fail the lookup by returning with >> SK_DROP, or let the lookup continue as usual with SK_PASS on return, when >> no socket has not been selected with bpf_sk_assign(). Other return values > > you probably meant "no socket has been selected"? Yes, a typo. Will fix. > >> are treated the same as SK_DROP. > > > Why not enforce it instead? Check check_return_code() in verifier.c, > it's trivial to do it for SK_LOOKUP. That's a game changer D-: Thank you. This will simplify the prog runners. > > >> >> This lets the user match packets with listening sockets freely at the last >> possible point on the receive path, where we know that packets are destined >> for local delivery after undergoing policing, filtering, and routing. >> >> With BPF code selecting the socket, directing packets destined to an IP >> range or to a port range to a single socket becomes possible. >> >> In case multiple programs are attached, they are run in series in the order >> in which they were attached. The end result is determined from return codes >> of all the programs according to following rules: >> >> 1. If any program returned SK_PASS and selected a valid socket, the socket >> is used as result of socket lookup. >> 2. If more than one program returned SK_PASS and selected a socket, >> last selection takes effect. >> 3. If any program returned SK_DROP or an invalid return code, and no >> program returned SK_PASS and selected a socket, socket lookup fails >> with -ECONNREFUSED. >> 4. If all programs returned SK_PASS and none of them selected a socket, >> socket lookup continues to htable-based lookup. >> >> Suggested-by: Marek Majkowski >> Signed-off-by: Jakub Sitnicki >> --- >> >> Notes: >> v4: >> - Reduce BPF sk_lookup prog return codes to SK_PASS/SK_DROP. (Lorenz) > > your description above still assumes prog can return something besides > SK_PASS and SK_DROP? I should have written 'reduce allowed prog return codes'. > >> - Default to drop & warn on illegal return value from BPF prog. (Lorenz) >> - Rename netns_bpf_attach_type_enable/disable to _need/unneed. (Lorenz) >> - Export bpf_sk_lookup_enabled symbol for CONFIG_IPV6=m (kernel test robot) >> - Invert return value from bpf_sk_lookup_run_v4 to true on skip reuseport. >> - Move dedicated prog_array runner close to its callers in filter.h. >> >> v3: >> - Use a static_key to minimize the hook overhead when not used. (Alexei) >> - Adapt for running an array of attached programs. (Alexei) >> - Adapt for optionally skipping reuseport selection. (Martin) >> >> include/linux/filter.h | 102 +++++++++++++++++++++++++++++++++++++ >> kernel/bpf/net_namespace.c | 32 +++++++++++- >> net/core/filter.c | 3 ++ >> net/ipv4/inet_hashtables.c | 31 +++++++++++ >> 4 files changed, 167 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/filter.h b/include/linux/filter.h >> index 380746f47fa1..b9ad0fdabca5 100644 >> --- a/include/linux/filter.h >> +++ b/include/linux/filter.h >> @@ -1295,4 +1295,106 @@ struct bpf_sk_lookup_kern { >> bool no_reuseport; >> }; >> >> +extern struct static_key_false bpf_sk_lookup_enabled; >> + >> +/* Runners for BPF_SK_LOOKUP programs to invoke on socket lookup. >> + * >> + * Allowed return values for a BPF SK_LOOKUP program are SK_PASS and >> + * SK_DROP. Any other return value is treated as SK_DROP. Their >> + * meaning is as follows: >> + * >> + * SK_PASS && ctx.selected_sk != NULL: use selected_sk as lookup result >> + * SK_PASS && ctx.selected_sk == NULL: continue to htable-based socket lookup >> + * SK_DROP : terminate lookup with -ECONNREFUSED >> + * >> + * This macro aggregates return values and selected sockets from >> + * multiple BPF programs according to following rules: >> + * >> + * 1. If any program returned SK_PASS and a non-NULL ctx.selected_sk, >> + * macro result is SK_PASS and last ctx.selected_sk is used. >> + * 2. If any program returned non-SK_PASS return value, >> + * macro result is the last non-SK_PASS return value. >> + * 3. Otherwise result is SK_PASS and ctx.selected_sk is NULL. >> + * >> + * Caller must ensure that the prog array is non-NULL, and that the >> + * array as well as the programs it contains remain valid. >> + */ >> +#define BPF_PROG_SK_LOOKUP_RUN_ARRAY(array, ctx, func) \ >> + ({ \ >> + struct bpf_sk_lookup_kern *_ctx = &(ctx); \ >> + struct bpf_prog_array_item *_item; \ >> + struct sock *_selected_sk; \ >> + struct bpf_prog *_prog; \ >> + u32 _ret, _last_ret; \ >> + bool _no_reuseport; \ >> + \ >> + migrate_disable(); \ >> + _last_ret = SK_PASS; \ >> + _selected_sk = NULL; \ >> + _no_reuseport = false; \ > > these three could be moved before migrate_disable(), or even better > just initialize corresponding variables above? I was torn between keeping all info needed to read through the loop close to it and keeping the critical section tight. I can move it up. > > >> + _item = &(array)->items[0]; \ >> + while ((_prog = READ_ONCE(_item->prog))) { \ >> + /* restore most recent selection */ \ >> + _ctx->selected_sk = _selected_sk; \ >> + _ctx->no_reuseport = _no_reuseport; \ >> + \ >> + _ret = func(_prog, _ctx); \ >> + if (_ret == SK_PASS) { \ >> + /* remember last non-NULL socket */ \ >> + if (_ctx->selected_sk) { \ >> + _selected_sk = _ctx->selected_sk; \ >> + _no_reuseport = _ctx->no_reuseport; \ >> + } \ >> + } else { \ >> + /* remember last non-PASS ret code */ \ >> + _last_ret = _ret; \ >> + } \ >> + _item++; \ >> + } \ >> + _ctx->selected_sk = _selected_sk; \ >> + _ctx->no_reuseport = _no_reuseport; \ >> + migrate_enable(); \ >> + _ctx->selected_sk ? SK_PASS : _last_ret; \ >> + }) >> + > > [...]