From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joe Stringer Subject: Re: [RFC bpf-next 07/11] bpf: Add helper to retrieve socket in BPF Date: Fri, 11 May 2018 17:54:33 -0700 Message-ID: References: <20180509210709.7201-1-joe@wand.net.nz> <20180509210709.7201-8-joe@wand.net.nz> <20180511045722.p7r4tbog66omohs6@kafai-mbp.dhcp.thefacebook.com> <20180511214111.hi6ax6qoe37al37q@kafai-mbp.dhcp.thefacebook.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Cc: Joe Stringer , daniel@iogearbox.net, netdev , ast@kernel.org, john fastabend To: Martin KaFai Lau Return-path: Received: from mail-qk0-f194.google.com ([209.85.220.194]:43845 "EHLO mail-qk0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750776AbeELAyy (ORCPT ); Fri, 11 May 2018 20:54:54 -0400 Received: by mail-qk0-f194.google.com with SMTP id h19-v6so5793548qkj.10 for ; Fri, 11 May 2018 17:54:54 -0700 (PDT) In-Reply-To: <20180511214111.hi6ax6qoe37al37q@kafai-mbp.dhcp.thefacebook.com> Sender: netdev-owner@vger.kernel.org List-ID: On 11 May 2018 at 14:41, Martin KaFai Lau wrote: > On Fri, May 11, 2018 at 02:08:01PM -0700, Joe Stringer wrote: >> On 10 May 2018 at 22:00, Martin KaFai Lau wrote: >> > On Wed, May 09, 2018 at 02:07:05PM -0700, Joe Stringer wrote: >> >> This patch adds a new BPF helper function, sk_lookup() which allows BPF >> >> programs to find out if there is a socket listening on this host, and >> >> returns a socket pointer which the BPF program can then access to >> >> determine, for instance, whether to forward or drop traffic. sk_lookup() >> >> takes a reference on the socket, so when a BPF program makes use of this >> >> function, it must subsequently pass the returned pointer into the newly >> >> added sk_release() to return the reference. >> >> >> >> By way of example, the following pseudocode would filter inbound >> >> connections at XDP if there is no corresponding service listening for >> >> the traffic: >> >> >> >> struct bpf_sock_tuple tuple; >> >> struct bpf_sock_ops *sk; >> >> >> >> populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet >> >> sk = bpf_sk_lookup(ctx, &tuple, sizeof tuple, netns, 0); >> >> if (!sk) { >> >> // Couldn't find a socket listening for this traffic. Drop. >> >> return TC_ACT_SHOT; >> >> } >> >> bpf_sk_release(sk, 0); >> >> return TC_ACT_OK; >> >> >> >> Signed-off-by: Joe Stringer >> >> --- >> >> ... >> >> >> @@ -4032,6 +4036,96 @@ static const struct bpf_func_proto bpf_skb_get_xfrm_state_proto = { >> >> }; >> >> #endif >> >> >> >> +struct sock * >> >> +sk_lookup(struct net *net, struct bpf_sock_tuple *tuple) { >> > Would it be possible to have another version that >> > returns a sk without taking its refcnt? >> > It may have performance benefit. >> >> Not really. The sockets are not RCU-protected, and established sockets >> may be torn down without notice. If we don't take a reference, there's >> no guarantee that the socket will continue to exist for the duration >> of running the BPF program. >> >> From what I follow, the comment below has a hidden implication which >> is that sockets without SOCK_RCU_FREE, eg established sockets, may be >> directly freed regardless of RCU. > Right, SOCK_RCU_FREE sk is the one I am concern about. > For example, TCP_LISTEN socket does not require taking a refcnt > now. Doing a bpf_sk_lookup() may have a rather big > impact on handling TCP syn flood. or the usual intention > is to redirect instead of passing it up to the stack? I see, if you're only interested in listen sockets then probably this series could be extended with a new flag, eg something like BPF_F_SK_FIND_LISTENERS which restricts the set of possible sockets found to only listen sockets, then the implementation would call into __inet_lookup_listener() instead of inet_lookup(). The presence of that flag in the relevant register during CALL instruction would show that the verifier should not reference-track the result, then there'd need to be a check on the release to ensure that this unreferenced socket is never released. Just a thought, completely untested and I could still be missing some detail.. That said, I don't completely follow how you would expect to handle the traffic for sockets that are already established - the helper would no longer find those sockets, so you wouldn't know whether to pass the traffic up the stack for established traffic or not.