netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joe Stringer <joe@isovalent.com>
To: Martin Lau <kafai@fb.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	bpf@vger.kernel.org,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH bpf] bpf: Check sk_fullsock() before returning from bpf_sk_lookup()
Date: Sat, 18 May 2019 18:52:48 -0700	[thread overview]
Message-ID: <CADa=RyxfhK+XhAwf_C_an=+RnsQCPCXV23Qrwk-3OC1oLdHM=A@mail.gmail.com> (raw)
In-Reply-To: <20190518190520.53mrvat4c4y6cnbf@kafai-mbp>

On Sat, May 18, 2019, 09:05 Martin Lau <kafai@fb.com> wrote:
>
> On Sat, May 18, 2019 at 08:38:46AM -1000, Joe Stringer wrote:
> > On Fri, May 17, 2019, 12:02 Martin Lau <kafai@fb.com> wrote:
> >
> > > On Fri, May 17, 2019 at 02:51:48PM -0700, Eric Dumazet wrote:
> > > >
> > > >
> > > > On 5/17/19 2:21 PM, Martin KaFai Lau wrote:
> > > > > The BPF_FUNC_sk_lookup_xxx helpers return RET_PTR_TO_SOCKET_OR_NULL.
> > > > > Meaning a fullsock ptr and its fullsock's fields in bpf_sock can be
> > > > > accessed, e.g. type, protocol, mark and priority.
> > > > > Some new helper, like bpf_sk_storage_get(), also expects
> > > > > ARG_PTR_TO_SOCKET is a fullsock.
> > > > >
> > > > > bpf_sk_lookup() currently calls sk_to_full_sk() before returning.
> > > > > However, the ptr returned from sk_to_full_sk() is not guaranteed
> > > > > to be a fullsock.  For example, it cannot get a fullsock if sk
> > > > > is in TCP_TIME_WAIT.
> > > > >
> > > > > This patch checks for sk_fullsock() before returning. If it is not
> > > > > a fullsock, sock_gen_put() is called if needed and then returns NULL.
> > > > >
> > > > > Fixes: 6acc9b432e67 ("bpf: Add helper to retrieve socket in BPF")
> > > > > Cc: Joe Stringer <joe@isovalent.com>
> > > > > Signed-off-by: Martin KaFai Lau <kafai@fb.com>
> > > > > ---
> > > > >  net/core/filter.c | 16 ++++++++++++++--
> > > > >  1 file changed, 14 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > > index 55bfc941d17a..85def5a20aaf 100644
> > > > > --- a/net/core/filter.c
> > > > > +++ b/net/core/filter.c
> > > > > @@ -5337,8 +5337,14 @@ __bpf_sk_lookup(struct sk_buff *skb, struct
> > > bpf_sock_tuple *tuple, u32 len,
> > > > >     struct sock *sk = __bpf_skc_lookup(skb, tuple, len, caller_net,
> > > > >                                        ifindex, proto, netns_id,
> > > flags);
> > > > >
> > > > > -   if (sk)
> > > > > +   if (sk) {
> > > > >             sk = sk_to_full_sk(sk);
> > > > > +           if (!sk_fullsock(sk)) {
> > > > > +                   if (!sock_flag(sk, SOCK_RCU_FREE))
> > > > > +                           sock_gen_put(sk);
> > > >
> > > > This looks a bit convoluted/weird.
> > > >
> > > > What about telling/asking __bpf_skc_lookup() to not return a non
> > > fullsock instead ?
> > > It is becausee some other helpers, like BPF_FUNC_skc_lookup_tcp,
> > > can return non fullsock
> > >
> >
> > FYI this is necessary for finding a transparently proxied socket for a
> > non-local connection (tproxy use case).
> You meant it is necessary to return a non fullsock from the
> BPF_FUNC_sk_lookup_xxx helpers?

Yes, that's what I want to associate with the skb so that the delivery
to the SO_TRANSPARENT is received properly.

For the first packet of a connection, we look up the socket using the
tproxy socket port as the destination, and deliver the packet there.
The SO_TRANSPARENT logic then kicks in and sends back the ack and
creates the non-full sock for the connection tuple, which can be
entirely unrelated to local addresses or ports.

For the second forward-direction packet, (ie ACK in 3-way handshake)
then we must deliver the packet to this non-full sock as that's what
is negotiating the proxied connection. If you look up using the packet
tuple then get the full sock from it, it will go back to the
SO_TRANSPARENT parent socket. Delivering the ACK there will result in
a RST being sent back, because the SO_TRANSPARENT socket is just there
to accept new connections for connections to be proxied. So this is
the case where I need the non-full sock.

(In practice, the lookup logic attempts the packet tuple first then if
that fails, uses the tproxy port for lookup to achieve the above).

  reply	other threads:[~2019-05-19  1:56 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-17 21:21 [PATCH bpf] bpf: Check sk_fullsock() before returning from bpf_sk_lookup() Martin KaFai Lau
2019-05-17 21:51 ` Eric Dumazet
2019-05-17 22:01   ` Martin Lau
     [not found]     ` <CADa=RyxisbcVeXL7yq6o02XOgWd87QCzq-6zDXRnm9RoD2WM=A@mail.gmail.com>
2019-05-18 19:05       ` Martin Lau
2019-05-19  1:52         ` Joe Stringer [this message]
2019-05-19  2:07           ` Martin Lau
2019-05-20 18:38             ` Martin Lau
2019-05-20 18:56             ` Joe Stringer
2019-05-20 18:57 ` Joe Stringer
2019-05-21 14:48 ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CADa=RyxfhK+XhAwf_C_an=+RnsQCPCXV23Qrwk-3OC1oLdHM=A@mail.gmail.com' \
    --to=joe@isovalent.com \
    --cc=Kernel-team@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).