All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joe Stringer <joe@wand.net.nz>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: Joe Stringer <joe@wand.net.nz>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>, Martin Lau <kafai@fb.com>
Subject: Re: [PATCHv2 bpf-next 4/5] bpf: Don't refcount LISTEN sockets in sk_assign()
Date: Thu, 26 Mar 2020 14:37:06 -0700	[thread overview]
Message-ID: <CAOftzPiEkMqz5Kkps4AEO-ZsdW-ogK2xxoj3iytKwjr25_hw1w@mail.gmail.com> (raw)
In-Reply-To: <CACAyw99SDN0U+VWi=WqS0V-M+riGehXfj3frTzSa6YcvOgWJtQ@mail.gmail.com>

On Thu, Mar 26, 2020 at 3:21 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Wed, 25 Mar 2020 at 20:47, Joe Stringer <joe@wand.net.nz> wrote:
> >
> > On Wed, Mar 25, 2020 at 3:29 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
> > >
> > > On Wed, 25 Mar 2020 at 05:58, Joe Stringer <joe@wand.net.nz> wrote:
> > > >
> > > > Avoid taking a reference on listen sockets by checking the socket type
> > > > in the sk_assign and in the corresponding skb_steal_sock() code in the
> > > > the transport layer, and by ensuring that the prefetch free (sock_pfree)
> > > > function uses the same logic to check whether the socket is refcounted.
> > > >
> > > > Suggested-by: Martin KaFai Lau <kafai@fb.com>
> > > > Signed-off-by: Joe Stringer <joe@wand.net.nz>
> > > > ---
> > > > v2: Initial version
> > > > ---
> > > >  include/net/sock.h | 25 +++++++++++++++++--------
> > > >  net/core/filter.c  |  6 +++---
> > > >  net/core/sock.c    |  3 ++-
> > > >  3 files changed, 22 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/include/net/sock.h b/include/net/sock.h
> > > > index 1ca2e808cb8e..3ec1865f173e 100644
> > > > --- a/include/net/sock.h
> > > > +++ b/include/net/sock.h
> > > > @@ -2533,6 +2533,21 @@ skb_sk_is_prefetched(struct sk_buff *skb)
> > > >         return skb->destructor == sock_pfree;
> > > >  }
> > > >
> > > > +/* This helper checks if a socket is a full socket,
> > > > + * ie _not_ a timewait or request socket.
> > > > + */
> > > > +static inline bool sk_fullsock(const struct sock *sk)
> > > > +{
> > > > +       return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> > > > +}
> > > > +
> > > > +static inline bool
> > > > +sk_is_refcounted(struct sock *sk)
> > > > +{
> > > > +       /* Only full sockets have sk->sk_flags. */
> > > > +       return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE);
> > > > +}
> > > > +
> > > >  /**
> > > >   * skb_steal_sock
> > > >   * @skb to steal the socket from
> > > > @@ -2545,6 +2560,8 @@ skb_steal_sock(struct sk_buff *skb, bool *refcounted)
> > > >                 struct sock *sk = skb->sk;
> > > >
> > > >                 *refcounted = true;
> > > > +               if (skb_sk_is_prefetched(skb))
> > > > +                       *refcounted = sk_is_refcounted(sk);
> > > >                 skb->destructor = NULL;
> > > >                 skb->sk = NULL;
> > > >                 return sk;
> > > > @@ -2553,14 +2570,6 @@ skb_steal_sock(struct sk_buff *skb, bool *refcounted)
> > > >         return NULL;
> > > >  }
> > > >
> > > > -/* This helper checks if a socket is a full socket,
> > > > - * ie _not_ a timewait or request socket.
> > > > - */
> > > > -static inline bool sk_fullsock(const struct sock *sk)
> > > > -{
> > > > -       return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> > > > -}
> > > > -
> > > >  /* Checks if this SKB belongs to an HW offloaded socket
> > > >   * and whether any SW fallbacks are required based on dev.
> > > >   * Check decrypted mark in case skb_orphan() cleared socket.
> > > > diff --git a/net/core/filter.c b/net/core/filter.c
> > > > index 0fada7fe9b75..997b8606167e 100644
> > > > --- a/net/core/filter.c
> > > > +++ b/net/core/filter.c
> > > > @@ -5343,8 +5343,7 @@ static const struct bpf_func_proto bpf_sk_lookup_udp_proto = {
> > > >
> > > >  BPF_CALL_1(bpf_sk_release, struct sock *, sk)
> > > >  {
> > > > -       /* Only full sockets have sk->sk_flags. */
> > > > -       if (!sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE))
> > > > +       if (sk_is_refcounted(sk))
> > > >                 sock_gen_put(sk);
> > > >         return 0;
> > > >  }
> > > > @@ -5870,7 +5869,8 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
> > > >                 return -ESOCKTNOSUPPORT;
> > > >         if (unlikely(dev_net(skb->dev) != sock_net(sk)))
> > > >                 return -ENETUNREACH;
> > > > -       if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
> > > > +       if (sk_is_refcounted(sk) &&
> > > > +           unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
> > > >                 return -ENOENT;
> > > >
> > > >         skb_orphan(skb);
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index cfaf60267360..a2ab79446f59 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -2076,7 +2076,8 @@ EXPORT_SYMBOL(sock_efree);
> > > >   */
> > > >  void sock_pfree(struct sk_buff *skb)
> > > >  {
> > > > -       sock_edemux(skb);
> > > > +       if (sk_is_refcounted(skb->sk))
> > > > +               sock_edemux(skb);
> > >
> > > sock_edemux calls sock_gen_put, which is also called by
> > > bpf_sk_release. Is it worth teaching sock_gen_put about
> > > sk_fullsock, and dropping the other helpers? I was considering this
> > > when fixing up sk_release, but then forgot
> > > about it.
> >
> > I like the idea, but I'm concerned about breaking things outside the
> > focus of this new helper if the skb_sk_is_prefetched() function from
> > patch 1 is allowed to return true for sockets other than the ones
> > assigned from the bpf_sk_assign() helper. At a glance there's users of
> > sock_efree (which sock_edemux can be defined to) like netem_enqueue()
> > which may inadvertently trigger unexpected paths here. I think it's
> > more explicit so more obviously correct if the destructor pointer used
> > in this series is unique compared to other paths, even if the
> > underlying code is the same.
>
> Sorry, I didn't mean to get rid of sock_pfree, I was referring to
> sk_fullsock and
> sk_is_refcounted. My point was that it's weird that sock_gen_put isn't
> actually generic because it doesn't properly handle SOCK_RCU_FREE.

I briefly looked into this, and I'm still not confident that all
existing usage of sock_gen_put() is handling SOCK_RCU_FREE in a way
that would be consistent with pushing this check into sock_gen_put().
I think it's something worth digging into, but I'd prefer to dig into
it separately from this series.

  reply	other threads:[~2020-03-26 21:37 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25  5:57 [PATCHv2 bpf-next 0/5] Add bpf_sk_assign eBPF helper Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 1/5] bpf: Add socket assign support Joe Stringer
2020-03-26  6:23   ` Martin KaFai Lau
2020-03-26  6:31     ` Joe Stringer
2020-03-26 10:24   ` Lorenz Bauer
2020-03-26 22:52     ` Joe Stringer
2020-03-27  2:40       ` Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 2/5] bpf: Prefetch established socket destinations Joe Stringer
2020-03-26 21:11   ` Alexei Starovoitov
2020-03-26 21:45     ` Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 3/5] net: Track socket refcounts in skb_steal_sock() Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 4/5] bpf: Don't refcount LISTEN sockets in sk_assign() Joe Stringer
2020-03-25 10:29   ` Lorenz Bauer
2020-03-25 20:46     ` Joe Stringer
2020-03-26 10:20       ` Lorenz Bauer
2020-03-26 21:37         ` Joe Stringer [this message]
2020-03-25  5:57 ` [PATCHv2 bpf-next 5/5] selftests: bpf: add test for sk_assign Joe Stringer
2020-03-25 10:35   ` Lorenz Bauer
2020-03-25 20:55     ` Joe Stringer
2020-03-26  6:25       ` Martin KaFai Lau
2020-03-26  6:38         ` Joe Stringer
2020-03-26 23:39           ` Joe Stringer
2020-03-25 18:17   ` Yonghong Song
2020-03-25 21:20     ` Joe Stringer
2020-03-25 22:00       ` Yonghong Song
2020-03-25 23:07         ` Joe Stringer
2020-03-26 10:13     ` Lorenz Bauer
2020-03-26 21:07       ` call for bpf progs. " Alexei Starovoitov
2020-03-26 23:14         ` Yonghong Song
2020-03-27 10:02         ` Lorenz Bauer
2020-03-27 16:08           ` Alexei Starovoitov
2020-03-27 19:06         ` Joe Stringer
2020-03-27 20:16           ` Daniel Borkmann
2020-03-27 22:24             ` Alexei Starovoitov
2020-03-28  0:17           ` Andrii Nakryiko
2020-03-26  2:04   ` Andrii Nakryiko
2020-03-26  2:16   ` Andrii Nakryiko
2020-03-26  5:28     ` Joe Stringer
2020-03-26  6:31       ` Martin KaFai Lau
2020-03-26 19:36       ` Andrii Nakryiko
2020-03-26 21:38         ` Joe Stringer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOftzPiEkMqz5Kkps4AEO-ZsdW-ogK2xxoj3iytKwjr25_hw1w@mail.gmail.com \
    --to=joe@wand.net.nz \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.