netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joe Stringer <joe@wand.net.nz>
To: Lorenz Bauer <lmb@cloudflare.com>
Cc: Joe Stringer <joe@wand.net.nz>, bpf <bpf@vger.kernel.org>,
	Networking <netdev@vger.kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alexei Starovoitov <ast@kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>, Martin Lau <kafai@fb.com>
Subject: Re: [PATCHv2 bpf-next 4/5] bpf: Don't refcount LISTEN sockets in sk_assign()
Date: Wed, 25 Mar 2020 13:46:43 -0700	[thread overview]
Message-ID: <CAOftzPipEjfy1p_98V+JmV3p_WJPzhE-_KfqC3UE3d-TYYxyww@mail.gmail.com> (raw)
In-Reply-To: <CACAyw9_17E3TNCFsnXzQ4K2zSmwn8J+BcZqbjiK==WQH=zNzvg@mail.gmail.com>

On Wed, Mar 25, 2020 at 3:29 AM Lorenz Bauer <lmb@cloudflare.com> wrote:
>
> On Wed, 25 Mar 2020 at 05:58, Joe Stringer <joe@wand.net.nz> wrote:
> >
> > Avoid taking a reference on listen sockets by checking the socket type
> > in the sk_assign and in the corresponding skb_steal_sock() code in the
> > the transport layer, and by ensuring that the prefetch free (sock_pfree)
> > function uses the same logic to check whether the socket is refcounted.
> >
> > Suggested-by: Martin KaFai Lau <kafai@fb.com>
> > Signed-off-by: Joe Stringer <joe@wand.net.nz>
> > ---
> > v2: Initial version
> > ---
> >  include/net/sock.h | 25 +++++++++++++++++--------
> >  net/core/filter.c  |  6 +++---
> >  net/core/sock.c    |  3 ++-
> >  3 files changed, 22 insertions(+), 12 deletions(-)
> >
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 1ca2e808cb8e..3ec1865f173e 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2533,6 +2533,21 @@ skb_sk_is_prefetched(struct sk_buff *skb)
> >         return skb->destructor == sock_pfree;
> >  }
> >
> > +/* This helper checks if a socket is a full socket,
> > + * ie _not_ a timewait or request socket.
> > + */
> > +static inline bool sk_fullsock(const struct sock *sk)
> > +{
> > +       return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> > +}
> > +
> > +static inline bool
> > +sk_is_refcounted(struct sock *sk)
> > +{
> > +       /* Only full sockets have sk->sk_flags. */
> > +       return !sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE);
> > +}
> > +
> >  /**
> >   * skb_steal_sock
> >   * @skb to steal the socket from
> > @@ -2545,6 +2560,8 @@ skb_steal_sock(struct sk_buff *skb, bool *refcounted)
> >                 struct sock *sk = skb->sk;
> >
> >                 *refcounted = true;
> > +               if (skb_sk_is_prefetched(skb))
> > +                       *refcounted = sk_is_refcounted(sk);
> >                 skb->destructor = NULL;
> >                 skb->sk = NULL;
> >                 return sk;
> > @@ -2553,14 +2570,6 @@ skb_steal_sock(struct sk_buff *skb, bool *refcounted)
> >         return NULL;
> >  }
> >
> > -/* This helper checks if a socket is a full socket,
> > - * ie _not_ a timewait or request socket.
> > - */
> > -static inline bool sk_fullsock(const struct sock *sk)
> > -{
> > -       return (1 << sk->sk_state) & ~(TCPF_TIME_WAIT | TCPF_NEW_SYN_RECV);
> > -}
> > -
> >  /* Checks if this SKB belongs to an HW offloaded socket
> >   * and whether any SW fallbacks are required based on dev.
> >   * Check decrypted mark in case skb_orphan() cleared socket.
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 0fada7fe9b75..997b8606167e 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -5343,8 +5343,7 @@ static const struct bpf_func_proto bpf_sk_lookup_udp_proto = {
> >
> >  BPF_CALL_1(bpf_sk_release, struct sock *, sk)
> >  {
> > -       /* Only full sockets have sk->sk_flags. */
> > -       if (!sk_fullsock(sk) || !sock_flag(sk, SOCK_RCU_FREE))
> > +       if (sk_is_refcounted(sk))
> >                 sock_gen_put(sk);
> >         return 0;
> >  }
> > @@ -5870,7 +5869,8 @@ BPF_CALL_3(bpf_sk_assign, struct sk_buff *, skb, struct sock *, sk, u64, flags)
> >                 return -ESOCKTNOSUPPORT;
> >         if (unlikely(dev_net(skb->dev) != sock_net(sk)))
> >                 return -ENETUNREACH;
> > -       if (unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
> > +       if (sk_is_refcounted(sk) &&
> > +           unlikely(!refcount_inc_not_zero(&sk->sk_refcnt)))
> >                 return -ENOENT;
> >
> >         skb_orphan(skb);
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index cfaf60267360..a2ab79446f59 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2076,7 +2076,8 @@ EXPORT_SYMBOL(sock_efree);
> >   */
> >  void sock_pfree(struct sk_buff *skb)
> >  {
> > -       sock_edemux(skb);
> > +       if (sk_is_refcounted(skb->sk))
> > +               sock_edemux(skb);
>
> sock_edemux calls sock_gen_put, which is also called by
> bpf_sk_release. Is it worth teaching sock_gen_put about
> sk_fullsock, and dropping the other helpers? I was considering this
> when fixing up sk_release, but then forgot
> about it.

I like the idea, but I'm concerned about breaking things outside the
focus of this new helper if the skb_sk_is_prefetched() function from
patch 1 is allowed to return true for sockets other than the ones
assigned from the bpf_sk_assign() helper. At a glance there's users of
sock_efree (which sock_edemux can be defined to) like netem_enqueue()
which may inadvertently trigger unexpected paths here. I think it's
more explicit so more obviously correct if the destructor pointer used
in this series is unique compared to other paths, even if the
underlying code is the same.

  reply	other threads:[~2020-03-25 20:47 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-25  5:57 [PATCHv2 bpf-next 0/5] Add bpf_sk_assign eBPF helper Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 1/5] bpf: Add socket assign support Joe Stringer
2020-03-26  6:23   ` Martin KaFai Lau
2020-03-26  6:31     ` Joe Stringer
2020-03-26 10:24   ` Lorenz Bauer
2020-03-26 22:52     ` Joe Stringer
2020-03-27  2:40       ` Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 2/5] bpf: Prefetch established socket destinations Joe Stringer
2020-03-26 21:11   ` Alexei Starovoitov
2020-03-26 21:45     ` Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 3/5] net: Track socket refcounts in skb_steal_sock() Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 4/5] bpf: Don't refcount LISTEN sockets in sk_assign() Joe Stringer
2020-03-25 10:29   ` Lorenz Bauer
2020-03-25 20:46     ` Joe Stringer [this message]
2020-03-26 10:20       ` Lorenz Bauer
2020-03-26 21:37         ` Joe Stringer
2020-03-25  5:57 ` [PATCHv2 bpf-next 5/5] selftests: bpf: add test for sk_assign Joe Stringer
2020-03-25 10:35   ` Lorenz Bauer
2020-03-25 20:55     ` Joe Stringer
2020-03-26  6:25       ` Martin KaFai Lau
2020-03-26  6:38         ` Joe Stringer
2020-03-26 23:39           ` Joe Stringer
2020-03-25 18:17   ` Yonghong Song
2020-03-25 21:20     ` Joe Stringer
2020-03-25 22:00       ` Yonghong Song
2020-03-25 23:07         ` Joe Stringer
2020-03-26 10:13     ` Lorenz Bauer
2020-03-26 21:07       ` call for bpf progs. " Alexei Starovoitov
2020-03-26 23:14         ` Yonghong Song
2020-03-27 10:02         ` Lorenz Bauer
2020-03-27 16:08           ` Alexei Starovoitov
2020-03-27 19:06         ` Joe Stringer
2020-03-27 20:16           ` Daniel Borkmann
2020-03-27 22:24             ` Alexei Starovoitov
2020-03-28  0:17           ` Andrii Nakryiko
2020-03-26  2:04   ` Andrii Nakryiko
2020-03-26  2:16   ` Andrii Nakryiko
2020-03-26  5:28     ` Joe Stringer
2020-03-26  6:31       ` Martin KaFai Lau
2020-03-26 19:36       ` Andrii Nakryiko
2020-03-26 21:38         ` Joe Stringer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOftzPipEjfy1p_98V+JmV3p_WJPzhE-_KfqC3UE3d-TYYxyww@mail.gmail.com \
    --to=joe@wand.net.nz \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).