Netdev Archive on lore.kernel.org
 help / Atom feed
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next RFC] ipv6: elide flowlabel check if no exclusive leases exist
Date: Fri, 17 May 2019 17:51:58 -0400
Message-ID: <CA+FuTSe62ZxhDAfiuPvF7k53WOj1Mzi-3iYUjQA_JFM_LNUvCQ@mail.gmail.com> (raw)
In-Reply-To: <d7502d42-207b-177e-8f2b-f6645feff051@gmail.com>

On Fri, May 17, 2019 at 4:32 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 5/17/19 8:56 AM, Willem de Bruijn wrote:
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
> > If not set, by default an autogenerated flowlabel is selected.
> >
> > Explicit flowlabels require a control operation per label plus a
> > datapath check on every connection (every datagram if unconnected).
> >
> > This is particularly expensive on unconnected sockets with many
> > connections, such as QUIC.
> >
> > In the common case, where no lease is exclusive, the check can be
> > safely elided, as both lease request and check trivially succeed.
> > Indeed, autoflowlabel does the same (even with exclusive leases).
> >
> > Elide the check if no process has requested an exclusive lease.
> >
> > This is an optimization. Robust applications still have to revert to
> > requesting leases if the fast path fails due to an exclusive lease.
> >
> > This is decidedly an RFC patch:
> > - need to update all fl6_sock_lookup callers, not just udp
> > - behavior should be per-netns isolated
> >
> > Other approaches considered:
> > - a single "get all flowlabels, non-exclusive" flowlabel get request
> >   if set, elide fl6_sock_lookup and fail exclusive lease requests
> >
> > - sysctls (only useful if on by default, with static_branch)
> >   A) "non-exclusive mode", failing all exclusive lease requests:
> >      processes already have to be robust against lease failure
> >   B) just bypass check in fl6_sock_lookup, like autoflowlabel
> >
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
> > ---
> >  include/net/ipv6.h       | 11 +++++++++++
> >  net/ipv6/ip6_flowlabel.c |  6 ++++++
> >  net/ipv6/udp.c           |  8 ++++----
> >  3 files changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/net/ipv6.h b/include/net/ipv6.h
> > index daf80863d3a50..8881cee572410 100644
> > --- a/include/net/ipv6.h
> > +++ b/include/net/ipv6.h
> > @@ -17,6 +17,7 @@
> >  #include <linux/hardirq.h>
> >  #include <linux/jhash.h>
> >  #include <linux/refcount.h>
> > +#include <linux/jump_label.h>
> >  #include <net/if_inet6.h>
> >  #include <net/ndisc.h>
> >  #include <net/flow.h>
> > @@ -343,7 +344,17 @@ static inline void txopt_put(struct ipv6_txoptions *opt)
> >               kfree_rcu(opt, rcu);
> >  }
> >
> > +extern struct static_key_false ipv6_flowlabel_exclusive;
> >  struct ip6_flowlabel *fl6_sock_lookup(struct sock *sk, __be32 label);
> > +static inline struct ip6_flowlabel *fl6_sock_verify(struct sock *sk,
> > +                                                 __be32 label)
> > +{
> > +     if (static_branch_unlikely(&ipv6_flowlabel_exclusive))
> > +             return fl6_sock_lookup(sk, label) ? : ERR_PTR(-ENOENT);
> > +
> > +     return NULL;
> > +}
> > +
> >  struct ipv6_txoptions *fl6_merge_options(struct ipv6_txoptions *opt_space,
> >                                        struct ip6_flowlabel *fl,
> >                                        struct ipv6_txoptions *fopt);
> > diff --git a/net/ipv6/ip6_flowlabel.c b/net/ipv6/ip6_flowlabel.c
> > index be5f3d7ceb966..d5f4233b04e0c 100644
> > --- a/net/ipv6/ip6_flowlabel.c
> > +++ b/net/ipv6/ip6_flowlabel.c
> > @@ -57,6 +57,8 @@ static DEFINE_SPINLOCK(ip6_fl_lock);
> >
> >  static DEFINE_SPINLOCK(ip6_sk_fl_lock);
> >
> > +DEFINE_STATIC_KEY_FALSE(ipv6_flowlabel_exclusive);
> > +
> >  #define for_each_fl_rcu(hash, fl)                            \
> >       for (fl = rcu_dereference_bh(fl_ht[(hash)]);            \
> >            fl != NULL;                                        \
> > @@ -98,6 +100,8 @@ static void fl_free_rcu(struct rcu_head *head)
> >  {
> >       struct ip6_flowlabel *fl = container_of(head, struct ip6_flowlabel, rcu);
> >
> > +     if (fl->share != IPV6_FL_S_NONE && fl->share != IPV6_FL_S_ANY)
> > +             static_branch_dec(&ipv6_flowlabel_exclusive);
>
> static_branch_dec() can not be invoked from a rcu call back.
>
> >       if (fl->share == IPV6_FL_S_PROCESS)
> >               put_pid(fl->owner.pid);
> >       kfree(fl->opt);
> > @@ -423,6 +427,8 @@ fl_create(struct net *net, struct sock *sk, struct in6_flowlabel_req *freq,
> >       }
> >       fl->dst = freq->flr_dst;
> >       atomic_set(&fl->users, 1);
> > +     if (fl->share != IPV6_FL_S_ANY)
> > +             static_branch_inc(&ipv6_flowlabel_exclusive);
>
>
> Can this be used by unpriv users ?
>
> If yes, then you want to use static_key_false_deferred instead

Ah of course. Yes, any user can exercise this API. Thanks, Eric. I'll
take a look at both points.

      reply index

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-17 15:56 Willem de Bruijn
2019-05-17 20:32 ` Eric Dumazet
2019-05-17 21:51   ` Willem de Bruijn [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+FuTSe62ZxhDAfiuPvF7k53WOj1Mzi-3iYUjQA_JFM_LNUvCQ@mail.gmail.com \
    --to=willemdebruijn.kernel@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox