BPF Archive on lore.kernel.org
 help / color / Atom feed
From: Joe Stringer <joe@wand.net.nz>
To: Martin KaFai Lau <kafai@fb.com>, Alexei Starovoitov <ast@kernel.org>
Cc: Joe Stringer <joe@wand.net.nz>, bpf <bpf@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Lorenz Bauer <lmb@cloudflare.com>
Subject: Re: [PATCHv3 bpf-next 0/5] Add bpf_sk_assign eBPF helper
Date: Fri, 27 Mar 2020 14:05:05 -0700
Message-ID: <CAOftzPjv8rcP7Ge59fc4rhy=BR2Ym1=G3n3fvi402nx61zLf-Q@mail.gmail.com> (raw)
In-Reply-To: <20200327184621.67324727o5rtu42p@kafai-mbp>

On Fri, Mar 27, 2020 at 11:46 AM Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Thu, Mar 26, 2020 at 09:25:51PM -0700, Joe Stringer wrote:
> > Introduce a new helper that allows assigning a previously-found socket
> > to the skb as the packet is received towards the stack, to cause the
> > stack to guide the packet towards that socket subject to local routing
> > configuration. The intention is to support TProxy use cases more
> > directly from eBPF programs attached at TC ingress, to simplify and
> > streamline Linux stack configuration in scale environments with Cilium.
> >
> > Normally in ip{,6}_rcv_core(), the skb will be orphaned, dropping any
> > existing socket reference associated with the skb. Existing tproxy
> > implementations in netfilter get around this restriction by running the
> > tproxy logic after ip_rcv_core() in the PREROUTING table. However, this
> > is not an option for TC-based logic (including eBPF programs attached at
> > TC ingress).
> >
> > This series introduces the BPF helper bpf_sk_assign() to associate the
> > socket with the skb on the ingress path as the packet is passed up the
> > stack. The initial patch in the series simply takes a reference on the
> > socket to ensure safety, but later patches relax this for listen
> > sockets.
> >
> > To ensure delivery to the relevant socket, we still consult the routing
> > table, for full examples of how to configure see the tests in patch #5;
> > the simplest form of the route would look like this:
> >
> >   $ ip route add local default dev lo
> >
> > This series is laid out as follows:
> > * Patch 1 extends the eBPF API to add sk_assign() and defines a new
> >   socket free function to allow the later paths to understand when the
> >   socket associated with the skb should be kept through receive.
> > * Patches 2-3 optimize the receive path to avoid taking a reference on
> >   listener sockets during receive.
> > * Patches 4-5 extends the selftests with examples of the new
> >   functionality and validation of correct behaviour.
> >
> > Changes since v2:
> > * Add selftests for UDP socket redirection
> > * Drop the early demux optimization patch (defer for more testing)
> > * Fix check for orphaning after TC act return
> > * Tidy up the tests to clean up properly and be less noisy.
> >
> > Changes since v1:
> > * Replace the metadata_dst approach with using the skb->destructor to
> >   determine whether the socket has been prefetched. This is much
> >   simpler.
> > * Avoid taking a reference on listener sockets during receive
> > * Restrict assigning sockets across namespaces
> > * Restrict assigning SO_REUSEPORT sockets
> > * Fix cookie usage for socket dst check
> > * Rebase the tests against test_progs infrastructure
> > * Tidy up commit messages
> lgtm.
>
> Acked-by: Martin KaFai Lau <kafai@fb.com>

Thanks for the reviews!

I've rolled in the current nits + acks into the branch below, pending
any further feedback. Alexei, happy to respin this on the mailinglist
at some point if that's easier for you.

https://github.com/joestringer/linux/tree/submit/bpf-sk-assign-v3+

  reply index

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-27  4:25 Joe Stringer
2020-03-27  4:25 ` [PATCHv3 bpf-next 1/5] bpf: Add socket assign support Joe Stringer
2020-03-27 18:44   ` Martin KaFai Lau
2020-03-27  4:25 ` [PATCHv3 bpf-next 2/5] net: Track socket refcounts in skb_steal_sock() Joe Stringer
2020-03-27 18:45   ` Martin KaFai Lau
2020-03-27  4:25 ` [PATCHv3 bpf-next 3/5] bpf: Don't refcount LISTEN sockets in sk_assign() Joe Stringer
2020-03-27 14:26   ` Jamal Hadi Salim
2020-03-27 17:38     ` Joe Stringer
2020-03-27 18:29       ` Jamal Hadi Salim
2020-03-27  4:25 ` [PATCHv3 bpf-next 4/5] selftests: bpf: add test for sk_assign Joe Stringer
2020-03-27  4:25 ` [PATCHv3 bpf-next 5/5] selftests: bpf: Extend sk_assign tests for UDP Joe Stringer
     [not found]   ` <CACAyw9-GOw5tkR8n6p7Kct9-wq4B-9ka-X8R2V8uZv8VWUY5UQ@mail.gmail.com>
2020-03-27 19:37     ` Joe Stringer
2020-03-27  5:02 ` [PATCHv3 bpf-next 0/5] Add bpf_sk_assign eBPF helper Alexei Starovoitov
2020-03-27  5:42   ` Eric Dumazet
2020-03-27 14:13 ` Jamal Hadi Salim
2020-03-27 17:43   ` Joe Stringer
2020-03-27 18:34     ` Jamal Hadi Salim
2020-03-27 18:46 ` Martin KaFai Lau
2020-03-27 21:05   ` Joe Stringer [this message]
2020-03-28 17:25     ` Daniel Borkmann
2020-03-28 17:42       ` Joe Stringer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOftzPjv8rcP7Ge59fc4rhy=BR2Ym1=G3n3fvi402nx61zLf-Q@mail.gmail.com' \
    --to=joe@wand.net.nz \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=eric.dumazet@gmail.com \
    --cc=kafai@fb.com \
    --cc=lmb@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

BPF Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/bpf/0 bpf/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 bpf bpf/ https://lore.kernel.org/bpf \
		bpf@vger.kernel.org
	public-inbox-index bpf

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.bpf


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git