bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Martin Lau <kafai@fb.com>, Jakub Sitnicki <jakub@cloudflare.com>
Cc: "bpf@vger.kernel.org" <bpf@vger.kernel.org>,
	John Fastabend <john.fastabend@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"kernel-team@cloudflare.com" <kernel-team@cloudflare.com>
Subject: Re: [RFC bpf-next 0/5] Extend SOCKMAP to store listening sockets
Date: Mon, 28 Oct 2019 14:05:24 -0700	[thread overview]
Message-ID: <5db758142fac5_6642abc699aa5c4fd@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <20191028204255.jmkraj3xlp346xz4@kafai-mbp.dhcp.thefacebook.com>

Martin Lau wrote:
> On Mon, Oct 28, 2019 at 01:35:26PM +0100, Jakub Sitnicki wrote:
> > On Mon, Oct 28, 2019 at 06:52 AM CET, Martin Lau wrote:
> > > On Tue, Oct 22, 2019 at 01:37:25PM +0200, Jakub Sitnicki wrote:
> > >> This patch set is a follow up on a suggestion from LPC '19 discussions to
> > >> make SOCKMAP (or a new map type derived from it) a generic type for storing
> > >> established as well as listening sockets.
> > >>
> > >> We found ourselves in need of a map type that keeps references to listening
> > >> sockets when working on making the socket lookup programmable, aka BPF
> > >> inet_lookup [1].  Initially we repurposed REUSEPORT_SOCKARRAY but found it
> > >> problematic to extend due to being tightly coupled with reuseport
> > >> logic (see slides [2]).
> > >> So we've turned our attention to SOCKMAP instead.
> > >>
> > >> As it turns out the changes needed to make SOCKMAP suitable for storing
> > >> listening sockets are self-contained and have use outside of programming
> > >> the socket lookup. Hence this patch set.
> > >>
> > >> With these patches SOCKMAP can be used in SK_REUSEPORT BPF programs as a
> > >> drop-in replacement for REUSEPORT_SOCKARRAY for TCP. This can hopefully
> > >> lead to code consolidation between the two map types in the future.
> > > What is the plan for UDP support in sockmap?
> > 
> > It's on our road-map because without SOCKMAP support for UDP we won't be
> > able to move away from TPROXY [1] and custom SO_BINDTOPREFIX extension
> > [2] for steering new UDP flows to receiving sockets. Also we would like
> > to look into using SOCKMAP for connected UDP socket splicing in the
> > future [3].
> > 
> > I was planning to split work as follows:
> > 
> > 1. SOCKMAP support for listening sockets (this series)
> > 2. programmable socket lookup for TCP (cut-down version of [4])
> > 3. SOCKMAP support for UDP (work not started)
> hmm...It is hard to comment how the full UDP sockmap may
> work out without a code attempt because I am not fluent in
> sock_map ;)
> 
> From a quick look, it seems there are quite a few things to do.
> For example, the TCP_SKB_CB(skb) usage and how that may look
> like in UDP.  "struct udp_skb_cb" is 28 bytes while "struct napi_gro_cb"
> seems to be 48 bytes already which may need a closer look.

The extra bits sockmap needs are used for redirecting between
between sockets. These will fit in the udp cb area with some
extra room to spare. If that is paticularly challenging we can
also create a program attach type which would preclude using
those bits in the sk_reuseport bpf program types. We already
have types for rx, tx, nop progs, so one more should be fine.

So at least that paticular concern is not difficult to fix.

> 
> > 4. programmable socket lookup for UDP (rest of [4])
> > 
> > I'm open to suggestions on how to organize it.
> > 
> > >> Having said that, the main intention here is to lay groundwork for using
> > >> SOCKMAP in the next iteration of programmable socket lookup patches.
> > > What may be the minimal to get only lookup work for UDP sockmap?
> > > .close() and .unhash()?
> > 
> > John would know better. I haven't tried doing it yet.
> > 
> > From just reading the code - override the two proto ops you mentioned,
> > close and unhash, and adapt the socket checks in SOCKMAP.
> Do your use cases need bpf prog attached to sock_map?

Perhaps not specifically sock_map but they do need to be consolidated
into a map somewhere IMO this has proven to be the most versatile. We
can add sockets from the various BPF hooks or from user space and have
the ability to use the existing map tools, etc.

> 
> If not, would it be cleaner to delicate another map_type
> for lookup-only use case to have both TCP and UDP support.

But we (Cilium project and above splicing use case is also interested)
will need UDP support so it will be supported regardless of the
SK_REUSEPORT_BPF so I think it makes sense to consolidate all these
use cases on to the existing sockmap.

Also sockmap supports inserting sockets from BPF and from userspace
which actually requires a bit of logic to track state, etc. Its been
in use and been beat on by various automated test tools so I think
at minimum this needs to be reused. Re-implementing this logic seems
a waste of time and it wasn't exactly trivial and took some work.

Being able to insert the sockets from XDP (support coming soon) and
from sock_ops programs turns out to be fairly powerful.

So in short I think it makes most sense to consolidate on sock_map
because

  (a) we need and will add udp support regardless,
  (b) we already handle the tricky parts inerting/removing live sockets
  (c) from this series it looks like its fairly straight forward
  (d) we get lots of shared code

Thanks,
John


> 
> > 
> > -Jakub
> > 
> > [1] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_how-2Dwe-2Dbuilt-2Dspectrum_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=DZgW8-2Xl1P8NU59ji4ieQLzwWpx4t3gGq_tqB0l3Bo&e= 
> > [2] https://lore.kernel.org/netdev/1458699966-3752-1-git-send-email-gilberto.bertin@gmail.com/
> > [3] https://lore.kernel.org/bpf/20190828072250.29828-1-jakub@cloudflare.com/
> > [4] https://urldefense.proofpoint.com/v2/url?u=https-3A__blog.cloudflare.com_sockmap-2Dtcp-2Dsplicing-2Dof-2Dthe-2Dfuture_&d=DwIBAg&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=lSo-FsOeNl_8znZZ07H8I6ZYAinPKTR5C3Cn_Ol3QYQ&s=NerUqb4j7IsGBTcni6Yxk40wf6kTkckHXn3Nx5i4mCU&e= 



  reply	other threads:[~2019-10-28 21:05 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-22 11:37 Jakub Sitnicki
2019-10-22 11:37 ` [RFC bpf-next 1/5] bpf, sockmap: Let BPF helpers use lookup operation on SOCKMAP Jakub Sitnicki
2019-10-24 16:59   ` John Fastabend
2019-10-22 11:37 ` [RFC bpf-next 2/5] bpf, sockmap: Allow inserting listening TCP sockets into SOCKMAP Jakub Sitnicki
2019-10-24 17:06   ` John Fastabend
2019-10-25  9:41     ` Jakub Sitnicki
2019-10-22 11:37 ` [RFC bpf-next 3/5] bpf, sockmap: Don't let child socket inherit psock or its ops on copy Jakub Sitnicki
2019-10-22 11:37 ` [RFC bpf-next 4/5] bpf: Allow selecting reuseport socket from a SOCKMAP Jakub Sitnicki
2019-10-22 11:37 ` [RFC bpf-next 5/5] selftests/bpf: Extend SK_REUSEPORT tests to cover SOCKMAP Jakub Sitnicki
2019-10-24 16:12 ` [RFC bpf-next 0/5] Extend SOCKMAP to store listening sockets Alexei Starovoitov
2019-10-24 16:56 ` John Fastabend
2019-10-25  9:26   ` Jakub Sitnicki
2019-10-25 14:18     ` John Fastabend
2019-10-28  5:52 ` Martin Lau
2019-10-28 12:35   ` Jakub Sitnicki
2019-10-28 19:04     ` John Fastabend
2019-10-29  8:56       ` Jakub Sitnicki
2019-10-28 20:42     ` Martin Lau
2019-10-28 21:05       ` John Fastabend [this message]
2019-10-28 21:38         ` Martin Lau
2019-10-29  8:52           ` Jakub Sitnicki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5db758142fac5_6642abc699aa5c4fd@john-XPS-13-9370.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=jakub@cloudflare.com \
    --cc=kafai@fb.com \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    --subject='Re: [RFC bpf-next 0/5] Extend SOCKMAP to store listening sockets' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).