bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lorenz Bauer <lmb@cloudflare.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team <kernel-team@cloudflare.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Jakub Sitnicki <jakub@cloudflare.com>
Subject: Re: [PATCH 0/5] Return fds from privileged sockhash/sockmap lookup
Date: Tue, 17 Mar 2020 09:55:56 +0000	[thread overview]
Message-ID: <CACAyw99HC70=wYBzZAiQVyUi56y_0x-6saGkp_KHBpjQuva1KA@mail.gmail.com> (raw)
In-Reply-To: <20200314025832.3ffdgkva65dseoec@ast-mbp.dhcp.thefacebook.com>

On Sat, 14 Mar 2020 at 02:58, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> I'm not following. There is skb->sk. Why do you need to lookup sk ? Because
> your hook is before demux and skb->sk is not set? Then move your hook to after?
> I think we're arguing in circles because in this thread I haven't seen the
> explanation of the problem you're trying to solve. We argued about your
> proposed solution and got stuck. Can we restart from the beginning with all
> details?

Yes, that's a good idea. I mentioned this in passing in my cover
letter, but should
have provided more context.

Jakub is working on a patch series to add a BPF hook to socket dispatch [1] aka
the inet_lookup function. The core idea is to control skb->sk via a BPF program.
Hence, we can't use skb->sk.

Introducing this hook poses another problem: we need to get the struct sk from
somewhere. The canonical way in BPF is to use the lookup_sk helpers. Of course
that doesn't work, since our hook would invoke itself. So we need a
data structure
that can hold sockets, to be used by programs attached on the new hook.

Jakub's RFC patch set used REUSEPORT_SOCKARRAY for this. During LPC '19
we got feedback that sockmap is probably the better choice. As a
result, Jakub started
working on extending sockmap TCP support and after a while I joined to add UDP.

Now, we are looking at what our control plane could look like. Based
on the inet-tool
work that Marek Majkowski has done [2], we currently have the following set up:

* An LPM map that goes from IP prefix and port to an index in a sockmap
* A sockmap that holds sockets
* A BPF program that performs the business logic

inet-tool is used to update the two maps to add and remove mappings on the fly.
Essentially, services donate their sockets either via fork+exec or SCM_RIGHTS on
a Unix socket.

Once we have inserted a socket in the sockmap, it's not possible to
retrieve it again.
This makes it impossible to change the position of a socket in the
map, to resize the
map, etc. with our current design.

One way to work around this is to add a persistent component to our
control plane:
a process can hold on to the sockets and re-build the map when necessary. The
downsides are that upgrading the service is non-trivial (since we need
to pass the
socket fds) and that a failure of this service is catastrophic. Once
it happens, we
probably have to reboot the machine to get it into a workable state again.

We'd like to avoid a persistent service if we can. By allowing to look
up fds from the
sockmap, we could make this part of our control plane more robust.

1: https://www.youtube.com/watch?v=qRDoUpqvYjY
2: https://github.com/majek/inet-tool

I hope this explanation helps, sorry for not being more thorough in the original
cover letter!


Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK


  reply	other threads:[~2020-03-17  9:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 17:47 [PATCH 0/5] Return fds from privileged sockhash/sockmap lookup Lorenz Bauer
2020-03-10 17:47 ` [PATCH 1/5] bpf: add map_copy_value hook Lorenz Bauer
2020-03-10 17:47 ` [PATCH 2/5] bpf: convert queue and stack map to map_copy_value Lorenz Bauer
2020-03-11 14:00   ` Jakub Sitnicki
2020-03-11 22:31     ` John Fastabend
2020-03-10 17:47 ` [PATCH 3/5] bpf: convert sock map and hash " Lorenz Bauer
2020-03-11 13:55   ` Jakub Sitnicki
2020-03-10 17:47 ` [PATCH 4/5] bpf: sockmap, sockhash: return file descriptors from privileged lookup Lorenz Bauer
2020-03-11 23:27   ` John Fastabend
2020-03-17 10:17     ` Lorenz Bauer
2020-03-17 15:18   ` Jakub Sitnicki
2020-03-17 18:16     ` John Fastabend
2020-03-10 17:47 ` [PATCH 5/5] bpf: sockmap, sockhash: test looking up fds Lorenz Bauer
2020-03-11 13:52   ` Jakub Sitnicki
2020-03-11 17:24     ` Lorenz Bauer
2020-03-11 13:44 ` [PATCH 0/5] Return fds from privileged sockhash/sockmap lookup Jakub Sitnicki
2020-03-11 22:40   ` John Fastabend
2020-03-12  1:58 ` Alexei Starovoitov
2020-03-12  9:16   ` Lorenz Bauer
2020-03-12 17:58     ` Alexei Starovoitov
2020-03-12 19:32       ` John Fastabend
2020-03-13 11:03         ` Lorenz Bauer
2020-03-13 10:48       ` Lorenz Bauer
2020-03-14  2:58         ` Alexei Starovoitov
2020-03-17  9:55           ` Lorenz Bauer [this message]
2020-03-17 19:05             ` John Fastabend
2020-03-20 15:12               ` Lorenz Bauer
2020-04-07  3:08                 ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACAyw99HC70=wYBzZAiQVyUi56y_0x-6saGkp_KHBpjQuva1KA@mail.gmail.com' \
    --to=lmb@cloudflare.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \


* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).