All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lorenz Bauer <lmb@cloudflare.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team <kernel-team@cloudflare.com>,
	Networking <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Jakub Sitnicki <jakub@cloudflare.com>
Subject: Re: [PATCH 0/5] Return fds from privileged sockhash/sockmap lookup
Date: Tue, 17 Mar 2020 09:55:56 +0000	[thread overview]
Message-ID: <CACAyw99HC70=wYBzZAiQVyUi56y_0x-6saGkp_KHBpjQuva1KA@mail.gmail.com> (raw)
In-Reply-To: <20200314025832.3ffdgkva65dseoec@ast-mbp.dhcp.thefacebook.com>

On Sat, 14 Mar 2020 at 02:58, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> I'm not following. There is skb->sk. Why do you need to lookup sk ? Because
> your hook is before demux and skb->sk is not set? Then move your hook to after?
>
> I think we're arguing in circles because in this thread I haven't seen the
> explanation of the problem you're trying to solve. We argued about your
> proposed solution and got stuck. Can we restart from the beginning with all
> details?

Yes, that's a good idea. I mentioned this in passing in my cover
letter, but should
have provided more context.

Jakub is working on a patch series to add a BPF hook to socket dispatch [1] aka
the inet_lookup function. The core idea is to control skb->sk via a BPF program.
Hence, we can't use skb->sk.

Introducing this hook poses another problem: we need to get the struct sk from
somewhere. The canonical way in BPF is to use the lookup_sk helpers. Of course
that doesn't work, since our hook would invoke itself. So we need a
data structure
that can hold sockets, to be used by programs attached on the new hook.

Jakub's RFC patch set used REUSEPORT_SOCKARRAY for this. During LPC '19
we got feedback that sockmap is probably the better choice. As a
result, Jakub started
working on extending sockmap TCP support and after a while I joined to add UDP.

Now, we are looking at what our control plane could look like. Based
on the inet-tool
work that Marek Majkowski has done [2], we currently have the following set up:

* An LPM map that goes from IP prefix and port to an index in a sockmap
* A sockmap that holds sockets
* A BPF program that performs the business logic

inet-tool is used to update the two maps to add and remove mappings on the fly.
Essentially, services donate their sockets either via fork+exec or SCM_RIGHTS on
a Unix socket.

Once we have inserted a socket in the sockmap, it's not possible to
retrieve it again.
This makes it impossible to change the position of a socket in the
map, to resize the
map, etc. with our current design.

One way to work around this is to add a persistent component to our
control plane:
a process can hold on to the sockets and re-build the map when necessary. The
downsides are that upgrading the service is non-trivial (since we need
to pass the
socket fds) and that a failure of this service is catastrophic. Once
it happens, we
probably have to reboot the machine to get it into a workable state again.

We'd like to avoid a persistent service if we can. By allowing to look
up fds from the
sockmap, we could make this part of our control plane more robust.

1: https://www.youtube.com/watch?v=qRDoUpqvYjY
2: https://github.com/majek/inet-tool

I hope this explanation helps, sorry for not being more thorough in the original
cover letter!

Lorenz

-- 
Lorenz Bauer  |  Systems Engineer
6th Floor, County Hall/The Riverside Building, SE1 7PB, UK

www.cloudflare.com

  reply	other threads:[~2020-03-17  9:56 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-10 17:47 [PATCH 0/5] Return fds from privileged sockhash/sockmap lookup Lorenz Bauer
2020-03-10 17:47 ` [PATCH 1/5] bpf: add map_copy_value hook Lorenz Bauer
2020-03-10 17:47 ` [PATCH 2/5] bpf: convert queue and stack map to map_copy_value Lorenz Bauer
2020-03-11 14:00   ` Jakub Sitnicki
2020-03-11 22:31     ` John Fastabend
2020-03-10 17:47 ` [PATCH 3/5] bpf: convert sock map and hash " Lorenz Bauer
2020-03-11 13:55   ` Jakub Sitnicki
2020-03-10 17:47 ` [PATCH 4/5] bpf: sockmap, sockhash: return file descriptors from privileged lookup Lorenz Bauer
2020-03-11 23:27   ` John Fastabend
2020-03-17 10:17     ` Lorenz Bauer
2020-03-17 15:18   ` Jakub Sitnicki
2020-03-17 18:16     ` John Fastabend
2020-03-10 17:47 ` [PATCH 5/5] bpf: sockmap, sockhash: test looking up fds Lorenz Bauer
2020-03-11 13:52   ` Jakub Sitnicki
2020-03-11 17:24     ` Lorenz Bauer
2020-03-11 13:44 ` [PATCH 0/5] Return fds from privileged sockhash/sockmap lookup Jakub Sitnicki
2020-03-11 22:40   ` John Fastabend
2020-03-12  1:58 ` Alexei Starovoitov
2020-03-12  9:16   ` Lorenz Bauer
2020-03-12 17:58     ` Alexei Starovoitov
2020-03-12 19:32       ` John Fastabend
2020-03-13 11:03         ` Lorenz Bauer
2020-03-13 10:48       ` Lorenz Bauer
2020-03-14  2:58         ` Alexei Starovoitov
2020-03-17  9:55           ` Lorenz Bauer [this message]
2020-03-17 19:05             ` John Fastabend
2020-03-20 15:12               ` Lorenz Bauer
2020-04-07  3:08                 ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CACAyw99HC70=wYBzZAiQVyUi56y_0x-6saGkp_KHBpjQuva1KA@mail.gmail.com' \
    --to=lmb@cloudflare.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=jakub@cloudflare.com \
    --cc=kernel-team@cloudflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.