bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Andy Lutomirski" <luto@kernel.org>
To: "Andrii Nakryiko" <andrii.nakryiko@gmail.com>,
	"Maryam Tahhan" <mtahhan@redhat.com>
Cc: "Andrii Nakryiko" <andrii@kernel.org>,
	bpf@vger.kernel.org, linux-security-module@vger.kernel.org,
	"Kees Cook" <keescook@chromium.org>,
	"Christian Brauner" <brauner@kernel.org>,
	lennart@poettering.net, cyphar@cyphar.com, kernel-team@meta.com
Subject: Re: [PATCH v2 bpf-next 00/18] BPF token
Date: Fri, 23 Jun 2023 08:10:32 -0700	[thread overview]
Message-ID: <9b0e9227-4cf4-4acb-ba88-52f65b099709@app.fastmail.com> (raw)
In-Reply-To: <82b79e57-a0ad-4559-abc9-858e0f51fbba@app.fastmail.com>



On Thu, Jun 22, 2023, at 6:02 PM, Andy Lutomirski wrote:
> On Thu, Jun 22, 2023, at 11:40 AM, Andrii Nakryiko wrote:
>
>> Hopefully you can see where I'm going with this. And this is just one
>> random tiny example. We can think up tons of other cases to prove BPF
>> is not isolatable to any sort of "container".
>
> No.  You have not come up with an example of why BPF is not isolatable 
> to a container.  You have come up with an example of why binding to a 
> sched_switch raw tracepoint does not make sense in a container without 
> additional mechanisms to give it well defined functionality and 
> appropriate security.

Thinking about this some more:

Suppose the goal is to allow a workload in a container to monitor itself by attaching to a tracepoint (something in the scheduler, for example).  The workload is in the container.  The tracepoint is global.  Kernel memory is global unless something that is trusted and understands the containers is doing the reading.  And proxying BPF is a mess.

So here are a couple of possible solutions:

(a) Improve BPF maps a bit so that BPF maps work well in containers.  It should be possible to create a map and share it (the file descriptor!) between the outside and the container without running into various snags.  (IIRC my patch series was a decent step in this direction,)  Now load the BPF program and attach it to the tracepoint outside the container but have it write its gathered data to the map that's in the container.  So you end up with a daemon outside the container that gets a request like "help me monitor such-and-such by running BPF program such-and-such (where the BPF program code presumably comes from a library outside the container", and the daemon arranges for the requesting container to have access to the map it needs to get the data.

(b) Make a way to pass a pre-approved program into a container.  So a daemon outside loads the program and does some new magic to say "make an fd that can be used to attach this particular program to this particular tracepoint" and pass that into the container.

I think (a) is better.  In particular, if you have a workload with many containers, and they all want to monitor the same tracepoint as it relates to their container, you will get much better performance if a single BPF program does the monitoring and sends the data out to each container as needed instead of having one copy of the program per container.

For what it's worth, BPF tokens seem like they'll have the same performance problem -- without coordination, you can end up with N containers generating N hooks all targeting the same global resource, resulting in overhead that scales linearly with the number of containers.

And, again, I'm not an XDP expert, but if you have one NIC, and you attach N XDP programs to it, and each one is inspecting packets and sending some to one particular container's AF_XDP socket, you are not going to get good performance.  You want *one* XDP program fanning the packets out to the relevant containers.

If this is hard right now, perhaps you could add new kernel mechanisms as needed to improve the situation.

--Andy

  reply	other threads:[~2023-06-23 15:10 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-07 23:53 [PATCH v2 bpf-next 00/18] BPF token Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 01/18] bpf: introduce BPF token object Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 02/18] libbpf: add bpf_token_create() API Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 03/18] selftests/bpf: add BPF_TOKEN_CREATE test Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 04/18] bpf: move unprivileged checks into map_create() and bpf_prog_load() Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 05/18] bpf: inline map creation logic in map_create() function Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 06/18] bpf: centralize permissions checks for all BPF map types Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 07/18] bpf: add BPF token support to BPF_MAP_CREATE command Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 08/18] libbpf: add BPF token support to bpf_map_create() API Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 09/18] selftests/bpf: add BPF token-enabled test for BPF_MAP_CREATE command Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 10/18] bpf: add BPF token support to BPF_BTF_LOAD command Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 11/18] libbpf: add BPF token support to bpf_btf_load() API Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 12/18] selftests/bpf: add BPF token-enabled BPF_BTF_LOAD selftest Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 13/18] bpf: keep BPF_PROG_LOAD permission checks clear of validations Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 14/18] bpf: add BPF token support to BPF_PROG_LOAD command Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 15/18] bpf: take into account BPF token when fetching helper protos Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 16/18] bpf: consistenly use BPF token throughout BPF verifier logic Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 17/18] libbpf: add BPF token support to bpf_prog_load() API Andrii Nakryiko
2023-06-07 23:53 ` [PATCH v2 bpf-next 18/18] selftests/bpf: add BPF token-enabled BPF_PROG_LOAD tests Andrii Nakryiko
2023-06-08 18:49 ` [PATCH v2 bpf-next 00/18] BPF token Stanislav Fomichev
2023-06-08 22:17   ` Andrii Nakryiko
2023-06-09 11:17 ` Toke Høiland-Jørgensen
2023-06-09 18:21   ` Andrii Nakryiko
2023-06-09 21:21     ` Toke Høiland-Jørgensen
2023-06-09 22:03       ` Andrii Nakryiko
2023-06-12 10:49         ` Toke Høiland-Jørgensen
2023-06-12 22:08           ` Andrii Nakryiko
2023-06-13 21:48             ` Hao Luo
2023-06-14 12:06             ` Toke Høiland-Jørgensen
2023-06-15 22:55               ` Andrii Nakryiko
2023-06-09 18:32 ` Andy Lutomirski
2023-06-09 19:08   ` Andrii Nakryiko
2023-06-19 17:40     ` Andy Lutomirski
2023-06-21 23:48       ` Andrii Nakryiko
2023-06-22  8:22         ` Maryam Tahhan
2023-06-22 16:49           ` Andy Lutomirski
     [not found]             ` <5a75d1f0-4ed9-399c-4851-2df0755de9b5@redhat.com>
2023-06-22 18:40               ` Andrii Nakryiko
2023-06-22 21:04                 ` Maryam Tahhan
2023-06-22 23:35                   ` Andrii Nakryiko
2023-06-23  1:02                 ` Andy Lutomirski
2023-06-23 15:10                   ` Andy Lutomirski [this message]
2023-06-23 23:23                     ` Daniel Borkmann
2023-06-24 13:59                       ` Andy Lutomirski
2023-06-24 15:28                         ` Andy Lutomirski
2023-06-26 15:23                           ` Daniel Borkmann
2023-07-04 20:48                             ` Andy Lutomirski
2023-07-04 21:06                               ` Andy Lutomirski
2023-06-27 10:22                           ` Djalal Harouni
2023-06-26 22:31                         ` Andrii Nakryiko
2023-06-26 22:08                   ` Andrii Nakryiko
2023-06-22 19:05             ` Andrii Nakryiko
2023-06-23  3:28               ` Andy Lutomirski
2023-06-23 16:13                 ` Casey Schaufler
2023-06-26 22:08                 ` Andrii Nakryiko
2023-06-22 18:20           ` Andrii Nakryiko
2023-06-23 23:07             ` Toke Høiland-Jørgensen
2023-06-26 22:08               ` Andrii Nakryiko
2023-07-04 21:05                 ` Andy Lutomirski
2023-06-09 22:29 ` Djalal Harouni
2023-06-09 22:57   ` Andrii Nakryiko
2023-06-12 12:02     ` Djalal Harouni
2023-06-12 14:31       ` Djalal Harouni
2023-06-12 22:27       ` Andrii Nakryiko
2023-06-14  0:23         ` Djalal Harouni
2023-06-14  9:39           ` Christian Brauner
2023-06-15 22:48             ` Andrii Nakryiko
2023-06-23 22:18               ` Daniel Borkmann
2023-06-26 22:08                 ` Andrii Nakryiko
2023-06-15 22:47           ` Andrii Nakryiko
2023-06-12 12:44 ` Dave Tucker
2023-06-12 15:52   ` Djalal Harouni
2023-06-12 23:04   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9b0e9227-4cf4-4acb-ba88-52f65b099709@app.fastmail.com \
    --to=luto@kernel.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=cyphar@cyphar.com \
    --cc=keescook@chromium.org \
    --cc=kernel-team@meta.com \
    --cc=lennart@poettering.net \
    --cc=linux-security-module@vger.kernel.org \
    --cc=mtahhan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).