From: Yafang Shao <laoar.shao@gmail.com>
To: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
john.fastabend@gmail.com, kpsingh@kernel.org, sdf@google.com,
haoluo@google.com, jolsa@kernel.org
Cc: bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
Yafang Shao <laoar.shao@gmail.com>
Subject: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace
Date: Sun, 26 Mar 2023 09:21:55 +0000 [thread overview]
Message-ID: <20230326092208.13613-1-laoar.shao@gmail.com> (raw)
Currently only CAP_SYS_ADMIN can iterate BPF object IDs and convert IDs
to FDs, that's intended for BPF's security model[1]. Not only does it
prevent non-privilidged users from getting other users' bpf program, but
also it prevents the user from iterating his own bpf objects.
In container environment, some users want to run bpf programs in their
containers. These users can run their bpf programs under CAP_BPF and
some other specific CAPs, but they can't inspect their bpf programs in a
generic way. For example, the bpftool can't be used as it requires
CAP_SYS_ADMIN. That is very inconvenient.
Without CAP_SYS_ADMIN, the only way to get the information of a bpf object
which is not created by the process itself is with SCM_RIGHTS, that
requires each processes which created bpf object has to implement a unix
domain socket to share the fd of a bpf object between different
processes, that is really trivial and troublesome.
Hence we need a better mechanism to get bpf object info without
CAP_SYS_ADMIN.
BPF namespace is introduced in this patchset with an attempt to remove
the CAP_SYS_ADMIN requirement. The user can create bpf map, prog and
link in a specific bpf namespace, then these bpf objects will not be
visible to the users in a different bpf namespace. But these bpf
objects are visible to its parent bpf namespace, so the sys admin can
still iterate and inspect them.
BPF namespace is similar to PID namespace, and the bpf objects are
similar to tasks, so BPF namespace is very easy to understand. These
patchset only implements BPF namespace for bpf map, prog and link. In the
future we may extend it to other bpf objects like btf, bpffs and etc.
For example, we can allow some of the BTF objects to be used in
non-init bpf namespace, then the container user can only trace the
processes running in his container, but can't get the information of
tasks running in other containers.
A simple example is introduced into selftests/bpf on how to use the bpf
namespace.
Putting bpf map, prog and link into bpf namespace is the first step.
Let's start with it.
[1]. https://lore.kernel.org/bpf/20200513230355.7858-1-alexei.starovoitov@gmail.com/
Yafang Shao (13):
fork: New clone3 flag for BPF namespace
proc_ns: Extend the field type in struct proc_ns_operations to long
bpf: Implement bpf namespace
bpf: No need to check if id is 0
bpf: Make bpf objects id have the same alloc and free pattern
bpf: Helpers to alloc and free object id in bpf namespace
bpf: Add bpf helper to get bpf object id
bpf: Alloc and free bpf_map id in bpf namespace
bpf: Alloc and free bpf_prog id in bpf namespace
bpf: Alloc and free bpf_link id in bpf namespace
bpf: Allow iterating bpf objects with CAP_BPF in bpf namespace
bpf: Use bpf_idr_lock array instead
selftests/bpf: Add selftest for bpf namespace
fs/proc/namespaces.c | 4 +
include/linux/bpf.h | 9 +-
include/linux/bpf_namespace.h | 88 ++++++++++
include/linux/nsproxy.h | 4 +
include/linux/proc_ns.h | 3 +-
include/linux/user_namespace.h | 1 +
include/uapi/linux/bpf.h | 7 +
include/uapi/linux/sched.h | 1 +
kernel/bpf/Makefile | 1 +
kernel/bpf/bpf_namespace.c | 283 ++++++++++++++++++++++++++++++
kernel/bpf/offload.c | 16 +-
kernel/bpf/syscall.c | 262 ++++++++++-----------------
kernel/bpf/task_iter.c | 12 ++
kernel/fork.c | 5 +-
kernel/nsproxy.c | 19 +-
kernel/trace/bpf_trace.c | 2 +
kernel/ucount.c | 1 +
tools/bpf/bpftool/skeleton/pid_iter.bpf.c | 13 +-
tools/include/uapi/linux/bpf.h | 7 +
tools/testing/selftests/bpf/Makefile | 3 +-
tools/testing/selftests/bpf/test_bpfns.c | 76 ++++++++
21 files changed, 637 insertions(+), 180 deletions(-)
create mode 100644 include/linux/bpf_namespace.h
create mode 100644 kernel/bpf/bpf_namespace.c
create mode 100644 tools/testing/selftests/bpf/test_bpfns.c
--
1.8.3.1
next reply other threads:[~2023-03-26 9:22 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-26 9:21 Yafang Shao [this message]
2023-03-26 9:21 ` [RFC PATCH bpf-next 01/13] fork: New clone3 flag for BPF namespace Yafang Shao
2023-03-26 9:21 ` [RFC PATCH bpf-next 02/13] proc_ns: Extend the field type in struct proc_ns_operations to long Yafang Shao
2023-03-26 9:21 ` [RFC PATCH bpf-next 03/13] bpf: Implement bpf namespace Yafang Shao
2023-03-26 9:21 ` [RFC PATCH bpf-next 04/13] bpf: No need to check if id is 0 Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 05/13] bpf: Make bpf objects id have the same alloc and free pattern Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 06/13] bpf: Helpers to alloc and free object id in bpf namespace Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 07/13] bpf: Add bpf helper to get bpf object id Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 08/13] bpf: Alloc and free bpf_map id in bpf namespace Yafang Shao
2023-03-26 10:50 ` Toke Høiland-Jørgensen
2023-03-27 2:44 ` Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 09/13] bpf: Alloc and free bpf_prog " Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 10/13] bpf: Alloc and free bpf_link " Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 11/13] bpf: Allow iterating bpf objects with CAP_BPF " Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 12/13] bpf: Use bpf_idr_lock array instead Yafang Shao
2023-03-26 9:22 ` [RFC PATCH bpf-next 13/13] selftests/bpf: Add selftest for bpf namespace Yafang Shao
2023-03-26 10:49 ` [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace Toke Høiland-Jørgensen
2023-03-27 3:07 ` Yafang Shao
2023-03-27 20:51 ` Toke Høiland-Jørgensen
2023-03-28 3:48 ` Yafang Shao
2023-03-27 17:28 ` Stanislav Fomichev
2023-03-28 3:42 ` Yafang Shao
2023-03-28 17:15 ` Stanislav Fomichev
2023-03-29 3:02 ` Yafang Shao
2023-03-29 20:50 ` Stanislav Fomichev
2023-03-30 2:40 ` Yafang Shao
2023-03-27 19:03 ` Song Liu
2023-03-28 3:47 ` Yafang Shao
2023-04-02 23:37 ` Alexei Starovoitov
2023-04-03 3:05 ` Yafang Shao
2023-04-03 22:50 ` Alexei Starovoitov
2023-04-04 2:59 ` Yafang Shao
2023-04-06 2:06 ` Alexei Starovoitov
2023-04-06 2:54 ` Yafang Shao
2023-04-06 3:05 ` Alexei Starovoitov
2023-04-06 3:22 ` Yafang Shao
2023-04-06 4:24 ` Alexei Starovoitov
2023-04-06 5:43 ` Yafang Shao
2023-04-06 20:22 ` Andrii Nakryiko
2023-04-07 1:43 ` Alexei Starovoitov
2023-04-07 4:33 ` Yafang Shao
2023-04-07 15:32 ` Alexei Starovoitov
2023-04-07 15:59 ` Andrii Nakryiko
2023-04-07 16:05 ` Alexei Starovoitov
2023-04-07 16:21 ` Yafang Shao
2023-04-07 16:31 ` Alexei Starovoitov
2023-04-07 16:35 ` Yafang Shao
2023-03-31 5:52 ` Hao Luo
2023-04-01 16:32 ` Yafang Shao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230326092208.13613-1-laoar.shao@gmail.com \
--to=laoar.shao@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=sdf@google.com \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).