bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Stanislav Fomichev <sdf@google.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: ast@kernel.org, daniel@iogearbox.net, andrii@kernel.org,
	kafai@fb.com, songliubraving@fb.com, yhs@fb.com,
	john.fastabend@gmail.com, kpsingh@kernel.org, haoluo@google.com,
	jolsa@kernel.org, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace
Date: Wed, 29 Mar 2023 13:50:45 -0700	[thread overview]
Message-ID: <CAKH8qBuRS5mp5akkT-sQHHdnnjWZYKehppgO5D1vE5rhYPuo8Q@mail.gmail.com> (raw)
In-Reply-To: <CALOAHbCc843AGvVftCumnNgM87NzqAYpPcPEj0i+aQ5QEOUW7Q@mail.gmail.com>

On Tue, Mar 28, 2023 at 8:03 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Wed, Mar 29, 2023 at 1:15 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On 03/28, Yafang Shao wrote:
> > > On Tue, Mar 28, 2023 at 1:28 AM Stanislav Fomichev <sdf@google.com> wrote:
> > > >
> > > > On 03/26, Yafang Shao wrote:
> > > > > Currently only CAP_SYS_ADMIN can iterate BPF object IDs and convert
> > > IDs
> > > > > to FDs, that's intended for BPF's security model[1]. Not only does it
> > > > > prevent non-privilidged users from getting other users' bpf program,
> > > but
> > > > > also it prevents the user from iterating his own bpf objects.
> > > >
> > > > > In container environment, some users want to run bpf programs in their
> > > > > containers. These users can run their bpf programs under CAP_BPF and
> > > > > some other specific CAPs, but they can't inspect their bpf programs
> > > in a
> > > > > generic way. For example, the bpftool can't be used as it requires
> > > > > CAP_SYS_ADMIN. That is very inconvenient.
> > > >
> > > > > Without CAP_SYS_ADMIN, the only way to get the information of a bpf
> > > object
> > > > > which is not created by the process itself is with SCM_RIGHTS, that
> > > > > requires each processes which created bpf object has to implement a
> > > unix
> > > > > domain socket to share the fd of a bpf object between different
> > > > > processes, that is really trivial and troublesome.
> > > >
> > > > > Hence we need a better mechanism to get bpf object info without
> > > > > CAP_SYS_ADMIN.
> > > >
> > > > [..]
> > > >
> > > > > BPF namespace is introduced in this patchset with an attempt to remove
> > > > > the CAP_SYS_ADMIN requirement. The user can create bpf map, prog and
> > > > > link in a specific bpf namespace, then these bpf objects will not be
> > > > > visible to the users in a different bpf namespace. But these bpf
> > > > > objects are visible to its parent bpf namespace, so the sys admin can
> > > > > still iterate and inspect them.
> > > >
> > > > Does it essentially mean unpriv bpf?
> >
> > > Right. With CAP_BPF and some other CAPs enabled.
> >
> > > > Can I, as a non-root, create
> > > > a new bpf namespace and start loading/attaching progs?
> >
> > > No, you can't create a new bpf namespace as a non-root, see also
> > > copy_namespaces().
> > > In the container environment, new namespaces are always created by
> > > containered, which is started by root.
> >
> > Are you talking about "if (!ns_capable(user_ns, CAP_SYS_ADMIN))" part
> > from copy_namespaces? Isn't it trivially bypassed with a new user
> > namespace?
> >
> > IIUC, I can create a new user namespace which gives me CAP_SYS_ADMIN
> > in this particular user-ns. Then I can go on and create a new bpf
> > namespace (with CAP_BPF) and go wild? I won't see anything from the
> > other namespaces, but I'll be able to load/attach bpf programs?
> >
>
> I don't think so. If you create a new userspace, and give the process
> the CAP_BPF or CAP_SYS_ADMIN in this new user namespace but not the
> initial namespace, you can't do that. Because currently only CAP_BPF
> or CAP_SYS_ADMIN in the init user namespace can load/attach bpf
> programs.
>
> > > > Maybe add a paragraph about now vs whatever you're proposing.
> >
> > > What I'm proposing in this patchset is to put bpf objects (map, prog,
> > > link, and btf) into the bpf namespace. Next step I will put bpffs into
> > > the bpf namespace as well.
> > > That said, I'm trying to put  all the objects created in bpf into the
> > > bpf namespace. Below is a simple paragraph to illustrate it.
> >
> > > Regarding the unpriv user with CAP_BPF enabled,
> > >                                                               Now | Future
> > > ------------------------------------------------------------------------
> > > Iterate his BPF IDs                                | N   |  Y  |
> > > Iterate others' BPF IDs                          | N   |  N  |
> > > Convert his BPF IDs to FDs                  | N   |  Y  |
> > > Convert others' BPF IDs to FDs            | N   |  N  |
> > > Get others' object info from pinned file  | Y(*) | N  |
> > > ------------------------------------------------------------------------
> >
> > > (*) It can be improved by,
> > >       1). Different containers has different bpffs
> > >       2). Setting file permission
> > >       That's not perfect, for example, if one single user has two bpf
> > > instances, but we don't want them to inspect each other.
> >
> > I think the question here is what happens to the existing
> > capable(CAP_BPF) checks? Do they become ns_capable(CAP_BPF) eventually?
> >
>
> They won't become ns_capable(CAP_BPF). If it becomes
> ns_capable(CAP_BPF), it will really go wild then.
>
> > And if not, I don't think it integrates well with the user namespaces?
> >
>
> IIUC, it is the CAP_BPF which doesn't integrate with the user
> namespaces, right?

Yeah. And it's probably fine if we don't, I just wanted to see some
explanation on the long-term plan.
If the purpose is to have a bpf namespace and use it for pure
isolation purposes, let's state it clearly in the cover letter.
Otherwise it's not clear whether it's only about isolation or
potentially allowing CAP_BPF in user namespaces.
Maybe clone(CLONE_NEWBPF|CLONE_NEWUSER) should return an explicit
error? (or maybe it already does, haven't looked at the patches)

One other question I have is: should init bpf namespace see
everything? Otherwise it might be hard to chase down the namespaces
that loaded some BPF program...




> --
> Regards
> Yafang

  reply	other threads:[~2023-03-29 20:51 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-26  9:21 [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace Yafang Shao
2023-03-26  9:21 ` [RFC PATCH bpf-next 01/13] fork: New clone3 flag for " Yafang Shao
2023-03-26  9:21 ` [RFC PATCH bpf-next 02/13] proc_ns: Extend the field type in struct proc_ns_operations to long Yafang Shao
2023-03-26  9:21 ` [RFC PATCH bpf-next 03/13] bpf: Implement bpf namespace Yafang Shao
2023-03-26  9:21 ` [RFC PATCH bpf-next 04/13] bpf: No need to check if id is 0 Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 05/13] bpf: Make bpf objects id have the same alloc and free pattern Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 06/13] bpf: Helpers to alloc and free object id in bpf namespace Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 07/13] bpf: Add bpf helper to get bpf object id Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 08/13] bpf: Alloc and free bpf_map id in bpf namespace Yafang Shao
2023-03-26 10:50   ` Toke Høiland-Jørgensen
2023-03-27  2:44     ` Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 09/13] bpf: Alloc and free bpf_prog " Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 10/13] bpf: Alloc and free bpf_link " Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 11/13] bpf: Allow iterating bpf objects with CAP_BPF " Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 12/13] bpf: Use bpf_idr_lock array instead Yafang Shao
2023-03-26  9:22 ` [RFC PATCH bpf-next 13/13] selftests/bpf: Add selftest for bpf namespace Yafang Shao
2023-03-26 10:49 ` [RFC PATCH bpf-next 00/13] bpf: Introduce BPF namespace Toke Høiland-Jørgensen
2023-03-27  3:07   ` Yafang Shao
2023-03-27 20:51     ` Toke Høiland-Jørgensen
2023-03-28  3:48       ` Yafang Shao
2023-03-27 17:28 ` Stanislav Fomichev
2023-03-28  3:42   ` Yafang Shao
2023-03-28 17:15     ` Stanislav Fomichev
2023-03-29  3:02       ` Yafang Shao
2023-03-29 20:50         ` Stanislav Fomichev [this message]
2023-03-30  2:40           ` Yafang Shao
2023-03-27 19:03 ` Song Liu
2023-03-28  3:47   ` Yafang Shao
2023-04-02 23:37     ` Alexei Starovoitov
2023-04-03  3:05       ` Yafang Shao
2023-04-03 22:50         ` Alexei Starovoitov
2023-04-04  2:59           ` Yafang Shao
2023-04-06  2:06             ` Alexei Starovoitov
2023-04-06  2:54               ` Yafang Shao
2023-04-06  3:05                 ` Alexei Starovoitov
2023-04-06  3:22                   ` Yafang Shao
2023-04-06  4:24                     ` Alexei Starovoitov
2023-04-06  5:43                       ` Yafang Shao
2023-04-06 20:22                         ` Andrii Nakryiko
2023-04-07  1:43                           ` Alexei Starovoitov
2023-04-07  4:33                             ` Yafang Shao
2023-04-07 15:32                               ` Alexei Starovoitov
2023-04-07 15:59                             ` Andrii Nakryiko
2023-04-07 16:05                               ` Alexei Starovoitov
2023-04-07 16:21                                 ` Yafang Shao
2023-04-07 16:31                                   ` Alexei Starovoitov
2023-04-07 16:35                                     ` Yafang Shao
2023-03-31  5:52 ` Hao Luo
2023-04-01 16:32   ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKH8qBuRS5mp5akkT-sQHHdnnjWZYKehppgO5D1vE5rhYPuo8Q@mail.gmail.com \
    --to=sdf@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).