linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: Hao Luo <haoluo@google.com>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>, KP Singh <kpsingh@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	Joe Burton <jevburton.kernel@gmail.com>,
	Tejun Heo <tj@kernel.org>,
	joshdon@google.com, sdf@google.com, bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH bpf-next v1 1/9] bpf: Add mkdir, rmdir, unlink syscalls for prog_bpf_syscall
Date: Wed, 2 Mar 2022 11:34:11 -0800	[thread overview]
Message-ID: <20220302193411.ieooguqoa6tpraoe@ast-mbp.dhcp.thefacebook.com> (raw)
In-Reply-To: <CA+khW7g4mLw9W+CY651FaE-2SF0XBeaGKa5Le7ZnTBTK7eD30Q@mail.gmail.com>

On Mon, Feb 28, 2022 at 02:10:39PM -0800, Hao Luo wrote:
> Hi Kumar,
> 
> On Sat, Feb 26, 2022 at 9:18 PM Kumar Kartikeya Dwivedi
> <memxor@gmail.com> wrote:
> >
> > On Sat, Feb 26, 2022 at 05:13:31AM IST, Hao Luo wrote:
> > > This patch allows bpf_syscall prog to perform some basic filesystem
> > > operations: create, remove directories and unlink files. Three bpf
> > > helpers are added for this purpose. When combined with the following
> > > patches that allow pinning and getting bpf objects from bpf prog,
> > > this feature can be used to create directory hierarchy in bpffs that
> > > help manage bpf objects purely using bpf progs.
> > >
> > > The added helpers subject to the same permission checks as their syscall
> > > version. For example, one can not write to a read-only file system;
> > > The identity of the current process is checked to see whether it has
> > > sufficient permission to perform the operations.
> > >
> > > Only directories and files in bpffs can be created or removed by these
> > > helpers. But it won't be too hard to allow these helpers to operate
> > > on files in other filesystems, if we want.
> > >
> > > Signed-off-by: Hao Luo <haoluo@google.com>
> > > ---
> > > + *
> > > + * long bpf_mkdir(const char *pathname, int pathname_sz, u32 mode)
> > > + *   Description
> > > + *           Attempts to create a directory name *pathname*. The argument
> > > + *           *pathname_sz* specifies the length of the string *pathname*.
> > > + *           The argument *mode* specifies the mode for the new directory. It
> > > + *           is modified by the process's umask. It has the same semantic as
> > > + *           the syscall mkdir(2).
> > > + *   Return
> > > + *           0 on success, or a negative error in case of failure.
> > > + *
> > > + * long bpf_rmdir(const char *pathname, int pathname_sz)
> > > + *   Description
> > > + *           Deletes a directory, which must be empty.
> > > + *   Return
> > > + *           0 on sucess, or a negative error in case of failure.
> > > + *
> > > + * long bpf_unlink(const char *pathname, int pathname_sz)
> > > + *   Description
> > > + *           Deletes a name and possibly the file it refers to. It has the
> > > + *           same semantic as the syscall unlink(2).
> > > + *   Return
> > > + *           0 on success, or a negative error in case of failure.
> > >   */
> > >
> >
> > How about only introducing bpf_sys_mkdirat and bpf_sys_unlinkat? That would be
> > more useful for other cases in future, and when AT_FDCWD is passed, has the same
> > functionality as these, but when openat/fget is supported, it would work
> > relative to other dirfds as well. It can also allow using dirfd of the process
> > calling read for a iterator (e.g. if it sets the fd number using skel->bss).
> > unlinkat's AT_REMOVEDIR flag also removes the need for a bpf_rmdir.
> >
> > WDYT?
> >
> 
> The idea sounds good to me, more flexible. But I don't have a real use
> case for using the added 'dirfd' at this moment. For all the use cases
> I can think of, absolute paths will suffice, I think. Unless other
> reviewers have opposition, I will try switching to mkdirat and
> unlinkat in v2.

I'm surprised you don't need "at" variants.
I thought your production setup has a top level cgroup controller and
then inner tasks inside containers manage cgroups on their own.
Since containers are involved they likely run inside their own mountns.
cgroupfs mount is single. So you probably don't even need to bind mount it
inside containers, but bpffs is not a single mount. You need
to bind mount top bpffs inside containers for tasks to access it.
Now for cgroupfs the abs path is not an issue, but for bpffs
the AT_FDCWD becomes a problem. AT_FDCWD is using current mount ns.
Inside container that will be different. Unless you bind mount into exact
same path the full path has different meanings inside and outside of the container.
It seems to me the bpf progs attached to cgroup sleepable events should
be using FD of bpffs. Then when these tracepoints are triggered from
different containers in different mountns they will get the right dir prefix.
What am I missing?

I think non-AT variants are not needed. The prog can always pass AT_FDCWD
if it's really the intent, but passing actual FD seems more error-proof.

  reply	other threads:[~2022-03-02 19:34 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-25 23:43 [PATCH bpf-next v1 0/9] Extend cgroup interface with bpf Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 1/9] bpf: Add mkdir, rmdir, unlink syscalls for prog_bpf_syscall Hao Luo
2022-02-27  5:18   ` Kumar Kartikeya Dwivedi
2022-02-28 22:10     ` Hao Luo
2022-03-02 19:34       ` Alexei Starovoitov [this message]
2022-03-03 18:50         ` Hao Luo
2022-03-04 18:37           ` Hao Luo
2022-03-05 23:47             ` Alexei Starovoitov
2022-03-08 21:08               ` Hao Luo
2022-03-02 20:55   ` Yonghong Song
2022-03-03 18:56     ` Hao Luo
2022-03-03 19:13       ` Yonghong Song
2022-03-03 19:15         ` Hao Luo
2022-03-12  3:46   ` Al Viro
2022-03-14 17:07     ` Hao Luo
2022-03-14 23:10       ` Al Viro
2022-03-15 17:27         ` Hao Luo
2022-03-15 18:59           ` Alexei Starovoitov
2022-03-15 19:03             ` Alexei Starovoitov
2022-03-15 19:00           ` Al Viro
2022-03-15 19:47             ` Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 2/9] bpf: Add BPF_OBJ_PIN and BPF_OBJ_GET in the bpf_sys_bpf helper Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 3/9] selftests/bpf: tests mkdir, rmdir, unlink and pin in syscall Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 4/9] bpf: Introduce sleepable tracepoints Hao Luo
2022-03-02 19:41   ` Alexei Starovoitov
2022-03-03 19:37     ` Hao Luo
2022-03-03 19:59       ` Alexei Starovoitov
2022-03-02 21:23   ` Yonghong Song
2022-03-02 21:30     ` Alexei Starovoitov
2022-03-03  1:08       ` Yonghong Song
2022-03-03  2:29         ` Alexei Starovoitov
2022-03-03 19:43           ` Hao Luo
2022-03-03 20:02             ` Alexei Starovoitov
2022-03-03 20:04               ` Alexei Starovoitov
2022-03-03 22:06                 ` Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 5/9] cgroup: Sleepable cgroup tracepoints Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 6/9] libbpf: Add sleepable tp_btf Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 7/9] bpf: Lift permission check in __sys_bpf when called from kernel Hao Luo
2022-03-02 20:01   ` Alexei Starovoitov
2022-03-03 19:14     ` Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 8/9] bpf: Introduce cgroup iter Hao Luo
2022-02-26  2:32   ` kernel test robot
2022-02-26  2:32   ` kernel test robot
2022-02-26  2:53   ` kernel test robot
2022-03-02 21:59   ` Yonghong Song
2022-03-03 20:02     ` Hao Luo
2022-03-02 22:45   ` Kumar Kartikeya Dwivedi
2022-03-03  2:03     ` Yonghong Song
2022-03-03  3:03       ` Kumar Kartikeya Dwivedi
2022-03-03  4:00         ` Alexei Starovoitov
2022-03-03  7:33         ` Yonghong Song
2022-03-03  8:13           ` Kumar Kartikeya Dwivedi
2022-03-03 21:52           ` Hao Luo
2022-02-25 23:43 ` [PATCH bpf-next v1 9/9] selftests/bpf: Tests using sleepable tracepoints to monitor cgroup events Hao Luo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220302193411.ieooguqoa6tpraoe@ast-mbp.dhcp.thefacebook.com \
    --to=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=jevburton.kernel@gmail.com \
    --cc=joshdon@google.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=memxor@gmail.com \
    --cc=sdf@google.com \
    --cc=shakeelb@google.com \
    --cc=songliubraving@fb.com \
    --cc=tj@kernel.org \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).