Re: [PATCH] bpf: add access control for map

From: Menglong Dong <menglong8.dong@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	Menglong Dong <imagedong@tencent.com>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Alan Maguire <alan.maguire@oracle.com>
Subject: Re: [PATCH] bpf: add access control for map
Date: Wed, 18 May 2022 11:31:37 +0800	[thread overview]
Message-ID: <CADxym3bLXhgCK-BO6JL3OVbVdC9Tsc0k_LqMRBbXA7eOpUtdYQ@mail.gmail.com> (raw)
In-Reply-To: <CAADnVQLRmv107zFL-dgB07Mf8NmR0TCAC9eG9aZ1O4DG3=ityw@mail.gmail.com>

On Wed, May 18, 2022 at 12:58 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Mon, May 16, 2022 at 8:44 PM <menglong8.dong@gmail.com> wrote:
> >
> > From: Menglong Dong <imagedong@tencent.com>
> >
> > Hello,
> >
> > I have a idea about the access control of eBPF map, could you help
> > to see if it works?
> >
> > For now, only users with the CAP_SYS_ADMIN or CAP_BPF have the right
> > to access the data in eBPF maps. So I'm thinking, are there any way
> > to control the access to the maps, just like what we do to files?
>
> The bpf objects pinned in bpffs should always be accessible
> as files regardless of sysctl or cap-s.
>
> > Therefore, we can decide who have the right to read the map and who
> > can write.
> >
> > I think it is useful in some case. For example, I made a eBPF-based
> > network statistics program, and the information is stored in an array
> > map. And I want all users can read the information in the map, without
> > changing the capacity of them. As the information is iunsensitive,
> > normal users can read it. This make publish-consume mode possible,
> > the eBPF program is publisher and the user space program is consumer.
>
> Right. It is a choice of the bpf prog which data expose in the map.
>
> > So this aim can be achieve, if we can control the access of maps as a
> > file. There are many ways I thought, and I choosed one to implement:
> >
> > While pining the map, add the inode that is created to a list on the
> > map. root can change the permission of the inode through the pin path.
> > Therefore, we can try to find the inode corresponding to current user
> > namespace in the list, and check whether user have permission to read
> > or write.
> >
> > The steps can be:
> >
> > 1. create the map with BPF_F_UMODE flags, which imply that enable
> >    access control in this map.
> > 2. load and pin the map on /sys/fs/bpf/xxx.
> > 3. change the umode of /sys/fs/bpf/xxx with 'chmod 744 /sys/fs/bpf/xxx',
> >    therefor all user can read the map.
>
> This behavior should be available by default.
> Only sysctl was preventing it. It's being fixed by
> the following patch. Please take a look at:
> https://patchwork.kernel.org/project/netdevbpf/patch/1652788780-25520-2-git-send-email-alan.maguire@oracle.com/
>
> Does it solve your use case?

Yeah, it seems this patch gives another way: give users all access to
bpf commands (except map_create and prog_load). Therefore, users
that have the access to the pin path have corresponding r/w of the
eBPF object. This patch can cover my case.

However, this seems to give users too much permission (or the
access check is not enough?) I have do a test:

1. load a ebpf program of type cgroup and pin it on
   /sys/fs/bpf/post_bind as root.
2. give users access to read /sys/fs/bpf/post_bind
3. Now, all users can attach or detach the eBPF program
   to /sys/fs/cgroup/, who have only read access to the ebpf
   and the cgroup.

I think there are many such cases. Is this fine?

>
> > @@ -542,14 +557,26 @@ int bpf_obj_get_user(const char __user *pathname, int flags)
> >         if (IS_ERR(raw))
> >                 return PTR_ERR(raw);
> >
> > -       if (type == BPF_TYPE_PROG)
> > +       if (type != BPF_TYPE_MAP && !bpf_capable())
> > +               return -EPERM;
>
> obj_get already implements normal ACL style access to files.
> Let's not fragment this security model with extra cap checks.
>

Yeah, my way is too rough.

> > +
> > +       switch (type) {
> > +       case BPF_TYPE_PROG:
> >                 ret = bpf_prog_new_fd(raw);
> > -       else if (type == BPF_TYPE_MAP)
> > +               break;
> > +       case BPF_TYPE_MAP:
> > +               if (bpf_map_permission(raw, f_flags)) {
> > +                       bpf_any_put(raw, type);
> > +                       return -EPERM;
> > +               }
>
> bpf_obj_do_get() already does such check.
>

With the patch you mentioned above, now bpf_obj_do_get()
can do this job, as normal users can also get there too.

> > +int bpf_map_permission(struct bpf_map *map, int flags)
> > +{
> > +       struct bpf_map_inode *map_inode;
> > +       struct user_namespace *ns;
> > +
> > +       if (capable(CAP_SYS_ADMIN))
> > +               return 0;
> > +
> > +       if (!(map->map_flags & BPF_F_UMODE))
> > +               return -1;
> > +
> > +       rcu_read_lock();
> > +       list_for_each_entry_rcu(map_inode, &map->inode_list, list) {
> > +               ns = map_inode->inode->i_sb->s_user_ns;
> > +               if (ns == current_user_ns())
> > +                       goto found;
> > +       }
> > +       rcu_read_unlock();
> > +       return -1;
> > +found:
> > +       rcu_read_unlock();
> > +       return inode_permission(ns, map_inode->inode, ACC_MODE(flags));
> > +}
>
> See path_permission() in bpf_obj_do_get().
>
> >  static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
> > @@ -3720,9 +3757,6 @@ static int bpf_map_get_fd_by_id(const union bpf_attr *attr)
> >             attr->open_flags & ~BPF_OBJ_FLAG_MASK)
> >                 return -EINVAL;
> >
> > -       if (!capable(CAP_SYS_ADMIN))
> > -               return -EPERM;
> > -
>
> This part we cannot relax.
> What you're trying to do is to bypass path checks
> by pointing at a map with its ID only.
> That contradicts to your official goal in the cover letter.
>
> bpf_map_get_fd_by_id() has to stay cap_sys_admin only.
> Exactly for the reason that bpf subsystem has file ACL style.
> fd_by_id is a debug interface used by tools like bpftool and
> root admin that needs to see the system as a whole.
> Normal tasks/processes need to use bpffs and pin files with
> correct permissions to pass maps from one process to another.
> Or use FD passing kernel facilities.

Yeah, this part is not necessary for me either. Without this
part, the current code already can do what I wanted.

Thanks
Menglong Dong