All of lore.kernel.org
 help / color / mirror / Atom feed
* Closing the BPF map permission loophole
@ 2022-09-15 10:30 Lorenz Bauer
  2022-09-16  3:03 ` Kumar Kartikeya Dwivedi
                   ` (4 more replies)
  0 siblings, 5 replies; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-15 10:30 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

Hi list,

Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF map permission loophole", with slides at [0].

Problem #1: Read-only fds can be modified via a BPF program

1. Craft a BPF program that executes bpf_map_update_elem(read-only fd, ...)
2. Load the program & execute it

The reason is that the verifier only checks bpf_map->map_flags in resolve_pseudo_ldimm64, but ignores fd->f_mode.

Fixing this problem is complicated by the fact that a user may use several distinct fds with differing permissions to refer to the same map, but that the verifier internally only tracks unique struct bpf_map. See [1].

Problem #2: Read-only fds can be "transmuted" into read-write fds via map in map

1. BPF_MAP_UPDATE_ELEM(map in map fd, read-only fd)
2. BPF_MAP_LOOKUP_ELEM(map in map fd) = read-write fd

This was pointed out by Stanislav Fomichev during the LPC session. I've not yet tried this myself.

Problem #3: Read-only fds can be transmuted into read-write fds via object pinning

1. BPF_OBJ_PIN(read-only fd, /sys/fs/bpf/foo)
2. BPF_OBJ_GET(/sys/fs/bpf/foo) = read-write fd

The problem is with BPF_OBJ_PIN semantics: regardless of fd->f_mode, pinning creates an inode that is owned by the current user, with mode o=rw. Even if we made the inode o=r, a user / attacker can still use chmod(2) to change it back to o=rw.

On older kernels, this requires either unprivileged BPF or CAP_BPF, but recently BPF_OBJ_PIN has been made available without CAP_BPF.

This problem also applies to other BPF objects: links, programs, maybe iterators? Since we don't have BPF_F_RDONLY semantics for those the issue is maybe less urgent, but see [2] for some more fun.

A number of ideas were explored during the session:

* In OBJ_PIN, create the inode owned by the user that executed MAP_CREATE, not the user that
  invoked OBJ_PIN. This would allow unprivileged users to create files as another user, which
  seems like a bad idea.
* In OBJ_GET, refuse a read-write fd if the fd passed to OBJ_PIN wasn't read-write. This is not
  possible since we store struct bpf_map * in the inode, so we don't have access to fd->f_mode
  anymore.
* In OBJ_PIN, adjust the mode of the created inode to match fd->f_mode, and later refuse attempts
  to chmod(2). After a cursory glance at the source code it seems like there are no hooks for
  filesystems to influence chmod.

My gut feeling is that the root of the problem is that OBJ_PIN is too permissive. Once an inode exists that is owned by the current user the cat is out of the box.

BPF_F_RDONLY and BPF_F_WRONLY were introduced in 4.15 [3]. If we want to fix this properly, aka relying on BPF_R_RDONLY doesn't introduce a gaping hole, we'll have to do quite a bit of backporting.

I plan on submitting a sledgehammer approach fix for #1 and #2 as discussed with Daniel after my presentation.

#3 is in sore need of further discussion and creativity. One avenue I want to explore is whether we can refuse OBJ_PIN if:
- the current user is not the map creator
- and the fd is not r/w
- and the current user has no CAP_DAC_OVERRIDE (or similar)

Thanks
Lorenz

0: https://lpc.events/event/16/contributions/1372/attachments/977/2059/Plumbers%2022%20Closing%20the%20BPF%20map%20permission%20loophole.pdf
1: https://elixir.bootlin.com/linux/v6.0-rc5/source/kernel/bpf/verifier.c#L12839
2: https://lore.kernel.org/bpf/20210618105526.265003-1-zenczykowski@gmail.com/
3: https://github.com/torvalds/linux/commit/6e71b04a8224

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-15 10:30 Closing the BPF map permission loophole Lorenz Bauer
@ 2022-09-16  3:03 ` Kumar Kartikeya Dwivedi
  2022-09-22 14:57   ` Lorenz Bauer
  2022-09-17 15:42 ` Stanislav Fomichev
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 45+ messages in thread
From: Kumar Kartikeya Dwivedi @ 2022-09-16  3:03 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf, luto

On Thu, 15 Sept 2022 at 12:41, Lorenz Bauer <oss@lmb.io> wrote:
>
> Hi list,
>
> Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF map permission loophole", with slides at [0].
>
> Problem #1: Read-only fds can be modified via a BPF program
>
> 1. Craft a BPF program that executes bpf_map_update_elem(read-only fd, ...)
> 2. Load the program & execute it
>
> The reason is that the verifier only checks bpf_map->map_flags in resolve_pseudo_ldimm64, but ignores fd->f_mode.
>
> Fixing this problem is complicated by the fact that a user may use several distinct fds with differing permissions to refer to the same map, but that the verifier internally only tracks unique struct bpf_map. See [1].
>
> Problem #2: Read-only fds can be "transmuted" into read-write fds via map in map
>
> 1. BPF_MAP_UPDATE_ELEM(map in map fd, read-only fd)
> 2. BPF_MAP_LOOKUP_ELEM(map in map fd) = read-write fd
>
> This was pointed out by Stanislav Fomichev during the LPC session. I've not yet tried this myself.
>
> Problem #3: Read-only fds can be transmuted into read-write fds via object pinning
>
> 1. BPF_OBJ_PIN(read-only fd, /sys/fs/bpf/foo)
> 2. BPF_OBJ_GET(/sys/fs/bpf/foo) = read-write fd
>

Just for completeness, this was also pointed out by Andy back in 2019:
https://lore.kernel.org/bpf/CALCETrVtPs8gY-H4gmzSqPboid3CB++n50SvYd6RU9YVde_-Ow@mail.gmail.com

> The problem is with BPF_OBJ_PIN semantics: regardless of fd->f_mode, pinning creates an inode that is owned by the current user, with mode o=rw. Even if we made the inode o=r, a user / attacker can still use chmod(2) to change it back to o=rw.
>
> On older kernels, this requires either unprivileged BPF or CAP_BPF, but recently BPF_OBJ_PIN has been made available without CAP_BPF.
>
> This problem also applies to other BPF objects: links, programs, maybe iterators? Since we don't have BPF_F_RDONLY semantics for those the issue is maybe less urgent, but see [2] for some more fun.
>
> A number of ideas were explored during the session:
>
> * In OBJ_PIN, create the inode owned by the user that executed MAP_CREATE, not the user that
>   invoked OBJ_PIN. This would allow unprivileged users to create files as another user, which
>   seems like a bad idea.
> * In OBJ_GET, refuse a read-write fd if the fd passed to OBJ_PIN wasn't read-write. This is not
>   possible since we store struct bpf_map * in the inode, so we don't have access to fd->f_mode
>   anymore.
> * In OBJ_PIN, adjust the mode of the created inode to match fd->f_mode, and later refuse attempts
>   to chmod(2). After a cursory glance at the source code it seems like there are no hooks for
>   filesystems to influence chmod.
>
> My gut feeling is that the root of the problem is that OBJ_PIN is too permissive. Once an inode exists that is owned by the current user the cat is out of the box.
>
> BPF_F_RDONLY and BPF_F_WRONLY were introduced in 4.15 [3]. If we want to fix this properly, aka relying on BPF_R_RDONLY doesn't introduce a gaping hole, we'll have to do quite a bit of backporting.
>
> I plan on submitting a sledgehammer approach fix for #1 and #2 as discussed with Daniel after my presentation.
>
> #3 is in sore need of further discussion and creativity. One avenue I want to explore is whether we can refuse OBJ_PIN if:
> - the current user is not the map creator
> - and the fd is not r/w
> - and the current user has no CAP_DAC_OVERRIDE (or similar)
>
> Thanks
> Lorenz
>
> 0: https://lpc.events/event/16/contributions/1372/attachments/977/2059/Plumbers%2022%20Closing%20the%20BPF%20map%20permission%20loophole.pdf
> 1: https://elixir.bootlin.com/linux/v6.0-rc5/source/kernel/bpf/verifier.c#L12839
> 2: https://lore.kernel.org/bpf/20210618105526.265003-1-zenczykowski@gmail.com/
> 3: https://github.com/torvalds/linux/commit/6e71b04a8224

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-15 10:30 Closing the BPF map permission loophole Lorenz Bauer
  2022-09-16  3:03 ` Kumar Kartikeya Dwivedi
@ 2022-09-17 15:42 ` Stanislav Fomichev
  2022-09-21 16:32 ` Roberto Sassu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 45+ messages in thread
From: Stanislav Fomichev @ 2022-09-17 15:42 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, bpf

On Thu, Sep 15, 2022 at 3:31 AM Lorenz Bauer <oss@lmb.io> wrote:
>
> Hi list,
>
> Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF map permission loophole", with slides at [0].
>
> Problem #1: Read-only fds can be modified via a BPF program
>
> 1. Craft a BPF program that executes bpf_map_update_elem(read-only fd, ...)
> 2. Load the program & execute it
>
> The reason is that the verifier only checks bpf_map->map_flags in resolve_pseudo_ldimm64, but ignores fd->f_mode.
>
> Fixing this problem is complicated by the fact that a user may use several distinct fds with differing permissions to refer to the same map, but that the verifier internally only tracks unique struct bpf_map. See [1].
>
> Problem #2: Read-only fds can be "transmuted" into read-write fds via map in map
>
> 1. BPF_MAP_UPDATE_ELEM(map in map fd, read-only fd)
> 2. BPF_MAP_LOOKUP_ELEM(map in map fd) = read-write fd
>
> This was pointed out by Stanislav Fomichev during the LPC session. I've not yet tried this myself.
>
> Problem #3: Read-only fds can be transmuted into read-write fds via object pinning
>
> 1. BPF_OBJ_PIN(read-only fd, /sys/fs/bpf/foo)
> 2. BPF_OBJ_GET(/sys/fs/bpf/foo) = read-write fd
>
> The problem is with BPF_OBJ_PIN semantics: regardless of fd->f_mode, pinning creates an inode that is owned by the current user, with mode o=rw. Even if we made the inode o=r, a user / attacker can still use chmod(2) to change it back to o=rw.
>
> On older kernels, this requires either unprivileged BPF or CAP_BPF, but recently BPF_OBJ_PIN has been made available without CAP_BPF.
>
> This problem also applies to other BPF objects: links, programs, maybe iterators? Since we don't have BPF_F_RDONLY semantics for those the issue is maybe less urgent, but see [2] for some more fun.
>
> A number of ideas were explored during the session:
>
> * In OBJ_PIN, create the inode owned by the user that executed MAP_CREATE, not the user that
>   invoked OBJ_PIN. This would allow unprivileged users to create files as another user, which
>   seems like a bad idea.
> * In OBJ_GET, refuse a read-write fd if the fd passed to OBJ_PIN wasn't read-write. This is not
>   possible since we store struct bpf_map * in the inode, so we don't have access to fd->f_mode
>   anymore.
> * In OBJ_PIN, adjust the mode of the created inode to match fd->f_mode, and later refuse attempts
>   to chmod(2). After a cursory glance at the source code it seems like there are no hooks for
>   filesystems to influence chmod.
>
> My gut feeling is that the root of the problem is that OBJ_PIN is too permissive. Once an inode exists that is owned by the current user the cat is out of the box.
>
> BPF_F_RDONLY and BPF_F_WRONLY were introduced in 4.15 [3]. If we want to fix this properly, aka relying on BPF_R_RDONLY doesn't introduce a gaping hole, we'll have to do quite a bit of backporting.
>
> I plan on submitting a sledgehammer approach fix for #1 and #2 as discussed with Daniel after my presentation.
>

[..]

> #3 is in sore need of further discussion and creativity. One avenue I want to explore is whether we can refuse OBJ_PIN if:
> - the current user is not the map creator
> - and the fd is not r/w
> - and the current user has no CAP_DAC_OVERRIDE (or similar)

Thank you, Lorenz, for a nice summary!

We might start by plugging the hole by requiring CAP_BPF for OBJ_PIN
and then discussing a better way forward (unless somebody has better
ideas). I'm traveling so I don't have time to think this through yet
:-(

Our use-case for unpriv so far is for some CAP_BPF process to pin a
read-only map and chmod it to 0755 for unpriv programs to read.

> Thanks
> Lorenz
>
> 0: https://lpc.events/event/16/contributions/1372/attachments/977/2059/Plumbers%2022%20Closing%20the%20BPF%20map%20permission%20loophole.pdf
> 1: https://elixir.bootlin.com/linux/v6.0-rc5/source/kernel/bpf/verifier.c#L12839
> 2: https://lore.kernel.org/bpf/20210618105526.265003-1-zenczykowski@gmail.com/
> 3: https://github.com/torvalds/linux/commit/6e71b04a8224

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-15 10:30 Closing the BPF map permission loophole Lorenz Bauer
  2022-09-16  3:03 ` Kumar Kartikeya Dwivedi
  2022-09-17 15:42 ` Stanislav Fomichev
@ 2022-09-21 16:32 ` Roberto Sassu
  2022-09-22 14:47   ` Lorenz Bauer
  2022-09-22 16:29 ` Lorenz Bauer
  2022-09-28  8:54 ` Lorenz Bauer
  4 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-09-21 16:32 UTC (permalink / raw)
  To: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Thu, 2022-09-15 at 11:30 +0100, Lorenz Bauer wrote:
> Hi list,
> 
> Here is a summary of the talk I gave at LPC '22 titled "Closing the
> BPF map permission loophole", with slides at [0].
> 
> Problem #1: Read-only fds can be modified via a BPF program
> 
> 1. Craft a BPF program that executes bpf_map_update_elem(read-only
> fd, ...)
> 2. Load the program & execute it
> 
> The reason is that the verifier only checks bpf_map->map_flags in
> resolve_pseudo_ldimm64, but ignores fd->f_mode.
> 
> Fixing this problem is complicated by the fact that a user may use
> several distinct fds with differing permissions to refer to the same
> map, but that the verifier internally only tracks unique struct
> bpf_map. See [1].

Hi Lorenz

thanks for the detailed report. Unfortunately, I could not attend your
presentation at Linux Plumbers, and the recording is not yet available.
So, I rely on this and your slides.

I saw your fix #2, even if I didn't fully understand it. What do you
think instead about converting the fd modes to map flags (e.g.
BPF_F_RDONLY -> BPF_RDONLY_PROG), and we rely on the existing verifier
behavior for the _PROG counterparts? In this way, it will be the
verifier enforcing the decision made by security_bpf_map().

> Problem #2: Read-only fds can be "transmuted" into read-write fds via
> map in map
> 
> 1. BPF_MAP_UPDATE_ELEM(map in map fd, read-only fd)
> 2. BPF_MAP_LOOKUP_ELEM(map in map fd) = read-write fd
> 
> This was pointed out by Stanislav Fomichev during the LPC session.
> I've not yet tried this myself.

Didn't look into it yet.

> Problem #3: Read-only fds can be transmuted into read-write fds via
> object pinning
> 
> 1. BPF_OBJ_PIN(read-only fd, /sys/fs/bpf/foo)
> 2. BPF_OBJ_GET(/sys/fs/bpf/foo) = read-write fd
> 
> The problem is with BPF_OBJ_PIN semantics: regardless of fd->f_mode,
> pinning creates an inode that is owned by the current user, with mode
> o=rw. Even if we made the inode o=r, a user / attacker can still use
> chmod(2) to change it back to o=rw.

Maybe I'm missing something, but I consider pinning more like adding a
new reference to an eBPF object (like the ID).

You are still subject to access control decision by security_bpf_map(),
as for BPF_MAP_GET_FD_BY_ID().

Now, the security model is limited to two permissions (read, write). If
we want to add a new one to decide whether or not a new reference can
be added, we could revisit this.

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-21 16:32 ` Roberto Sassu
@ 2022-09-22 14:47   ` Lorenz Bauer
  2022-09-22 15:39     ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-22 14:47 UTC (permalink / raw)
  To: Roberto Sassu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Wed, 21 Sep 2022, at 17:32, Roberto Sassu wrote:
>
> I saw your fix #2, even if I didn't fully understand it. What do you
> think instead about converting the fd modes to map flags (e.g.
> BPF_F_RDONLY -> BPF_RDONLY_PROG), and we rely on the existing verifier
> behavior for the _PROG counterparts? In this way, it will be the
> verifier enforcing the decision made by security_bpf_map().

Thanks for that idea, I think something like it was floated during the discussion after my talk as well but I forgot about it. I gave it a shot, and it turns out okay actually. The biggest draw back is that this approach requires commit 591fe9888d78 ("bpf: add program side {rd, wr}only support for maps") which appeared after BPF_F_RDONLY.

>> Problem #3: Read-only fds can be transmuted into read-write fds via
>> object pinning
>
> Maybe I'm missing something, but I consider pinning more like adding a
> new reference to an eBPF object (like the ID).
>
> You are still subject to access control decision by security_bpf_map(),
> as for BPF_MAP_GET_FD_BY_ID().

I had a look, security_bpf_map() is only called from bpf_map_new_fd(), which in turn is invoked from GET_FD_BY_ID, MAP_CREATE and OBJ_GET. Checking at this point is too late, since OBJ_PIN + chmod allow escalating privileges. Can you explain your idea some more?

> Now, the security model is limited to two permissions (read, write). If
> we want to add a new one to decide whether or not a new reference can
> be added, we could revisit this.

Maybe, but that would preclude back porting any fixes. I'll write up another summary in a bit that shows that this problem goes back all the way to the introduction of BPF_F_RDONLY, etc.

Thanks
Lorenz

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-16  3:03 ` Kumar Kartikeya Dwivedi
@ 2022-09-22 14:57   ` Lorenz Bauer
  0 siblings, 0 replies; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-22 14:57 UTC (permalink / raw)
  To: Kumar Kartikeya Dwivedi
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf, luto

On Fri, 16 Sep 2022, at 04:03, Kumar Kartikeya Dwivedi wrote:
> Just for completeness, this was also pointed out by Andy back in 2019:
> https://lore.kernel.org/bpf/CALCETrVtPs8gY-H4gmzSqPboid3CB++n50SvYd6RU9YVde_-Ow@mail.gmail.com

Thanks for that link, it's sad that the discussion got bogged down, it would've been great to fix this back then. For reference, Andy came up with the same proposed fix to require R/W when pinning:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=bpf/perms&id=3a5ddc41c25e00e8590f6da414bcf0ed7626ffe1

The following also seems relevant / desirable: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=bpf/perms&id=3bb110117c983f781f545e69ce35d4fcdd0c543b

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-22 14:47   ` Lorenz Bauer
@ 2022-09-22 15:39     ` Roberto Sassu
  2022-09-22 16:21       ` Lorenz Bauer
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-09-22 15:39 UTC (permalink / raw)
  To: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Thu, 2022-09-22 at 15:47 +0100, Lorenz Bauer wrote:
> On Wed, 21 Sep 2022, at 17:32, Roberto Sassu wrote:
> > I saw your fix #2, even if I didn't fully understand it. What do
> > you
> > think instead about converting the fd modes to map flags (e.g.
> > BPF_F_RDONLY -> BPF_RDONLY_PROG), and we rely on the existing
> > verifier
> > behavior for the _PROG counterparts? In this way, it will be the
> > verifier enforcing the decision made by security_bpf_map().
> 
> Thanks for that idea, I think something like it was floated during
> the discussion after my talk as well but I forgot about it. I gave it
> a shot, and it turns out okay actually. The biggest draw back is that
> this approach requires commit 591fe9888d78 ("bpf: add program side
> {rd, wr}only support for maps") which appeared after BPF_F_RDONLY.

Yes, true. Not sure if it makes sense to backport it to stable versions
(probably not). Or alternatively, for older versions we could ensure
that the fd is for read/write, even if as you said, there is a risk of
breakage of existing applications. It seems unlikely that this could
happen, as libbpf still does not support requesting a read-only fd,
although one could create an ad-hoc function to set the appropriate
parameters for the bpf() system call.

Actually, if it may help, I could send my version of the fix I
developed to validate the idea. I also wrote the tests.

> Problem #3: Read-only fds can be transmuted into read-write fds
> > > via
> > > object pinning
> > 
> > Maybe I'm missing something, but I consider pinning more like
> > adding a
> > new reference to an eBPF object (like the ID).
> > 
> > You are still subject to access control decision by
> > security_bpf_map(),
> > as for BPF_MAP_GET_FD_BY_ID().
> 
> I had a look, security_bpf_map() is only called from
> bpf_map_new_fd(), which in turn is invoked from GET_FD_BY_ID,
> MAP_CREATE and OBJ_GET. Checking at this point is too late, since
> OBJ_PIN + chmod allow escalating privileges. Can you explain your
> idea some more?

The ability to access the path of a pinned map does not give you the
ability to access the map itself. You still need to pass
security_bpf_map(). With SELinux, you would need a rule like:

allow <ctx of subject accessing the pinned map> <ctx of the map
creator>: bpf { map_read map_write };

The inode is not passed to security_bpf_map(), so likely it is not
taken into account for the security decision.

What you say, I think it applies to map iterators. The ability to
access the path of an iterator gives you the ability to make changes to
the map without further checks.

My further question, we could discuss it separately, is: by updating
the verifier to check fd modes, and by checking the fd modes for map
iterators, are we able to intercept any possible attempt from user
space to modify a map?

> Now, the security model is limited to two permissions (read,
> > write). If
> > we want to add a new one to decide whether or not a new reference
> > can
> > be added, we could revisit this.
> 
> Maybe, but that would preclude back porting any fixes. I'll write up
> another summary in a bit that shows that this problem goes back all
> the way to the introduction of BPF_F_RDONLY, etc.

Definitely, if possible, the fix should be backportable.

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-22 15:39     ` Roberto Sassu
@ 2022-09-22 16:21       ` Lorenz Bauer
  2022-09-23  7:39         ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-22 16:21 UTC (permalink / raw)
  To: Roberto Sassu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Thu, 22 Sep 2022, at 16:39, Roberto Sassu wrote:
>
> Yes, true. Not sure if it makes sense to backport it to stable versions
> (probably not). Or alternatively, for older versions we could ensure
> that the fd is for read/write, even if as you said, there is a risk of
> breakage of existing applications. It seems unlikely that this could
> happen, as libbpf still does not support requesting a read-only fd,
> although one could create an ad-hoc function to set the appropriate
> parameters for the bpf() system call.

You can create a r-o fd via bpf_map_create if you pass map_flags=BPF_F_RDONLY unfortunately. Were you thinking of OBJ_GET when you refer to libbpf?

> Actually, if it may help, I could send my version of the fix I
> developed to validate the idea. I also wrote the tests.

Yes please, I have also have a WIP patch that seems to work, but I'm curious what you came up with. Tests would also be great, mine are kinda janky.

> The ability to access the path of a pinned map does not give you the
> ability to access the map itself.

I think there is a subtlety here I don't get. BPF_OBJ_GET(/sys/fs/bpf/foo) gives me an fd I can modify via syscall, no? Are you making a distinction between the inode and the bpf_map itself?

> You still need to pass
> security_bpf_map(). With SELinux, you would need a rule like:
>
> allow <ctx of subject accessing the pinned map> <ctx of the map
> creator>: bpf { map_read map_write };
>
> The inode is not passed to security_bpf_map(), so likely it is not
> taken into account for the security decision.

Ok, you're saying that a user can prevent the escalation via SELinux?

> What you say, I think it applies to map iterators. The ability to
> access the path of an iterator gives you the ability to make changes to
> the map without further checks.

How? An example would really help me, I don't know much about iterators besides that I can pin them and that open() triggers the program.

Lorenz

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-15 10:30 Closing the BPF map permission loophole Lorenz Bauer
                   ` (2 preceding siblings ...)
  2022-09-21 16:32 ` Roberto Sassu
@ 2022-09-22 16:29 ` Lorenz Bauer
  2022-09-22 23:21   ` Hao Luo
  2022-09-28  8:54 ` Lorenz Bauer
  4 siblings, 1 reply; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-22 16:29 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote:
> Hi list,
>
> Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF 
> map permission loophole", with slides at [0].

A timeline of the most important commits in this sorry affair. TL;DR: >= 4.15 is really broken.

v4.4 commit b2197755b263 ("bpf: add support for persistent maps/progs") 
https://github.com/torvalds/linux/commit/b2197755b263

Introduces BPF_OBJ_PIN and OBJ_GET. Map fds are always read-write and all is fine.

v4.12 commit 56f668dfe00d ("bpf: Add array of maps support")
https://github.com/torvalds/linux/commit/56f668dfe00d

Map fds are always read-write. Inner map map_flags are compared to the outer map via bpf_map_meta_equal, which prevents stripping BPF_F_RDONLY_PROG, etc. later on.

v4.15 commit 6e71b04a8224 ("bpf: Add file mode configuration into bpf maps")
https://github.com/torvalds/linux/commit/6e71b04a8224

Introduces BPF_F_RDONLY / WRONLY. Doesn't take into account that BPF programs can modify maps (unprivileged BPF is enabled since v4.3). Also doesn't take into account that BPF_OBJ_PIN can be used to turn r/o into r/w maps. This is the start of our problems, and IMO it's completely broken.

v5.2 commit 591fe9888d78 ("bpf: add program side {rd, wr}only support for maps")
https://github.com/torvalds/linux/commit/591fe9888d78

Introduces BPF_F_RDONLY_PROG, etc. Using sourcegraph.com I found a BPF ELF that creates a map with BPF_F_RDONLY | BPF_F_RDONLY_PROG flags [0]. This would be broken by refusing non-RW fds in the verifier, which is my preferred brute fix. Those flags came about because of a misunderstanding of what BPF_F_RDONLY does, which makes me worry might be other cases out there.

Based on my research, we can do the following:

> Problem #1: Read-only fds can be modified via a BPF program

Up until 591fe9888d78 BPF_F_RDONLY_PROG we'd probably be OK with just refusing !rw fds. The only kernel.org version that this applies to is 4.19. 5.4, 5.10, 5.15 need the fix Roberto suggested.

> Problem #2: Read-only fds can be "transmuted" into read-write fds via map in map

This is not as big a problem as I initially thought. First, lookup in a nested map from userspace returns an ID, not a fd. Going from ID to fd requires CAP_SYS_ADMIN. On the program side, it's possible to retrieve the inner map and modify it.

I think it's possible to fix this by refusing to insert !rw inner maps, by patching bpf_map_fd_get_ptr(). That function hasn't changed in a long time either. 4.19, 5.4, 5.10, 5.15.

> Problem #3: Read-only fds can be transmuted into read-write fds via 
> object pinning
>
> 1. BPF_OBJ_PIN(read-only fd, /sys/fs/bpf/foo)
> 2. BPF_OBJ_GET(/sys/fs/bpf/foo) = read-write fd

bpf_map doesn't have a concept of an owning user, so the only solution I can come up with is to refuse OBJ_PIN if !rw and !capable(CAP_DAC_OVERRIDE). Replace CAP_DAC_OVERRIDE with another capability of your choice, the idea being that this allows programs that run as root (probably a lot?) to continue functioning.

Again 4.19, 5.4, 5.10, 5.15.

Daniel, Alexei, Andrii: any thoughts? This is a pretty deep rabbit hole, and I don't want to waste my time on the wrong approach.

0: https://github.com/willfindlay/bpfcontain-rs/blob/eb2cd826b609e165d63d784c0f562b7a278171d2/src/bpf/include/allocator.h#L16

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-22 16:29 ` Lorenz Bauer
@ 2022-09-22 23:21   ` Hao Luo
  2022-09-26 15:41     ` Lorenz Bauer
  0 siblings, 1 reply; 45+ messages in thread
From: Hao Luo @ 2022-09-22 23:21 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf

Hi Lorenz,

Thanks for your nice summary.

On Thu, Sep 22, 2022 at 9:29 AM Lorenz Bauer <oss@lmb.io> wrote:
>
> On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote:
> > Hi list,
> >
> > Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF
> > map permission loophole", with slides at [0].
[...]
> > Problem #1: Read-only fds can be modified via a BPF program
>
> Up until 591fe9888d78 BPF_F_RDONLY_PROG we'd probably be OK with just refusing !rw fds. The only kernel.org version that this applies to is 4.19. 5.4, 5.10, 5.15 need the fix Roberto suggested.

For this problem, I'm thinking of a fix purely in the verifier:
passing along the file permission up to the site where the map is
used.

For maps that are not passed from outside, this permission is rw by default.

For maps that are obtained from fdget(), we can do the following:

- associate the env->used_maps with an array of permissions.
- then extend bpf_reg_state and bpf_call_arg_meta to include the
permission information.
- then in record_func_map(), we can reject the program if the
permission doesn't allow the map operation.

Not the simplest solution, but this is the first solution that comes
up in my head.

> > Problem #2: Read-only fds can be "transmuted" into read-write fds via map in map
>
> This is not as big a problem as I initially thought. First, lookup in a nested map from userspace returns an ID, not a fd. Going from ID to fd requires CAP_SYS_ADMIN. On the program side, it's possible to retrieve the inner map and modify it.
>
> I think it's possible to fix this by refusing to insert !rw inner maps, by patching bpf_map_fd_get_ptr(). That function hasn't changed in a long time either. 4.19, 5.4, 5.10, 5.15.

With the permission info associated with map_ptr in verifier that is
established for program #1, a fix seems relatively easy.

> > Problem #3: Read-only fds can be transmuted into read-write fds via
> > object pinning
> >
> > 1. BPF_OBJ_PIN(read-only fd, /sys/fs/bpf/foo)
> > 2. BPF_OBJ_GET(/sys/fs/bpf/foo) = read-write fd
>
> bpf_map doesn't have a concept of an owning user, so the only solution I can come up with is to refuse OBJ_PIN if !rw and !capable(CAP_DAC_OVERRIDE). Replace CAP_DAC_OVERRIDE with another capability of your choice, the idea being that this allows programs that run as root (probably a lot?) to continue functioning.

For an idea mentioned in the summary,

> In OBJ_GET, refuse a read-write fd if the fd passed to OBJ_PIN wasn't read-write.

This sounds reasonable to me. Can we extend the object type referenced
by inode to include the permission?

> Again 4.19, 5.4, 5.10, 5.15.
>
> Daniel, Alexei, Andrii: any thoughts? This is a pretty deep rabbit hole, and I don't want to waste my time on the wrong approach.
>
> 0: https://github.com/willfindlay/bpfcontain-rs/blob/eb2cd826b609e165d63d784c0f562b7a278171d2/src/bpf/include/allocator.h#L16

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-22 16:21       ` Lorenz Bauer
@ 2022-09-23  7:39         ` Roberto Sassu
  2022-09-26 15:35           ` Lorenz Bauer
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-09-23  7:39 UTC (permalink / raw)
  To: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Thu, 2022-09-22 at 17:21 +0100, Lorenz Bauer wrote:
> On Thu, 22 Sep 2022, at 16:39, Roberto Sassu wrote:
> > Yes, true. Not sure if it makes sense to backport it to stable
> > versions
> > (probably not). Or alternatively, for older versions we could
> > ensure
> > that the fd is for read/write, even if as you said, there is a risk
> > of
> > breakage of existing applications. It seems unlikely that this
> > could
> > happen, as libbpf still does not support requesting a read-only fd,
> > although one could create an ad-hoc function to set the appropriate
> > parameters for the bpf() system call.
> 
> You can create a r-o fd via bpf_map_create if you pass
> map_flags=BPF_F_RDONLY unfortunately. Were you thinking of OBJ_GET
> when you refer to libbpf?

Oh, right.

To be more precise, bpf_map_get_fd_by_id() does not support yet getting
a read-only fd. bpf_obj_get_opts() was recently introduced for this
purpose.

> Actually, if it may help, I could send my version of the fix I
> > developed to validate the idea. I also wrote the tests.
> 
> Yes please, I have also have a WIP patch that seems to work, but I'm
> curious what you came up with. Tests would also be great, mine are
> kinda janky.

Ok.

> The ability to access the path of a pinned map does not give you
> > the
> > ability to access the map itself.
> 
> I think there is a subtlety here I don't get.
> BPF_OBJ_GET(/sys/fs/bpf/foo) gives me an fd I can modify via syscall,
> no? Are you making a distinction between the inode and the bpf_map
> itself?

If you have write access to /sys/fs/bpf/foo, it does not mean that you
will have write access to the map, when you call OBJ_GET(). I don't
know if you could add more modes after getting a fd.

> You still need to pass
> > security_bpf_map(). With SELinux, you would need a rule like:
> > 
> > allow <ctx of subject accessing the pinned map> <ctx of the map
> > creator>: bpf { map_read map_write };
> > 
> > The inode is not passed to security_bpf_map(), so likely it is not
> > taken into account for the security decision.
> 
> Ok, you're saying that a user can prevent the escalation via SELinux?

Not only with SELinux, but with an eBPF program too (BPF LSM). What I
wanted to do is to deny write access to anyone except the eBPF program
that declares the map.

> What you say, I think it applies to map iterators. The ability to
> > access the path of an iterator gives you the ability to make
> > changes to
> > the map without further checks.
> 
> How? An example would really help me, I don't know much about
> iterators besides that I can pin them and that open() triggers the
> program.

I wrote an example here:

https://lore.kernel.org/bpf/8d7a713e500b5e3fce93e4c5c7b8841eb6dd28e4.camel@huaweicloud.com/

It shows that you can actually write to a map, despite SELinux gives
you only read permission.

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-23  7:39         ` Roberto Sassu
@ 2022-09-26 15:35           ` Lorenz Bauer
  2022-09-26 16:18             ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-26 15:35 UTC (permalink / raw)
  To: Roberto Sassu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf



On Fri, 23 Sep 2022, at 08:39, Roberto Sassu wrote:
>> Yes please, I have also have a WIP patch that seems to work, but I'm
>> curious what you came up with. Tests would also be great, mine are
>> kinda janky.
>
> Ok.

Hi Roberto,

Did you get around to putting your patches somewhere?

> If you have write access to /sys/fs/bpf/foo, it does not mean that you
> will have write access to the map, when you call OBJ_GET(). I don't
> know if you could add more modes after getting a fd.

Well, that's kind of how it works at the moment. Write on the file gives write on the FD, and BPF_F_RDONLY, etc. can be passed to OBJ_GET to change that. You're proposing to change that?
 
>> Ok, you're saying that a user can prevent the escalation via SELinux?
>
> Not only with SELinux, but with an eBPF program too (BPF LSM). What I
> wanted to do is to deny write access to anyone except the eBPF program
> that declares the map.

There is no "ownership" of a map as far as I can tell. How would you figure out which program declared the map?

Maybe it's best not to focus on SELinux / LSM too much: this stuff should work correctly out of the box, without needing workarounds from the user.

> I wrote an example here:
>
> https://lore.kernel.org/bpf/8d7a713e500b5e3fce93e4c5c7b8841eb6dd28e4.camel@huaweicloud.com/

That was very helpful. Yes, map iterators semantics are also broken, just like program fds and link fds. I started with maps since that seemed easier to tackle than everything all at once.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-22 23:21   ` Hao Luo
@ 2022-09-26 15:41     ` Lorenz Bauer
  2022-09-26 18:39       ` Hao Luo
  0 siblings, 1 reply; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-26 15:41 UTC (permalink / raw)
  To: Hao Luo
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf

On Fri, 23 Sep 2022, at 00:21, Hao Luo wrote:
> For this problem, I'm thinking of a fix purely in the verifier:
> passing along the file permission up to the site where the map is
> used.
>
> For maps that are not passed from outside, this permission is rw by default.
>
> For maps that are obtained from fdget(), we can do the following:
>
> - associate the env->used_maps with an array of permissions.
> - then extend bpf_reg_state and bpf_call_arg_meta to include the
> permission information.
> - then in record_func_map(), we can reject the program if the
> permission doesn't allow the map operation.
>
> Not the simplest solution, but this is the first solution that comes
> up in my head.

That's roughly what I have as well!

> For an idea mentioned in the summary,
>
>> In OBJ_GET, refuse a read-write fd if the fd passed to OBJ_PIN wasn't read-write.
>
> This sounds reasonable to me. Can we extend the object type referenced
> by inode to include the permission?

You're saying, add a layer of indirection? Instead of inode => bpf_map we have something like inode => bpf_perm => bpf_map.

I think this is less user friendly than refusing !rw pin, since we decouple what you can do with a pinned file from the state that is observable via ls. Put another way, there should be a way to introspect bpf_perm if we end up going this way.

I also think that this tries to plug the hole in the wrong place: it's not the caller of OBJ_GET that is escalating privs, is OBJ_PIN.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-26 15:35           ` Lorenz Bauer
@ 2022-09-26 16:18             ` Roberto Sassu
  2022-09-28  8:52               ` Lorenz Bauer
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-09-26 16:18 UTC (permalink / raw)
  To: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Mon, 2022-09-26 at 16:35 +0100, Lorenz Bauer wrote:
> 
> On Fri, 23 Sep 2022, at 08:39, Roberto Sassu wrote:
> > > Yes please, I have also have a WIP patch that seems to work, but
> > > I'm
> > > curious what you came up with. Tests would also be great, mine
> > > are
> > > kinda janky.
> > 
> > Ok.
> 
> Hi Roberto,
> 
> Did you get around to putting your patches somewhere?
> 
> > If you have write access to /sys/fs/bpf/foo, it does not mean that
> > you
> > will have write access to the map, when you call OBJ_GET(). I don't
> > know if you could add more modes after getting a fd.
> 
> Well, that's kind of how it works at the moment. Write on the file
> gives write on the FD, and BPF_F_RDONLY, etc. can be passed to
> OBJ_GET to change that. You're proposing to change that?

Could it be that file permissions and fd permissions from OBJ_GET() are
independent?

You could decide to ask for any permission with bpf_obj_get_opts(). By
default, it is read/write.

> > > Ok, you're saying that a user can prevent the escalation via
> > > SELinux?
> > 
> > Not only with SELinux, but with an eBPF program too (BPF LSM). What
> > I
> > wanted to do is to deny write access to anyone except the eBPF
> > program
> > that declares the map.
> 
> There is no "ownership" of a map as far as I can tell. How would you
> figure out which program declared the map?

At least in SELinux, the label of the map is the label of the process
creating it:

static int selinux_bpf_map_alloc(struct bpf_map *map)
{
[...]
	bpfsec->sid = current_sid();

Accessing a map requires a rule allowing a subject to access maps with
that label.

> Maybe it's best not to focus on SELinux / LSM too much: this stuff
> should work correctly out of the box, without needing workarounds
> from the user.

Uhm, if I get what you mean, you would like to add DAC controls to the
pinned map to decide if you can get a fd and with which modes.

The problem I see is that a map exists regardless of the pinned path
(just by ID). DAC information should be rather added to the map object
itself.

> > I wrote an example here:
> > 
> > https://lore.kernel.org/bpf/8d7a713e500b5e3fce93e4c5c7b8841eb6dd28e4.camel@huaweicloud.com/
> 
> That was very helpful. Yes, map iterators semantics are also broken,
> just like program fds and link fds. I started with maps since that
> seemed easier to tackle than everything all at once.

Great!

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-26 15:41     ` Lorenz Bauer
@ 2022-09-26 18:39       ` Hao Luo
  0 siblings, 0 replies; 45+ messages in thread
From: Hao Luo @ 2022-09-26 18:39 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf

On Mon, Sep 26, 2022 at 8:41 AM Lorenz Bauer <oss@lmb.io> wrote:
>
> On Fri, 23 Sep 2022, at 00:21, Hao Luo wrote:
[...]
>
> > For an idea mentioned in the summary,
> >
> >> In OBJ_GET, refuse a read-write fd if the fd passed to OBJ_PIN wasn't read-write.
> >
> > This sounds reasonable to me. Can we extend the object type referenced
> > by inode to include the permission?
>
> You're saying, add a layer of indirection? Instead of inode => bpf_map we have something like inode => bpf_perm => bpf_map.

Yes, that's what I mean.

> I think this is less user friendly than refusing !rw pin, since we decouple what you can do with a pinned file from the state that is observable via ls. Put another way, there should be a way to introspect bpf_perm if we end up going this way.
>
> I also think that this tries to plug the hole in the wrong place: it's not the caller of OBJ_GET that is escalating privs, is OBJ_PIN.

I actually think the problem is at OBJ_GET. I feel OBJ_GET is too
permissive, which can turn any pinned object into rw object.

I see what you meant by refusing !rw pin being more user-friendly. It
makes sense. We don't yet know how the introspection of bpf_perm will
look like. But on the other hand, I think introducing bpf_perm opens
up more potential use cases, because one can pin read-only objects and
allow others to get the object with less permissions. This is not
possible if refusing !rw pin in the first place.

I don't have strong opinions on the solution of this problem. Refusing
!rw pin is a simple solution, we can go for it and see how it works.
Thanks Lorenz for the ideas and discussions, let me know if there is
anything I can help with, this is interesting. :)

Hao

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-26 16:18             ` Roberto Sassu
@ 2022-09-28  8:52               ` Lorenz Bauer
  2022-09-28  9:42                 ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-28  8:52 UTC (permalink / raw)
  To: Roberto Sassu, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Mon, 26 Sep 2022, at 17:18, Roberto Sassu wrote:
>
> Uhm, if I get what you mean, you would like to add DAC controls to the
> pinned map to decide if you can get a fd and with which modes.
>
> The problem I see is that a map exists regardless of the pinned path
> (just by ID).

Can you spell this out for me? I imagine you're talking about MAP_GET_FD_BY_ID, but that is CAP_SYS_ADMIN only, right? Not great maybe, but no gaping hole IMO.

> DAC information should be rather added to the map object
> itself.

There is a form of DAC on the map, BPF_F_RDONLY_PROG and friends. You just can't stuff BPF_F_RDONLY in there since multiple fds may refer to the same map with different permissions.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-15 10:30 Closing the BPF map permission loophole Lorenz Bauer
                   ` (3 preceding siblings ...)
  2022-09-22 16:29 ` Lorenz Bauer
@ 2022-09-28  8:54 ` Lorenz Bauer
  2022-10-06  7:15   ` Roberto Sassu
  2022-10-27 16:54   ` Andrii Nakryiko
  4 siblings, 2 replies; 45+ messages in thread
From: Lorenz Bauer @ 2022-09-28  8:54 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf

On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote:
> Hi list,
>
> Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF 
> map permission loophole", with slides at [0].

I've put this topic on the agenda of the 2022-10-06 BPF office hours to get some maintainer attention. Details are here: https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit

Best

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-28  8:52               ` Lorenz Bauer
@ 2022-09-28  9:42                 ` Roberto Sassu
  2022-09-28 10:33                   ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-09-28  9:42 UTC (permalink / raw)
  To: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf, linux-security-module

On Wed, 2022-09-28 at 09:52 +0100, Lorenz Bauer wrote:
> On Mon, 26 Sep 2022, at 17:18, Roberto Sassu wrote:
> > Uhm, if I get what you mean, you would like to add DAC controls to
> > the
> > pinned map to decide if you can get a fd and with which modes.
> > 
> > The problem I see is that a map exists regardless of the pinned
> > path
> > (just by ID).
> 
> Can you spell this out for me? I imagine you're talking about
> MAP_GET_FD_BY_ID, but that is CAP_SYS_ADMIN only, right? Not great
> maybe, but no gaping hole IMO.

+linux-security-module ML (they could be interested in this topic as
well)

Good to know! I didn't realize it before.

I figured out better what you mean by escalating privileges.

Pin a read-only fd, get a read-write fd from the pinned path.

What you want to do is, if I pin a read-only fd, I should get read-only 
fds too, right?

I think here there could be different views. From my perspective,
pinning is just creating a new link to an existing object. Accessing
the link does not imply being able to access the object itself (the
same happens for files).

I understand what you want to achieve. If I have to choose a solution,
that would be doing something similar to files, i.e. add owner and mode
information to the bpf_map structure (m_uid, m_gid, m_mode). We could
add the MAP_CHMOD and MAP_CHOWN operations to the bpf() system call to
modify the new fields.

When you pin the map, the inode will get the owner and mode from
bpf_map. bpf_obj_get() will then do DAC-style verification similar to
MAC-style verification (with security_bpf_map()).

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-28  9:42                 ` Roberto Sassu
@ 2022-09-28 10:33                   ` Toke Høiland-Jørgensen
  2022-09-28 11:23                     ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Toke Høiland-Jørgensen @ 2022-09-28 10:33 UTC (permalink / raw)
  To: Roberto Sassu, Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf, linux-security-module

Roberto Sassu <roberto.sassu@huaweicloud.com> writes:

> On Wed, 2022-09-28 at 09:52 +0100, Lorenz Bauer wrote:
>> On Mon, 26 Sep 2022, at 17:18, Roberto Sassu wrote:
>> > Uhm, if I get what you mean, you would like to add DAC controls to
>> > the
>> > pinned map to decide if you can get a fd and with which modes.
>> > 
>> > The problem I see is that a map exists regardless of the pinned
>> > path
>> > (just by ID).
>> 
>> Can you spell this out for me? I imagine you're talking about
>> MAP_GET_FD_BY_ID, but that is CAP_SYS_ADMIN only, right? Not great
>> maybe, but no gaping hole IMO.
>
> +linux-security-module ML (they could be interested in this topic as
> well)
>
> Good to know! I didn't realize it before.
>
> I figured out better what you mean by escalating privileges.
>
> Pin a read-only fd, get a read-write fd from the pinned path.
>
> What you want to do is, if I pin a read-only fd, I should get read-only 
> fds too, right?
>
> I think here there could be different views. From my perspective,
> pinning is just creating a new link to an existing object. Accessing
> the link does not imply being able to access the object itself (the
> same happens for files).
>
> I understand what you want to achieve. If I have to choose a solution,
> that would be doing something similar to files, i.e. add owner and mode
> information to the bpf_map structure (m_uid, m_gid, m_mode). We could
> add the MAP_CHMOD and MAP_CHOWN operations to the bpf() system call to
> modify the new fields.
>
> When you pin the map, the inode will get the owner and mode from
> bpf_map. bpf_obj_get() will then do DAC-style verification similar to
> MAC-style verification (with security_bpf_map()).

As someone pointed out during the discussing at LPC, this will
effectively allow a user to create files owned by someone else, which is
probably not a good idea either from a security PoV. (I.e., user A pins
map owned by user B, so A creates a file owned by B).

-Toke

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-28 10:33                   ` Toke Høiland-Jørgensen
@ 2022-09-28 11:23                     ` Roberto Sassu
  2022-09-29  0:24                       ` Paul Moore
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-09-28 11:23 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf, linux-security-module

On Wed, 2022-09-28 at 12:33 +0200, Toke Høiland-Jørgensen wrote:
> Roberto Sassu <roberto.sassu@huaweicloud.com> writes:
> 
> > On Wed, 2022-09-28 at 09:52 +0100, Lorenz Bauer wrote:
> > > On Mon, 26 Sep 2022, at 17:18, Roberto Sassu wrote:
> > > > Uhm, if I get what you mean, you would like to add DAC controls
> > > > to
> > > > the
> > > > pinned map to decide if you can get a fd and with which modes.
> > > > 
> > > > The problem I see is that a map exists regardless of the pinned
> > > > path
> > > > (just by ID).
> > > 
> > > Can you spell this out for me? I imagine you're talking about
> > > MAP_GET_FD_BY_ID, but that is CAP_SYS_ADMIN only, right? Not
> > > great
> > > maybe, but no gaping hole IMO.
> > 
> > +linux-security-module ML (they could be interested in this topic
> > as
> > well)
> > 
> > Good to know! I didn't realize it before.
> > 
> > I figured out better what you mean by escalating privileges.
> > 
> > Pin a read-only fd, get a read-write fd from the pinned path.
> > 
> > What you want to do is, if I pin a read-only fd, I should get read-
> > only 
> > fds too, right?
> > 
> > I think here there could be different views. From my perspective,
> > pinning is just creating a new link to an existing object.
> > Accessing
> > the link does not imply being able to access the object itself (the
> > same happens for files).
> > 
> > I understand what you want to achieve. If I have to choose a
> > solution,
> > that would be doing something similar to files, i.e. add owner and
> > mode
> > information to the bpf_map structure (m_uid, m_gid, m_mode). We
> > could
> > add the MAP_CHMOD and MAP_CHOWN operations to the bpf() system call
> > to
> > modify the new fields.
> > 
> > When you pin the map, the inode will get the owner and mode from
> > bpf_map. bpf_obj_get() will then do DAC-style verification similar
> > to
> > MAC-style verification (with security_bpf_map()).
> 
> As someone pointed out during the discussing at LPC, this will
> effectively allow a user to create files owned by someone else, which
> is
> probably not a good idea either from a security PoV. (I.e., user A
> pins
> map owned by user B, so A creates a file owned by B).

Uhm, I see what you mean. Right, it is not a good idea, the owner of
the file should the one that pinned the map.

Other than that, DAC verification on the map would be still correct, as
it would be independent from the DAC verification of the file.

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-28 11:23                     ` Roberto Sassu
@ 2022-09-29  0:24                       ` Paul Moore
  2022-09-29  7:54                         ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Paul Moore @ 2022-09-29  0:24 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf,
	linux-security-module

On Wed, Sep 28, 2022 at 7:24 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote
> On Wed, 2022-09-28 at 12:33 +0200, Toke Høiland-Jørgensen wrote:
> > Roberto Sassu <roberto.sassu@huaweicloud.com> writes:
> >
> > > On Wed, 2022-09-28 at 09:52 +0100, Lorenz Bauer wrote:
> > > > On Mon, 26 Sep 2022, at 17:18, Roberto Sassu wrote:
> > > > > Uhm, if I get what you mean, you would like to add DAC controls
> > > > > to
> > > > > the
> > > > > pinned map to decide if you can get a fd and with which modes.
> > > > >
> > > > > The problem I see is that a map exists regardless of the pinned
> > > > > path
> > > > > (just by ID).
> > > >
> > > > Can you spell this out for me? I imagine you're talking about
> > > > MAP_GET_FD_BY_ID, but that is CAP_SYS_ADMIN only, right? Not
> > > > great
> > > > maybe, but no gaping hole IMO.
> > >
> > > +linux-security-module ML (they could be interested in this topic
> > > as
> > > well)
> > >
> > > Good to know! I didn't realize it before.
> > >
> > > I figured out better what you mean by escalating privileges.
> > >
> > > Pin a read-only fd, get a read-write fd from the pinned path.
> > >
> > > What you want to do is, if I pin a read-only fd, I should get read-
> > > only
> > > fds too, right?
> > >
> > > I think here there could be different views. From my perspective,
> > > pinning is just creating a new link to an existing object.
> > > Accessing
> > > the link does not imply being able to access the object itself (the
> > > same happens for files).
> > >
> > > I understand what you want to achieve. If I have to choose a
> > > solution,
> > > that would be doing something similar to files, i.e. add owner and
> > > mode
> > > information to the bpf_map structure (m_uid, m_gid, m_mode). We
> > > could
> > > add the MAP_CHMOD and MAP_CHOWN operations to the bpf() system call
> > > to
> > > modify the new fields.
> > >
> > > When you pin the map, the inode will get the owner and mode from
> > > bpf_map. bpf_obj_get() will then do DAC-style verification similar
> > > to
> > > MAC-style verification (with security_bpf_map()).
> >
> > As someone pointed out during the discussing at LPC, this will
> > effectively allow a user to create files owned by someone else, which
> > is
> > probably not a good idea either from a security PoV. (I.e., user A
> > pins
> > map owned by user B, so A creates a file owned by B).
>
> Uhm, I see what you mean. Right, it is not a good idea, the owner of
> the file should the one that pinned the map.
>
> Other than that, DAC verification on the map would be still correct, as
> it would be independent from the DAC verification of the file.

I only became aware of this when the LSM list was CC'd so I'm a little
behind on what is going on here ... looking quickly through the
mailing list archive it looks like there is an issue with BPF map
permissions not matching well with their associated fd permissions,
yes?  From a LSM perspective, there are a couple of hooks that
currently use the fd's permissions (read/write) to determine the
appropriate access control check.

Is the plan to ensure that the map and fd permissions are correct at
the core BPF level, or do we need to do some additional checks in the
LSMs (currently only SELinux)?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-29  0:24                       ` Paul Moore
@ 2022-09-29  7:54                         ` Roberto Sassu
  2022-09-29 15:27                           ` Casey Schaufler
  2022-09-29 22:30                           ` Paul Moore
  0 siblings, 2 replies; 45+ messages in thread
From: Roberto Sassu @ 2022-09-29  7:54 UTC (permalink / raw)
  To: Paul Moore
  Cc: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf,
	linux-security-module

On Wed, 2022-09-28 at 20:24 -0400, Paul Moore wrote:
> On Wed, Sep 28, 2022 at 7:24 AM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote
> > On Wed, 2022-09-28 at 12:33 +0200, Toke Høiland-Jørgensen wrote:
> > > Roberto Sassu <roberto.sassu@huaweicloud.com> writes:
> > > 
> > > > On Wed, 2022-09-28 at 09:52 +0100, Lorenz Bauer wrote:
> > > > > On Mon, 26 Sep 2022, at 17:18, Roberto Sassu wrote:
> > > > > > Uhm, if I get what you mean, you would like to add DAC
> > > > > > controls
> > > > > > to
> > > > > > the
> > > > > > pinned map to decide if you can get a fd and with which
> > > > > > modes.
> > > > > > 
> > > > > > The problem I see is that a map exists regardless of the
> > > > > > pinned
> > > > > > path
> > > > > > (just by ID).
> > > > > 
> > > > > Can you spell this out for me? I imagine you're talking about
> > > > > MAP_GET_FD_BY_ID, but that is CAP_SYS_ADMIN only, right? Not
> > > > > great
> > > > > maybe, but no gaping hole IMO.
> > > > 
> > > > +linux-security-module ML (they could be interested in this
> > > > topic
> > > > as
> > > > well)
> > > > 
> > > > Good to know! I didn't realize it before.
> > > > 
> > > > I figured out better what you mean by escalating privileges.
> > > > 
> > > > Pin a read-only fd, get a read-write fd from the pinned path.
> > > > 
> > > > What you want to do is, if I pin a read-only fd, I should get
> > > > read-
> > > > only
> > > > fds too, right?
> > > > 
> > > > I think here there could be different views. From my
> > > > perspective,
> > > > pinning is just creating a new link to an existing object.
> > > > Accessing
> > > > the link does not imply being able to access the object itself
> > > > (the
> > > > same happens for files).
> > > > 
> > > > I understand what you want to achieve. If I have to choose a
> > > > solution,
> > > > that would be doing something similar to files, i.e. add owner
> > > > and
> > > > mode
> > > > information to the bpf_map structure (m_uid, m_gid, m_mode). We
> > > > could
> > > > add the MAP_CHMOD and MAP_CHOWN operations to the bpf() system
> > > > call
> > > > to
> > > > modify the new fields.
> > > > 
> > > > When you pin the map, the inode will get the owner and mode
> > > > from
> > > > bpf_map. bpf_obj_get() will then do DAC-style verification
> > > > similar
> > > > to
> > > > MAC-style verification (with security_bpf_map()).
> > > 
> > > As someone pointed out during the discussing at LPC, this will
> > > effectively allow a user to create files owned by someone else,
> > > which
> > > is
> > > probably not a good idea either from a security PoV. (I.e., user
> > > A
> > > pins
> > > map owned by user B, so A creates a file owned by B).
> > 
> > Uhm, I see what you mean. Right, it is not a good idea, the owner
> > of
> > the file should the one that pinned the map.
> > 
> > Other than that, DAC verification on the map would be still
> > correct, as
> > it would be independent from the DAC verification of the file.
> 
> I only became aware of this when the LSM list was CC'd so I'm a
> little
> behind on what is going on here ... looking quickly through the
> mailing list archive it looks like there is an issue with BPF map
> permissions not matching well with their associated fd permissions,
> yes?  From a LSM perspective, there are a couple of hooks that
> currently use the fd's permissions (read/write) to determine the
> appropriate access control check.

From what I understood, access control on maps is done in two steps.
First, whenever someone attempts to get a fd to a map
security_bpf_map() is called. LSM implementations could check access if
the current process has the right to access the map (whose label can be
assigned at map creation time with security_bpf_map_alloc()).

Second, whenever the holder of the obtained fd wants to do an operation
on the map (lookup, update, delete, ...), eBPF checks if the fd modes
are compatible with the operation to perform (e.g. lookup requires
FMODE_CAN_READ).

One problem is that the second part is missing for some operations
dealing with the map fd:

Map iterators:
https://lore.kernel.org/bpf/20220906170301.256206-1-roberto.sassu@huaweicloud.com/

Map fd directly used by eBPF programs without system call:
https://lore.kernel.org/bpf/20220926154430.1552800-1-roberto.sassu@huaweicloud.com/

Another problem is that there is no DAC, only MAC (work in progress). I
don't know exactly the status of enabling unprivileged eBPF.

Apart from this, now the discussion is focusing on the following
problem. A map (kernel object) can be referenced in two ways: by ID or
by path. By ID requires CAP_ADMIN, so we can consider by path for now.

Given a map fd, the holder of that fd can create a new reference
(pinning) to the map in the bpf filesystem (a new file whose private
data contains the address of the kernel object).

Pinning a map does not have a corresponding permission. Any fd mode is
sufficient to do the operation. Furthermore, subsequent requests to
obtain a map fd by path could result in receiving a read-write fd,
while at the time of pinning the fd was read-only.

While this does not seem to me a concern from MAC perspective, as
attempts to get a map fd still have to pass through security_bpf_map(),
in general this should be fixed without relying on LSMs.

> Is the plan to ensure that the map and fd permissions are correct at
> the core BPF level, or do we need to do some additional checks in the
> LSMs (currently only SELinux)?

Should we add a new map_pin permission in SELinux?

Should we have DAC to restrict pinnning without LSMs?

Thanks

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-29  7:54                         ` Roberto Sassu
@ 2022-09-29 15:27                           ` Casey Schaufler
  2022-09-30  7:42                             ` Roberto Sassu
  2022-09-29 22:30                           ` Paul Moore
  1 sibling, 1 reply; 45+ messages in thread
From: Casey Schaufler @ 2022-09-29 15:27 UTC (permalink / raw)
  To: Roberto Sassu, Paul Moore
  Cc: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf,
	linux-security-module, casey

On 9/29/2022 12:54 AM, Roberto Sassu wrote:
> On Wed, 2022-09-28 at 20:24 -0400, Paul Moore wrote:
>> On Wed, Sep 28, 2022 at 7:24 AM Roberto Sassu
>> <roberto.sassu@huaweicloud.com> wrote
>>> On Wed, 2022-09-28 at 12:33 +0200, Toke Høiland-Jørgensen wrote:
>>>> Roberto Sassu <roberto.sassu@huaweicloud.com> writes:
>>>>
>>>>> On Wed, 2022-09-28 at 09:52 +0100, Lorenz Bauer wrote:
>>>>>> On Mon, 26 Sep 2022, at 17:18, Roberto Sassu wrote:
>>>>>>> Uhm, if I get what you mean, you would like to add DAC
>>>>>>> controls
>>>>>>> to
>>>>>>> the
>>>>>>> pinned map to decide if you can get a fd and with which
>>>>>>> modes.
>>>>>>>
>>>>>>> The problem I see is that a map exists regardless of the
>>>>>>> pinned
>>>>>>> path
>>>>>>> (just by ID).
>>>>>> Can you spell this out for me? I imagine you're talking about
>>>>>> MAP_GET_FD_BY_ID, but that is CAP_SYS_ADMIN only, right? Not
>>>>>> great
>>>>>> maybe, but no gaping hole IMO.
>>>>> +linux-security-module ML (they could be interested in this
>>>>> topic
>>>>> as
>>>>> well)
>>>>>
>>>>> Good to know! I didn't realize it before.
>>>>>
>>>>> I figured out better what you mean by escalating privileges.
>>>>>
>>>>> Pin a read-only fd, get a read-write fd from the pinned path.
>>>>>
>>>>> What you want to do is, if I pin a read-only fd, I should get
>>>>> read-
>>>>> only
>>>>> fds too, right?
>>>>>
>>>>> I think here there could be different views. From my
>>>>> perspective,
>>>>> pinning is just creating a new link to an existing object.
>>>>> Accessing
>>>>> the link does not imply being able to access the object itself
>>>>> (the
>>>>> same happens for files).
>>>>>
>>>>> I understand what you want to achieve. If I have to choose a
>>>>> solution,
>>>>> that would be doing something similar to files, i.e. add owner
>>>>> and
>>>>> mode
>>>>> information to the bpf_map structure (m_uid, m_gid, m_mode). We
>>>>> could
>>>>> add the MAP_CHMOD and MAP_CHOWN operations to the bpf() system
>>>>> call
>>>>> to
>>>>> modify the new fields.
>>>>>
>>>>> When you pin the map, the inode will get the owner and mode
>>>>> from
>>>>> bpf_map. bpf_obj_get() will then do DAC-style verification
>>>>> similar
>>>>> to
>>>>> MAC-style verification (with security_bpf_map()).
>>>> As someone pointed out during the discussing at LPC, this will
>>>> effectively allow a user to create files owned by someone else,
>>>> which
>>>> is
>>>> probably not a good idea either from a security PoV. (I.e., user
>>>> A
>>>> pins
>>>> map owned by user B, so A creates a file owned by B).
>>> Uhm, I see what you mean. Right, it is not a good idea, the owner
>>> of
>>> the file should the one that pinned the map.
>>>
>>> Other than that, DAC verification on the map would be still
>>> correct, as
>>> it would be independent from the DAC verification of the file.
>> I only became aware of this when the LSM list was CC'd so I'm a
>> little
>> behind on what is going on here ... looking quickly through the
>> mailing list archive it looks like there is an issue with BPF map
>> permissions not matching well with their associated fd permissions,
>> yes?  From a LSM perspective, there are a couple of hooks that
>> currently use the fd's permissions (read/write) to determine the
>> appropriate access control check.
> >From what I understood, access control on maps is done in two steps.
> First, whenever someone attempts to get a fd to a map
> security_bpf_map() is called. LSM implementations could check access if
> the current process has the right to access the map (whose label can be
> assigned at map creation time with security_bpf_map_alloc()).
>
> Second, whenever the holder of the obtained fd wants to do an operation
> on the map (lookup, update, delete, ...), eBPF checks if the fd modes
> are compatible with the operation to perform (e.g. lookup requires
> FMODE_CAN_READ).
>
> One problem is that the second part is missing for some operations
> dealing with the map fd:
>
> Map iterators:
> https://lore.kernel.org/bpf/20220906170301.256206-1-roberto.sassu@huaweicloud.com/
>
> Map fd directly used by eBPF programs without system call:
> https://lore.kernel.org/bpf/20220926154430.1552800-1-roberto.sassu@huaweicloud.com/
>
> Another problem is that there is no DAC, only MAC (work in progress). I
> don't know exactly the status of enabling unprivileged eBPF.
>
> Apart from this, now the discussion is focusing on the following
> problem. A map (kernel object) can be referenced in two ways: by ID or
> by path. By ID requires CAP_ADMIN, so we can consider by path for now.
>
> Given a map fd, the holder of that fd can create a new reference
> (pinning) to the map in the bpf filesystem (a new file whose private
> data contains the address of the kernel object).
>
> Pinning a map does not have a corresponding permission. Any fd mode is
> sufficient to do the operation. Furthermore, subsequent requests to
> obtain a map fd by path could result in receiving a read-write fd,
> while at the time of pinning the fd was read-only.
>
> While this does not seem to me a concern from MAC perspective, as
> attempts to get a map fd still have to pass through security_bpf_map(),
> in general this should be fixed without relying on LSMs.
>
>> Is the plan to ensure that the map and fd permissions are correct at
>> the core BPF level, or do we need to do some additional checks in the
>> LSMs (currently only SELinux)?
> Should we add a new map_pin permission in SELinux?
>
> Should we have DAC to restrict pinnning without LSMs?

As you've hinted above, DAC hasn't been an issue because there isn't
unprivileged eBPF. Even with privileged eBPF I expect that there are
going to be cases where not having DAC controls will surprise someone.
The less BPF looks like low level kernel internals and the more it looks
like general userspace code, the more likely this is to be an issue.

Or ...

If you are treating maps as kernel internal data structures you don't
need DAC. If you are treating them as user accessible named objects you
do need DAC. Security modules that implement MAC may chose to control
kernel internal data access (e.g. SElinux) in addition to named objects,
so you may want to accommodate that as well. If you do decide that maps
are named objects Smack (and possibly AppArmor) needs significant work.
Probably audit and IMA, too.

>
> Thanks
>
> Roberto
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-29  7:54                         ` Roberto Sassu
  2022-09-29 15:27                           ` Casey Schaufler
@ 2022-09-29 22:30                           ` Paul Moore
  2022-09-30  9:56                             ` Roberto Sassu
  1 sibling, 1 reply; 45+ messages in thread
From: Paul Moore @ 2022-09-29 22:30 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf,
	linux-security-module

On Thu, Sep 29, 2022 at 3:55 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
> On Wed, 2022-09-28 at 20:24 -0400, Paul Moore wrote:
> > I only became aware of this when the LSM list was CC'd so I'm a
> > little
> > behind on what is going on here ... looking quickly through the
> > mailing list archive it looks like there is an issue with BPF map
> > permissions not matching well with their associated fd permissions,
> > yes?  From a LSM perspective, there are a couple of hooks that
> > currently use the fd's permissions (read/write) to determine the
> > appropriate access control check.
>
> From what I understood, access control on maps is done in two steps.
> First, whenever someone attempts to get a fd to a map
> security_bpf_map() is called. LSM implementations could check access if
> the current process has the right to access the map (whose label can be
> assigned at map creation time with security_bpf_map_alloc()).

[NOTE: SELinux is currently the only LSM which provides BPF access
controls, so they are going to be my example here and in the rest of
this email.]

In the case of SELinux, security_bpf_map() does check the access
between the current task and the BPF map itself (which inherits its
security label from its creator), with the actual permission requested
being determined by the fmode_t parameter passed to the LSM hook.
Looking at the current BPF code, the callers seem to take that from
various different places (bpf_attr:{file_flags,map_flags,open_flags}).
This could be due solely to the different operations being done by the
callers, which would make me believe everything is correct, but given
this thread it seems reasonable for someone with a better
understanding of BPF than me to double check this.  Can you help
verify that everything is okay here?

> Second, whenever the holder of the obtained fd wants to do an operation
> on the map (lookup, update, delete, ...), eBPF checks if the fd modes
> are compatible with the operation to perform (e.g. lookup requires
> FMODE_CAN_READ).

To be clear, from what I can see, it looks like the LSM is not
checking the fd modes, but rather the modes stored in the bpf_attr,
which I get the impression do not always match the fd modes.  Yes?
No?

There is also LSM/SELinux code which checks the permissions when a BPF
map is passed between tasks via a fd.  Currently the SELinux check
only looks at the file:f_mode to get the permissions to check, but if
the f_mode bits are not the authoritative record of what is allowed in
the BPF map, perhaps we need to change that to use one of the bpf_attr
mode bits (my gut feeling is bpf_attr:open_flags)?

> One problem is that the second part is missing for some operations
> dealing with the map fd:
>
> Map iterators:
> https://lore.kernel.org/bpf/20220906170301.256206-1-roberto.sassu@huaweicloud.com/

You'll need to treat me like an idiot when it comes to BPF maps ;)

I did a very quick read on them right now and it looks like a BPF map
iterator would just be a combination of BPF read and execute ("bpf {
map_read prog_run }" in SELinux policy terms).  Would it make more
sense to just use the existing security_bpf_map() and
security_bpf_prog() hooks here?

> Map fd directly used by eBPF programs without system call:
> https://lore.kernel.org/bpf/20220926154430.1552800-1-roberto.sassu@huaweicloud.com/

Another instance of "can you please explain this use case?" ;)

I'm not going to hazard too much of a guess here, but if the map is
being passed between tasks and a fd is generated from that map, we may
be able to cover this with logic similar
security/selinux/hooks.c:bpf_fd_pass() ... but I'm really stretching
my weak understanding of BPF here.

> Another problem is that there is no DAC, only MAC (work in progress). I
> don't know exactly the status of enabling unprivileged eBPF.

It is my opinion that we need to ensure both DAC and MAC are present
in the code.  This thread makes me worry that some eBPF DAC controls
are being ignored because one can currently say "we're okay because
you need privilege!".  That may be true today, but I can imagine a
time in the not too distant future where unpriv eBPF is allowed and we
suddenly have to bolt on a lot of capable() checks ... which is a
great recipe for privilege escalation bugs.

> Apart from this, now the discussion is focusing on the following
> problem. A map (kernel object) can be referenced in two ways: by ID or
> by path. By ID requires CAP_ADMIN, so we can consider by path for now.
>
> Given a map fd, the holder of that fd can create a new reference
> (pinning) to the map in the bpf filesystem (a new file whose private
> data contains the address of the kernel object).
>
> Pinning a map does not have a corresponding permission. Any fd mode is
> sufficient to do the operation. Furthermore, subsequent requests to
> obtain a map fd by path could result in receiving a read-write fd,
> while at the time of pinning the fd was read-only.

Since the maps carry their own label I think we are mostly okay, even
if the map is passed between tasks by some non-fd related mechanism.
However, I am now slightly worried that if a fd is obtained with perms
that don't match the underlying map and that fd is then passed to
another task the access control check on the fd passing would not be
correct.  Operations on the map from a SELinux perspective should
still be okay (the map has its own label), but still.

I'm wondering if we do want to move the SELinux BPF fd passing code to
check the bpf_attr:open_flags perms.  Thoughts?

> While this does not seem to me a concern from MAC perspective, as
> attempts to get a map fd still have to pass through security_bpf_map(),
> in general this should be fixed without relying on LSMs.

Agreed.  The access controls need to work both for DAC and DAC+LSM.

> > Is the plan to ensure that the map and fd permissions are correct at
> > the core BPF level, or do we need to do some additional checks in the
> > LSMs (currently only SELinux)?
>
> Should we add a new map_pin permission in SELinux?

Maybe?  Maybe not?  I don't know, can you help me understand map
pinning a bit more first?

> Should we have DAC to restrict pinnning without LSMs?

Similar to above.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-29 15:27                           ` Casey Schaufler
@ 2022-09-30  7:42                             ` Roberto Sassu
  0 siblings, 0 replies; 45+ messages in thread
From: Roberto Sassu @ 2022-09-30  7:42 UTC (permalink / raw)
  To: Casey Schaufler, Paul Moore
  Cc: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf,
	linux-security-module

On Thu, 2022-09-29 at 08:27 -0700, Casey Schaufler wrote:
> On 9/29/2022 12:54 AM, Roberto Sassu wrote:
> > On Wed, 2022-09-28 at 20:24 -0400, Paul Moore wrote:
> > > On Wed, Sep 28, 2022 at 7:24 AM Roberto Sassu
> > > <roberto.sassu@huaweicloud.com> wrote
> > > > On Wed, 2022-09-28 at 12:33 +0200, Toke Høiland-Jørgensen
> > > > wrote:
> > > > > Roberto Sassu <roberto.sassu@huaweicloud.com> writes:
> > > > > 
> > > > > > On Wed, 2022-09-28 at 09:52 +0100, Lorenz Bauer wrote:
> > > > > > > On Mon, 26 Sep 2022, at 17:18, Roberto Sassu wrote:
> > > > > > > > Uhm, if I get what you mean, you would like to add DAC
> > > > > > > > controls
> > > > > > > > to
> > > > > > > > the
> > > > > > > > pinned map to decide if you can get a fd and with which
> > > > > > > > modes.
> > > > > > > > 
> > > > > > > > The problem I see is that a map exists regardless of
> > > > > > > > the
> > > > > > > > pinned
> > > > > > > > path
> > > > > > > > (just by ID).
> > > > > > > Can you spell this out for me? I imagine you're talking
> > > > > > > about
> > > > > > > MAP_GET_FD_BY_ID, but that is CAP_SYS_ADMIN only, right?
> > > > > > > Not
> > > > > > > great
> > > > > > > maybe, but no gaping hole IMO.
> > > > > > +linux-security-module ML (they could be interested in this
> > > > > > topic
> > > > > > as
> > > > > > well)
> > > > > > 
> > > > > > Good to know! I didn't realize it before.
> > > > > > 
> > > > > > I figured out better what you mean by escalating
> > > > > > privileges.
> > > > > > 
> > > > > > Pin a read-only fd, get a read-write fd from the pinned
> > > > > > path.
> > > > > > 
> > > > > > What you want to do is, if I pin a read-only fd, I should
> > > > > > get
> > > > > > read-
> > > > > > only
> > > > > > fds too, right?
> > > > > > 
> > > > > > I think here there could be different views. From my
> > > > > > perspective,
> > > > > > pinning is just creating a new link to an existing object.
> > > > > > Accessing
> > > > > > the link does not imply being able to access the object
> > > > > > itself
> > > > > > (the
> > > > > > same happens for files).
> > > > > > 
> > > > > > I understand what you want to achieve. If I have to choose
> > > > > > a
> > > > > > solution,
> > > > > > that would be doing something similar to files, i.e. add
> > > > > > owner
> > > > > > and
> > > > > > mode
> > > > > > information to the bpf_map structure (m_uid, m_gid,
> > > > > > m_mode). We
> > > > > > could
> > > > > > add the MAP_CHMOD and MAP_CHOWN operations to the bpf()
> > > > > > system
> > > > > > call
> > > > > > to
> > > > > > modify the new fields.
> > > > > > 
> > > > > > When you pin the map, the inode will get the owner and mode
> > > > > > from
> > > > > > bpf_map. bpf_obj_get() will then do DAC-style verification
> > > > > > similar
> > > > > > to
> > > > > > MAC-style verification (with security_bpf_map()).
> > > > > As someone pointed out during the discussing at LPC, this
> > > > > will
> > > > > effectively allow a user to create files owned by someone
> > > > > else,
> > > > > which
> > > > > is
> > > > > probably not a good idea either from a security PoV. (I.e.,
> > > > > user
> > > > > A
> > > > > pins
> > > > > map owned by user B, so A creates a file owned by B).
> > > > Uhm, I see what you mean. Right, it is not a good idea, the
> > > > owner
> > > > of
> > > > the file should the one that pinned the map.
> > > > 
> > > > Other than that, DAC verification on the map would be still
> > > > correct, as
> > > > it would be independent from the DAC verification of the file.
> > > I only became aware of this when the LSM list was CC'd so I'm a
> > > little
> > > behind on what is going on here ... looking quickly through the
> > > mailing list archive it looks like there is an issue with BPF map
> > > permissions not matching well with their associated fd
> > > permissions,
> > > yes?  From a LSM perspective, there are a couple of hooks that
> > > currently use the fd's permissions (read/write) to determine the
> > > appropriate access control check.
> > > From what I understood, access control on maps is done in two
> > > steps.
> > First, whenever someone attempts to get a fd to a map
> > security_bpf_map() is called. LSM implementations could check
> > access if
> > the current process has the right to access the map (whose label
> > can be
> > assigned at map creation time with security_bpf_map_alloc()).
> > 
> > Second, whenever the holder of the obtained fd wants to do an
> > operation
> > on the map (lookup, update, delete, ...), eBPF checks if the fd
> > modes
> > are compatible with the operation to perform (e.g. lookup requires
> > FMODE_CAN_READ).
> > 
> > One problem is that the second part is missing for some operations
> > dealing with the map fd:
> > 
> > Map iterators:
> > https://lore.kernel.org/bpf/20220906170301.256206-1-roberto.sassu@huaweicloud.com/
> > 
> > Map fd directly used by eBPF programs without system call:
> > https://lore.kernel.org/bpf/20220926154430.1552800-1-roberto.sassu@huaweicloud.com/
> > 
> > Another problem is that there is no DAC, only MAC (work in
> > progress). I
> > don't know exactly the status of enabling unprivileged eBPF.
> > 
> > Apart from this, now the discussion is focusing on the following
> > problem. A map (kernel object) can be referenced in two ways: by ID
> > or
> > by path. By ID requires CAP_ADMIN, so we can consider by path for
> > now.
> > 
> > Given a map fd, the holder of that fd can create a new reference
> > (pinning) to the map in the bpf filesystem (a new file whose
> > private
> > data contains the address of the kernel object).
> > 
> > Pinning a map does not have a corresponding permission. Any fd mode
> > is
> > sufficient to do the operation. Furthermore, subsequent requests to
> > obtain a map fd by path could result in receiving a read-write fd,
> > while at the time of pinning the fd was read-only.
> > 
> > While this does not seem to me a concern from MAC perspective, as
> > attempts to get a map fd still have to pass through
> > security_bpf_map(),
> > in general this should be fixed without relying on LSMs.
> > 
> > > Is the plan to ensure that the map and fd permissions are correct
> > > at
> > > the core BPF level, or do we need to do some additional checks in
> > > the
> > > LSMs (currently only SELinux)?
> > Should we add a new map_pin permission in SELinux?
> > 
> > Should we have DAC to restrict pinnning without LSMs?
> 
> As you've hinted above, DAC hasn't been an issue because there isn't
> unprivileged eBPF. Even with privileged eBPF I expect that there are
> going to be cases where not having DAC controls will surprise
> someone.
> The less BPF looks like low level kernel internals and the more it
> looks
> like general userspace code, the more likely this is to be an issue.
> 
> Or ...
> 
> If you are treating maps as kernel internal data structures you don't
> need DAC. If you are treating them as user accessible named objects
> you
> do need DAC. Security modules that implement MAC may chose to control
> kernel internal data access (e.g. SElinux) in addition to named
> objects,
> so you may want to accommodate that as well. If you do decide that
> maps
> are named objects Smack (and possibly AppArmor) needs significant
> work.
> Probably audit and IMA, too.

To me a map seems more than just a kernel object. User space gets a
reference to it through a fd, similarly to files. It seems a named
object because a map can be accessed by user space through a path
(pinned map) or by ID (user space can provide the ID to get a map fd
through a system call).

I'm more familiar with SELinux and Smack. But following what has been
done for SELinux, doing the same for Smack seems straightforward (even
easier, as you have more simple permissions).

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-29 22:30                           ` Paul Moore
@ 2022-09-30  9:56                             ` Roberto Sassu
  2022-09-30 20:43                               ` Paul Moore
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-09-30  9:56 UTC (permalink / raw)
  To: Paul Moore
  Cc: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf,
	linux-security-module

On Thu, 2022-09-29 at 18:30 -0400, Paul Moore wrote:
> On Thu, Sep 29, 2022 at 3:55 AM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
> > On Wed, 2022-09-28 at 20:24 -0400, Paul Moore wrote:
> > > I only became aware of this when the LSM list was CC'd so I'm a
> > > little
> > > behind on what is going on here ... looking quickly through the
> > > mailing list archive it looks like there is an issue with BPF map
> > > permissions not matching well with their associated fd
> > > permissions,
> > > yes?  From a LSM perspective, there are a couple of hooks that
> > > currently use the fd's permissions (read/write) to determine the
> > > appropriate access control check.
> > 
> > From what I understood, access control on maps is done in two
> > steps.
> > First, whenever someone attempts to get a fd to a map
> > security_bpf_map() is called. LSM implementations could check
> > access if
> > the current process has the right to access the map (whose label
> > can be
> > assigned at map creation time with security_bpf_map_alloc()).
> 
> [NOTE: SELinux is currently the only LSM which provides BPF access
> controls, so they are going to be my example here and in the rest of
> this email.]
> 
> In the case of SELinux, security_bpf_map() does check the access
> between the current task and the BPF map itself (which inherits its
> security label from its creator), with the actual permission
> requested
> being determined by the fmode_t parameter passed to the LSM hook.
> Looking at the current BPF code, the callers seem to take that from
> various different places
> (bpf_attr:{file_flags,map_flags,open_flags}).
> This could be due solely to the different operations being done by
> the
> callers, which would make me believe everything is correct, but given
> this thread it seems reasonable for someone with a better
> understanding of BPF than me to double check this.  Can you help
> verify that everything is okay here?

I also don't have concerns on this part. eBPF maintainers can help to
confirm.

> Second, whenever the holder of the obtained fd wants to do an
> > operation
> > on the map (lookup, update, delete, ...), eBPF checks if the fd
> > modes
> > are compatible with the operation to perform (e.g. lookup requires
> > FMODE_CAN_READ).
> 
> To be clear, from what I can see, it looks like the LSM is not
> checking the fd modes, but rather the modes stored in the bpf_attr,
> which I get the impression do not always match the fd modes.  Yes?
> No?

Correct, there is no revalidation. Fd modes represent what LSMs granted
to the fd holder. eBPF allows operations on the object behind the fd
depending on the stored fd modes (for maps, not sure about the other
object types).

> There is also LSM/SELinux code which checks the permissions when a
> BPF
> map is passed between tasks via a fd.  Currently the SELinux check
> only looks at the file:f_mode to get the permissions to check, but if
> the f_mode bits are not the authoritative record of what is allowed
> in
> the BPF map, perhaps we need to change that to use one of the
> bpf_attr
> mode bits (my gut feeling is bpf_attr:open_flags)?
> 
> > One problem is that the second part is missing for some operations
> > dealing with the map fd:
> > 
> > Map iterators:
> > https://lore.kernel.org/bpf/20220906170301.256206-1-roberto.sassu@huaweicloud.com/
> 
> You'll need to treat me like an idiot when it comes to BPF maps ;)

Argh, sorry!

> I did a very quick read on them right now and it looks like a BPF map
> iterator would just be a combination of BPF read and execute ("bpf {
> map_read prog_run }" in SELinux policy terms).  Would it make more
> sense to just use the existing security_bpf_map() and
> security_bpf_prog() hooks here?

If I can make an example, if you have a linked list, a map iterator is
like calling a function for each element of the list, allowing the
function to read and write the element.

For now I didn't think about programs. Lorenz, which reported this
issue in the first place, is planning on rethinking access control on
all eBPF objects.

When you create a map iterator, you pass the program you want to run
and a map reference to bpftool (user space). bpftool first retrieves
the fds for the program and the map (thus, security_bpf_prog() and
security_bpf_map() are called). The fds are then passed to the kernel
with another system call to create the map iterator. The map iterator
is a named kernel object, with a corresponding file in the bpf
filesystem. Reading that file causes the iteration to start, by running
the program for each map element.

The only missing part is checking the fd modes granted by LSMs at the
time the map iterator is created. If the map iterator allows read and
write, both FMODE_CAN_READ and FMODE_CAN_WRITE should be set.

> > Map fd directly used by eBPF programs without system call:
> > https://lore.kernel.org/bpf/20220926154430.1552800-1-roberto.sassu@huaweicloud.com/
> 
> Another instance of "can you please explain this use case?" ;)

Lorenz discovered that. Despite LSMs check which permissions user space
requested for a map fd, there is no corresponding check of the fd
modes. Normally, a map fd is passed in a system call to perform a map
operation. In this case, it is not. It is set in the program code, and
eBPF transforms the map fd into a map address suitable to be passed to
kernel helpers which directly access the kernel object.

> I'm not going to hazard too much of a guess here, but if the map is
> being passed between tasks and a fd is generated from that map, we
> may
> be able to cover this with logic similar
> security/selinux/hooks.c:bpf_fd_pass() ... but I'm really stretching
> my weak understanding of BPF here.
> 
> > Another problem is that there is no DAC, only MAC (work in
> > progress). I
> > don't know exactly the status of enabling unprivileged eBPF.
> 
> It is my opinion that we need to ensure both DAC and MAC are present
> in the code.  This thread makes me worry that some eBPF DAC controls
> are being ignored because one can currently say "we're okay because
> you need privilege!".  That may be true today, but I can imagine a
> time in the not too distant future where unpriv eBPF is allowed and
> we
> suddenly have to bolt on a lot of capable() checks ... which is a
> great recipe for privilege escalation bugs.
> 
> > Apart from this, now the discussion is focusing on the following
> > problem. A map (kernel object) can be referenced in two ways: by ID
> > or
> > by path. By ID requires CAP_ADMIN, so we can consider by path for
> > now.
> > 
> > Given a map fd, the holder of that fd can create a new reference
> > (pinning) to the map in the bpf filesystem (a new file whose
> > private
> > data contains the address of the kernel object).
> > 
> > Pinning a map does not have a corresponding permission. Any fd mode
> > is
> > sufficient to do the operation. Furthermore, subsequent requests to
> > obtain a map fd by path could result in receiving a read-write fd,
> > while at the time of pinning the fd was read-only.
> 
> Since the maps carry their own label I think we are mostly okay, even
> if the map is passed between tasks by some non-fd related mechanism.
> However, I am now slightly worried that if a fd is obtained with
> perms
> that don't match the underlying map and that fd is then passed to
> another task the access control check on the fd passing would not be
> correct.  Operations on the map from a SELinux perspective should
> still be okay (the map has its own label), but still.
> 
> I'm wondering if we do want to move the SELinux BPF fd passing code
> to
> check the bpf_attr:open_flags perms.  Thoughts?

Seems correct as it is, but I'm not completely familiar with this work.
As long as you ensure that the fd holder has the right to access the
map, then it will be still responsibility of eBPF to just check the fd
modes.

> While this does not seem to me a concern from MAC perspective, as
> > attempts to get a map fd still have to pass through
> > security_bpf_map(),
> > in general this should be fixed without relying on LSMs.
> 
> Agreed.  The access controls need to work both for DAC and DAC+LSM.

@all, what do you think?

> Is the plan to ensure that the map and fd permissions are correct
> > > at
> > > the core BPF level, or do we need to do some additional checks in
> > > the
> > > LSMs (currently only SELinux)?
> > 
> > Should we add a new map_pin permission in SELinux?
> 
> Maybe?  Maybe not?  I don't know, can you help me understand map
> pinning a bit more first?

I'm not completely sure that this is correct. Pinning a map seems to me
like creating a new dentry for the inode.

> Should we have DAC to restrict pinnning without LSMs?
> 
> Similar to above.

If we had DAC, even without restricting pinning, fds have to be
obtained if the DAC check passed (if we don't want to rely exclusively

on MAC). Even if pinning was done with a read-only fd, a new read-write fd can be obtained if the owner allowed you to do so.

Restricting pinning might be risky to break compatibility with existing
applications.

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-30  9:56                             ` Roberto Sassu
@ 2022-09-30 20:43                               ` Paul Moore
  2022-10-04  8:03                                 ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Paul Moore @ 2022-09-30 20:43 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf,
	linux-security-module

On Fri, Sep 30, 2022 at 5:57 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
> On Thu, 2022-09-29 at 18:30 -0400, Paul Moore wrote:
> > On Thu, Sep 29, 2022 at 3:55 AM Roberto Sassu
> > <roberto.sassu@huaweicloud.com> wrote:
> > > On Wed, 2022-09-28 at 20:24 -0400, Paul Moore wrote:
> > > > I only became aware of this when the LSM list was CC'd so I'm a
> > > > little
> > > > behind on what is going on here ... looking quickly through the
> > > > mailing list archive it looks like there is an issue with BPF map
> > > > permissions not matching well with their associated fd
> > > > permissions,
> > > > yes?  From a LSM perspective, there are a couple of hooks that
> > > > currently use the fd's permissions (read/write) to determine the
> > > > appropriate access control check.
> > >
> > > From what I understood, access control on maps is done in two
> > > steps.
> > > First, whenever someone attempts to get a fd to a map
> > > security_bpf_map() is called. LSM implementations could check
> > > access if
> > > the current process has the right to access the map (whose label
> > > can be
> > > assigned at map creation time with security_bpf_map_alloc()).
> >
> > [NOTE: SELinux is currently the only LSM which provides BPF access
> > controls, so they are going to be my example here and in the rest of
> > this email.]
> >
> > In the case of SELinux, security_bpf_map() does check the access
> > between the current task and the BPF map itself (which inherits its
> > security label from its creator), with the actual permission
> > requested
> > being determined by the fmode_t parameter passed to the LSM hook.
> > Looking at the current BPF code, the callers seem to take that from
> > various different places
> > (bpf_attr:{file_flags,map_flags,open_flags}).
> > This could be due solely to the different operations being done by
> > the
> > callers, which would make me believe everything is correct, but given
> > this thread it seems reasonable for someone with a better
> > understanding of BPF than me to double check this.  Can you help
> > verify that everything is okay here?
>
> I also don't have concerns on this part. eBPF maintainers can help to
> confirm.
>
> > Second, whenever the holder of the obtained fd wants to do an
> > > operation
> > > on the map (lookup, update, delete, ...), eBPF checks if the fd
> > > modes
> > > are compatible with the operation to perform (e.g. lookup requires
> > > FMODE_CAN_READ).
> >
> > To be clear, from what I can see, it looks like the LSM is not
> > checking the fd modes, but rather the modes stored in the bpf_attr,
> > which I get the impression do not always match the fd modes.  Yes?
> > No?
>
> Correct, there is no revalidation. Fd modes represent what LSMs granted
> to the fd holder.

The "fd modes represent what LSMs grant the fd holder" comment doesn't
make much sense to me, can you clarify this?  LSMs by design are not
authoritative so I'm not sure how they would "grant" fd mode bits;
LSMs can deny a process the ability to open a file, but that's about
it.

> eBPF allows operations on the object behind the fd
> depending on the stored fd modes (for maps, not sure about the other
> object types).

Yes, it definitely seems as though BPF does not care at all about
permissions at the fd level.  That seems odd to me, and I wonder how
many BPF users truly understand that, but I suppose as long as
unprivileged BPF is not allowed, there is at least some ability to
dismiss the security concerns from a DAC perspective.  As mentioned
previously, I do worry about that becoming a problem in the future.

I am also growing increasingly concerned that the BPF fd passing check
in bpf_fd_pass() is not correct, but it isn't clear to me what the
correct thing is in this case.  How does one determine what access can
potentially be requested when a BPF fd is passed between processes if
the fd mode bits are not meaningful?  I'm not sure we have any other
information to go on at this point in the code ... do we just have to
assume that all passed maps are read/write (and programs can be run)
regardless of the fd mode bits?

> > There is also LSM/SELinux code which checks the permissions when a
> > BPF
> > map is passed between tasks via a fd.  Currently the SELinux check
> > only looks at the file:f_mode to get the permissions to check, but if
> > the f_mode bits are not the authoritative record of what is allowed
> > in
> > the BPF map, perhaps we need to change that to use one of the
> > bpf_attr
> > mode bits (my gut feeling is bpf_attr:open_flags)?
> >
> > > One problem is that the second part is missing for some operations
> > > dealing with the map fd:
> > >
> > > Map iterators:
> > > https://lore.kernel.org/bpf/20220906170301.256206-1-roberto.sassu@huaweicloud.com/
> >
> > You'll need to treat me like an idiot when it comes to BPF maps ;)
>
> Argh, sorry!

No worries :)

> > I did a very quick read on them right now and it looks like a BPF map
> > iterator would just be a combination of BPF read and execute ("bpf {
> > map_read prog_run }" in SELinux policy terms).  Would it make more
> > sense to just use the existing security_bpf_map() and
> > security_bpf_prog() hooks here?
>
> If I can make an example, if you have a linked list, a map iterator is
> like calling a function for each element of the list, allowing the
> function to read and write the element.

Thanks, that matches with what I found and why I thought an iterator
matches well with a map read and program run permission.  Do you
agree?

> For now I didn't think about programs. Lorenz, which reported this
> issue in the first place, is planning on rethinking access control on
> all eBPF objects.

That's probably not a bad idea, or at the very least I think it would
be a very good idea to draft a document about how eBPF access control
is supposed to work.  Start with the DAC side of things and we can
extend it to include those LSMs which provide eBPF controls.  Does
something like that already exist?

> When you create a map iterator, you pass the program you want to run
> and a map reference to bpftool (user space). bpftool first retrieves
> the fds for the program and the map (thus, security_bpf_prog() and
> security_bpf_map() are called).

Okay, that's good, we should have the right LSM controls in place at
this point in the process.

> The fds are then passed to the kernel
> with another system call to create the map iterator. The map iterator
> is a named kernel object, with a corresponding file in the bpf
> filesystem.

The iterator has no additional user data, or system state, beyond the
associated BPF program and map data, yes?  If that is the case I think
we may be able to avoid treating it as a proper object from a SELinux
perspective as long as we ensure that access is checked against the
associated map and program.

> Reading that file causes the iteration to start, by running
> the program for each map element.

... and we should definitely have an LSM access control check here,
although as mentioned above we may be able to just leverage the
existing security_bpf_map() and security_bpf_prog() hooks.  Do we do
that currently?

> The only missing part is checking the fd modes granted by LSMs at the
> time the map iterator is created. If the map iterator allows read and
> write, both FMODE_CAN_READ and FMODE_CAN_WRITE should be set.

Does the kernel enforce this from a DAC perspective?

> > > Map fd directly used by eBPF programs without system call:
> > > https://lore.kernel.org/bpf/20220926154430.1552800-1-roberto.sassu@huaweicloud.com/
> >
> > Another instance of "can you please explain this use case?" ;)
>
> Lorenz discovered that. Despite LSMs check which permissions user space
> requested for a map fd, there is no corresponding check of the fd
> modes. Normally, a map fd is passed in a system call to perform a map
> operation. In this case, it is not. It is set in the program code, and
> eBPF transforms the map fd into a map address suitable to be passed to
> kernel helpers which directly access the kernel object.

Yes, the LSMs use the fmode_t bits that are passed to
security_bpf_map() to determine the requested access permissions for
the map operation.  When I looked at the callers yesterday, it looked
like all of the security_bpf_map() callers were passing fmode_t bits
from a bpf_attr and not directly from the fd.  I need some help in
verifying that this is correct, but from what I've read thus far it
appears that the BPF code in general is not concerned about fd mode
bits.

> > I'm wondering if we do want to move the SELinux BPF fd passing code
> > to
> > check the bpf_attr:open_flags perms.  Thoughts?
>
> Seems correct as it is, but I'm not completely familiar with this work.
> As long as you ensure that the fd holder has the right to access the
> map, then it will be still responsibility of eBPF to just check the fd
> modes.

Looking at this further, I don't believe we would have the open_flags
perms at the time the BPF fd was passed anyway.

> > While this does not seem to me a concern from MAC perspective, as
> > > attempts to get a map fd still have to pass through
> > > security_bpf_map(),
> > > in general this should be fixed without relying on LSMs.
> >
> > Agreed.  The access controls need to work both for DAC and DAC+LSM.
>
> @all, what do you think?

It is worth reminding everyone this is pretty much how the rest of the
kernel works, you can't ignore the DAC controls (file mode bits,
capabilities, etc.) in favor of a pure LSM-based system, and you can't
avoid the necessary LSM hooks because "LSMs are stupid".  People build
Linux systems both as DAC only and DAC+LSM, we need to make sure we
can Do The Right Thing in both cases.

> > Is the plan to ensure that the map and fd permissions are correct
> > > > at
> > > > the core BPF level, or do we need to do some additional checks in
> > > > the
> > > > LSMs (currently only SELinux)?
> > >
> > > Should we add a new map_pin permission in SELinux?
> >
> > Maybe?  Maybe not?  I don't know, can you help me understand map
> > pinning a bit more first?
>
> I'm not completely sure that this is correct. Pinning a map seems to me
> like creating a new dentry for the inode.

Once again, can someone explain BPF map pinning?

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-30 20:43                               ` Paul Moore
@ 2022-10-04  8:03                                 ` Roberto Sassu
  0 siblings, 0 replies; 45+ messages in thread
From: Roberto Sassu @ 2022-10-04  8:03 UTC (permalink / raw)
  To: Paul Moore
  Cc: Toke Høiland-Jørgensen, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf,
	linux-security-module

On Fri, 2022-09-30 at 16:43 -0400, Paul Moore wrote:
> On Fri, Sep 30, 2022 at 5:57 AM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
> > On Thu, 2022-09-29 at 18:30 -0400, Paul Moore wrote:
> > > On Thu, Sep 29, 2022 at 3:55 AM Roberto Sassu
> > > <roberto.sassu@huaweicloud.com> wrote:
> > > > On Wed, 2022-09-28 at 20:24 -0400, Paul Moore wrote:
> > > > > I only became aware of this when the LSM list was CC'd so I'm
> > > > > a
> > > > > little
> > > > > behind on what is going on here ... looking quickly through
> > > > > the
> > > > > mailing list archive it looks like there is an issue with BPF
> > > > > map
> > > > > permissions not matching well with their associated fd
> > > > > permissions,
> > > > > yes?  From a LSM perspective, there are a couple of hooks
> > > > > that
> > > > > currently use the fd's permissions (read/write) to determine
> > > > > the
> > > > > appropriate access control check.
> > > > 
> > > > From what I understood, access control on maps is done in two
> > > > steps.
> > > > First, whenever someone attempts to get a fd to a map
> > > > security_bpf_map() is called. LSM implementations could check
> > > > access if
> > > > the current process has the right to access the map (whose
> > > > label
> > > > can be
> > > > assigned at map creation time with security_bpf_map_alloc()).
> > > 
> > > [NOTE: SELinux is currently the only LSM which provides BPF
> > > access
> > > controls, so they are going to be my example here and in the rest
> > > of
> > > this email.]
> > > 
> > > In the case of SELinux, security_bpf_map() does check the access
> > > between the current task and the BPF map itself (which inherits
> > > its
> > > security label from its creator), with the actual permission
> > > requested
> > > being determined by the fmode_t parameter passed to the LSM hook.
> > > Looking at the current BPF code, the callers seem to take that
> > > from
> > > various different places
> > > (bpf_attr:{file_flags,map_flags,open_flags}).
> > > This could be due solely to the different operations being done
> > > by
> > > the
> > > callers, which would make me believe everything is correct, but
> > > given
> > > this thread it seems reasonable for someone with a better
> > > understanding of BPF than me to double check this.  Can you help
> > > verify that everything is okay here?
> > 
> > I also don't have concerns on this part. eBPF maintainers can help
> > to
> > confirm.
> > 
> > > Second, whenever the holder of the obtained fd wants to do an
> > > > operation
> > > > on the map (lookup, update, delete, ...), eBPF checks if the fd
> > > > modes
> > > > are compatible with the operation to perform (e.g. lookup
> > > > requires
> > > > FMODE_CAN_READ).
> > > 
> > > To be clear, from what I can see, it looks like the LSM is not
> > > checking the fd modes, but rather the modes stored in the
> > > bpf_attr,
> > > which I get the impression do not always match the fd
> > > modes.  Yes?
> > > No?
> > 
> > Correct, there is no revalidation. Fd modes represent what LSMs
> > granted
> > to the fd holder.
> 
> The "fd modes represent what LSMs grant the fd holder" comment
> doesn't
> make much sense to me, can you clarify this?  LSMs by design are not
> authoritative so I'm not sure how they would "grant" fd mode bits;
> LSMs can deny a process the ability to open a file, but that's about
> it.

If I understood correctly, for files, FMODE_CAN_READ reflects the fact
that the requestor asked for O_RDONLY/O_RDWR, and DAC+LSM granted that
permission. The same for FMODE_CAN_WRITE.

> > eBPF allows operations on the object behind the fd
> > depending on the stored fd modes (for maps, not sure about the
> > other
> > object types).
> 
> Yes, it definitely seems as though BPF does not care at all about
> permissions at the fd level.  That seems odd to me, and I wonder how
> many BPF users truly understand that, but I suppose as long as
> unprivileged BPF is not allowed, there is at least some ability to
> dismiss the security concerns from a DAC perspective.  As mentioned
> previously, I do worry about that becoming a problem in the future.
> 
> I am also growing increasingly concerned that the BPF fd passing
> check
> in bpf_fd_pass() is not correct, but it isn't clear to me what the
> correct thing is in this case.  How does one determine what access
> can
> potentially be requested when a BPF fd is passed between processes if
> the fd mode bits are not meaningful?  I'm not sure we have any other
> information to go on at this point in the code ... do we just have to
> assume that all passed maps are read/write (and programs can be run)
> regardless of the fd mode bits?

Fd modes are meaningful in the sense that they allow certain map
operations. FMODE_CAN_READ allows read-like operations (lookup, dump,
...), FMODE_CAN_WRITE allows write-like operations (update, ...).

If we apply what I said above for files, FMODE_CAN_READ means that the
requestor passed BPF_F_RDONLY/zero to the BPF_MAP_GET_FD_BY_ID or
OBJ_GET operations of the bpf() system call, and LSMs granted that
operation (security_bpf_map() returned zero). So, at the end, eBPF
returned to the requestor a new fd with the appropriate fd modes set.

bpf_fd_pass() does the same as if the current process requested the map
fd by itself. If that process is receiving a fd with FMODE_CAN_READ, it
should pass a check as if security_bpf_map() is executed with the eBPF
flags BPF_F_RDONLY/zero.

If DAC check was also included, FMODE_CAN_READ would also mean that the
owner of the map allowed the requestor to access it.

> > > There is also LSM/SELinux code which checks the permissions when
> > > a
> > > BPF
> > > map is passed between tasks via a fd.  Currently the SELinux
> > > check
> > > only looks at the file:f_mode to get the permissions to check,
> > > but if
> > > the f_mode bits are not the authoritative record of what is
> > > allowed
> > > in
> > > the BPF map, perhaps we need to change that to use one of the
> > > bpf_attr
> > > mode bits (my gut feeling is bpf_attr:open_flags)?
> > > 
> > > > One problem is that the second part is missing for some
> > > > operations
> > > > dealing with the map fd:
> > > > 
> > > > Map iterators:
> > > > https://lore.kernel.org/bpf/20220906170301.256206-1-roberto.sassu@huaweicloud.com/
> > > 
> > > You'll need to treat me like an idiot when it comes to BPF maps
> > > ;)
> > 
> > Argh, sorry!
> 
> No worries :)
> 
> > > I did a very quick read on them right now and it looks like a BPF
> > > map
> > > iterator would just be a combination of BPF read and execute
> > > ("bpf {
> > > map_read prog_run }" in SELinux policy terms).  Would it make
> > > more
> > > sense to just use the existing security_bpf_map() and
> > > security_bpf_prog() hooks here?
> > 
> > If I can make an example, if you have a linked list, a map iterator
> > is
> > like calling a function for each element of the list, allowing the
> > function to read and write the element.
> 
> Thanks, that matches with what I found and why I thought an iterator
> matches well with a map read and program run permission.  Do you
> agree?

I thought it matches also with map write. Look at this code
(kernel/bpf/map_iter.c):

static const struct bpf_iter_reg bpf_map_elem_reg_info = {
[...]
	.ctx_arg_info		= {
		{ offsetof(struct bpf_iter__bpf_map_elem, key),
		  PTR_TO_BUF | PTR_MAYBE_NULL | MEM_RDONLY },
		{ offsetof(struct bpf_iter__bpf_map_elem, value),
		  PTR_TO_BUF | PTR_MAYBE_NULL },
	},
};

The flags above tell some attributes of each map key and value. In this
case only the key is read-only. The value is read-write. The eBPF
verifier enforces this, LSMs are not involved.

> > For now I didn't think about programs. Lorenz, which reported this
> > issue in the first place, is planning on rethinking access control
> > on
> > all eBPF objects.
> 
> That's probably not a bad idea, or at the very least I think it would
> be a very good idea to draft a document about how eBPF access control
> is supposed to work.  Start with the DAC side of things and we can
> extend it to include those LSMs which provide eBPF controls.  Does
> something like that already exist?

Not that I'm aware of. But it is a very good idea.

> > When you create a map iterator, you pass the program you want to
> > run
> > and a map reference to bpftool (user space). bpftool first
> > retrieves
> > the fds for the program and the map (thus, security_bpf_prog() and
> > security_bpf_map() are called).
> 
> Okay, that's good, we should have the right LSM controls in place at
> this point in the process.
> 
> > The fds are then passed to the kernel
> > with another system call to create the map iterator. The map
> > iterator
> > is a named kernel object, with a corresponding file in the bpf
> > filesystem.
> 
> The iterator has no additional user data, or system state, beyond the
> associated BPF program and map data, yes?  If that is the case I
> think
> we may be able to avoid treating it as a proper object from a SELinux
> perspective as long as we ensure that access is checked against the
> associated map and program.

Not totally sure about user data or system state, but yes. We probably
should be able to simply use existing map and program permissions. We
still have filesystem permissions to prevent access to the iterator.

> > Reading that file causes the iteration to start, by running
> > the program for each map element.
> 
> ... and we should definitely have an LSM access control check here,
> although as mentioned above we may be able to just leverage the
> existing security_bpf_map() and security_bpf_prog() hooks.  Do we do
> that currently?

I think so, although at the moment there is no restriction on which
program can access a map. In the long term, maybe it would be good to
have finer granularity.

> > The only missing part is checking the fd modes granted by LSMs at
> > the
> > time the map iterator is created. If the map iterator allows read
> > and
> > write, both FMODE_CAN_READ and FMODE_CAN_WRITE should be set.
> 
> Does the kernel enforce this from a DAC perspective?

At the moment not. I would place the DAC check close to
security_bpf_map().

The eBPF side of the security check (fd modes check) then remains the
same. What changed is which checks were made at the time the fd is
given to the requestor.

> > > > Map fd directly used by eBPF programs without system call:
> > > > https://lore.kernel.org/bpf/20220926154430.1552800-1-roberto.sassu@huaweicloud.com/
> > > 
> > > Another instance of "can you please explain this use case?" ;)
> > 
> > Lorenz discovered that. Despite LSMs check which permissions user
> > space
> > requested for a map fd, there is no corresponding check of the fd
> > modes. Normally, a map fd is passed in a system call to perform a
> > map
> > operation. In this case, it is not. It is set in the program code,
> > and
> > eBPF transforms the map fd into a map address suitable to be passed
> > to
> > kernel helpers which directly access the kernel object.
> 
> Yes, the LSMs use the fmode_t bits that are passed to
> security_bpf_map() to determine the requested access permissions for
> the map operation.  When I looked at the callers yesterday, it looked
> like all of the security_bpf_map() callers were passing fmode_t bits
> from a bpf_attr and not directly from the fd.  I need some help in
> verifying that this is correct, but from what I've read thus far it
> appears that the BPF code in general is not concerned about fd mode
> bits.

Probably because the fd is not obtained as result of an open() but as
result of a bpf() system call.

> > > I'm wondering if we do want to move the SELinux BPF fd passing
> > > code
> > > to
> > > check the bpf_attr:open_flags perms.  Thoughts?
> > 
> > Seems correct as it is, but I'm not completely familiar with this
> > work.
> > As long as you ensure that the fd holder has the right to access
> > the
> > map, then it will be still responsibility of eBPF to just check the
> > fd
> > modes.
> 
> Looking at this further, I don't believe we would have the open_flags
> perms at the time the BPF fd was passed anyway.

We have the fd modes, which we can translate to map permissions (not
exactly to open_flags or file_flags: FMODE_CAN_READ could be set as
result of setting those flags to BPF_F_RDONLY or zero).

> > > While this does not seem to me a concern from MAC perspective, as
> > > > attempts to get a map fd still have to pass through
> > > > security_bpf_map(),
> > > > in general this should be fixed without relying on LSMs.
> > > 
> > > Agreed.  The access controls need to work both for DAC and
> > > DAC+LSM.
> > 
> > @all, what do you think?
> 
> It is worth reminding everyone this is pretty much how the rest of
> the
> kernel works, you can't ignore the DAC controls (file mode bits,
> capabilities, etc.) in favor of a pure LSM-based system, and you
> can't
> avoid the necessary LSM hooks because "LSMs are stupid".  People
> build
> Linux systems both as DAC only and DAC+LSM, we need to make sure we
> can Do The Right Thing in both cases.
> 
> > > Is the plan to ensure that the map and fd permissions are correct
> > > > > at
> > > > > the core BPF level, or do we need to do some additional
> > > > > checks in
> > > > > the
> > > > > LSMs (currently only SELinux)?
> > > > 
> > > > Should we add a new map_pin permission in SELinux?
> > > 
> > > Maybe?  Maybe not?  I don't know, can you help me understand map
> > > pinning a bit more first?
> > 
> > I'm not completely sure that this is correct. Pinning a map seems
> > to me
> > like creating a new dentry for the inode.
> 
> Once again, can someone explain BPF map pinning?

Will let the eBPF maintainers explain this better.

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-28  8:54 ` Lorenz Bauer
@ 2022-10-06  7:15   ` Roberto Sassu
  2022-10-27 16:54   ` Andrii Nakryiko
  1 sibling, 0 replies; 45+ messages in thread
From: Roberto Sassu @ 2022-10-06  7:15 UTC (permalink / raw)
  To: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev
  Cc: bpf, linux-security-module, Paul Moore, Casey Schaufler

On Wed, 2022-09-28 at 09:54 +0100, Lorenz Bauer wrote:
> On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote:
> > Hi list,
> > 
> > Here is a summary of the talk I gave at LPC '22 titled "Closing the
> > BPF 
> > map permission loophole", with slides at [0].
> 
> I've put this topic on the agenda of the 2022-10-06 BPF office hours
> to get some maintainer attention. Details are here: 
> https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit

Thanks for setting this up. I would include security people, they might
be interested as well to join.

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-09-28  8:54 ` Lorenz Bauer
  2022-10-06  7:15   ` Roberto Sassu
@ 2022-10-27 16:54   ` Andrii Nakryiko
  2022-10-28  6:27     ` Greg KH
                       ` (2 more replies)
  1 sibling, 3 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2022-10-27 16:54 UTC (permalink / raw)
  To: Lorenz Bauer
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf

On Wed, Sep 28, 2022 at 1:54 AM Lorenz Bauer <oss@lmb.io> wrote:
>
> On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote:
> > Hi list,
> >
> > Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF
> > map permission loophole", with slides at [0].
>
> I've put this topic on the agenda of the 2022-10-06 BPF office hours to get some maintainer attention. Details are here: https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit
>
> Best

So after the office hours I had an offline whiteboard discussion with
Alexei explaining more precisely what I was proposing, and it became
apparent that some of the things I was proposing weren't exactly
clear, and thus people were left confused about the solution I was
talking about. So I'll try to summarize it a bit and add some more
specifics. Hopefully that will help, because I still believe we can
solve this problem moving forward.

But first, two notes.

1) Backporting this is going to be hard, and I don't think that should
be the goal, it's going to be too intrusive, probably.

2) It turned out that we currently don't store user-space-side
read/write permissions on struct bpf_map itself, and we'd need to do
that as a preliminary step here. Below I just assume that struct
bpf_map records all the bpf-side and user-side read/write permissions.

So, the overall idea is that instead of fetching struct bpf_map point
for all kinds of FD-based operations (bpf_obj_get, map_fd_by_id, even
bpf_map_create) we are always working with a view of a map, and that
"view" is a separate struct/object, something like:

struct bpf_map_view {
    struct bpf_map *map;
    /* BPF_F_RDONLY | BPF_F_WRONLY, but we can later also add
       BPF-side flags: BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG
    */
    int access_flags;
}

So whenever we work with map by FD, we get struct bpf_map_view (i.e.,
we store struct bpf_map_view inside file->private and
inode->i_private). The semantics of view->access_flags is that it is
superimposed on top of bpf_map->map_flags (specifically its
BPF_F_RDONLY | BPF_F_WRONLY parts, later also BPF_F_RDONLY_PROG |
BPF_F_WRONLY_PROG). This means that if struct bpf_map is R/W, but our
current bpf_map_view says BPF_F_RDONLY, then only read-only access is
allowed through that FD. On the other hand, if bpf_map itself is only
BPF_F_RDONLY, but we somehow go bpf_map_view with BPF_F_RDONLY |
BPF_F_WRONLY (e.g., due to chmod loophole), then it doesn't matter,
it's still BPF_F_RDONLY, no write access. We can try preventing such
situations in some circumstances, but as we showed with chmod() it's
impossible to prevent in general.

So, just to hopefully make it a bit clearer, let's discuss a use case
that a bunch of people had in mind. Root/CAP_BPF user created R/W
bpf_map, but wants to pin it in one BPFFS path as R/W, so that it can
be later opened as R/W and modified. This BPFFS path will be
restricted through FS permissions to only allow it to be opened by a
privileged user/group. But, that same original root/CAP_BPF user would
like to also create a read-only BPFFS pinning of that same map, and
let unprivileged user(s) to open and work with that map, but only
perform read-only operations (e.g., BPF_MAP_LOOKUP_ELEM command).

Let's see how that works in this new bpf_map_view model.

1. root/CAP_BPF user does BPF_MAP_CREATE operation, struct bpf_map is
created with map_flags BPF_F_RDONLY | BPF_F_WRONLY. Also, immediately
struct bpf_map_view is created with same BPF_F_RDONLY | BPF_F_WRONLY
access_flags. struct bpf_map_view keeps reference on struct bpf_map,
struct bpf_map_view is assigned to struct file->private, new FD is
returned to user-space.

2. That same root/CAP_BPF user does BPF_OBJ_PIN and specifies that
they want R/W pinning (through file_flags). Kernel clones/copies
struct bpf_map_view (I think bpf_map_view shouldn't be shared between
files/inodes, each file/inode has its own personal copy; but we can
work out the details later), sets (keeps in this case) its
access_flags as  BPF_F_RDONLY | BPF_F_WRONLY. After that they are free
to chown/chmod as necessary to prevent unprivileged user from doing
anything with that BPFFS file, if necessary.

3. Now, for the read-only pinning. User does another BPF_OBJ_PIN using
original R/W map FD, but now they specify file_flags to only allow
BPF_F_RDONLY (I'm too lazy to check what exact flag we pass there to
communicate this intent, it's not that important for this discussion).
At this point, kernel creates a new struct bpf_map_view, pointing to
struct bpf_map, but this time access_flags have only BPF_F_RDONLY set.
Then we proceed to creating an inode, its i_private is assigned this
new R/O bpf_map_view. The user should chmod/chown pinned BPFFS file
appropriately to allow unprivileged users to BPF_OBJ_GET it.


Now, let's assume we are unprivileged user who wants to work with that
pinned BPF map. When we do BPF_OBJ_GET on that read-only pinned file,
kernel fetches struct bpf_map_view from inode->i_private which has
access_flags as BPF_F_RDONLY. That's it, there is no way we can do
update on that map, kernel will reject that even though struct bpf_map
itself allows BPF_F_WRONLY.

Note, though, that once we checked everything, as we create a new
struct file and return new FD to user-space, that new struct file will
have *yet another copy* of struct bpf_map_view, cloned from inode's
bpf_map_view (recall that I was proposing to have 1-to-1 mapping
between file/inode and bpf_map_view).


Let's now assume we are sneaky bastards and chmod that second pinned
BPFFS file to allow r/w file permissions. When we do BPF_OBJ_GET,
again, we'll fetch struct bpf_map_view which enforce BPF_F_RDONLY
(only), despite file itself having writable permissions. We can argue
if we should reject such BPF_OBJ_GET command or silently "downgrade"
to read-only view, that's beside the point.

Hopefully this is a bit clearer.

One last note. When we are talking about BPF_OBJ_GET, we are actually
going to be dealing with 4 layers of read and write permissions:
  1) struct bpf_map's "inherent" permissions
  2) struct bpf_map_view's access_flags
  3) struct file's FS read/write permissions
  4) union bpf_attr's file_flags specified for BPF_OBJ_GET

While that's a lot, we always intersect them and keep only the most
restrictive combination. So if at any of the layers we have read-only
permissions, resulting *new struct bpf_map_view* will only specify
BPF_F_RDONLY. E.g., if at layers 1, 2, and 4 we allow BPF_F_WRONLY,
but BPFFS file permission (layer #3 above) at that moment is
read-only, we should be only getting read-only view of BPF map.

P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG |
BPF_F_WRONLY_PROG as well, it's just that we'll need to define how
user will control that. E.g., FS read-only permission, does it
restrict both user-space and BPF-view, or just user-space view? We can
certainly extend file_flags to allow users to get BPF-side read-only
and user-space-side read-write BPF map FD, for example. Obviously, BPF
verifier would need to know about struct bpf_map_view when accepting
BPF map FD in ldimm64 and such.

Alright, that's what I think I had to say. Please point out scenarios
that won't be covered by this bpf_map_view model. Thanks.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-10-27 16:54   ` Andrii Nakryiko
@ 2022-10-28  6:27     ` Greg KH
  2022-10-28 17:17       ` Andrii Nakryiko
  2022-10-29 11:18     ` Lorenz Bauer
  2022-10-31 11:53     ` Roberto Sassu
  2 siblings, 1 reply; 45+ messages in thread
From: Greg KH @ 2022-10-28  6:27 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev,
	bpf

On Thu, Oct 27, 2022 at 09:54:57AM -0700, Andrii Nakryiko wrote:
> But first, two notes.
> 
> 1) Backporting this is going to be hard, and I don't think that should
> be the goal, it's going to be too intrusive, probably.

Don't worry about backporting when doing your work.  Get it correct
first, and let others worry about any backporting if it's even needed.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-10-28  6:27     ` Greg KH
@ 2022-10-28 17:17       ` Andrii Nakryiko
  0 siblings, 0 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2022-10-28 17:17 UTC (permalink / raw)
  To: Greg KH
  Cc: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev,
	bpf

On Thu, Oct 27, 2022 at 11:26 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Thu, Oct 27, 2022 at 09:54:57AM -0700, Andrii Nakryiko wrote:
> > But first, two notes.
> >
> > 1) Backporting this is going to be hard, and I don't think that should
> > be the goal, it's going to be too intrusive, probably.
>
> Don't worry about backporting when doing your work.  Get it correct
> first, and let others worry about any backporting if it's even needed.
>

Yep, I agree, thanks, Greg!

I was pointing this out here explicitly because during previous
discussions the "backportability" aspect was a distraction and reason
for miscommunication issues and I wanted to get that out of the way.

> thanks,
>
> greg k-h

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-10-27 16:54   ` Andrii Nakryiko
  2022-10-28  6:27     ` Greg KH
@ 2022-10-29 11:18     ` Lorenz Bauer
  2022-10-31 11:53     ` Roberto Sassu
  2 siblings, 0 replies; 45+ messages in thread
From: Lorenz Bauer @ 2022-10-29 11:18 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf

On Thu, 27 Oct 2022, at 17:54, Andrii Nakryiko wrote:
>
> So after the office hours I had an offline whiteboard discussion with
> Alexei explaining more precisely what I was proposing, and it became
> apparent that some of the things I was proposing weren't exactly
> clear, and thus people were left confused about the solution I was
> talking about. So I'll try to summarize it a bit and add some more
> specifics. Hopefully that will help, because I still believe we can
> solve this problem moving forward.

Hi Andrii,

Thanks for writing down your thoughts, that helps a lot. I have a draft
email that I wanted to finish this week, but that didn't happen, sorry.
I'll be traveling next week, so I'll circle back on this issue the week after.
I hope that is OK.

Lorenz

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-10-27 16:54   ` Andrii Nakryiko
  2022-10-28  6:27     ` Greg KH
  2022-10-29 11:18     ` Lorenz Bauer
@ 2022-10-31 11:53     ` Roberto Sassu
  2022-11-04 21:10       ` Andrii Nakryiko
  2 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-10-31 11:53 UTC (permalink / raw)
  To: Andrii Nakryiko, Lorenz Bauer
  Cc: Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf, LSM List,
	Paul Moore, Casey Schaufler

On 10/27/2022 6:54 PM, Andrii Nakryiko wrote:
> On Wed, Sep 28, 2022 at 1:54 AM Lorenz Bauer <oss@lmb.io> wrote:
>>
>> On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote:
>>> Hi list,
>>>
>>> Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF
>>> map permission loophole", with slides at [0].
>>
>> I've put this topic on the agenda of the 2022-10-06 BPF office hours to get some maintainer attention. Details are here: https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit
>>
>> Best
> 
> So after the office hours I had an offline whiteboard discussion with
> Alexei explaining more precisely what I was proposing, and it became
> apparent that some of the things I was proposing weren't exactly
> clear, and thus people were left confused about the solution I was
> talking about. So I'll try to summarize it a bit and add some more
> specifics. Hopefully that will help, because I still believe we can
> solve this problem moving forward.
> 
> But first, two notes.
> 
> 1) Backporting this is going to be hard, and I don't think that should
> be the goal, it's going to be too intrusive, probably.
> 
> 2) It turned out that we currently don't store user-space-side
> read/write permissions on struct bpf_map itself, and we'd need to do
> that as a preliminary step here. Below I just assume that struct
> bpf_map records all the bpf-side and user-side read/write permissions.

+linux-security-module, Paul, Casey

Thanks Andrii for writing such detailed proposal. It is very clear.

I was thinking about your bpf_map_view abstraction, to record per-fd
permission. My question would be, isn't the f_mode enough for this
purpose? I mean, if you want to record the access flags per fd, you
already have them in f_mode. Apart from map iterators, the eBPF code
handling the user space side of map access is already capable of
handling and enforcing based on the f_mode.

So, what remains for us to do is to ensure that a requestor gets
a fd with modes compatible with what the requestor is allowed to do.

For a moment, I exclude MAC-style controls, as I understood that
this should not be the only type of enforcement.

Then, maybe we could treat maps like inodes, meaning that we could
add to bpf_map the following fields:

m_uid
m_gid
m_mode

These fields will be populated at map creation time, depending on
who is requesting it. With similar mechanism as for inodes (umask),
we can decide the default m_mode (read-write for the owner,
read-only for the group and others). These fields are relevant only
for the user space side of map access.

We can add two new commands for bpf():

BPF_MAP_CHOWN
BPF_MAP_CHMOD

to change the fields above.

I comment below, to see if this alternative proposal works for the
use cases you described.

> So, the overall idea is that instead of fetching struct bpf_map point
> for all kinds of FD-based operations (bpf_obj_get, map_fd_by_id, even
> bpf_map_create) we are always working with a view of a map, and that
> "view" is a separate struct/object, something like:
> 
> struct bpf_map_view {
>      struct bpf_map *map;
>      /* BPF_F_RDONLY | BPF_F_WRONLY, but we can later also add
>         BPF-side flags: BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG
>      */
>      int access_flags;
> }
> 
> So whenever we work with map by FD, we get struct bpf_map_view (i.e.,
> we store struct bpf_map_view inside file->private and
> inode->i_private). The semantics of view->access_flags is that it is
> superimposed on top of bpf_map->map_flags (specifically its
> BPF_F_RDONLY | BPF_F_WRONLY parts, later also BPF_F_RDONLY_PROG |
> BPF_F_WRONLY_PROG). This means that if struct bpf_map is R/W, but our
> current bpf_map_view says BPF_F_RDONLY, then only read-only access is
> allowed through that FD. On the other hand, if bpf_map itself is only
> BPF_F_RDONLY, but we somehow go bpf_map_view with BPF_F_RDONLY |
> BPF_F_WRONLY (e.g., due to chmod loophole), then it doesn't matter,
> it's still BPF_F_RDONLY, no write access. We can try preventing such
> situations in some circumstances, but as we showed with chmod() it's
> impossible to prevent in general.
> 
> So, just to hopefully make it a bit clearer, let's discuss a use case
> that a bunch of people had in mind. Root/CAP_BPF user created R/W
> bpf_map, but wants to pin it in one BPFFS path as R/W, so that it can
> be later opened as R/W and modified. This BPFFS path will be
> restricted through FS permissions to only allow it to be opened by a
> privileged user/group. But, that same original root/CAP_BPF user would
> like to also create a read-only BPFFS pinning of that same map, and
> let unprivileged user(s) to open and work with that map, but only
> perform read-only operations (e.g., BPF_MAP_LOOKUP_ELEM command).
> 
> Let's see how that works in this new bpf_map_view model.
> 
> 1. root/CAP_BPF user does BPF_MAP_CREATE operation, struct bpf_map is
> created with map_flags BPF_F_RDONLY | BPF_F_WRONLY. Also, immediately
> struct bpf_map_view is created with same BPF_F_RDONLY | BPF_F_WRONLY
> access_flags. struct bpf_map_view keeps reference on struct bpf_map,
> struct bpf_map_view is assigned to struct file->private, new FD is
> returned to user-space.

Ok, m_uid, m_gid are taken from the current process. m_mode (for the
owner) could be set from the map creation flags (0, BPF_F_RDONLY,
BPF_F_WRONLY). The fd the owner receives has f_mode compatible with the
map creation flags.

> 2. That same root/CAP_BPF user does BPF_OBJ_PIN and specifies that
> they want R/W pinning (through file_flags). Kernel clones/copies
> struct bpf_map_view (I think bpf_map_view shouldn't be shared between
> files/inodes, each file/inode has its own personal copy; but we can
> work out the details later), sets (keeps in this case) its
> access_flags as  BPF_F_RDONLY | BPF_F_WRONLY. After that they are free
> to chown/chmod as necessary to prevent unprivileged user from doing
> anything with that BPFFS file, if necessary.

I understand that per-pinned map permissions gives a lot of flexibility.
But maybe, the owner/group/others permissions are sufficient to cover
most of the use cases. Instead of creating two pinned maps, one 
read-write and one read-only, we just create one and we define the map 
m_mode
as rw-r--r--.

At the time a requestor wants to get a fd from the pinned map through
OBJ_GET, the kernel checks from the process UID/GID if it has 
permissions in m_mode.

We can keep the permission check on the inode of the pinned map as an
additional security control.

> 3. Now, for the read-only pinning. User does another BPF_OBJ_PIN using
> original R/W map FD, but now they specify file_flags to only allow
> BPF_F_RDONLY (I'm too lazy to check what exact flag we pass there to
> communicate this intent, it's not that important for this discussion).
> At this point, kernel creates a new struct bpf_map_view, pointing to
> struct bpf_map, but this time access_flags have only BPF_F_RDONLY set.
> Then we proceed to creating an inode, its i_private is assigned this
> new R/O bpf_map_view. The user should chmod/chown pinned BPFFS file
> appropriately to allow unprivileged users to BPF_OBJ_GET it.

Now that the main access control check is based on m_mode, we might
think who can pin a map. Only the owner? Maybe we can reuse the
execute permission in m_mode to determine that.

> Now, let's assume we are unprivileged user who wants to work with that
> pinned BPF map. When we do BPF_OBJ_GET on that read-only pinned file,
> kernel fetches struct bpf_map_view from inode->i_private which has
> access_flags as BPF_F_RDONLY. That's it, there is no way we can do
> update on that map, kernel will reject that even though struct bpf_map
> itself allows BPF_F_WRONLY.

This should be clear now.

> Note, though, that once we checked everything, as we create a new
> struct file and return new FD to user-space, that new struct file will
> have *yet another copy* of struct bpf_map_view, cloned from inode's
> bpf_map_view (recall that I was proposing to have 1-to-1 mapping
> between file/inode and bpf_map_view).
> 
> 
> Let's now assume we are sneaky bastards and chmod that second pinned
> BPFFS file to allow r/w file permissions. When we do BPF_OBJ_GET,
> again, we'll fetch struct bpf_map_view which enforce BPF_F_RDONLY
> (only), despite file itself having writable permissions. We can argue
> if we should reject such BPF_OBJ_GET command or silently "downgrade"
> to read-only view, that's beside the point.

Ok, yes. Permissions on the pinned map are just an additional barrier.

> Hopefully this is a bit clearer.
> 
> One last note. When we are talking about BPF_OBJ_GET, we are actually
> going to be dealing with 4 layers of read and write permissions:
>    1) struct bpf_map's "inherent" permissions
>    2) struct bpf_map_view's access_flags
>    3) struct file's FS read/write permissions
>    4) union bpf_attr's file_flags specified for BPF_OBJ_GET

In my proposal, that would change to:

1) struct bpf_map m_uid, m_gid, m_mode
2) struct file's FS read/write permission (depends on the inode on
    BPFFS)
3) process uid, gid of the requestor
4) union bpf_attr's file_flags specified for BPF_OBJ_GET

> While that's a lot, we always intersect them and keep only the most
> restrictive combination. So if at any of the layers we have read-only
> permissions, resulting *new struct bpf_map_view* will only specify
> BPF_F_RDONLY. E.g., if at layers 1, 2, and 4 we allow BPF_F_WRONLY,
> but BPFFS file permission (layer #3 above) at that moment is
> read-only, we should be only getting read-only view of BPF map.

Ok, sure. I think more or less the proposals are aligned. If traditional
access control is sufficient, we could avoid the increased complexity of 
the new bpf_map_view layer.

> P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG |
> BPF_F_WRONLY_PROG as well, it's just that we'll need to define how
> user will control that. E.g., FS read-only permission, does it
> restrict both user-space and BPF-view, or just user-space view? We can
> certainly extend file_flags to allow users to get BPF-side read-only
> and user-space-side read-write BPF map FD, for example. Obviously, BPF
> verifier would need to know about struct bpf_map_view when accepting
> BPF map FD in ldimm64 and such.

I guess, this patch could be used:

https://lore.kernel.org/bpf/20220926154430.1552800-3-roberto.sassu@huaweicloud.com/

When passing a fd to an eBPF program, the permissions of the user space
side cannot exceed those defined from eBPF program side.

Thanks

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-10-31 11:53     ` Roberto Sassu
@ 2022-11-04 21:10       ` Andrii Nakryiko
  2022-11-07 12:11         ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Andrii Nakryiko @ 2022-11-04 21:10 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev,
	bpf, LSM List, Paul Moore, Casey Schaufler

On Mon, Oct 31, 2022 at 4:54 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> On 10/27/2022 6:54 PM, Andrii Nakryiko wrote:
> > On Wed, Sep 28, 2022 at 1:54 AM Lorenz Bauer <oss@lmb.io> wrote:
> >>
> >> On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote:
> >>> Hi list,
> >>>
> >>> Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF
> >>> map permission loophole", with slides at [0].
> >>
> >> I've put this topic on the agenda of the 2022-10-06 BPF office hours to get some maintainer attention. Details are here: https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit
> >>
> >> Best
> >
> > So after the office hours I had an offline whiteboard discussion with
> > Alexei explaining more precisely what I was proposing, and it became
> > apparent that some of the things I was proposing weren't exactly
> > clear, and thus people were left confused about the solution I was
> > talking about. So I'll try to summarize it a bit and add some more
> > specifics. Hopefully that will help, because I still believe we can
> > solve this problem moving forward.
> >
> > But first, two notes.
> >
> > 1) Backporting this is going to be hard, and I don't think that should
> > be the goal, it's going to be too intrusive, probably.
> >
> > 2) It turned out that we currently don't store user-space-side
> > read/write permissions on struct bpf_map itself, and we'd need to do
> > that as a preliminary step here. Below I just assume that struct
> > bpf_map records all the bpf-side and user-side read/write permissions.
>
> +linux-security-module, Paul, Casey
>
> Thanks Andrii for writing such detailed proposal. It is very clear.
>
> I was thinking about your bpf_map_view abstraction, to record per-fd
> permission. My question would be, isn't the f_mode enough for this
> purpose? I mean, if you want to record the access flags per fd, you
> already have them in f_mode. Apart from map iterators, the eBPF code
> handling the user space side of map access is already capable of
> handling and enforcing based on the f_mode.
>
> So, what remains for us to do is to ensure that a requestor gets
> a fd with modes compatible with what the requestor is allowed to do.
>
> For a moment, I exclude MAC-style controls, as I understood that
> this should not be the only type of enforcement.
>
> Then, maybe we could treat maps like inodes, meaning that we could
> add to bpf_map the following fields:
>
> m_uid
> m_gid
> m_mode
>
> These fields will be populated at map creation time, depending on
> who is requesting it. With similar mechanism as for inodes (umask),
> we can decide the default m_mode (read-write for the owner,
> read-only for the group and others). These fields are relevant only
> for the user space side of map access.
>
> We can add two new commands for bpf():
>
> BPF_MAP_CHOWN
> BPF_MAP_CHMOD
>
> to change the fields above.
>
> I comment below, to see if this alternative proposal works for the
> use cases you described.

didn't we establish that we can't trust fd permissions because we
don't control normal chmod/chown?..

>
> > So, the overall idea is that instead of fetching struct bpf_map point
> > for all kinds of FD-based operations (bpf_obj_get, map_fd_by_id, even
> > bpf_map_create) we are always working with a view of a map, and that
> > "view" is a separate struct/object, something like:
> >
> > struct bpf_map_view {
> >      struct bpf_map *map;
> >      /* BPF_F_RDONLY | BPF_F_WRONLY, but we can later also add
> >         BPF-side flags: BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG
> >      */
> >      int access_flags;
> > }
> >
> > So whenever we work with map by FD, we get struct bpf_map_view (i.e.,
> > we store struct bpf_map_view inside file->private and
> > inode->i_private). The semantics of view->access_flags is that it is
> > superimposed on top of bpf_map->map_flags (specifically its
> > BPF_F_RDONLY | BPF_F_WRONLY parts, later also BPF_F_RDONLY_PROG |
> > BPF_F_WRONLY_PROG). This means that if struct bpf_map is R/W, but our
> > current bpf_map_view says BPF_F_RDONLY, then only read-only access is
> > allowed through that FD. On the other hand, if bpf_map itself is only
> > BPF_F_RDONLY, but we somehow go bpf_map_view with BPF_F_RDONLY |
> > BPF_F_WRONLY (e.g., due to chmod loophole), then it doesn't matter,
> > it's still BPF_F_RDONLY, no write access. We can try preventing such
> > situations in some circumstances, but as we showed with chmod() it's
> > impossible to prevent in general.
> >
> > So, just to hopefully make it a bit clearer, let's discuss a use case
> > that a bunch of people had in mind. Root/CAP_BPF user created R/W
> > bpf_map, but wants to pin it in one BPFFS path as R/W, so that it can
> > be later opened as R/W and modified. This BPFFS path will be
> > restricted through FS permissions to only allow it to be opened by a
> > privileged user/group. But, that same original root/CAP_BPF user would
> > like to also create a read-only BPFFS pinning of that same map, and
> > let unprivileged user(s) to open and work with that map, but only
> > perform read-only operations (e.g., BPF_MAP_LOOKUP_ELEM command).
> >
> > Let's see how that works in this new bpf_map_view model.
> >
> > 1. root/CAP_BPF user does BPF_MAP_CREATE operation, struct bpf_map is
> > created with map_flags BPF_F_RDONLY | BPF_F_WRONLY. Also, immediately
> > struct bpf_map_view is created with same BPF_F_RDONLY | BPF_F_WRONLY
> > access_flags. struct bpf_map_view keeps reference on struct bpf_map,
> > struct bpf_map_view is assigned to struct file->private, new FD is
> > returned to user-space.
>
> Ok, m_uid, m_gid are taken from the current process. m_mode (for the
> owner) could be set from the map creation flags (0, BPF_F_RDONLY,
> BPF_F_WRONLY). The fd the owner receives has f_mode compatible with the
> map creation flags.
>
> > 2. That same root/CAP_BPF user does BPF_OBJ_PIN and specifies that
> > they want R/W pinning (through file_flags). Kernel clones/copies
> > struct bpf_map_view (I think bpf_map_view shouldn't be shared between
> > files/inodes, each file/inode has its own personal copy; but we can
> > work out the details later), sets (keeps in this case) its
> > access_flags as  BPF_F_RDONLY | BPF_F_WRONLY. After that they are free
> > to chown/chmod as necessary to prevent unprivileged user from doing
> > anything with that BPFFS file, if necessary.
>
> I understand that per-pinned map permissions gives a lot of flexibility.
> But maybe, the owner/group/others permissions are sufficient to cover
> most of the use cases. Instead of creating two pinned maps, one
> read-write and one read-only, we just create one and we define the map
> m_mode
> as rw-r--r--.
>
> At the time a requestor wants to get a fd from the pinned map through
> OBJ_GET, the kernel checks from the process UID/GID if it has
> permissions in m_mode.
>
> We can keep the permission check on the inode of the pinned map as an
> additional security control.

wouldn't inode permissions be too inflexible? E.g., if I created a map
and I'm in group1, but I want to give read-only access to group2, but
not to any other group. I can't use other part of permission and
setting group permissions to r is too restrictive (I want users in my
group to be able to open r/w view of the map).

>
> > 3. Now, for the read-only pinning. User does another BPF_OBJ_PIN using
> > original R/W map FD, but now they specify file_flags to only allow
> > BPF_F_RDONLY (I'm too lazy to check what exact flag we pass there to
> > communicate this intent, it's not that important for this discussion).
> > At this point, kernel creates a new struct bpf_map_view, pointing to
> > struct bpf_map, but this time access_flags have only BPF_F_RDONLY set.
> > Then we proceed to creating an inode, its i_private is assigned this
> > new R/O bpf_map_view. The user should chmod/chown pinned BPFFS file
> > appropriately to allow unprivileged users to BPF_OBJ_GET it.
>
> Now that the main access control check is based on m_mode, we might
> think who can pin a map. Only the owner? Maybe we can reuse the
> execute permission in m_mode to determine that.

Why such limitations that only owner should be able to pin? What if
I'm that read-only user and I want to pin it somewhere else as another
read-only pinning (for whatever reason, to share with my own
processes/users).


>
> > Now, let's assume we are unprivileged user who wants to work with that
> > pinned BPF map. When we do BPF_OBJ_GET on that read-only pinned file,
> > kernel fetches struct bpf_map_view from inode->i_private which has
> > access_flags as BPF_F_RDONLY. That's it, there is no way we can do
> > update on that map, kernel will reject that even though struct bpf_map
> > itself allows BPF_F_WRONLY.
>
> This should be clear now.
>
> > Note, though, that once we checked everything, as we create a new
> > struct file and return new FD to user-space, that new struct file will
> > have *yet another copy* of struct bpf_map_view, cloned from inode's
> > bpf_map_view (recall that I was proposing to have 1-to-1 mapping
> > between file/inode and bpf_map_view).
> >
> >
> > Let's now assume we are sneaky bastards and chmod that second pinned
> > BPFFS file to allow r/w file permissions. When we do BPF_OBJ_GET,
> > again, we'll fetch struct bpf_map_view which enforce BPF_F_RDONLY
> > (only), despite file itself having writable permissions. We can argue
> > if we should reject such BPF_OBJ_GET command or silently "downgrade"
> > to read-only view, that's beside the point.
>
> Ok, yes. Permissions on the pinned map are just an additional barrier.
>
> > Hopefully this is a bit clearer.
> >
> > One last note. When we are talking about BPF_OBJ_GET, we are actually
> > going to be dealing with 4 layers of read and write permissions:
> >    1) struct bpf_map's "inherent" permissions
> >    2) struct bpf_map_view's access_flags
> >    3) struct file's FS read/write permissions
> >    4) union bpf_attr's file_flags specified for BPF_OBJ_GET
>
> In my proposal, that would change to:
>
> 1) struct bpf_map m_uid, m_gid, m_mode
> 2) struct file's FS read/write permission (depends on the inode on
>     BPFFS)
> 3) process uid, gid of the requestor
> 4) union bpf_attr's file_flags specified for BPF_OBJ_GET
>
> > While that's a lot, we always intersect them and keep only the most
> > restrictive combination. So if at any of the layers we have read-only
> > permissions, resulting *new struct bpf_map_view* will only specify
> > BPF_F_RDONLY. E.g., if at layers 1, 2, and 4 we allow BPF_F_WRONLY,
> > but BPFFS file permission (layer #3 above) at that moment is
> > read-only, we should be only getting read-only view of BPF map.
>
> Ok, sure. I think more or less the proposals are aligned. If traditional
> access control is sufficient, we could avoid the increased complexity of
> the new bpf_map_view layer.

I think this additional complexity is fundamental to this problem. And
as I mentioned above, relying just on inode permissions doesn't seem
sufficient. But maybe I missed something in your proposal.

>
> > P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG |
> > BPF_F_WRONLY_PROG as well, it's just that we'll need to define how
> > user will control that. E.g., FS read-only permission, does it
> > restrict both user-space and BPF-view, or just user-space view? We can
> > certainly extend file_flags to allow users to get BPF-side read-only
> > and user-space-side read-write BPF map FD, for example. Obviously, BPF
> > verifier would need to know about struct bpf_map_view when accepting
> > BPF map FD in ldimm64 and such.
>
> I guess, this patch could be used:
>
> https://lore.kernel.org/bpf/20220926154430.1552800-3-roberto.sassu@huaweicloud.com/
>
> When passing a fd to an eBPF program, the permissions of the user space
> side cannot exceed those defined from eBPF program side.

Don't know, maybe. But I can see how BPF-side can be declared r/w for
BPF programs, while user-space should be restricted to read-only. I'm
a bit hesitant to artificially couple both together.

>
> Thanks
>
> Roberto
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-11-04 21:10       ` Andrii Nakryiko
@ 2022-11-07 12:11         ` Roberto Sassu
  2022-12-12 16:10           ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-11-07 12:11 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev,
	bpf, LSM List, Paul Moore, Casey Schaufler

On Fri, 2022-11-04 at 14:10 -0700, Andrii Nakryiko wrote:
> On Mon, Oct 31, 2022 at 4:54 AM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
> > On 10/27/2022 6:54 PM, Andrii Nakryiko wrote:
> > > On Wed, Sep 28, 2022 at 1:54 AM Lorenz Bauer <oss@lmb.io> wrote:
> > > > On Thu, 15 Sep 2022, at 11:30, Lorenz Bauer wrote:
> > > > > Hi list,
> > > > > 
> > > > > Here is a summary of the talk I gave at LPC '22 titled "Closing the BPF
> > > > > map permission loophole", with slides at [0].
> > > > 
> > > > I've put this topic on the agenda of the 2022-10-06 BPF office hours to get some maintainer attention. Details are here: https://docs.google.com/spreadsheets/d/1LfrDXZ9-fdhvPEp_LHkxAMYyxxpwBXjywWa0AejEveU/edit
> > > > 
> > > > Best
> > > 
> > > So after the office hours I had an offline whiteboard discussion with
> > > Alexei explaining more precisely what I was proposing, and it became
> > > apparent that some of the things I was proposing weren't exactly
> > > clear, and thus people were left confused about the solution I was
> > > talking about. So I'll try to summarize it a bit and add some more
> > > specifics. Hopefully that will help, because I still believe we can
> > > solve this problem moving forward.
> > > 
> > > But first, two notes.
> > > 
> > > 1) Backporting this is going to be hard, and I don't think that should
> > > be the goal, it's going to be too intrusive, probably.
> > > 
> > > 2) It turned out that we currently don't store user-space-side
> > > read/write permissions on struct bpf_map itself, and we'd need to do
> > > that as a preliminary step here. Below I just assume that struct
> > > bpf_map records all the bpf-side and user-side read/write permissions.
> > 
> > +linux-security-module, Paul, Casey
> > 
> > Thanks Andrii for writing such detailed proposal. It is very clear.
> > 
> > I was thinking about your bpf_map_view abstraction, to record per-fd
> > permission. My question would be, isn't the f_mode enough for this
> > purpose? I mean, if you want to record the access flags per fd, you
> > already have them in f_mode. Apart from map iterators, the eBPF code
> > handling the user space side of map access is already capable of
> > handling and enforcing based on the f_mode.
> > 
> > So, what remains for us to do is to ensure that a requestor gets
> > a fd with modes compatible with what the requestor is allowed to do.
> > 
> > For a moment, I exclude MAC-style controls, as I understood that
> > this should not be the only type of enforcement.
> > 
> > Then, maybe we could treat maps like inodes, meaning that we could
> > add to bpf_map the following fields:
> > 
> > m_uid
> > m_gid
> > m_mode
> > 
> > These fields will be populated at map creation time, depending on
> > who is requesting it. With similar mechanism as for inodes (umask),
> > we can decide the default m_mode (read-write for the owner,
> > read-only for the group and others). These fields are relevant only
> > for the user space side of map access.
> > 
> > We can add two new commands for bpf():
> > 
> > BPF_MAP_CHOWN
> > BPF_MAP_CHMOD
> > 
> > to change the fields above.
> > 
> > I comment below, to see if this alternative proposal works for the
> > use cases you described.
> 
> didn't we establish that we can't trust fd permissions because we
> don't control normal chmod/chown?..

Maybe this part is not clear to me.

We are getting a fd with:


	return anon_inode_getfd("bpf-map", &bpf_map_fops, map,
				flags | O_CLOEXEC);


from bpf_map_new_fd(). This applies for both MAP_GET_FD_BY_ID and
OBJ_GET operations.

Before this, we have the MAC-style access control. We could add DAC-
style there too.

So, once the fd is obtained, there would be a problem if we could
change the flags FMODE_CAN_READ/FMODE_CAN_WRITE without mediation,
right? Would it be possible?

> > > So, the overall idea is that instead of fetching struct bpf_map point
> > > for all kinds of FD-based operations (bpf_obj_get, map_fd_by_id, even
> > > bpf_map_create) we are always working with a view of a map, and that
> > > "view" is a separate struct/object, something like:
> > > 
> > > struct bpf_map_view {
> > >      struct bpf_map *map;
> > >      /* BPF_F_RDONLY | BPF_F_WRONLY, but we can later also add
> > >         BPF-side flags: BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG
> > >      */
> > >      int access_flags;
> > > }
> > > 
> > > So whenever we work with map by FD, we get struct bpf_map_view (i.e.,
> > > we store struct bpf_map_view inside file->private and
> > > inode->i_private). The semantics of view->access_flags is that it is
> > > superimposed on top of bpf_map->map_flags (specifically its
> > > BPF_F_RDONLY | BPF_F_WRONLY parts, later also BPF_F_RDONLY_PROG |
> > > BPF_F_WRONLY_PROG). This means that if struct bpf_map is R/W, but our
> > > current bpf_map_view says BPF_F_RDONLY, then only read-only access is
> > > allowed through that FD. On the other hand, if bpf_map itself is only
> > > BPF_F_RDONLY, but we somehow go bpf_map_view with BPF_F_RDONLY |
> > > BPF_F_WRONLY (e.g., due to chmod loophole), then it doesn't matter,
> > > it's still BPF_F_RDONLY, no write access. We can try preventing such
> > > situations in some circumstances, but as we showed with chmod() it's
> > > impossible to prevent in general.
> > > 
> > > So, just to hopefully make it a bit clearer, let's discuss a use case
> > > that a bunch of people had in mind. Root/CAP_BPF user created R/W
> > > bpf_map, but wants to pin it in one BPFFS path as R/W, so that it can
> > > be later opened as R/W and modified. This BPFFS path will be
> > > restricted through FS permissions to only allow it to be opened by a
> > > privileged user/group. But, that same original root/CAP_BPF user would
> > > like to also create a read-only BPFFS pinning of that same map, and
> > > let unprivileged user(s) to open and work with that map, but only
> > > perform read-only operations (e.g., BPF_MAP_LOOKUP_ELEM command).
> > > 
> > > Let's see how that works in this new bpf_map_view model.
> > > 
> > > 1. root/CAP_BPF user does BPF_MAP_CREATE operation, struct bpf_map is
> > > created with map_flags BPF_F_RDONLY | BPF_F_WRONLY. Also, immediately
> > > struct bpf_map_view is created with same BPF_F_RDONLY | BPF_F_WRONLY
> > > access_flags. struct bpf_map_view keeps reference on struct bpf_map,
> > > struct bpf_map_view is assigned to struct file->private, new FD is
> > > returned to user-space.
> > 
> > Ok, m_uid, m_gid are taken from the current process. m_mode (for the
> > owner) could be set from the map creation flags (0, BPF_F_RDONLY,
> > BPF_F_WRONLY). The fd the owner receives has f_mode compatible with the
> > map creation flags.
> > 
> > > 2. That same root/CAP_BPF user does BPF_OBJ_PIN and specifies that
> > > they want R/W pinning (through file_flags). Kernel clones/copies
> > > struct bpf_map_view (I think bpf_map_view shouldn't be shared between
> > > files/inodes, each file/inode has its own personal copy; but we can
> > > work out the details later), sets (keeps in this case) its
> > > access_flags as  BPF_F_RDONLY | BPF_F_WRONLY. After that they are free
> > > to chown/chmod as necessary to prevent unprivileged user from doing
> > > anything with that BPFFS file, if necessary.
> > 
> > I understand that per-pinned map permissions gives a lot of flexibility.
> > But maybe, the owner/group/others permissions are sufficient to cover
> > most of the use cases. Instead of creating two pinned maps, one
> > read-write and one read-only, we just create one and we define the map
> > m_mode
> > as rw-r--r--.
> > 
> > At the time a requestor wants to get a fd from the pinned map through
> > OBJ_GET, the kernel checks from the process UID/GID if it has
> > permissions in m_mode.
> > 
> > We can keep the permission check on the inode of the pinned map as an
> > additional security control.
> 
> wouldn't inode permissions be too inflexible? E.g., if I created a map
> and I'm in group1, but I want to give read-only access to group2, but
> not to any other group. I can't use other part of permission and
> setting group permissions to r is too restrictive (I want users in my
> group to be able to open r/w view of the map).

Ok, I guess we would need ACLs then. Maybe in addition to simple POSIX
permissions.

> > > 3. Now, for the read-only pinning. User does another BPF_OBJ_PIN using
> > > original R/W map FD, but now they specify file_flags to only allow
> > > BPF_F_RDONLY (I'm too lazy to check what exact flag we pass there to
> > > communicate this intent, it's not that important for this discussion).
> > > At this point, kernel creates a new struct bpf_map_view, pointing to
> > > struct bpf_map, but this time access_flags have only BPF_F_RDONLY set.
> > > Then we proceed to creating an inode, its i_private is assigned this
> > > new R/O bpf_map_view. The user should chmod/chown pinned BPFFS file
> > > appropriately to allow unprivileged users to BPF_OBJ_GET it.
> > 
> > Now that the main access control check is based on m_mode, we might
> > think who can pin a map. Only the owner? Maybe we can reuse the
> > execute permission in m_mode to determine that.
> 
> Why such limitations that only owner should be able to pin? What if
> I'm that read-only user and I want to pin it somewhere else as another
> read-only pinning (for whatever reason, to share with my own
> processes/users).

It was easier to determine the initial permissions/owner of the pinned
map. You get it directly from m_mode/m_uid/m_gid.

If another subject does the pinning, the initial permissions could be
different (only the permissions granted to that subject).

Regarding the use case you mentioned, sharing a read-only pinning with
your processes/users, it could be achieved with simple POSIX
permissions/ACLs. It would not matter if a read-only/read-write fd was
used, because the permissions are checked anyway.

The only impact I see in pinning, from the security point of view, is
increasing the visibility of the map, which may have impact for
confidentiality policies.

> > > Now, let's assume we are unprivileged user who wants to work with that
> > > pinned BPF map. When we do BPF_OBJ_GET on that read-only pinned file,
> > > kernel fetches struct bpf_map_view from inode->i_private which has
> > > access_flags as BPF_F_RDONLY. That's it, there is no way we can do
> > > update on that map, kernel will reject that even though struct bpf_map
> > > itself allows BPF_F_WRONLY.
> > 
> > This should be clear now.
> > 
> > > Note, though, that once we checked everything, as we create a new
> > > struct file and return new FD to user-space, that new struct file will
> > > have *yet another copy* of struct bpf_map_view, cloned from inode's
> > > bpf_map_view (recall that I was proposing to have 1-to-1 mapping
> > > between file/inode and bpf_map_view).
> > > 
> > > 
> > > Let's now assume we are sneaky bastards and chmod that second pinned
> > > BPFFS file to allow r/w file permissions. When we do BPF_OBJ_GET,
> > > again, we'll fetch struct bpf_map_view which enforce BPF_F_RDONLY
> > > (only), despite file itself having writable permissions. We can argue
> > > if we should reject such BPF_OBJ_GET command or silently "downgrade"
> > > to read-only view, that's beside the point.
> > 
> > Ok, yes. Permissions on the pinned map are just an additional barrier.
> > 
> > > Hopefully this is a bit clearer.
> > > 
> > > One last note. When we are talking about BPF_OBJ_GET, we are actually
> > > going to be dealing with 4 layers of read and write permissions:
> > >    1) struct bpf_map's "inherent" permissions
> > >    2) struct bpf_map_view's access_flags
> > >    3) struct file's FS read/write permissions
> > >    4) union bpf_attr's file_flags specified for BPF_OBJ_GET
> > 
> > In my proposal, that would change to:
> > 
> > 1) struct bpf_map m_uid, m_gid, m_mode
> > 2) struct file's FS read/write permission (depends on the inode on
> >     BPFFS)
> > 3) process uid, gid of the requestor
> > 4) union bpf_attr's file_flags specified for BPF_OBJ_GET
> > 
> > > While that's a lot, we always intersect them and keep only the most
> > > restrictive combination. So if at any of the layers we have read-only
> > > permissions, resulting *new struct bpf_map_view* will only specify
> > > BPF_F_RDONLY. E.g., if at layers 1, 2, and 4 we allow BPF_F_WRONLY,
> > > but BPFFS file permission (layer #3 above) at that moment is
> > > read-only, we should be only getting read-only view of BPF map.
> > 
> > Ok, sure. I think more or less the proposals are aligned. If traditional
> > access control is sufficient, we could avoid the increased complexity of
> > the new bpf_map_view layer.
> 
> I think this additional complexity is fundamental to this problem. And
> as I mentioned above, relying just on inode permissions doesn't seem
> sufficient. But maybe I missed something in your proposal.

Assuming that we can mediate or it is not possible to clear/set
FMODE_CAN_READ/FMODE_CAN_WRITE, fd has already the concept of view
(capability?).

Fd modes hold what DAC/MAC granted to you, and give you the ability to
perform compatible operations.

The only missing part, is to do better mediation, e.g. by introducing
DAC in addition to MAC.

> > > P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG |
> > > BPF_F_WRONLY_PROG as well, it's just that we'll need to define how
> > > user will control that. E.g., FS read-only permission, does it
> > > restrict both user-space and BPF-view, or just user-space view? We can
> > > certainly extend file_flags to allow users to get BPF-side read-only
> > > and user-space-side read-write BPF map FD, for example. Obviously, BPF
> > > verifier would need to know about struct bpf_map_view when accepting
> > > BPF map FD in ldimm64 and such.
> > 
> > I guess, this patch could be used:
> > 
> > https://lore.kernel.org/bpf/20220926154430.1552800-3-roberto.sassu@huaweicloud.com/
> > 
> > When passing a fd to an eBPF program, the permissions of the user space
> > side cannot exceed those defined from eBPF program side.
> 
> Don't know, maybe. But I can see how BPF-side can be declared r/w for
> BPF programs, while user-space should be restricted to read-only. I'm
> a bit hesitant to artificially couple both together.

Ok. At least what I would do is to forbid write, if you provide a read-
only fd.

Thanks

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-11-07 12:11         ` Roberto Sassu
@ 2022-12-12 16:10           ` Roberto Sassu
  2022-12-12 17:07             ` Alexei Starovoitov
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-12-12 16:10 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: Lorenz Bauer, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau, KP Singh, Stanislav Fomichev,
	bpf, LSM List, Paul Moore, Casey Schaufler

On Mon, 2022-11-07 at 13:11 +0100, Roberto Sassu wrote:

[...]

> > > > P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG |
> > > > BPF_F_WRONLY_PROG as well, it's just that we'll need to define how
> > > > user will control that. E.g., FS read-only permission, does it
> > > > restrict both user-space and BPF-view, or just user-space view? We can
> > > > certainly extend file_flags to allow users to get BPF-side read-only
> > > > and user-space-side read-write BPF map FD, for example. Obviously, BPF
> > > > verifier would need to know about struct bpf_map_view when accepting
> > > > BPF map FD in ldimm64 and such.
> > > 
> > > I guess, this patch could be used:
> > > 
> > > https://lore.kernel.org/bpf/20220926154430.1552800-3-roberto.sassu@huaweicloud.com/
> > > 
> > > When passing a fd to an eBPF program, the permissions of the user space
> > > side cannot exceed those defined from eBPF program side.
> > 
> > Don't know, maybe. But I can see how BPF-side can be declared r/w for
> > BPF programs, while user-space should be restricted to read-only. I'm
> > a bit hesitant to artificially couple both together.
> 
> Ok. At least what I would do is to forbid write, if you provide a read-
> only fd.

Ok, we didn't do too much progress for a while. I would like to resume
the discussion.

Can we start from the first point Lorenz mentioned? Given a read-only
map fd, it is not possible to write to the map. Can we make sure that
this properly work?

In my opinion, to achieve this particular goal, the map view
abstraction Andrii suggested, should not be necessary. For pinning too,
I think incremental changes on top of the ones suggested below would be
sufficient, so I would rather discuss them separately.
 
As far as I know, there are two open issues that prevent Lorenz's
assertion on read-only map fds from being true. They are:

- map iterator
- map fd injected to eBPF programs

In both cases, a read-only map fd is sufficient to cause a map update.

I have proposed two patches to address the issues above:

https://lore.kernel.org/bpf/20220906170301.256206-2-roberto.sassu@huaweicloud.com/

If an iterator allows the key or value to be modified by an eBPF
program, ensure that the map fd passed to it is read-write. Otherwise,
read-only is sufficient if both cannot be modified.


https://lore.kernel.org/bpf/20220926154430.1552800-3-roberto.sassu@huaweicloud.com/

Let the verifier allow the minimum operations granted by eBPF-side
permissions and by fd modes. The intersection needs to be applied
because the map can be modified by the eBPF program through map
helpers, so it is eBPF-side, but at the same time whoever granted the
requestor a map fd expects that the permissions included in that fd are
enforced by any function using it.

I believe that with these patches Lorenz's assertion of a read-only map
fd would be true. I'm not aware of other ways which would make the
assertion false.

Thanks

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-12-12 16:10           ` Roberto Sassu
@ 2022-12-12 17:07             ` Alexei Starovoitov
  2022-12-12 18:19               ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Alexei Starovoitov @ 2022-12-12 17:07 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Andrii Nakryiko, Lorenz Bauer, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, KP Singh,
	Stanislav Fomichev, bpf, LSM List, Paul Moore, Casey Schaufler

On Mon, Dec 12, 2022 at 8:11 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> On Mon, 2022-11-07 at 13:11 +0100, Roberto Sassu wrote:
>
> [...]
>
> > > > > P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG |
> > > > > BPF_F_WRONLY_PROG as well, it's just that we'll need to define how
> > > > > user will control that. E.g., FS read-only permission, does it
> > > > > restrict both user-space and BPF-view, or just user-space view? We can
> > > > > certainly extend file_flags to allow users to get BPF-side read-only
> > > > > and user-space-side read-write BPF map FD, for example. Obviously, BPF
> > > > > verifier would need to know about struct bpf_map_view when accepting
> > > > > BPF map FD in ldimm64 and such.
> > > >
> > > > I guess, this patch could be used:
> > > >
> > > > https://lore.kernel.org/bpf/20220926154430.1552800-3-roberto.sassu@huaweicloud.com/
> > > >
> > > > When passing a fd to an eBPF program, the permissions of the user space
> > > > side cannot exceed those defined from eBPF program side.
> > >
> > > Don't know, maybe. But I can see how BPF-side can be declared r/w for
> > > BPF programs, while user-space should be restricted to read-only. I'm
> > > a bit hesitant to artificially couple both together.
> >
> > Ok. At least what I would do is to forbid write, if you provide a read-
> > only fd.
>
> Ok, we didn't do too much progress for a while. I would like to resume
> the discussion.
>
> Can we start from the first point Lorenz mentioned? Given a read-only
> map fd, it is not possible to write to the map. Can we make sure that
> this properly work?
>
> In my opinion, to achieve this particular goal, the map view
> abstraction Andrii suggested, should not be necessary.

What do you 'not necessary' ?
afair the map view abstraction is only one that actually addresses
all the issues.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-12-12 17:07             ` Alexei Starovoitov
@ 2022-12-12 18:19               ` Roberto Sassu
  2022-12-16 10:23                 ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-12-12 18:19 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Lorenz Bauer, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, KP Singh,
	Stanislav Fomichev, bpf, LSM List, Paul Moore, Casey Schaufler

On Mon, 2022-12-12 at 09:07 -0800, Alexei Starovoitov wrote:
> On Mon, Dec 12, 2022 at 8:11 AM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
> > On Mon, 2022-11-07 at 13:11 +0100, Roberto Sassu wrote:
> > 
> > [...]
> > 
> > > > > > P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG |
> > > > > > BPF_F_WRONLY_PROG as well, it's just that we'll need to define how
> > > > > > user will control that. E.g., FS read-only permission, does it
> > > > > > restrict both user-space and BPF-view, or just user-space view? We can
> > > > > > certainly extend file_flags to allow users to get BPF-side read-only
> > > > > > and user-space-side read-write BPF map FD, for example. Obviously, BPF
> > > > > > verifier would need to know about struct bpf_map_view when accepting
> > > > > > BPF map FD in ldimm64 and such.
> > > > > 
> > > > > I guess, this patch could be used:
> > > > > 
> > > > > https://lore.kernel.org/bpf/20220926154430.1552800-3-roberto.sassu@huaweicloud.com/
> > > > > 
> > > > > When passing a fd to an eBPF program, the permissions of the user space
> > > > > side cannot exceed those defined from eBPF program side.
> > > > 
> > > > Don't know, maybe. But I can see how BPF-side can be declared r/w for
> > > > BPF programs, while user-space should be restricted to read-only. I'm
> > > > a bit hesitant to artificially couple both together.
> > > 
> > > Ok. At least what I would do is to forbid write, if you provide a read-
> > > only fd.
> > 
> > Ok, we didn't do too much progress for a while. I would like to resume
> > the discussion.
> > 
> > Can we start from the first point Lorenz mentioned? Given a read-only
> > map fd, it is not possible to write to the map. Can we make sure that
> > this properly work?
> > 
> > In my opinion, to achieve this particular goal, the map view
> > abstraction Andrii suggested, should not be necessary.
> 
> What do you 'not necessary' ?
> afair the map view abstraction is only one that actually addresses
> all the issues.

For the first issue, map iterators, you need to ensure that the fd is
read-write if the key/value can be modified.

For the second issue, fd modes ignored by the verifier, you need to
restrict operations on the map, to meet the expectactions of whoever
granted the fd to the requestor (as Lorenz said, if you have a read-
only fd, you should not be able to write to the map).

Maybe I missed something, I didn't get how the map view abstraction
could help better in these cases.

Thanks

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-12-12 18:19               ` Roberto Sassu
@ 2022-12-16 10:23                 ` Roberto Sassu
  2022-12-20 20:44                   ` Paul Moore
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-12-16 10:23 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Andrii Nakryiko, Lorenz Bauer, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, KP Singh,
	Stanislav Fomichev, bpf, LSM List, Paul Moore, Casey Schaufler

On Mon, 2022-12-12 at 19:19 +0100, Roberto Sassu wrote:
> On Mon, 2022-12-12 at 09:07 -0800, Alexei Starovoitov wrote:
> > On Mon, Dec 12, 2022 at 8:11 AM Roberto Sassu
> > <roberto.sassu@huaweicloud.com> wrote:
> > > On Mon, 2022-11-07 at 13:11 +0100, Roberto Sassu wrote:
> > > 
> > > [...]
> > > 
> > > > > > > P.S. We can extend this to BPF-side BPF_F_RDONLY_PROG |
> > > > > > > BPF_F_WRONLY_PROG as well, it's just that we'll need to define how
> > > > > > > user will control that. E.g., FS read-only permission, does it
> > > > > > > restrict both user-space and BPF-view, or just user-space view? We can
> > > > > > > certainly extend file_flags to allow users to get BPF-side read-only
> > > > > > > and user-space-side read-write BPF map FD, for example. Obviously, BPF
> > > > > > > verifier would need to know about struct bpf_map_view when accepting
> > > > > > > BPF map FD in ldimm64 and such.
> > > > > > 
> > > > > > I guess, this patch could be used:
> > > > > > 
> > > > > > https://lore.kernel.org/bpf/20220926154430.1552800-3-roberto.sassu@huaweicloud.com/
> > > > > > 
> > > > > > When passing a fd to an eBPF program, the permissions of the user space
> > > > > > side cannot exceed those defined from eBPF program side.
> > > > > 
> > > > > Don't know, maybe. But I can see how BPF-side can be declared r/w for
> > > > > BPF programs, while user-space should be restricted to read-only. I'm
> > > > > a bit hesitant to artificially couple both together.
> > > > 
> > > > Ok. At least what I would do is to forbid write, if you provide a read-
> > > > only fd.
> > > 
> > > Ok, we didn't do too much progress for a while. I would like to resume
> > > the discussion.
> > > 
> > > Can we start from the first point Lorenz mentioned? Given a read-only
> > > map fd, it is not possible to write to the map. Can we make sure that
> > > this properly work?
> > > 
> > > In my opinion, to achieve this particular goal, the map view
> > > abstraction Andrii suggested, should not be necessary.
> > 
> > What do you 'not necessary' ?
> > afair the map view abstraction is only one that actually addresses
> > all the issues.
> 
> For the first issue, map iterators, you need to ensure that the fd is
> read-write if the key/value can be modified.
> 
> For the second issue, fd modes ignored by the verifier, you need to
> restrict operations on the map, to meet the expectactions of whoever
> granted the fd to the requestor (as Lorenz said, if you have a read-
> only fd, you should not be able to write to the map).
> 
> Maybe I missed something, I didn't get how the map view abstraction
> could help better in these cases.

Ok, let me try to complete the solution for the issues Lorenz pointed
out. Here I discuss only the system call side of access.

I was thinking on the meaning of the permissions on the inode of a
pinned eBPF object. Given that the object exists without pinning, this
double check of permissions first on the inode and then on the object
to me looks very confusing.

So, here is a proposal: what if read and write in the context of
pinning don't refer to accessing the eBPF object itself but to the
ability to read the association between inode and eBPF object or to
write/replace the association with a different eBPF object (I guess not
supported now).

We continue to do access control only at the time a requestor asks for
a fd. Currently there is only MAC, but we can add DAC and POSIX ACL too
(Andrii wanted to give read permission to a specific group). The owner
is who created the eBPF object and who can decide (for DAC and ACL) who
can access that object.

The requestor obtains a fd with modes depending on what was granted. Fd
modes (current behavior) give the requestor the ability to do certain
operations. It is responsibility of the function performing the
operation on an eBPF object to check the fd modes first.

It does not matter if the eBPF object is accessed through ID or inode,
access control is solely based on who is accessing the object, who
created it and the object permissions. *_GET_FD_BY_ID and OBJ_GET
operations will have the same access control.

With my new proposal, once an eBPF object is pinned the owner or
whoever can access the inode could do chown/chmod. But this does not
have effect on the permissions of the object. It changes only who can
retrieve the association with the eBPF object itself.

Permissions on the eBPF object could be changed with the bpf() syscall
and with new operations (such as OBJ_CHOWN, OBJ_CHMOD). These
operations are of course subject to access control too.

The last part is who can do pinning. Again, an eBPF object can be
pinned several times by different users. It won't affect who can access
the object, but only who can access the association between inode and
eBPF object.

We can make things very simple: whoever is able to read the association
is granted with the privilege to pin the eBPF object again.

One could ask what happens if a user has only read permission on an
inode created by someone else, but has also write permission on a new
inode the user creates by pinning the eBPF object again (I assume that
changing the association makes sense). Well, that user is the owner of
the inode. If the user wants other users accessing it to see a
different eBPF object, it is the user's decision.

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-12-16 10:23                 ` Roberto Sassu
@ 2022-12-20 20:44                   ` Paul Moore
  2022-12-21  9:53                     ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Paul Moore @ 2022-12-20 20:44 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Alexei Starovoitov, Andrii Nakryiko, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf, LSM List,
	Casey Schaufler

On Fri, Dec 16, 2022 at 5:24 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
> Ok, let me try to complete the solution for the issues Lorenz pointed
> out. Here I discuss only the system call side of access.
>
> I was thinking on the meaning of the permissions on the inode of a
> pinned eBPF object. Given that the object exists without pinning, this
> double check of permissions first on the inode and then on the object
> to me looks very confusing.
>
> So, here is a proposal: what if read and write in the context of
> pinning don't refer to accessing the eBPF object itself but to the
> ability to read the association between inode and eBPF object or to
> write/replace the association with a different eBPF object (I guess not
> supported now).
>
> We continue to do access control only at the time a requestor asks for
> a fd. Currently there is only MAC, but we can add DAC and POSIX ACL too
> (Andrii wanted to give read permission to a specific group). The owner
> is who created the eBPF object and who can decide (for DAC and ACL) who
> can access that object.
>
> The requestor obtains a fd with modes depending on what was granted. Fd
> modes (current behavior) give the requestor the ability to do certain
> operations. It is responsibility of the function performing the
> operation on an eBPF object to check the fd modes first.
>
> It does not matter if the eBPF object is accessed through ID or inode,
> access control is solely based on who is accessing the object, who
> created it and the object permissions. *_GET_FD_BY_ID and OBJ_GET
> operations will have the same access control.
>
> With my new proposal, once an eBPF object is pinned the owner or
> whoever can access the inode could do chown/chmod. But this does not
> have effect on the permissions of the object. It changes only who can
> retrieve the association with the eBPF object itself.

Just to make sure I understand you correctly, you're suggesting that
the access modes assigned to a pinned map's fd are simply what is
requested by the caller, and don't necessarily represent the access
control modes of the underlying map, is that correct?  That seems a
little odd to me, but I'll once again admit that I'm not familiar with
all of the subtle nuances around eBPF maps.  I could understand
allowing a process to grab a map fd where the access modes are bounded
by the map's access modes, e.g. a read-only fd for a read-write map;
however, that only makes sense if all of the map operations for *that
process* are gated by the access control policy of the fd and not
necessarily the map itself.  If the two access policies were disjoint
(fd/map), one could/should do permission checks between the calling
process and both the fd and the map ... although I'm having a hard
time trying to think of a valid use case where a map's fd would have a
*more* permissive access control policy than the map itself, I'm not
sure that makes sense.

> Permissions on the eBPF object could be changed with the bpf() syscall
> and with new operations (such as OBJ_CHOWN, OBJ_CHMOD). These
> operations are of course subject to access control too.
>
> The last part is who can do pinning. Again, an eBPF object can be
> pinned several times by different users. It won't affect who can access
> the object, but only who can access the association between inode and
> eBPF object.
>
> We can make things very simple: whoever is able to read the association
> is granted with the privilege to pin the eBPF object again.
>
> One could ask what happens if a user has only read permission on an
> inode created by someone else, but has also write permission on a new
> inode the user creates by pinning the eBPF object again (I assume that
> changing the association makes sense). Well, that user is the owner of
> the inode. If the user wants other users accessing it to see a
> different eBPF object, it is the user's decision.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-12-20 20:44                   ` Paul Moore
@ 2022-12-21  9:53                     ` Roberto Sassu
  2022-12-22  0:55                       ` Paul Moore
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2022-12-21  9:53 UTC (permalink / raw)
  To: Paul Moore
  Cc: Alexei Starovoitov, Andrii Nakryiko, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf, LSM List,
	Casey Schaufler

On 12/20/2022 9:44 PM, Paul Moore wrote:
> On Fri, Dec 16, 2022 at 5:24 AM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
>> Ok, let me try to complete the solution for the issues Lorenz pointed
>> out. Here I discuss only the system call side of access.
>>
>> I was thinking on the meaning of the permissions on the inode of a
>> pinned eBPF object. Given that the object exists without pinning, this
>> double check of permissions first on the inode and then on the object
>> to me looks very confusing.
>>
>> So, here is a proposal: what if read and write in the context of
>> pinning don't refer to accessing the eBPF object itself but to the
>> ability to read the association between inode and eBPF object or to
>> write/replace the association with a different eBPF object (I guess not
>> supported now).
>>
>> We continue to do access control only at the time a requestor asks for
>> a fd. Currently there is only MAC, but we can add DAC and POSIX ACL too
>> (Andrii wanted to give read permission to a specific group). The owner
>> is who created the eBPF object and who can decide (for DAC and ACL) who
>> can access that object.
>>
>> The requestor obtains a fd with modes depending on what was granted. Fd
>> modes (current behavior) give the requestor the ability to do certain
>> operations. It is responsibility of the function performing the
>> operation on an eBPF object to check the fd modes first.
>>
>> It does not matter if the eBPF object is accessed through ID or inode,
>> access control is solely based on who is accessing the object, who
>> created it and the object permissions. *_GET_FD_BY_ID and OBJ_GET
>> operations will have the same access control.
>>
>> With my new proposal, once an eBPF object is pinned the owner or
>> whoever can access the inode could do chown/chmod. But this does not
>> have effect on the permissions of the object. It changes only who can
>> retrieve the association with the eBPF object itself.
> 
> Just to make sure I understand you correctly, you're suggesting that
> the access modes assigned to a pinned map's fd are simply what is
> requested by the caller, and don't necessarily represent the access
> control modes of the underlying map, is that correct?  That seems a

The fd modes don't necessarily represent the access control modes of the 
inode the map is pinned to. But they surely represent the access control 
modes of the map object itself.

The access control modes of the inode tell if the requestor is able to 
retrieve the map from it, before accessing the map is attempted. But, 
even if the request is granted (i.e. the inode has read permission), the 
requestor has still to pass access control on the map object, which is 
separate.

> little odd to me, but I'll once again admit that I'm not familiar with
> all of the subtle nuances around eBPF maps.  I could understand
> allowing a process to grab a map fd where the access modes are bounded
> by the map's access modes, e.g. a read-only fd for a read-write map;
> however, that only makes sense if all of the map operations for *that
> process* are gated by the access control policy of the fd and not
> necessarily the map itself.  If the two access policies were disjoint
> (fd/map), one could/should do permission checks between the calling
> process and both the fd and the map ... although I'm having a hard
> time trying to think of a valid use case where a map's fd would have a
> *more* permissive access control policy than the map itself, I'm not
> sure that makes sense.

Fd modes are bound to the map access modes, but not necessarily bound to 
the inode access modes (fd with write mode, on an inode with only read 
permission). Fd modes are later enforced by map operations by checking 
the compatibility of the operation (e.g. read-like operation requires fd 
read mode).

The last point is what it means getting a fd on the inode itself. It is 
possible, because inodes could have seq_file operations. Thus, one could 
dump the map content by just reading from the inode.

Here, I suggest that we still do two separate checks. One is for the 
open(), done by the VFS, and the other to access the map object. Not 
having read permission on the inode means that the map content cannot be 
dumped. But, having read permission on the inode does not imply the 
ability to do it (still the map object check has to be passed).

Roberto

>> Permissions on the eBPF object could be changed with the bpf() syscall
>> and with new operations (such as OBJ_CHOWN, OBJ_CHMOD). These
>> operations are of course subject to access control too.
>>
>> The last part is who can do pinning. Again, an eBPF object can be
>> pinned several times by different users. It won't affect who can access
>> the object, but only who can access the association between inode and
>> eBPF object.
>>
>> We can make things very simple: whoever is able to read the association
>> is granted with the privilege to pin the eBPF object again.
>>
>> One could ask what happens if a user has only read permission on an
>> inode created by someone else, but has also write permission on a new
>> inode the user creates by pinning the eBPF object again (I assume that
>> changing the association makes sense). Well, that user is the owner of
>> the inode. If the user wants other users accessing it to see a
>> different eBPF object, it is the user's decision.
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-12-21  9:53                     ` Roberto Sassu
@ 2022-12-22  0:55                       ` Paul Moore
  2023-01-10  9:11                         ` Roberto Sassu
  0 siblings, 1 reply; 45+ messages in thread
From: Paul Moore @ 2022-12-22  0:55 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Alexei Starovoitov, Andrii Nakryiko, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf, LSM List,
	Casey Schaufler

On Wed, Dec 21, 2022 at 4:54 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
> On 12/20/2022 9:44 PM, Paul Moore wrote:
> > On Fri, Dec 16, 2022 at 5:24 AM Roberto Sassu
> > <roberto.sassu@huaweicloud.com> wrote:
> >> Ok, let me try to complete the solution for the issues Lorenz pointed
> >> out. Here I discuss only the system call side of access.
> >>
> >> I was thinking on the meaning of the permissions on the inode of a
> >> pinned eBPF object. Given that the object exists without pinning, this
> >> double check of permissions first on the inode and then on the object
> >> to me looks very confusing.
> >>
> >> So, here is a proposal: what if read and write in the context of
> >> pinning don't refer to accessing the eBPF object itself but to the
> >> ability to read the association between inode and eBPF object or to
> >> write/replace the association with a different eBPF object (I guess not
> >> supported now).
> >>
> >> We continue to do access control only at the time a requestor asks for
> >> a fd. Currently there is only MAC, but we can add DAC and POSIX ACL too
> >> (Andrii wanted to give read permission to a specific group). The owner
> >> is who created the eBPF object and who can decide (for DAC and ACL) who
> >> can access that object.
> >>
> >> The requestor obtains a fd with modes depending on what was granted. Fd
> >> modes (current behavior) give the requestor the ability to do certain
> >> operations. It is responsibility of the function performing the
> >> operation on an eBPF object to check the fd modes first.
> >>
> >> It does not matter if the eBPF object is accessed through ID or inode,
> >> access control is solely based on who is accessing the object, who
> >> created it and the object permissions. *_GET_FD_BY_ID and OBJ_GET
> >> operations will have the same access control.
> >>
> >> With my new proposal, once an eBPF object is pinned the owner or
> >> whoever can access the inode could do chown/chmod. But this does not
> >> have effect on the permissions of the object. It changes only who can
> >> retrieve the association with the eBPF object itself.
> >
> > Just to make sure I understand you correctly, you're suggesting that
> > the access modes assigned to a pinned map's fd are simply what is
> > requested by the caller, and don't necessarily represent the access
> > control modes of the underlying map, is that correct?  That seems a
>
> The fd modes don't necessarily represent the access control modes of the
> inode the map is pinned to. But they surely represent the access control
> modes of the map object itself.
>
> The access control modes of the inode tell if the requestor is able to
> retrieve the map from it, before accessing the map is attempted. But,
> even if the request is granted (i.e. the inode has read permission), the
> requestor has still to pass access control on the map object, which is
> separate.

Okay, good.  That should work.

> Fd modes are bound to the map access modes, but not necessarily bound to
> the inode access modes (fd with write mode, on an inode with only read
> permission). Fd modes are later enforced by map operations by checking
> the compatibility of the operation (e.g. read-like operation requires fd
> read mode).
>
> The last point is what it means getting a fd on the inode itself. It is
> possible, because inodes could have seq_file operations. Thus, one could
> dump the map content by just reading from the inode.

Gotcha, yes, that would be bad.

> Here, I suggest that we still do two separate checks. One is for the
> open(), done by the VFS, and the other to access the map object. Not
> having read permission on the inode means that the map content cannot be
> dumped. But, having read permission on the inode does not imply the
> ability to do it (still the map object check has to be passed).

That makes sense to me.

-- 
paul-moore.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2022-12-22  0:55                       ` Paul Moore
@ 2023-01-10  9:11                         ` Roberto Sassu
  2023-01-13 23:44                           ` Andrii Nakryiko
  0 siblings, 1 reply; 45+ messages in thread
From: Roberto Sassu @ 2023-01-10  9:11 UTC (permalink / raw)
  To: Paul Moore
  Cc: Alexei Starovoitov, Andrii Nakryiko, Lorenz Bauer,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, KP Singh, Stanislav Fomichev, bpf, LSM List,
	Casey Schaufler

On Wed, 2022-12-21 at 19:55 -0500, Paul Moore wrote:
> On Wed, Dec 21, 2022 at 4:54 AM Roberto Sassu
> <roberto.sassu@huaweicloud.com> wrote:
> > On 12/20/2022 9:44 PM, Paul Moore wrote:
> > > On Fri, Dec 16, 2022 at 5:24 AM Roberto Sassu
> > > <roberto.sassu@huaweicloud.com> wrote:
> > > > Ok, let me try to complete the solution for the issues Lorenz pointed
> > > > out. Here I discuss only the system call side of access.
> > > > 
> > > > I was thinking on the meaning of the permissions on the inode of a
> > > > pinned eBPF object. Given that the object exists without pinning, this
> > > > double check of permissions first on the inode and then on the object
> > > > to me looks very confusing.
> > > > 
> > > > So, here is a proposal: what if read and write in the context of
> > > > pinning don't refer to accessing the eBPF object itself but to the
> > > > ability to read the association between inode and eBPF object or to
> > > > write/replace the association with a different eBPF object (I guess not
> > > > supported now).
> > > > 
> > > > We continue to do access control only at the time a requestor asks for
> > > > a fd. Currently there is only MAC, but we can add DAC and POSIX ACL too
> > > > (Andrii wanted to give read permission to a specific group). The owner
> > > > is who created the eBPF object and who can decide (for DAC and ACL) who
> > > > can access that object.
> > > > 
> > > > The requestor obtains a fd with modes depending on what was granted. Fd
> > > > modes (current behavior) give the requestor the ability to do certain
> > > > operations. It is responsibility of the function performing the
> > > > operation on an eBPF object to check the fd modes first.
> > > > 
> > > > It does not matter if the eBPF object is accessed through ID or inode,
> > > > access control is solely based on who is accessing the object, who
> > > > created it and the object permissions. *_GET_FD_BY_ID and OBJ_GET
> > > > operations will have the same access control.
> > > > 
> > > > With my new proposal, once an eBPF object is pinned the owner or
> > > > whoever can access the inode could do chown/chmod. But this does not
> > > > have effect on the permissions of the object. It changes only who can
> > > > retrieve the association with the eBPF object itself.
> > > 
> > > Just to make sure I understand you correctly, you're suggesting that
> > > the access modes assigned to a pinned map's fd are simply what is
> > > requested by the caller, and don't necessarily represent the access
> > > control modes of the underlying map, is that correct?  That seems a
> > 
> > The fd modes don't necessarily represent the access control modes of the
> > inode the map is pinned to. But they surely represent the access control
> > modes of the map object itself.
> > 
> > The access control modes of the inode tell if the requestor is able to
> > retrieve the map from it, before accessing the map is attempted. But,
> > even if the request is granted (i.e. the inode has read permission), the
> > requestor has still to pass access control on the map object, which is
> > separate.
> 
> Okay, good.  That should work.
> 
> > Fd modes are bound to the map access modes, but not necessarily bound to
> > the inode access modes (fd with write mode, on an inode with only read
> > permission). Fd modes are later enforced by map operations by checking
> > the compatibility of the operation (e.g. read-like operation requires fd
> > read mode).
> > 
> > The last point is what it means getting a fd on the inode itself. It is
> > possible, because inodes could have seq_file operations. Thus, one could
> > dump the map content by just reading from the inode.
> 
> Gotcha, yes, that would be bad.
> 
> > Here, I suggest that we still do two separate checks. One is for the
> > open(), done by the VFS, and the other to access the map object. Not
> > having read permission on the inode means that the map content cannot be
> > dumped. But, having read permission on the inode does not imply the
> > ability to do it (still the map object check has to be passed).
> 
> That makes sense to me.

Andrii, Lorenz, what do you think about the new interpretation of the
permissions of the inode of a pinned map?

Thanks

Roberto


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: Closing the BPF map permission loophole
  2023-01-10  9:11                         ` Roberto Sassu
@ 2023-01-13 23:44                           ` Andrii Nakryiko
  0 siblings, 0 replies; 45+ messages in thread
From: Andrii Nakryiko @ 2023-01-13 23:44 UTC (permalink / raw)
  To: Roberto Sassu
  Cc: Paul Moore, Alexei Starovoitov, Lorenz Bauer, Alexei Starovoitov,
	Daniel Borkmann, Andrii Nakryiko, Martin KaFai Lau, KP Singh,
	Stanislav Fomichev, bpf, LSM List, Casey Schaufler

On Tue, Jan 10, 2023 at 1:12 AM Roberto Sassu
<roberto.sassu@huaweicloud.com> wrote:
>
> On Wed, 2022-12-21 at 19:55 -0500, Paul Moore wrote:
> > On Wed, Dec 21, 2022 at 4:54 AM Roberto Sassu
> > <roberto.sassu@huaweicloud.com> wrote:
> > > On 12/20/2022 9:44 PM, Paul Moore wrote:
> > > > On Fri, Dec 16, 2022 at 5:24 AM Roberto Sassu
> > > > <roberto.sassu@huaweicloud.com> wrote:
> > > > > Ok, let me try to complete the solution for the issues Lorenz pointed
> > > > > out. Here I discuss only the system call side of access.
> > > > >
> > > > > I was thinking on the meaning of the permissions on the inode of a
> > > > > pinned eBPF object. Given that the object exists without pinning, this
> > > > > double check of permissions first on the inode and then on the object
> > > > > to me looks very confusing.
> > > > >
> > > > > So, here is a proposal: what if read and write in the context of
> > > > > pinning don't refer to accessing the eBPF object itself but to the
> > > > > ability to read the association between inode and eBPF object or to
> > > > > write/replace the association with a different eBPF object (I guess not
> > > > > supported now).
> > > > >
> > > > > We continue to do access control only at the time a requestor asks for
> > > > > a fd. Currently there is only MAC, but we can add DAC and POSIX ACL too
> > > > > (Andrii wanted to give read permission to a specific group). The owner
> > > > > is who created the eBPF object and who can decide (for DAC and ACL) who
> > > > > can access that object.
> > > > >
> > > > > The requestor obtains a fd with modes depending on what was granted. Fd
> > > > > modes (current behavior) give the requestor the ability to do certain
> > > > > operations. It is responsibility of the function performing the
> > > > > operation on an eBPF object to check the fd modes first.
> > > > >
> > > > > It does not matter if the eBPF object is accessed through ID or inode,
> > > > > access control is solely based on who is accessing the object, who
> > > > > created it and the object permissions. *_GET_FD_BY_ID and OBJ_GET
> > > > > operations will have the same access control.
> > > > >
> > > > > With my new proposal, once an eBPF object is pinned the owner or
> > > > > whoever can access the inode could do chown/chmod. But this does not
> > > > > have effect on the permissions of the object. It changes only who can
> > > > > retrieve the association with the eBPF object itself.
> > > >
> > > > Just to make sure I understand you correctly, you're suggesting that
> > > > the access modes assigned to a pinned map's fd are simply what is
> > > > requested by the caller, and don't necessarily represent the access
> > > > control modes of the underlying map, is that correct?  That seems a
> > >
> > > The fd modes don't necessarily represent the access control modes of the
> > > inode the map is pinned to. But they surely represent the access control
> > > modes of the map object itself.
> > >
> > > The access control modes of the inode tell if the requestor is able to
> > > retrieve the map from it, before accessing the map is attempted. But,
> > > even if the request is granted (i.e. the inode has read permission), the
> > > requestor has still to pass access control on the map object, which is
> > > separate.
> >
> > Okay, good.  That should work.
> >
> > > Fd modes are bound to the map access modes, but not necessarily bound to
> > > the inode access modes (fd with write mode, on an inode with only read
> > > permission). Fd modes are later enforced by map operations by checking
> > > the compatibility of the operation (e.g. read-like operation requires fd
> > > read mode).
> > >
> > > The last point is what it means getting a fd on the inode itself. It is
> > > possible, because inodes could have seq_file operations. Thus, one could
> > > dump the map content by just reading from the inode.
> >
> > Gotcha, yes, that would be bad.
> >
> > > Here, I suggest that we still do two separate checks. One is for the
> > > open(), done by the VFS, and the other to access the map object. Not
> > > having read permission on the inode means that the map content cannot be
> > > dumped. But, having read permission on the inode does not imply the
> > > ability to do it (still the map object check has to be passed).
> >
> > That makes sense to me.
>
> Andrii, Lorenz, what do you think about the new interpretation of the
> permissions of the inode of a pinned map?

Hi Roberto,

Sorry, I've lost track of all these intricacies a while ago. I'd like
to hear from Lorenz as well. My role here was to expand on what I
meant by "BPF map view" back then at BPF office hours and why I think
it solves all the problems mentioned earlier.

You seem to be proposing some alternative, but I'm not clear why we
need an alternative, given BPF map view seems to be the right
abstraction here? Either way, I think Lorenz was passionate about
solving this problem, so would certainly help to get his input as
well.

Thanks.

>
> Thanks
>
> Roberto
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2023-01-13 23:44 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-15 10:30 Closing the BPF map permission loophole Lorenz Bauer
2022-09-16  3:03 ` Kumar Kartikeya Dwivedi
2022-09-22 14:57   ` Lorenz Bauer
2022-09-17 15:42 ` Stanislav Fomichev
2022-09-21 16:32 ` Roberto Sassu
2022-09-22 14:47   ` Lorenz Bauer
2022-09-22 15:39     ` Roberto Sassu
2022-09-22 16:21       ` Lorenz Bauer
2022-09-23  7:39         ` Roberto Sassu
2022-09-26 15:35           ` Lorenz Bauer
2022-09-26 16:18             ` Roberto Sassu
2022-09-28  8:52               ` Lorenz Bauer
2022-09-28  9:42                 ` Roberto Sassu
2022-09-28 10:33                   ` Toke Høiland-Jørgensen
2022-09-28 11:23                     ` Roberto Sassu
2022-09-29  0:24                       ` Paul Moore
2022-09-29  7:54                         ` Roberto Sassu
2022-09-29 15:27                           ` Casey Schaufler
2022-09-30  7:42                             ` Roberto Sassu
2022-09-29 22:30                           ` Paul Moore
2022-09-30  9:56                             ` Roberto Sassu
2022-09-30 20:43                               ` Paul Moore
2022-10-04  8:03                                 ` Roberto Sassu
2022-09-22 16:29 ` Lorenz Bauer
2022-09-22 23:21   ` Hao Luo
2022-09-26 15:41     ` Lorenz Bauer
2022-09-26 18:39       ` Hao Luo
2022-09-28  8:54 ` Lorenz Bauer
2022-10-06  7:15   ` Roberto Sassu
2022-10-27 16:54   ` Andrii Nakryiko
2022-10-28  6:27     ` Greg KH
2022-10-28 17:17       ` Andrii Nakryiko
2022-10-29 11:18     ` Lorenz Bauer
2022-10-31 11:53     ` Roberto Sassu
2022-11-04 21:10       ` Andrii Nakryiko
2022-11-07 12:11         ` Roberto Sassu
2022-12-12 16:10           ` Roberto Sassu
2022-12-12 17:07             ` Alexei Starovoitov
2022-12-12 18:19               ` Roberto Sassu
2022-12-16 10:23                 ` Roberto Sassu
2022-12-20 20:44                   ` Paul Moore
2022-12-21  9:53                     ` Roberto Sassu
2022-12-22  0:55                       ` Paul Moore
2023-01-10  9:11                         ` Roberto Sassu
2023-01-13 23:44                           ` Andrii Nakryiko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.