archive mirror
 help / color / mirror / Atom feed
From: YiFei Zhu <>
To: Andy Lutomirski <>
Cc:, bpf <>,
	YiFei Zhu <>,
	LSM List <>,
	Alexei Starovoitov <>,
	Andrea Arcangeli <>,
	Austin Kuo <>,
	Claudio Canella <>,
	Daniel Borkmann <>,
	Daniel Gruss <>,
	Dimitrios Skarlatos <>,
	Giuseppe Scrivano <>,
	Hubertus Franke <>,
	Jann Horn <>, Jinghao Jia <>,
	Josep Torrellas <>,
	Kees Cook <>,
	Sargun Dhillon <>, Tianyin Xu <>,
	Tobin Feldman-Fitzthum <>,
	Tom Hromatka <>,
	Will Drewry <>
Subject: Re: [RFC PATCH bpf-next seccomp 00/12] eBPF seccomp filters
Date: Tue, 11 May 2021 00:21:17 -0500	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Mon, May 10, 2021 at 12:47 PM Andy Lutomirski <> wrote:
> On Mon, May 10, 2021 at 10:22 AM YiFei Zhu <> wrote:
> >
> > From: YiFei Zhu <>
> >
> > Based on:
> >
> > This patchset enables seccomp filters to be written in eBPF.
> > Supporting eBPF filters has been proposed a few times in the past.
> > The main concerns were (1) use cases and (2) security. We have
> > identified many use cases that can benefit from advanced eBPF
> > filters, such as:
> I haven't reviewed this carefully, but I think we need to distinguish
> a few things:
> 1. Using the eBPF *language*.
> 2. Allowing the use of stateful / non-pure eBPF features.
> 3. Allowing the eBPF programs to read the target process' memory.
> I'm generally in favor of (1).  I'm not at all sure about (2), and I'm
> even less convinced by (3).
> >
> >   * exec-only-once filter / apply filter after exec
> This is (2).  I'm not sure it's a good idea.

The basic idea is that for a container runtime it may wait to execute
a program in a container without that program being able to execve
another program, stopping any attack that involves loading another
binary. The container runtime can block any syscall but execve in the
exec-ed process by using only cBPF.

The use case is suggested by Andrea Arcangeli and Giuseppe Scrivano.
@Andrea and @Giuseppe, could you clarify more in case I missed

> >   * syscall logging (eg. via maps)
> This is (2).  Probably useful, but doesn't obviously belong in
> seccomp, or at least not as part of the same seccomp feature as
> regular filtering.
> >   * expressiveness & better tooling (no need for DSLs like easyseccomp)
> (1).  Sounds good.
> >   * contained syscall fault injection
> (2)?  We can already do this with notifiers.

To clarify, “we can already do with notifiers” isn’t the point here.
We can do almost everything if you have notifiers and ptrace, but it
may impose significant overhead (see the microbenchmark results).

The reason I’m saying the overhead is important is for the
reproduction / testing of certain race conditions. A syscall failing
quickly in a userspace application could, from a race condition, have
a completely different trace as a syscall failing after a few context
switches. eBPF makes quick fault injection possible.

> > For security, for an unprivileged caller, our implementation is as
> > restrictive as user notifier + ptrace, in regards to capabilities.
> > eBPF helpers follow the privilege model of original eBPF helpers.
> eBPF doesn't really have a privilege model yet.  There was a long and
> disappointing thread about this awhile back.

The idea is that “seccomp-eBPF does not make life easier for an
adversary”. Any attack an adversary could potentially utilize
seccomp-eBPF, they can do the same with other eBPF features, i.e. it
would be an issue with eBPF in general rather than specifically
seccomp’s use of eBPF.

Here it is referring to the helpers goes to the base
bpf_base_func_proto if the caller is unprivileged (!bpf_capable ||
!perfmon_capable). In this case, if the adversary would utilize eBPF
helpers to perform an attack, they could do it via another
unprivileged prog type.

That said, there are a few additional helpers this patchset is adding:
* get_current_uid_gid
* get_current_pid_tgid
  These two provide public information (are namespaces a concern?). I
have no idea what kind of exploit it could add unless the adversary
somehow side-channels the task_struct? But in that case, how is the
reading of task_struct different from how the rest of the kernel is
reading task_struct?
  Though, if knowing the global uid / pid is a concern then the eBPF
progs will need to keep track of namespaces, and that might not be
* probe_read_user
* probe_read_user_str
  Reduction to ptrace. The privilege model of reading another
process’s data (via process_vm_readv or
ptrace(PTRACE_PEEK{TEXT,DATA})) is guarded by
PTRACE_MODE_ATTACH_REALCREDS. However, unprivileged seccomp is
safeguarded by no_new_privs, so, unless the caller have a non-uniform
resuid & fsuid, in which case it’s the caller’s failure to relinquish
privileges, ruid of the seccomp-eBPF executor (which is task whose
syscalls is being filtered) would be the save as the ruid of the
applier (the task that set the seccomp mode, at the time of setting
  The main concern here is LSMs. LSMs can further restrict the scope
of ptrace hence I also allow LSMs to deny all “the use of stateful /
non-pure eBPF features”.
  As for side channels... the copy_from_user_nofault may allow an
adversary to observe what’s in resident memory and what’s swapped out,
but the adversary can already do this by observing the timing of
memory accesses. The non-nofault variant copy_from_user is used
everywhere in the kernel, so if an adversary were to side channel the
kernel by copy_from_user against an address, they can already do it by
using a syscall with a pointer that would be used by copy_from_user.
* task_storage_get
* task_storage_delete
  This is what I’m least sure about. The implementation of
task_storage is more complex than the other helpers, and also assumes
a privileged eBPF loader. It would slightly extend the attack surface.
If this is a big issue then eBPF can emulate such a map by using some
hashmap and having PID as key...

> > Moreover, a mechanism for reading user memory is added. The same
> > prototypes of bpf_probe_read_user{,str} from tracing are used. However,
> > when the loader of bpf program does not have CAP_PTRACE, the helper
> > will return -EPERM if the task under seccomp filter is non-dumpable.
> > The reason for this is that if we perform reduction from seccomp-eBPF
> > to user notifier + ptrace, ptrace requires CAP_PTRACE to read from
> > a non-dumpable process. However, eBPF does not solve the TOCTOU problem
> > of user notifier, so users should not use this to enforce a policy
> > based on memory contents.
> What is this for?

Memory reading opens up lots of use cases. For example, logging what
files are being opened without imposing too much performance penalty
from strace. Or as an accelerator for user notify emulation, where
syscalls can be rejected on a fast path if we know the memory contents
does not satisfy certain conditions that user notify will check.

YiFei Zhu

  reply	other threads:[~2021-05-11  5:21 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10 17:22 [RFC PATCH bpf-next seccomp 00/12] eBPF seccomp filters YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 01/12] seccomp: Move no_new_privs check to after prepare_filter YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 02/12] bpf, seccomp: Add eBPF filter capabilities YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 03/12] seccomp, ptrace: Add a mechanism to retrieve attached eBPF seccomp filters YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 04/12] libbpf: recognize section "seccomp" YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 05/12] samples/bpf: Add eBPF seccomp sample programs YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 06/12] lsm: New hook seccomp_extended YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 07/12] bpf/verifier: allow restricting direct map access YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 08/12] seccomp-ebpf: restrict filter to almost cBPF if LSM request such YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 09/12] yama: (concept) restrict seccomp-eBPF with ptrace_scope YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 10/12] seccomp-ebpf: Add ability to read user memory YiFei Zhu
2021-05-11  2:04   ` Alexei Starovoitov
2021-05-11  7:14     ` YiFei Zhu
2021-05-12 22:36       ` Alexei Starovoitov
2021-05-13  5:26         ` YiFei Zhu
2021-05-13 14:53           ` Andy Lutomirski
2021-05-13 17:12             ` YiFei Zhu
2021-05-13 17:15               ` Andy Lutomirski
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 11/12] bpf/verifier: support NULL-able ptr to BTF ID as helper argument YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 12/12] seccomp-ebpf: support task storage from BPF-LSM, defaulting to group leader YiFei Zhu
2021-05-11  1:58   ` Alexei Starovoitov
2021-05-11  5:44     ` YiFei Zhu
2021-05-12 21:56       ` Alexei Starovoitov
2021-05-10 17:47 ` [RFC PATCH bpf-next seccomp 00/12] eBPF seccomp filters Andy Lutomirski
2021-05-11  5:21   ` YiFei Zhu [this message]
2021-05-15 15:49     ` Andy Lutomirski
2021-05-20  9:05       ` Christian Brauner
     [not found]     ` <>
2021-05-16  8:38       ` Tianyin Xu
2021-05-17 15:40         ` Tycho Andersen
2021-05-17 17:07         ` Sargun Dhillon
     [not found]         ` <>
2021-05-20  8:16           ` Tianyin Xu
2021-05-20  8:56             ` Christian Brauner
2021-05-20  9:37               ` Christian Brauner
2021-06-01 19:55               ` Kees Cook
2021-06-09  6:32                 ` Jinghao Jia
2021-06-09  6:27               ` Jinghao Jia
     [not found]             ` <>
2021-05-20 22:13               ` Tianyin Xu
     [not found]         ` <>
2021-05-20  8:22           ` Tianyin Xu
2021-05-24 18:55             ` Sargun Dhillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).