linux-security-module.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: YiFei Zhu <zhuyifei1999@gmail.com>
To: containers@lists.linux.dev, bpf@vger.kernel.org
Cc: YiFei Zhu <yifeifz2@illinois.edu>,
	linux-security-module@vger.kernel.org,
	Alexei Starovoitov <ast@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andy Lutomirski <luto@amacapital.net>,
	Austin Kuo <hckuo2@illinois.edu>,
	Claudio Canella <claudio.canella@iaik.tugraz.at>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Daniel Gruss <daniel.gruss@iaik.tugraz.at>,
	Dimitrios Skarlatos <dskarlat@cs.cmu.edu>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Hubertus Franke <frankeh@us.ibm.com>,
	Jann Horn <jannh@google.com>, Jinghao Jia <jinghao7@illinois.edu>,
	Josep Torrellas <torrella@illinois.edu>,
	Kees Cook <keescook@chromium.org>,
	Sargun Dhillon <sargun@sargun.me>, Tianyin Xu <tyxu@illinois.edu>,
	Tobin Feldman-Fitzthum <tobin@ibm.com>,
	Tom Hromatka <tom.hromatka@oracle.com>,
	Will Drewry <wad@chromium.org>
Subject: [RFC PATCH bpf-next seccomp 00/12] eBPF seccomp filters
Date: Mon, 10 May 2021 12:22:37 -0500	[thread overview]
Message-ID: <cover.1620499942.git.yifeifz2@illinois.edu> (raw)

From: YiFei Zhu <yifeifz2@illinois.edu>

Based on: https://lists.linux-foundation.org/pipermail/containers/2018-February/038571.html

This patchset enables seccomp filters to be written in eBPF.
Supporting eBPF filters has been proposed a few times in the past.
The main concerns were (1) use cases and (2) security. We have
identified many use cases that can benefit from advanced eBPF
filters, such as:

  * exec-only-once filter / apply filter after exec
  * syscall logging (eg. via maps)
  * expressiveness & better tooling (no need for DSLs like easyseccomp)
  * contained syscall fault injection
  * Temporal System Call Specialization [1] with restrictive
    initialization phases (serving phase syscalls are filtered)
  * possible future extensions such as syscall serialization and
    argument rewriting

These features can also be achieved by user notifier + ptrace but
unfortunately user notifier is a lot of context switches (see the
benchmark results below), and hence much less efficient than eBPF.

For security, for an unprivileged caller, our implementation is as
restrictive as user notifier + ptrace, in regards to capabilities.
eBPF helpers follow the privilege model of original eBPF helpers.

Advanced eBPF feature (maps & helpers) is restricted by a new LSM
hook seccomp_extended. If LSM permits these features, then all standard
bpf helpers are permitted, and tracing helpers are permitted too if the
loader is bpf_capable and perfmon_capable. Mutable privileges should
not be a concern because if seccomp-eBPF is used to implement a mutable
policy of privileges, such policy can be implemented using user
notifier anyhow (which does not require seccomp-eBPF).

Moreover, a mechanism for reading user memory is added. The same
prototypes of bpf_probe_read_user{,str} from tracing are used. However,
when the loader of bpf program does not have CAP_PTRACE, the helper
will return -EPERM if the task under seccomp filter is non-dumpable.
The reason for this is that if we perform reduction from seccomp-eBPF
to user notifier + ptrace, ptrace requires CAP_PTRACE to read from
a non-dumpable process. However, eBPF does not solve the TOCTOU problem
of user notifier, so users should not use this to enforce a policy
based on memory contents.

In addition, a mechanism for storing process states between filter runs
is added. This uses the BPF-LSM task storage. However, since
unprivileged bpf loaders do not have access to ptr to BTF ID for use as
the task parameter to the helpers, the workaround is to use NULL as the
parameter, and the helper will fallback to current's group leader. This
is insufficient, unfortunately, because of the BTF enforcement in
bpf_local_storage_map_alloc_check, and the fact that tasks without
bpf_capable cannot load map BTFs. (Can I ask why this is restricted
this way?)

Giuseppe Scrivano shows how to support eBPF filters in crun [2], based
on which we have tested a number of stateful filters.

Performance wise, Jinghao did a test of 1,000,000 getpid() calls on an
Intel i7-9700K, with stock Ubuntu config. The syscalls are half EPERM
and half passthrough to the getpid() syscall handler [3]. The tests
are done recording a median of 10:

                user notif      eBPF            ratio
QEMU            6808104 us      80508.5 us      84.6
Bare Metal      3403667.5 us    80316 us        42.4

[1] https://www.usenix.org/conference/usenixsecurity20/presentation/ghavamnia
[2] https://github.com/giuseppe/crun/commit/3906b4fbcb671f8f188deef08c94ceae86a80120
[3] https://github.com/xlab-uiuc/seccomp-ebpf-upstream/tree/perf-test

Patch 1 moves no_new_privs check in filter loading.
Patch 2 implements basic support for seccomp-eBPF in the kernel.
Patch 3 enables a ptracer to get a fd to the eBPF for CRIU.
Patch 4 enables libbpf to recognize the section "seccomp".
Patch 5 adds a sample program test_seccomp to samples/bpf.

Patch 6 adds an LSM hook seccomp_extended.
Patch 7 allows bpf verifier hooks to restrict direct map access.
Patch 8 implements restrictions for eBPF filters depending on LSM hooks.
Patch 9 lets Yama LSM restrict seccomp-ebpf based on ptrace_scope.

Patch 10 enables seccomp-ebpf to read user memory.
Patch 11 allows bpf helpers to have nullable ptr to BTF ID as argument.
Patch 12 implements process storage using BPF-LSM task storage.

Sargun Dhillon (3):
  bpf, seccomp: Add eBPF filter capabilities
  seccomp, ptrace: Add a mechanism to retrieve attached eBPF seccomp
    filters
  samples/bpf: Add eBPF seccomp sample programs

YiFei Zhu (9):
  seccomp: Move no_new_privs check to after prepare_filter
  libbpf: recognize section "seccomp"
  lsm: New hook seccomp_extended
  bpf/verifier: allow restricting direct map access
  seccomp-ebpf: restrict filter to almost cBPF if LSM request such
  yama: (concept) restrict seccomp-eBPF with ptrace_scope
  seccomp-ebpf: Add ability to read user memory
  bpf/verifier: support NULL-able ptr to BTF ID as helper argument
  seccomp-ebpf: support task storage from BPF-LSM, defaulting to group
    leader

 arch/Kconfig                    |   7 +
 include/linux/bpf.h             |   8 ++
 include/linux/bpf_types.h       |   4 +
 include/linux/lsm_hook_defs.h   |   4 +
 include/linux/seccomp.h         |  15 +-
 include/linux/security.h        |  13 ++
 include/uapi/linux/bpf.h        |   1 +
 include/uapi/linux/ptrace.h     |   2 +
 include/uapi/linux/seccomp.h    |   1 +
 kernel/bpf/bpf_task_storage.c   |  64 +++++++--
 kernel/bpf/syscall.c            |   1 +
 kernel/bpf/verifier.c           |  15 +-
 kernel/ptrace.c                 |   4 +
 kernel/seccomp.c                | 235 ++++++++++++++++++++++++++++----
 kernel/trace/bpf_trace.c        |  42 ++++++
 samples/bpf/Makefile            |   3 +
 samples/bpf/test_seccomp_kern.c |  41 ++++++
 samples/bpf/test_seccomp_user.c |  49 +++++++
 security/security.c             |   8 ++
 security/yama/yama_lsm.c        |  30 ++++
 tools/include/uapi/linux/bpf.h  |   1 +
 tools/lib/bpf/libbpf.c          |   1 +
 22 files changed, 511 insertions(+), 38 deletions(-)
 create mode 100644 samples/bpf/test_seccomp_kern.c
 create mode 100644 samples/bpf/test_seccomp_user.c

--
2.31.1

             reply	other threads:[~2021-05-10 17:22 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-10 17:22 YiFei Zhu [this message]
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 01/12] seccomp: Move no_new_privs check to after prepare_filter YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 02/12] bpf, seccomp: Add eBPF filter capabilities YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 03/12] seccomp, ptrace: Add a mechanism to retrieve attached eBPF seccomp filters YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 04/12] libbpf: recognize section "seccomp" YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 05/12] samples/bpf: Add eBPF seccomp sample programs YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 06/12] lsm: New hook seccomp_extended YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 07/12] bpf/verifier: allow restricting direct map access YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 08/12] seccomp-ebpf: restrict filter to almost cBPF if LSM request such YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 09/12] yama: (concept) restrict seccomp-eBPF with ptrace_scope YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 10/12] seccomp-ebpf: Add ability to read user memory YiFei Zhu
2021-05-11  2:04   ` Alexei Starovoitov
2021-05-11  7:14     ` YiFei Zhu
2021-05-12 22:36       ` Alexei Starovoitov
2021-05-13  5:26         ` YiFei Zhu
2021-05-13 14:53           ` Andy Lutomirski
2021-05-13 17:12             ` YiFei Zhu
2021-05-13 17:15               ` Andy Lutomirski
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 11/12] bpf/verifier: support NULL-able ptr to BTF ID as helper argument YiFei Zhu
2021-05-10 17:22 ` [RFC PATCH bpf-next seccomp 12/12] seccomp-ebpf: support task storage from BPF-LSM, defaulting to group leader YiFei Zhu
2021-05-11  1:58   ` Alexei Starovoitov
2021-05-11  5:44     ` YiFei Zhu
2021-05-12 21:56       ` Alexei Starovoitov
2021-05-10 17:47 ` [RFC PATCH bpf-next seccomp 00/12] eBPF seccomp filters Andy Lutomirski
2021-05-11  5:21   ` YiFei Zhu
2021-05-15 15:49     ` Andy Lutomirski
2021-05-20  9:05       ` Christian Brauner
     [not found]     ` <fffbea8189794a8da539f6082af3de8e@DM5PR11MB1692.namprd11.prod.outlook.com>
2021-05-16  8:38       ` Tianyin Xu
2021-05-17 15:40         ` Tycho Andersen
2021-05-17 17:07         ` Sargun Dhillon
     [not found]         ` <108b4b9c2daa4123805d2b92cf51374b@DM5PR11MB1692.namprd11.prod.outlook.com>
2021-05-20  8:16           ` Tianyin Xu
2021-05-20  8:56             ` Christian Brauner
2021-05-20  9:37               ` Christian Brauner
2021-06-01 19:55               ` Kees Cook
2021-06-09  6:32                 ` Jinghao Jia
2021-06-09  6:27               ` Jinghao Jia
     [not found]             ` <00fe481c572d486289bc88780f48e88f@DM5PR11MB1692.namprd11.prod.outlook.com>
2021-05-20 22:13               ` Tianyin Xu
     [not found]         ` <eae2a0e5038b41c4af87edcb3d4cdc13@DM5PR11MB1692.namprd11.prod.outlook.com>
2021-05-20  8:22           ` Tianyin Xu
2021-05-24 18:55             ` Sargun Dhillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1620499942.git.yifeifz2@illinois.edu \
    --to=zhuyifei1999@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=claudio.canella@iaik.tugraz.at \
    --cc=containers@lists.linux.dev \
    --cc=daniel.gruss@iaik.tugraz.at \
    --cc=daniel@iogearbox.net \
    --cc=dskarlat@cs.cmu.edu \
    --cc=frankeh@us.ibm.com \
    --cc=gscrivan@redhat.com \
    --cc=hckuo2@illinois.edu \
    --cc=jannh@google.com \
    --cc=jinghao7@illinois.edu \
    --cc=keescook@chromium.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=sargun@sargun.me \
    --cc=tobin@ibm.com \
    --cc=tom.hromatka@oracle.com \
    --cc=torrella@illinois.edu \
    --cc=tyxu@illinois.edu \
    --cc=wad@chromium.org \
    --cc=yifeifz2@illinois.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).