containers.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
From: "Hubertus Franke" <frankeh@us.ibm.com>
To: Jann Horn <jannh@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Giuseppe Scrivano <gscrivan@redhat.com>,
	Will Drewry <wad@chromium.org>, Kees Cook <keescook@chromium.org>,
	YiFei Zhu <yifeifz2@illinois.edu>,
	Linux Containers <containers@lists.linux-foundation.org>,
	Tobin Feldman-Fitzthum <tobin@ibm.com>,
	Dimitrios Skarlatos <dskarlat@cs.cmu.edu>,
	kernel list <linux-kernel@vger.kernel.org>,
	Valentin Rothberg <vrothber@redhat.com>,
	Jack Chen <jianyan2@illinois.edu>,
	Josep Torrellas <torrella@illinois.edu>,
	bpf <bpf@vger.kernel.org>, Andy Lutomirski <luto@amacapital.net>,
	Tianyin Xu <tyxu@illinois.edu>,
	YiFei Zhu <zhuyifei1999@gmail.com>
Subject: RE: [RFC PATCH seccomp 0/2] seccomp: Add bitmap cache of arg-independent filter results that allow syscalls
Date: Mon, 21 Sep 2020 15:35:49 -0400	[thread overview]
Message-ID: <OF8837FC1A.5C0D4D64-ON852585EA.006B677F-852585EA.006BA663@notes.na.collabserv.com> (raw)
In-Reply-To: <CAG48ez3Ofqp4crXGksLmZY6=fGrF_tWyUCg7PBkAetvbbOPeOA@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 6199 bytes --]


I suggest we first bring it down to the minimal features we what and
successively build the functions as these ideas evolve.
We asked YiFei to prepare a minimal set that brings home the basic
features. Might not be 100% optimal but having the hooks, the basic cache
in place and getting a good benefit should be a good starting point
to get this integrated into a linux kernel and then enable a larger
experimentation.
Does that make sense to approach it from that point ?


Hubertus Franke
frankeh@us.ibm.com




From:	Jann Horn <jannh@google.com>
To:	YiFei Zhu <zhuyifei1999@gmail.com>
Cc:	Linux Containers <containers@lists.linux-foundation.org>, YiFei
            Zhu <yifeifz2@illinois.edu>, bpf <bpf@vger.kernel.org>, Andrea
            Arcangeli <aarcange@redhat.com>, Dimitrios Skarlatos
            <dskarlat@cs.cmu.edu>, Giuseppe Scrivano <gscrivan@redhat.com>,
            Hubertus Franke <frankeh@us.ibm.com>, Jack Chen
            <jianyan2@illinois.edu>, Josep Torrellas
            <torrella@illinois.edu>, Kees Cook <keescook@chromium.org>,
            Tianyin Xu <tyxu@illinois.edu>, Tobin Feldman-Fitzthum
            <tobin@ibm.com>, Valentin Rothberg <vrothber@redhat.com>, Andy
            Lutomirski <luto@amacapital.net>, Will Drewry
            <wad@chromium.org>, Aleksa Sarai <cyphar@cyphar.com>, kernel
            list <linux-kernel@vger.kernel.org>
Date:	09/21/2020 03:16 PM
Subject:	[EXTERNAL] Re: [RFC PATCH seccomp 0/2] seccomp: Add bitmap
            cache of arg-independent filter results that allow syscalls



On Mon, Sep 21, 2020 at 7:35 AM YiFei Zhu <zhuyifei1999@gmail.com> wrote:
> This series adds a bitmap to cache seccomp filter results if the
> result permits a syscall and is indepenent of syscall arguments.
> This visibly decreases seccomp overhead for most common seccomp
> filters with very little memory footprint.

It would be really nice if, based on this, we could have a new entry
in procfs that has one line per entry in each syscall table. Maybe
something that looks vaguely like:

X86_64 0 (read): ALLOW
X86_64 1 (write): ALLOW
X86_64 2 (open): ERRNO -1
X86_64 3 (close): ALLOW
X86_64 4 (stat): <argument-dependent>
[...]
I386 0 (restart_syscall): ALLOW
I386 1 (exit): ALLOW
I386 2 (fork): KILL
[...]

This would be useful both for inspectability (at the moment it's
pretty hard to figure out what seccomp rules really apply to a given
task) and for testing (so that we could easily write unit tests to
verify that the bitmap calculation works as expected).

But if you don't want to implement that right now, we can do that at a
later point - while it would be nice for making it easier to write
tests for this functionality, I don't see it as a blocker.


> The overhead of running Seccomp filters has been part of some past
> discussions [1][2][3]. Oftentimes, the filters have a large number
> of instructions that check syscall numbers one by one and jump based
> on that. Some users chain BPF filters which further enlarge the
> overhead. A recent work [6] comprehensively measures the Seccomp
> overhead and shows that the overhead is non-negligible and has a
> non-trivial impact on application performance.
>
> We propose SECCOMP_CACHE, a cache-based solution to minimize the
> Seccomp overhead. The basic idea is to cache the result of each
> syscall check to save the subsequent overhead of executing the
> filters. This is feasible, because the check in Seccomp is stateless.
> The checking results of the same syscall ID and argument remains
> the same.
>
> We observed some common filters, such as docker's [4] or
> systemd's [5], will make most decisions based only on the syscall
> numbers, and as past discussions considered, a bitmap where each bit
> represents a syscall makes most sense for these filters.
[...]
> Statically analyzing BPF bytecode to see if each syscall is going to
> always land in allow or reject is more of a rabbit hole, especially
> there is no current in-kernel infrastructure to enumerate all the
> possible architecture numbers for a given machine.

You could add that though. Or if you think that that's too much work,
you could just do it for x86 and arm64 and then use a Kconfig
dependency to limit this to those architectures for now.

> So rather than
> doing that, we propose to cache the results after the BPF filters are
> run.

Please don't add extra complexity just to work around a limitation in
existing code if you could instead remove that limitation in existing
code. Otherwise, code will become unnecessarily hard to understand and
inefficient.

You could let struct seccomp_filter contain three bitmasks - one for
the "native" architecture and up to two for "compat" architectures
(gated on some Kconfig flag).

alpha has 1 architecture numbers, arc has 1 (per build config), arm
has 1, arm64 has 2, c6x has 1 (per build config), csky has 1, h8300
has 1, hexagon has 1, ia64 has 1, m68k has 1, microblaze has 1, mips
has 3 (per build config), nds32 has 1 (per build config), nios2 has 1,
openrisc has 1, parisc has 2, powerpc has 2 (per build config), riscv
has 1 (per build config), s390 has 2, sh has 1 (per build config),
sparc has 2, x86 has 2, xtensa has 1.

> And since there are filters like docker's who will check
> arguments of some syscalls, but not all or none of the syscalls, when
> a filter is loaded we analyze it to find whether each syscall is
> cacheable (does not access syscall argument or instruction pointer) by
> following its control flow graph, and store the result for each filter
> in a bitmap. Changes to architecture number or the filter are expected
> to be rare and simply cause the cache to be cleared. This solution
> shall be fully transparent to userspace.

Caching whether a given syscall number has fixed per-architecture
results across all architectures is a pretty gross hack, please don't.



> Ongoing work is to further support arguments with fast hash table
> lookups. We are investigating the performance of doing so [6], and how
> to best integrate with the existing seccomp infrastructure.




[-- Attachment #1.2: graycol.gif --]
[-- Type: image/gif, Size: 105 bytes --]

[-- Attachment #2: Type: text/plain, Size: 171 bytes --]

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

  reply	other threads:[~2020-09-21 19:36 UTC|newest]

Thread overview: 151+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-21  5:35 [RFC PATCH seccomp 0/2] seccomp: Add bitmap cache of arg-independent filter results that allow syscalls YiFei Zhu
2020-09-21  5:35 ` [RFC PATCH seccomp 1/2] seccomp/cache: Add "emulator" to check if filter is arg-dependent YiFei Zhu
2020-09-21 17:47   ` Jann Horn via Containers
2020-09-21 18:38     ` Jann Horn via Containers
2020-09-21 23:44     ` YiFei Zhu
2020-09-22  0:25       ` Jann Horn via Containers
2020-09-22  0:47         ` YiFei Zhu
2020-09-21  5:35 ` [RFC PATCH seccomp 2/2] seccomp/cache: Cache filter results that allow syscalls YiFei Zhu
2020-09-21 18:08   ` Jann Horn via Containers
2020-09-21 22:50     ` YiFei Zhu
2020-09-21 22:57       ` Jann Horn via Containers
2020-09-21 23:08         ` YiFei Zhu
2020-09-25  0:01   ` [PATCH v2 seccomp 2/6] asm/syscall.h: Add syscall_arches[] array Kees Cook
2020-09-25  0:15     ` Jann Horn via Containers
2020-09-25  0:18       ` Al Viro
2020-09-25  0:24         ` Jann Horn via Containers
2020-09-25  1:27     ` YiFei Zhu
2020-09-25  3:09       ` Kees Cook
2020-09-25  3:28         ` YiFei Zhu
2020-09-25 16:39           ` YiFei Zhu
2020-09-21  5:48 ` [RFC PATCH seccomp 0/2] seccomp: Add bitmap cache of arg-independent filter results that allow syscalls Sargun Dhillon
2020-09-21  7:13   ` YiFei Zhu
2020-09-21  8:30 ` Christian Brauner
2020-09-21  8:44   ` YiFei Zhu
2020-09-21 13:51 ` Tycho Andersen
2020-09-21 15:27   ` YiFei Zhu
2020-09-21 16:39     ` Tycho Andersen
2020-09-21 22:57       ` YiFei Zhu
2020-09-21 19:16 ` Jann Horn via Containers
2020-09-21 19:35   ` Hubertus Franke [this message]
2020-09-21 19:45     ` Jann Horn via Containers
2020-09-23 19:26 ` Kees Cook
2020-09-23 22:54   ` YiFei Zhu
2020-09-24  6:52     ` Kees Cook
2020-09-24 12:06 ` [PATCH seccomp 0/6] " YiFei Zhu
2020-09-24 12:06   ` [PATCH seccomp 1/6] seccomp: Move config option SECCOMP to arch/Kconfig YiFei Zhu
2020-09-24 12:06     ` YiFei Zhu
2020-09-24 12:06   ` [PATCH seccomp 2/6] asm/syscall.h: Add syscall_arches[] array YiFei Zhu
2020-09-24 12:06   ` [PATCH seccomp 3/6] seccomp/cache: Add "emulator" to check if filter is arg-dependent YiFei Zhu
2020-09-24 12:06   ` [PATCH seccomp 4/6] seccomp/cache: Lookup syscall allowlist for fast path YiFei Zhu
2020-09-24 12:06   ` [PATCH seccomp 5/6] selftests/seccomp: Compare bitmap vs filter overhead YiFei Zhu
2020-09-24 12:06   ` [PATCH seccomp 6/6] seccomp/cache: Report cache data through /proc/pid/seccomp_cache YiFei Zhu
2020-09-24 12:44   ` [PATCH v2 seccomp 0/6] seccomp: Add bitmap cache of arg-independent filter results that allow syscalls YiFei Zhu
2020-09-24 12:44     ` [PATCH v2 seccomp 1/6] seccomp: Move config option SECCOMP to arch/Kconfig YiFei Zhu
2020-09-24 19:11       ` Kees Cook
2020-10-27  9:52       ` Geert Uytterhoeven
2020-10-27 19:08         ` YiFei Zhu
2020-10-28  0:06         ` Kees Cook
2020-10-28  8:18           ` Geert Uytterhoeven
2020-10-28  9:34             ` Jann Horn via Containers
2020-09-24 12:44     ` [PATCH v2 seccomp 2/6] asm/syscall.h: Add syscall_arches[] array YiFei Zhu
2020-09-24 13:47       ` David Laight
2020-09-24 14:16         ` YiFei Zhu
2020-09-24 14:20           ` David Laight
2020-09-24 14:37             ` YiFei Zhu
2020-09-24 16:02               ` YiFei Zhu
2020-09-24 12:44     ` [PATCH v2 seccomp 3/6] seccomp/cache: Add "emulator" to check if filter is arg-dependent YiFei Zhu
2020-09-24 23:25       ` Kees Cook
2020-09-25  3:04         ` YiFei Zhu
2020-09-25 16:45           ` YiFei Zhu
2020-09-25 19:42             ` Kees Cook
2020-09-25 19:51               ` Andy Lutomirski
2020-09-25 20:37                 ` Kees Cook
2020-09-25 21:07                   ` Andy Lutomirski
2020-09-25 23:49                     ` Kees Cook
2020-09-26  0:34                       ` Andy Lutomirski
2020-09-26  1:23                     ` YiFei Zhu
2020-09-26  2:47                       ` Andy Lutomirski
2020-09-26  4:35                         ` Kees Cook
2020-09-24 12:44     ` [PATCH v2 seccomp 4/6] seccomp/cache: Lookup syscall allowlist for fast path YiFei Zhu
2020-09-24 23:46       ` Kees Cook
2020-09-25  1:55         ` YiFei Zhu
2020-09-24 12:44     ` [PATCH v2 seccomp 5/6] selftests/seccomp: Compare bitmap vs filter overhead YiFei Zhu
2020-09-24 23:47       ` Kees Cook
2020-09-25  1:35         ` YiFei Zhu
2020-09-24 12:44     ` [PATCH v2 seccomp 6/6] seccomp/cache: Report cache data through /proc/pid/seccomp_cache YiFei Zhu
2020-09-24 23:56       ` Kees Cook
2020-09-25  3:11         ` YiFei Zhu
2020-09-25  3:26           ` Kees Cook
2020-09-30 15:19 ` [PATCH v3 seccomp 0/5] seccomp: Add bitmap cache of constant allow filter results YiFei Zhu
2020-09-30 15:19   ` [PATCH v3 seccomp 1/5] x86: Enable seccomp architecture tracking YiFei Zhu
2020-09-30 21:21     ` Kees Cook
2020-09-30 21:33       ` Jann Horn via Containers
2020-09-30 22:53         ` Kees Cook
2020-09-30 23:15           ` Jann Horn via Containers
2020-09-30 15:19   ` [PATCH v3 seccomp 2/5] seccomp/cache: Add "emulator" to check if filter is constant allow YiFei Zhu
2020-09-30 22:24     ` Jann Horn via Containers
2020-09-30 22:49       ` Kees Cook
2020-10-01 11:28       ` YiFei Zhu
2020-10-01 21:08         ` Jann Horn via Containers
2020-09-30 22:40     ` Kees Cook
2020-10-01 11:52       ` YiFei Zhu
2020-10-01 21:05         ` Kees Cook
2020-10-02 11:08           ` YiFei Zhu
2020-10-09  4:47     ` YiFei Zhu
2020-10-09  5:41       ` Kees Cook
2020-09-30 15:19   ` [PATCH v3 seccomp 3/5] seccomp/cache: Lookup syscall allowlist for fast path YiFei Zhu
2020-09-30 21:32     ` Kees Cook
2020-10-09  0:17       ` YiFei Zhu
2020-10-09  5:35         ` Kees Cook
2020-09-30 15:19   ` [PATCH v3 seccomp 4/5] selftests/seccomp: Compare bitmap vs filter overhead YiFei Zhu
2020-09-30 15:19   ` [PATCH v3 seccomp 5/5] seccomp/cache: Report cache data through /proc/pid/seccomp_cache YiFei Zhu
2020-09-30 22:00     ` Jann Horn via Containers
2020-09-30 23:12       ` Kees Cook
2020-10-01 12:06       ` YiFei Zhu
2020-10-01 16:05         ` Jann Horn via Containers
2020-10-01 16:18           ` YiFei Zhu
2020-09-30 22:59     ` Kees Cook
2020-09-30 23:08       ` Jann Horn via Containers
2020-09-30 23:21         ` Kees Cook
2020-10-09 17:14   ` [PATCH v4 seccomp 0/5] seccomp: Add bitmap cache of constant allow filter results YiFei Zhu
2020-10-09 17:14     ` [PATCH v4 seccomp 1/5] seccomp/cache: Lookup syscall allowlist bitmap for fast path YiFei Zhu
2020-10-09 21:30       ` Jann Horn via Containers
2020-10-09 23:18       ` Kees Cook
2020-10-09 17:14     ` [PATCH v4 seccomp 2/5] seccomp/cache: Add "emulator" to check if filter is constant allow YiFei Zhu
2020-10-09 21:30       ` Jann Horn via Containers
2020-10-09 22:47         ` Kees Cook
2020-10-09 17:14     ` [PATCH v4 seccomp 3/5] x86: Enable seccomp architecture tracking YiFei Zhu
2020-10-09 17:25       ` Andy Lutomirski
2020-10-09 18:32         ` YiFei Zhu
2020-10-09 20:59           ` Andy Lutomirski
2020-10-09 17:14     ` [PATCH v4 seccomp 4/5] selftests/seccomp: Compare bitmap vs filter overhead YiFei Zhu
2020-10-09 17:14     ` [PATCH v4 seccomp 5/5] seccomp/cache: Report cache data through /proc/pid/seccomp_cache YiFei Zhu
2020-10-09 21:24       ` kernel test robot
2020-10-09 21:45       ` Jann Horn via Containers
2020-10-09 23:14       ` Kees Cook
2020-10-10 13:26         ` YiFei Zhu
2020-10-12 22:57           ` Kees Cook
2020-10-13  0:31             ` YiFei Zhu
2020-10-22 20:52               ` YiFei Zhu
2020-10-22 22:32                 ` Kees Cook
2020-10-22 23:40                   ` YiFei Zhu
2020-10-24  2:51                     ` Kees Cook
2020-10-30 12:18                       ` YiFei Zhu
2020-11-03 13:00                         ` YiFei Zhu
2020-11-04  0:29                           ` Kees Cook
2020-11-04 11:40                             ` YiFei Zhu
2020-11-04 18:57                               ` Kees Cook
2020-10-11 15:47     ` [PATCH v5 seccomp 0/5]seccomp: Add bitmap cache of constant allow filter results YiFei Zhu
2020-10-11 15:47       ` [PATCH v5 seccomp 1/5] seccomp/cache: Lookup syscall allowlist bitmap for fast path YiFei Zhu
2020-10-12  6:42         ` Jann Horn via Containers
2020-10-11 15:47       ` [PATCH v5 seccomp 2/5] seccomp/cache: Add "emulator" to check if filter is constant allow YiFei Zhu
2020-10-12  6:46         ` Jann Horn via Containers
2020-10-11 15:47       ` [PATCH v5 seccomp 3/5] x86: Enable seccomp architecture tracking YiFei Zhu
2020-10-11 15:47       ` [PATCH v5 seccomp 4/5] selftests/seccomp: Compare bitmap vs filter overhead YiFei Zhu
2020-10-11 15:47       ` [PATCH v5 seccomp 5/5] seccomp/cache: Report cache data through /proc/pid/seccomp_cache YiFei Zhu
2020-10-12  6:49         ` Jann Horn via Containers
2020-12-17 12:14         ` Geert Uytterhoeven
2020-12-17 18:34           ` YiFei Zhu
2020-12-18 12:35             ` Geert Uytterhoeven
2020-10-27 19:14       ` [PATCH v5 seccomp 0/5]seccomp: Add bitmap cache of constant allow filter results Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=OF8837FC1A.5C0D4D64-ON852585EA.006B677F-852585EA.006BA663@notes.na.collabserv.com \
    --to=frankeh@us.ibm.com \
    --cc=aarcange@redhat.com \
    --cc=bpf@vger.kernel.org \
    --cc=containers@lists.linux-foundation.org \
    --cc=dskarlat@cs.cmu.edu \
    --cc=gscrivan@redhat.com \
    --cc=jannh@google.com \
    --cc=jianyan2@illinois.edu \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@amacapital.net \
    --cc=tobin@ibm.com \
    --cc=torrella@illinois.edu \
    --cc=tyxu@illinois.edu \
    --cc=vrothber@redhat.com \
    --cc=wad@chromium.org \
    --cc=yifeifz2@illinois.edu \
    --cc=zhuyifei1999@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).