linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: Jann Horn <jannh@google.com>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	Christian Brauner <christian@brauner.io>,
	Sargun Dhillon <sargun@sargun.me>,
	Tycho Andersen <tycho@tycho.ws>,
	"zhujianwei (C)" <zhujianwei7@huawei.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andy Lutomirski <luto@kernel.org>, Will Drewry <wad@chromium.org>,
	Shuah Khan <shuah@kernel.org>, Matt Denton <mpdenton@google.com>,
	Chris Palmer <palmer@google.com>,
	Jeffrey Vander Stoep <jeffv@google.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Hehuazhen <hehuazhen@huawei.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-security-module <linux-security-module@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH 4/8] seccomp: Implement constant action bitmaps
Date: Tue, 16 Jun 2020 08:48:56 -0700	[thread overview]
Message-ID: <202006160757.99FD9B785@keescook> (raw)
In-Reply-To: <CAG48ez1p=dR_2ikKq=xVxkoGg0fYpTBpkhJSv1w-6BG=76PAvw@mail.gmail.com>

On Tue, Jun 16, 2020 at 02:14:47PM +0200, Jann Horn wrote:
> Wouldn't it be simpler to use a function that can run a subset of
> seccomp cBPF and bails out on anything that indicates that a syscall's
> handling is complex or on instructions it doesn't understand? For
> syscalls that have a fixed policy, a typical seccomp filter doesn't
> even use any of the BPF_ALU ops, the scratch space, or the X register;
> it just uses something like the following set of operations, which is
> easy to emulate without much code:
> 
> BPF_LD | BPF_W | BPF_ABS
> BPF_JMP | BPF_JEQ | BPF_K
> BPF_JMP | BPF_JGE | BPF_K
> BPF_JMP | BPF_JGT | BPF_K
> BPF_JMP | BPF_JA
> BPF_RET | BPF_K

Initially, I started down this path. It needed a bit of plumbing into
BPF to better control the lifetime of the cBPF "saved original filter"
(normally used by CHECKPOINT_RESTORE uses), and then I needed to keep
making exceptions (same list you have: ALU, X register, scratch, etc)
in the name of avoiding too much complexity in the emulator. I decided
I'd rather reuse the existing infrastructure to actually execute the
filter (no cBPF copy needed to be saved, no separate code, and full
instruction coverage).

> 
> Something like (completely untested):
> 
> /*
>  * Try to statically determine whether @filter will always return a fixed result
>  * when run for syscall @nr under architecture @arch.
>  * Returns true if the result could be determined; if so, the result will be
>  * stored in @action.
>  */
> static bool seccomp_check_syscall(struct sock_filter *filter, unsigned int arch,
>                                   unsigned int nr, unsigned int *action)
> {
>   int pc;
>   unsigned int reg_value = 0;
> 
>   for (pc = 0; 1; pc++) {
>     struct sock_filter *insn = &filter[pc];
>     u16 code = insn->code;
>     u32 k = insn->k;
> 
>     switch (code) {
>     case BPF_LD | BPF_W | BPF_ABS:
>       if (k == offsetof(struct seccomp_data, nr)) {
>         reg_value = nr;
>       } else if (k == offsetof(struct seccomp_data, arch)) {
>         reg_value = arch;
>       } else {
>         return false; /* can't optimize (non-constant value load) */
>       }
>       break;
>     case BPF_RET | BPF_K:
>       *action = insn->k;
>       return true; /* success: reached return with constant values only */
>     case BPF_JMP | BPF_JA:
>       pc += insn->k;
>       break;
>     case BPF_JMP | BPF_JEQ | BPF_K:
>     case BPF_JMP | BPF_JGE | BPF_K:
>     case BPF_JMP | BPF_JGT | BPF_K:
>     default:
>       if (BPF_CLASS(code) == BPF_JMP && BPF_SRC(code) == BPF_K) {
>         u16 op = BPF_OP(code);
>         bool op_res;
> 
>         switch (op) {
>         case BPF_JEQ:
>           op_res = reg_value == k;
>           break;
>         case BPF_JGE:
>           op_res = reg_value >= k;
>           break;
>         case BPF_JGT:
>           op_res = reg_value > k;
>           break;
>         default:
>           return false; /* can't optimize (unknown insn) */
>         }
> 
>         pc += op_res ? insn->jt : insn->jf;
>         break;
>       }
>       return false; /* can't optimize (unknown insn) */
>     }
>   }
> }

I didn't actually finish going down the emulator path (I stopped right
around the time I verified that libseccomp does use BPF_ALU -- though
only BPF_AND), so I didn't actually evaluate the filter contents for other
filter builders (i.e. Chrome).

But, if BPF_ALU | BPF_AND were added to your code above, it would cover
everything libseccomp generates (which covers a lot of the seccomp
filters, e.g. systemd, docker). I just felt funny about an "incomplete"
emulator.

Though now you've got me looking. It seems this is the core
of Chrome's BPF instruction generation:
https://github.com/chromium/chromium/blob/master/sandbox/linux/bpf_dsl/policy_compiler.cc
It also uses ALU|AND, but adds JMP|JSET.

So... that's only 2 more instructions to cover what I think are likely
the two largest seccomp instruction generators.

> That way, you won't need any of this complicated architecture-specific stuff.

There are two arch-specific needs, and using a cBPF-subset emulator
just gets rid of the local TLB flush. The other part is distinguishing
the archs. Neither requirement is onerous (TLB flush usually just
needs little more than an extern, arch is already documented in the
per-arch syscall_get_arch()). The awkward part I ran into for arm64
was a header include loop for compat due to how unistd is handled for
getting NR_syscalls for the bitmap sizing (which I'm sure is solvable,
but I just wanted to get the x86 RFC posted first).

-- 
Kees Cook

  reply	other threads:[~2020-06-16 16:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16  7:49 [RFC][PATCH 0/8] seccomp: Implement constant action bitmaps Kees Cook
2020-06-16  7:49 ` [PATCH 1/8] selftests/seccomp: Improve calibration loop Kees Cook
2020-06-16  7:49 ` [PATCH 2/8] seccomp: Use pr_fmt Kees Cook
2020-06-16  7:49 ` [PATCH 3/8] seccomp: Introduce SECCOMP_PIN_ARCHITECTURE Kees Cook
2020-06-16 16:56   ` Andy Lutomirski
2020-06-17 15:25   ` Jann Horn
2020-06-17 15:29     ` Andy Lutomirski
2020-06-17 15:31       ` Jann Horn
2020-06-16  7:49 ` [PATCH 4/8] seccomp: Implement constant action bitmaps Kees Cook
2020-06-16 12:14   ` Jann Horn
2020-06-16 15:48     ` Kees Cook [this message]
2020-06-16 18:36       ` Jann Horn
2020-06-16 18:49         ` Kees Cook
2020-06-16 21:13         ` Andy Lutomirski
2020-06-16 14:40   ` Dave Hansen
2020-06-16 16:01     ` Kees Cook
2020-06-16  7:49 ` [PATCH 5/8] selftests/seccomp: Compare bitmap vs filter overhead Kees Cook
2020-06-16  7:49 ` [PATCH 6/8] x86: Provide API for local kernel TLB flushing Kees Cook
2020-06-16 16:59   ` Andy Lutomirski
2020-06-16 18:37     ` Kees Cook
2020-06-16  7:49 ` [PATCH 7/8] x86: Enable seccomp constant action bitmaps Kees Cook
2020-06-16  7:49 ` [PATCH 8/8] [DEBUG] seccomp: Report bitmap coverage ranges Kees Cook
2020-06-16 17:01 ` [RFC][PATCH 0/8] seccomp: Implement constant action bitmaps Andy Lutomirski
2020-06-16 18:35   ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202006160757.99FD9B785@keescook \
    --to=keescook@chromium.org \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hehuazhen@huawei.com \
    --cc=jannh@google.com \
    --cc=jeffv@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mpdenton@google.com \
    --cc=palmer@google.com \
    --cc=sargun@sargun.me \
    --cc=shuah@kernel.org \
    --cc=tycho@tycho.ws \
    --cc=wad@chromium.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zhujianwei7@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).