From: Kees Cook <keescook@chromium.org>
To: Jann Horn <jannh@google.com>
Cc: kernel list <linux-kernel@vger.kernel.org>,
Christian Brauner <christian@brauner.io>,
Sargun Dhillon <sargun@sargun.me>,
Tycho Andersen <tycho@tycho.ws>,
"zhujianwei (C)" <zhujianwei7@huawei.com>,
Dave Hansen <dave.hansen@linux.intel.com>,
Matthew Wilcox <willy@infradead.org>,
Andy Lutomirski <luto@kernel.org>, Will Drewry <wad@chromium.org>,
Shuah Khan <shuah@kernel.org>, Matt Denton <mpdenton@google.com>,
Chris Palmer <palmer@google.com>,
Jeffrey Vander Stoep <jeffv@google.com>,
Aleksa Sarai <cyphar@cyphar.com>,
Hehuazhen <hehuazhen@huawei.com>,
the arch/x86 maintainers <x86@kernel.org>,
Linux Containers <containers@lists.linux-foundation.org>,
linux-security-module <linux-security-module@vger.kernel.org>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH 4/8] seccomp: Implement constant action bitmaps
Date: Tue, 16 Jun 2020 11:49:36 -0700 [thread overview]
Message-ID: <202006161145.1E6B616CBE@keescook> (raw)
In-Reply-To: <CAG48ez2HrPLhby31PUFb4f=iM60USA4NYRE6AjE8pPQ+ctm60g@mail.gmail.com>
On Tue, Jun 16, 2020 at 08:36:28PM +0200, Jann Horn wrote:
> On Tue, Jun 16, 2020 at 5:49 PM Kees Cook <keescook@chromium.org> wrote:
> > On Tue, Jun 16, 2020 at 02:14:47PM +0200, Jann Horn wrote:
> > > Wouldn't it be simpler to use a function that can run a subset of
> > > seccomp cBPF and bails out on anything that indicates that a syscall's
> > > handling is complex or on instructions it doesn't understand? For
> > > syscalls that have a fixed policy, a typical seccomp filter doesn't
> > > even use any of the BPF_ALU ops, the scratch space, or the X register;
> > > it just uses something like the following set of operations, which is
> > > easy to emulate without much code:
> > >
> > > BPF_LD | BPF_W | BPF_ABS
> > > BPF_JMP | BPF_JEQ | BPF_K
> > > BPF_JMP | BPF_JGE | BPF_K
> > > BPF_JMP | BPF_JGT | BPF_K
> > > BPF_JMP | BPF_JA
> > > BPF_RET | BPF_K
> >
> > Initially, I started down this path. It needed a bit of plumbing into
> > BPF to better control the lifetime of the cBPF "saved original filter"
> > (normally used by CHECKPOINT_RESTORE uses)
>
> I don't think you need that? When a filter is added, you can compute
> the results of the added individual filter, and then merge the state.
That's what I thought too, but unfortunately not (unless I missed
something) -- the seccomp verifier is run as a callback from the BPF
internals, so seccomp only see what the user sends (which is unverified)
and the final eBPF filter. There isn't state I can attach during the
callback, so I opted to just do the same thing as CHECKPOINT_RESTORE,
but to then explicitly free the cBPF after bitmap generation.
> > and then I needed to keep
> > making exceptions (same list you have: ALU, X register, scratch, etc)
> > in the name of avoiding too much complexity in the emulator. I decided
> > I'd rather reuse the existing infrastructure to actually execute the
> > filter (no cBPF copy needed to be saved, no separate code, and full
> > instruction coverage).
>
> If you really think that this bit of emulation is so bad, you could
> also make a copy of the BPF filter in which you replace all load
> instructions from syscall arguments with "return NON_CONSTANT_RESULT",
> and then run that through the normal BPF infrastructure.
>
> > > Something like (completely untested):
> [...]
> > I didn't actually finish going down the emulator path (I stopped right
> > around the time I verified that libseccomp does use BPF_ALU -- though
> > only BPF_AND), so I didn't actually evaluate the filter contents for other
> > filter builders (i.e. Chrome).
> >
> > But, if BPF_ALU | BPF_AND were added to your code above, it would cover
> > everything libseccomp generates (which covers a lot of the seccomp
> > filters, e.g. systemd, docker). I just felt funny about an "incomplete"
> > emulator.
> >
> > Though now you've got me looking. It seems this is the core
> > of Chrome's BPF instruction generation:
> > https://github.com/chromium/chromium/blob/master/sandbox/linux/bpf_dsl/policy_compiler.cc
> > It also uses ALU|AND, but adds JMP|JSET.
> >
> > So... that's only 2 more instructions to cover what I think are likely
> > the two largest seccomp instruction generators.
> >
> > > That way, you won't need any of this complicated architecture-specific stuff.
> >
> > There are two arch-specific needs, and using a cBPF-subset emulator
> > just gets rid of the local TLB flush. The other part is distinguishing
> > the archs. Neither requirement is onerous (TLB flush usually just
> > needs little more than an extern, arch is already documented in the
> > per-arch syscall_get_arch()).
>
> But it's also somewhat layer-breaking and reliant on very specific
> assumptions. Normal kernel code doesn't mess around with page table
> magic, outside of very specific low-level things. And your method
> would break if the fixed-value members were not all packed together at
> the start of the structure.
Right -- that was lucky. I suspect the emulation route will win out
here.
> And from a hardening perspective: The more code we add that fiddles
> around with PTEs directly, rather than going through higher-level
> abstractions, the higher the chance that something gets horribly
> screwed up. For example, this bit from your patch looks *really*
> suspect:
>
> + preempt_disable();
> + set_pte_at(&init_mm, vaddr, ptep,
> pte_mkold(*(READ_ONCE(ptep))));
> + local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> + preempt_enable();
>
> First off, that set_pte_at() is just a memory write; I don't see why
> you put it inside a preempt_disable() region.
> But more importantly, sticking a local TLB flush inside a
> preempt_disable() region with nothing else in there looks really
> shady. How is that supposed to work? If we migrate from CPU0 to CPU1
> directly before this region, and then from CPU1 back to CPU0 directly
> afterwards, the local TLB flush will have no effect.
Yeah, true, that's another good reason not to do this.
--
Kees Cook
next prev parent reply other threads:[~2020-06-16 18:49 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-16 7:49 [RFC][PATCH 0/8] seccomp: Implement constant action bitmaps Kees Cook
2020-06-16 7:49 ` [PATCH 1/8] selftests/seccomp: Improve calibration loop Kees Cook
2020-06-16 7:49 ` [PATCH 2/8] seccomp: Use pr_fmt Kees Cook
2020-06-16 7:49 ` [PATCH 3/8] seccomp: Introduce SECCOMP_PIN_ARCHITECTURE Kees Cook
2020-06-16 16:56 ` Andy Lutomirski
2020-06-17 15:25 ` Jann Horn
2020-06-17 15:29 ` Andy Lutomirski
2020-06-17 15:31 ` Jann Horn
2020-06-16 7:49 ` [PATCH 4/8] seccomp: Implement constant action bitmaps Kees Cook
2020-06-16 12:14 ` Jann Horn
2020-06-16 15:48 ` Kees Cook
2020-06-16 18:36 ` Jann Horn
2020-06-16 18:49 ` Kees Cook [this message]
2020-06-16 21:13 ` Andy Lutomirski
2020-06-16 14:40 ` Dave Hansen
2020-06-16 16:01 ` Kees Cook
2020-06-16 7:49 ` [PATCH 5/8] selftests/seccomp: Compare bitmap vs filter overhead Kees Cook
2020-06-16 7:49 ` [PATCH 6/8] x86: Provide API for local kernel TLB flushing Kees Cook
2020-06-16 16:59 ` Andy Lutomirski
2020-06-16 18:37 ` Kees Cook
2020-06-16 7:49 ` [PATCH 7/8] x86: Enable seccomp constant action bitmaps Kees Cook
2020-06-16 7:49 ` [PATCH 8/8] [DEBUG] seccomp: Report bitmap coverage ranges Kees Cook
2020-06-16 17:01 ` [RFC][PATCH 0/8] seccomp: Implement constant action bitmaps Andy Lutomirski
2020-06-16 18:35 ` Kees Cook
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202006161145.1E6B616CBE@keescook \
--to=keescook@chromium.org \
--cc=christian@brauner.io \
--cc=containers@lists.linux-foundation.org \
--cc=cyphar@cyphar.com \
--cc=dave.hansen@linux.intel.com \
--cc=hehuazhen@huawei.com \
--cc=jannh@google.com \
--cc=jeffv@google.com \
--cc=linux-api@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=luto@kernel.org \
--cc=mpdenton@google.com \
--cc=palmer@google.com \
--cc=sargun@sargun.me \
--cc=shuah@kernel.org \
--cc=tycho@tycho.ws \
--cc=wad@chromium.org \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
--cc=zhujianwei7@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).