linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: Jann Horn <jannh@google.com>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	Christian Brauner <christian@brauner.io>,
	Sargun Dhillon <sargun@sargun.me>,
	Tycho Andersen <tycho@tycho.ws>,
	"zhujianwei (C)" <zhujianwei7@huawei.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andy Lutomirski <luto@kernel.org>, Will Drewry <wad@chromium.org>,
	Shuah Khan <shuah@kernel.org>, Matt Denton <mpdenton@google.com>,
	Chris Palmer <palmer@google.com>,
	Jeffrey Vander Stoep <jeffv@google.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Hehuazhen <hehuazhen@huawei.com>,
	the arch/x86 maintainers <x86@kernel.org>,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-security-module <linux-security-module@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH 4/8] seccomp: Implement constant action bitmaps
Date: Tue, 16 Jun 2020 11:49:36 -0700	[thread overview]
Message-ID: <202006161145.1E6B616CBE@keescook> (raw)
In-Reply-To: <CAG48ez2HrPLhby31PUFb4f=iM60USA4NYRE6AjE8pPQ+ctm60g@mail.gmail.com>

On Tue, Jun 16, 2020 at 08:36:28PM +0200, Jann Horn wrote:
> On Tue, Jun 16, 2020 at 5:49 PM Kees Cook <keescook@chromium.org> wrote:
> > On Tue, Jun 16, 2020 at 02:14:47PM +0200, Jann Horn wrote:
> > > Wouldn't it be simpler to use a function that can run a subset of
> > > seccomp cBPF and bails out on anything that indicates that a syscall's
> > > handling is complex or on instructions it doesn't understand? For
> > > syscalls that have a fixed policy, a typical seccomp filter doesn't
> > > even use any of the BPF_ALU ops, the scratch space, or the X register;
> > > it just uses something like the following set of operations, which is
> > > easy to emulate without much code:
> > >
> > > BPF_LD | BPF_W | BPF_ABS
> > > BPF_JMP | BPF_JEQ | BPF_K
> > > BPF_JMP | BPF_JGE | BPF_K
> > > BPF_JMP | BPF_JGT | BPF_K
> > > BPF_JMP | BPF_JA
> > > BPF_RET | BPF_K
> >
> > Initially, I started down this path. It needed a bit of plumbing into
> > BPF to better control the lifetime of the cBPF "saved original filter"
> > (normally used by CHECKPOINT_RESTORE uses)
> 
> I don't think you need that? When a filter is added, you can compute
> the results of the added individual filter, and then merge the state.

That's what I thought too, but unfortunately not (unless I missed
something) -- the seccomp verifier is run as a callback from the BPF
internals, so seccomp only see what the user sends (which is unverified)
and the final eBPF filter. There isn't state I can attach during the
callback, so I opted to just do the same thing as CHECKPOINT_RESTORE,
but to then explicitly free the cBPF after bitmap generation.

> > and then I needed to keep
> > making exceptions (same list you have: ALU, X register, scratch, etc)
> > in the name of avoiding too much complexity in the emulator. I decided
> > I'd rather reuse the existing infrastructure to actually execute the
> > filter (no cBPF copy needed to be saved, no separate code, and full
> > instruction coverage).
> 
> If you really think that this bit of emulation is so bad, you could
> also make a copy of the BPF filter in which you replace all load
> instructions from syscall arguments with "return NON_CONSTANT_RESULT",
> and then run that through the normal BPF infrastructure.
> 
> > > Something like (completely untested):
> [...]
> > I didn't actually finish going down the emulator path (I stopped right
> > around the time I verified that libseccomp does use BPF_ALU -- though
> > only BPF_AND), so I didn't actually evaluate the filter contents for other
> > filter builders (i.e. Chrome).
> >
> > But, if BPF_ALU | BPF_AND were added to your code above, it would cover
> > everything libseccomp generates (which covers a lot of the seccomp
> > filters, e.g. systemd, docker). I just felt funny about an "incomplete"
> > emulator.
> >
> > Though now you've got me looking. It seems this is the core
> > of Chrome's BPF instruction generation:
> > https://github.com/chromium/chromium/blob/master/sandbox/linux/bpf_dsl/policy_compiler.cc
> > It also uses ALU|AND, but adds JMP|JSET.
> >
> > So... that's only 2 more instructions to cover what I think are likely
> > the two largest seccomp instruction generators.
> >
> > > That way, you won't need any of this complicated architecture-specific stuff.
> >
> > There are two arch-specific needs, and using a cBPF-subset emulator
> > just gets rid of the local TLB flush. The other part is distinguishing
> > the archs. Neither requirement is onerous (TLB flush usually just
> > needs little more than an extern, arch is already documented in the
> > per-arch syscall_get_arch()).
> 
> But it's also somewhat layer-breaking and reliant on very specific
> assumptions. Normal kernel code doesn't mess around with page table
> magic, outside of very specific low-level things. And your method
> would break if the fixed-value members were not all packed together at
> the start of the structure.

Right -- that was lucky. I suspect the emulation route will win out
here.

> And from a hardening perspective: The more code we add that fiddles
> around with PTEs directly, rather than going through higher-level
> abstractions, the higher the chance that something gets horribly
> screwed up. For example, this bit from your patch looks *really*
> suspect:
> 
> +                       preempt_disable();
> +                       set_pte_at(&init_mm, vaddr, ptep,
> pte_mkold(*(READ_ONCE(ptep))));
> +                       local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> +                       preempt_enable();
> 
> First off, that set_pte_at() is just a memory write; I don't see why
> you put it inside a preempt_disable() region.
> But more importantly, sticking a local TLB flush inside a
> preempt_disable() region with nothing else in there looks really
> shady. How is that supposed to work? If we migrate from CPU0 to CPU1
> directly before this region, and then from CPU1 back to CPU0 directly
> afterwards, the local TLB flush will have no effect.

Yeah, true, that's another good reason not to do this.

-- 
Kees Cook

  reply	other threads:[~2020-06-16 18:49 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16  7:49 [RFC][PATCH 0/8] seccomp: Implement constant action bitmaps Kees Cook
2020-06-16  7:49 ` [PATCH 1/8] selftests/seccomp: Improve calibration loop Kees Cook
2020-06-16  7:49 ` [PATCH 2/8] seccomp: Use pr_fmt Kees Cook
2020-06-16  7:49 ` [PATCH 3/8] seccomp: Introduce SECCOMP_PIN_ARCHITECTURE Kees Cook
2020-06-16 16:56   ` Andy Lutomirski
2020-06-17 15:25   ` Jann Horn
2020-06-17 15:29     ` Andy Lutomirski
2020-06-17 15:31       ` Jann Horn
2020-06-16  7:49 ` [PATCH 4/8] seccomp: Implement constant action bitmaps Kees Cook
2020-06-16 12:14   ` Jann Horn
2020-06-16 15:48     ` Kees Cook
2020-06-16 18:36       ` Jann Horn
2020-06-16 18:49         ` Kees Cook [this message]
2020-06-16 21:13         ` Andy Lutomirski
2020-06-16 14:40   ` Dave Hansen
2020-06-16 16:01     ` Kees Cook
2020-06-16  7:49 ` [PATCH 5/8] selftests/seccomp: Compare bitmap vs filter overhead Kees Cook
2020-06-16  7:49 ` [PATCH 6/8] x86: Provide API for local kernel TLB flushing Kees Cook
2020-06-16 16:59   ` Andy Lutomirski
2020-06-16 18:37     ` Kees Cook
2020-06-16  7:49 ` [PATCH 7/8] x86: Enable seccomp constant action bitmaps Kees Cook
2020-06-16  7:49 ` [PATCH 8/8] [DEBUG] seccomp: Report bitmap coverage ranges Kees Cook
2020-06-16 17:01 ` [RFC][PATCH 0/8] seccomp: Implement constant action bitmaps Andy Lutomirski
2020-06-16 18:35   ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202006161145.1E6B616CBE@keescook \
    --to=keescook@chromium.org \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hehuazhen@huawei.com \
    --cc=jannh@google.com \
    --cc=jeffv@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mpdenton@google.com \
    --cc=palmer@google.com \
    --cc=sargun@sargun.me \
    --cc=shuah@kernel.org \
    --cc=tycho@tycho.ws \
    --cc=wad@chromium.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zhujianwei7@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).