All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kees Cook <keescook@chromium.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: linux-kernel@vger.kernel.org,
	Christian Brauner <christian@brauner.io>,
	Sargun Dhillon <sargun@sargun.me>,
	Tycho Andersen <tycho@tycho.ws>, Jann Horn <jannh@google.com>,
	"zhujianwei (C)" <zhujianwei7@huawei.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Matthew Wilcox <willy@infradead.org>,
	Andy Lutomirski <luto@kernel.org>, Will Drewry <wad@chromium.org>,
	Shuah Khan <shuah@kernel.org>, Matt Denton <mpdenton@google.com>,
	Chris Palmer <palmer@google.com>,
	Jeffrey Vander Stoep <jeffv@google.com>,
	Aleksa Sarai <cyphar@cyphar.com>,
	Hehuazhen <hehuazhen@huawei.com>,
	x86@kernel.org,
	Linux Containers <containers@lists.linux-foundation.org>,
	linux-security-module@vger.kernel.org, linux-api@vger.kernel.org
Subject: Re: [PATCH 4/8] seccomp: Implement constant action bitmaps
Date: Tue, 16 Jun 2020 09:01:46 -0700	[thread overview]
Message-ID: <202006160851.E8F9928AAB@keescook> (raw)
In-Reply-To: <fc0c14cd-bcf0-c94c-6cba-d0ce1844e93c@intel.com>

On Tue, Jun 16, 2020 at 07:40:17AM -0700, Dave Hansen wrote:
> On 6/16/20 12:49 AM, Kees Cook wrote:
> > +	/* Mark the second page as untouched (i.e. "old") */
> > +	preempt_disable();
> > +	set_pte_at(&init_mm, vaddr, ptep, pte_mkold(*(READ_ONCE(ptep))));
> > +	local_flush_tlb_kernel_range(vaddr, vaddr + PAGE_SIZE);
> > +	preempt_enable();
> 
> If you can, I'd wrap that nugget up in a helper.  I'd also suggest being
> very explicit in a comment about what it is trying to do: ensure no TLB
> entries exist so that a future access will always set the Accessed bit.

Yeah, good idea!

> 
> > +	/* Make sure the PTE agrees that it is untouched. */
> > +	if (WARN_ON_ONCE(sd_touched(ptep)))
> > +		return;
> > +	/* Read a portion of struct seccomp_data from the second page. */
> > +	check = sd->instruction_pointer;
> > +	/* First, verify the contents are zero from vzalloc(). */
> > +	if (WARN_ON_ONCE(check))
> > +		return;
> > +	/* Now make sure the ACCESSED bit has been set after the read. */
> > +	if (!sd_touched(ptep)) {
> > +		/*
> > +		 * If autodetection fails, fall back to standard beahavior by
> > +		 * clearing the entire "allow" bitmap.
> > +		 */
> > +		pr_warn_once("seccomp: cannot build automatic syscall filters\n");
> > +		bitmap_zero(bitmaps->allow, NR_syscalls);
> > +		return;
> > +	}
> 
> I can't find any big holes with this.  It's the kind of code that makes
> me nervous, but mostly because it's pretty different that anything else
> we have in the kernel.
> 
> It's also clear to me here that you probably have a slightly different
> expectation of what the PTE accessed flag means versus the hardware
> guys.  What you are looking for it to mean is roughly: "a retired
> instruction touched this page".
> 
> The hardware guys would probably say it's closer to "a TLB entry was
> established for this page."  Remember that TLB entries can be
> established speculatively or from things like prefetchers.  While I
> don't know of anything microarchitectural today which would trip this
> mechanism, it's entirely possible that something in the future might.
> Accessing close to the page boundary is the exact kind of place folks
> might want to optimize.

Yeah, and to that end, going the cBPF emulator route removes this kind
of "weird" behavior.

> 
> *But*, at least it would err in the direction of being conservative.  It
> would say "somebody touched the page!" more often than it should, but
> never _less_ often than it should.

Right -- I made sure to design the bitmaps and the direction of the
checking to fail towards running the filter instead of bypassing it.

> One thing about the implementation (which is roughly):
> 
> 	// Touch the data:
> 	check = sd->instruction_pointer;
> 	// Examine the PTE mapping that data:
> 	if (!sd_touched(ptep)) {
> 		// something
> 	}
> 
> There aren't any barriers in there, which could lead to the sd_touched()
> check being ordered before the data touch.  I think a rmb() will
> suffice.  You could even do it inside sd_touched().

Ah yeah, I had convinced myself that READ_ONCE() gained me that
coverage, but I guess that's not actually true here.

> Was there a reason you chose to export a ranged TLB flush?  I probably
> would have just used the single-page flush_tlb_one_kernel() for this
> purpose if I were working in arch-specific code.

No particular reason -- it just seemed easiest to make available given
the interfaces. I could do the single-page version instead, if this way
of doing things survives review. ;)

Thanks for looking at it!

-- 
Kees Cook

  reply	other threads:[~2020-06-16 16:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-16  7:49 [RFC][PATCH 0/8] seccomp: Implement constant action bitmaps Kees Cook
2020-06-16  7:49 ` [PATCH 1/8] selftests/seccomp: Improve calibration loop Kees Cook
2020-06-16  7:49 ` [PATCH 2/8] seccomp: Use pr_fmt Kees Cook
2020-06-16  7:49 ` [PATCH 3/8] seccomp: Introduce SECCOMP_PIN_ARCHITECTURE Kees Cook
2020-06-16 16:56   ` Andy Lutomirski
2020-06-17 15:25   ` Jann Horn
2020-06-17 15:29     ` Andy Lutomirski
2020-06-17 15:31       ` Jann Horn
2020-06-16  7:49 ` [PATCH 4/8] seccomp: Implement constant action bitmaps Kees Cook
2020-06-16 12:14   ` Jann Horn
2020-06-16 15:48     ` Kees Cook
2020-06-16 18:36       ` Jann Horn
2020-06-16 18:49         ` Kees Cook
2020-06-16 21:13         ` Andy Lutomirski
2020-06-16 14:40   ` Dave Hansen
2020-06-16 16:01     ` Kees Cook [this message]
2020-06-16  7:49 ` [PATCH 5/8] selftests/seccomp: Compare bitmap vs filter overhead Kees Cook
2020-06-16  7:49 ` [PATCH 6/8] x86: Provide API for local kernel TLB flushing Kees Cook
2020-06-16 16:59   ` Andy Lutomirski
2020-06-16 18:37     ` Kees Cook
2020-06-16  7:49 ` [PATCH 7/8] x86: Enable seccomp constant action bitmaps Kees Cook
2020-06-16  7:49 ` [PATCH 8/8] [DEBUG] seccomp: Report bitmap coverage ranges Kees Cook
2020-06-16 17:01 ` [RFC][PATCH 0/8] seccomp: Implement constant action bitmaps Andy Lutomirski
2020-06-16 18:35   ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=202006160851.E8F9928AAB@keescook \
    --to=keescook@chromium.org \
    --cc=christian@brauner.io \
    --cc=containers@lists.linux-foundation.org \
    --cc=cyphar@cyphar.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=hehuazhen@huawei.com \
    --cc=jannh@google.com \
    --cc=jeffv@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mpdenton@google.com \
    --cc=palmer@google.com \
    --cc=sargun@sargun.me \
    --cc=shuah@kernel.org \
    --cc=tycho@tycho.ws \
    --cc=wad@chromium.org \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    --cc=zhujianwei7@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.