All of lore.kernel.org
 help / color / mirror / Atom feed
From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Ingo Molnar <mingo@kernel.org>, Mike Rapoport <rppt@linux.ibm.com>
Cc: linux-kernel@vger.kernel.org,
	Alexandre Chartre <alexandre.chartre@oracle.com>,
	Andy Lutomirski <luto@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
	Jonathan Adams <jwadams@google.com>,
	Kees Cook <keescook@chromium.org>, Paul Turner <pjt@google.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	linux-mm@kvack.org, linux-security-module@vger.kernel.org,
	x86@kernel.org, Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation
Date: Fri, 26 Apr 2019 07:44:49 -0700	[thread overview]
Message-ID: <1556289889.2833.17.camel@HansenPartnership.com> (raw)
In-Reply-To: <20190426083144.GA126896@gmail.com>

On Fri, 2019-04-26 at 10:31 +0200, Ingo Molnar wrote:
> * Mike Rapoport <rppt@linux.ibm.com> wrote:
> 
> > When enabled, the system call isolation (SCI) would allow execution
> > of the system calls with reduced page tables. These page tables are
> > almost identical to the user page tables in PTI. The only addition
> > is the code page containing system call entry function that will
> > continue exectution after the context switch.
> > 
> > Unlike PTI page tables, there is no sharing at higher levels and
> > all the hierarchy for SCI page tables is cloned.
> > 
> > The SCI page tables are created when a system call that requires 
> > isolation is executed for the first time.
> > 
> > Whenever a system call should be executed in the isolated
> > environment, the context is switched to the SCI page tables. Any
> > further access to the kernel memory will generate a page fault. The
> > page fault handler can verify that the access is safe and grant it
> > or kill the task otherwise.
> > 
> > The initial SCI implementation allows access to any kernel data,
> > but it limits access to the code in the following way:
> > * calls and jumps to known code symbols without offset are allowed
> > * calls and jumps into a known symbol with offset are allowed only
> > if that symbol was already accessed and the offset is in the next
> > page 
> > * all other code access are blocked
> > 
> > After the isolated system call finishes, the mappings created
> > during its execution are cleared.
> > 
> > The entire SCI page table is lazily freed at task exit() time.
> 
> So this basically uses a similar mechanism to the horrendous PTI CR3 
> switching overhead whenever a syscall seeks "protection", which
> overhead is only somewhat mitigated by PCID.
> 
> This might work on PTI-encumbered CPUs.
> 
> While AMD CPUs don't need PTI, nor do they have PCID.
> 
> So this feature is hurting the CPU maker who didn't mess up, and is 
> hurting future CPUs that don't need PTI ..
> 
> I really don't like it where this is going. In a couple of years I
> really  want to be able to think of PTI as a bad dream that is mostly
> over  fortunately.

Perhaps ROP gadgets were a bad first example.  The research object of
the current patch set is really to investigate eliminating sandboxing
for containers.  As you know, current sandboxes like gVisor and Nabla
try to reduce the exposure to horizontal exploits (ability of an
untrusted tenant to exploit the shared kernel to attack another tenant)
by running significant chunks of kernel emulation code in userspace to
reduce exposure of the tenant to code in the shared kernel.  The price
paid for this is pretty horrendous in performance terms, but the
benefit is multi-tenant safety.

The question we were looking into is if we used per-tenant in-kernel
address space isolation to improve the security of kernel system calls
such that either the exploit becomes detectable or its consequences
bounce back only on the tenant trying the exploit, we could eliminate
the emulation for that system call and instead pass it through to the
kernel, thus thinning out the sandbox layer without losing the security
benefits.

We are looking at other aspects as well, like can we simply run chunks
of the kernel in the user's address space as the sanbox emulation
currently does, or can we hide a tenant's data objects such that
they're not easily accessible from an exploited kernel.

James


  parent reply	other threads:[~2019-04-26 14:44 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-25 21:45 [RFC PATCH 0/7] x86: introduce system calls addess space isolation Mike Rapoport
2019-04-25 21:45 ` [RFC PATCH 1/7] x86/cpufeatures: add X86_FEATURE_SCI Mike Rapoport
2019-04-25 21:45 ` [RFC PATCH 2/7] x86/sci: add core implementation for system call isolation Mike Rapoport
2019-04-26  7:49   ` Peter Zijlstra
2019-04-28  5:45     ` Mike Rapoport
2019-04-26  8:31   ` Ingo Molnar
2019-04-26  9:58     ` Ingo Molnar
2019-04-26 21:26       ` Andy Lutomirski
2019-04-26 21:26         ` Andy Lutomirski
2019-04-27  8:47         ` Ingo Molnar
2019-04-27 10:46           ` Ingo Molnar
2019-04-29 18:26             ` James Morris
2019-04-29 18:43               ` Andy Lutomirski
2019-04-29 18:43                 ` Andy Lutomirski
2019-04-29 18:46             ` Andy Lutomirski
2019-04-29 18:46               ` Andy Lutomirski
2019-04-30  5:03               ` Ingo Molnar
2019-04-30  9:38                 ` Peter Zijlstra
2019-04-30 11:05                   ` Ingo Molnar
2019-05-02 11:35             ` Robert O'Callahan
2019-05-02 11:35               ` Robert O'Callahan
2019-05-02 15:20               ` Ingo Molnar
2019-05-02 21:07                 ` Robert O'Callahan
2019-05-02 21:07                   ` Robert O'Callahan
2019-04-26 14:44     ` James Bottomley [this message]
2019-04-26 14:44       ` James Bottomley
2019-04-26 14:46   ` Dave Hansen
2019-04-26 14:57     ` James Bottomley
2019-04-26 14:57       ` James Bottomley
2019-04-26 15:07       ` Andy Lutomirski
2019-04-26 15:19         ` James Bottomley
2019-04-26 15:19           ` James Bottomley
2019-04-26 17:40           ` Andy Lutomirski
2019-04-26 18:49             ` James Bottomley
2019-04-26 18:49               ` James Bottomley
2019-04-26 19:22               ` Andy Lutomirski
2019-04-25 21:45 ` [RFC PATCH 3/7] x86/entry/64: add infrastructure for switching to isolated syscall context Mike Rapoport
2019-04-25 21:45 ` [RFC PATCH 4/7] x86/sci: hook up isolated system call entry and exit Mike Rapoport
2019-04-25 21:45 ` [RFC PATCH 5/7] x86/mm/fault: hook up SCI verification Mike Rapoport
2019-04-26  7:42   ` Peter Zijlstra
2019-04-28  5:47     ` Mike Rapoport
2019-04-30 16:44       ` Andy Lutomirski
2019-04-30 16:44         ` Andy Lutomirski
2019-05-01  5:39         ` Mike Rapoport
2019-04-25 21:45 ` [RFC PATCH 6/7] security: enable system call isolation in kernel config Mike Rapoport
2019-04-25 21:45 ` [RFC PATCH 7/7] sci: add example system calls to exercse SCI Mike Rapoport
2019-04-26  0:30 ` [RFC PATCH 0/7] x86: introduce system calls addess space isolation Andy Lutomirski
2019-04-26  0:30   ` Andy Lutomirski
2019-04-26  8:07   ` Jiri Kosina
2019-04-28  6:01   ` Mike Rapoport
2019-04-26 14:41 ` Dave Hansen
2019-04-28  6:08   ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1556289889.2833.17.camel@HansenPartnership.com \
    --to=james.bottomley@hansenpartnership.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=alexandre.chartre@oracle.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=jwadams@google.com \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-security-module@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.