kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Nadav Amit <nadav.amit@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	Borislav Petkov <bp@alien8.de>, X86 ML <x86@kernel.org>,
	"kernel-hardening@lists.openwall.com"
	<kernel-hardening@lists.openwall.com>,
	Brian Gerst <brgerst@gmail.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>
Subject: [kernel-hardening] Re: [PATCH 00/13] Virtually mapped stacks with guard pages (x86, core)
Date: Fri, 17 Jun 2016 10:38:24 -0700	[thread overview]
Message-ID: <CALCETrVh+w1Bj2K5QGvJwsbrGrrD_=zz1vWPC2zXUai6YB9dBA@mail.gmail.com> (raw)
In-Reply-To: <20160617072737.GA3960@osiris>

On Jun 17, 2016 12:27 AM, "Heiko Carstens" <heiko.carstens@de.ibm.com> wrote:
>
> On Thu, Jun 16, 2016 at 08:58:07PM -0700, Andy Lutomirski wrote:
> > On Wed, Jun 15, 2016 at 11:05 PM, Heiko Carstens
> > <heiko.carstens@de.ibm.com> wrote:
> > > On Wed, Jun 15, 2016 at 05:28:22PM -0700, Andy Lutomirski wrote:
> > >> Since the dawn of time, a kernel stack overflow has been a real PITA
> > >> to debug, has caused nondeterministic crashes some time after the
> > >> actual overflow, and has generally been easy to exploit for root.
> > >>
> > >> With this series, arches can enable HAVE_ARCH_VMAP_STACK.  Arches
> > >> that enable it (just x86 for now) get virtually mapped stacks with
> > >> guard pages.  This causes reliable faults when the stack overflows.
> > >>
> > >> If the arch implements it well, we get a nice OOPS on stack overflow
> > >> (as opposed to panicing directly or otherwise exploding badly).  On
> > >> x86, the OOPS is nice, has a usable call trace, and the overflowing
> > >> task is killed cleanly.
> > >
> > > Do you have numbers which reflect the performance impact of this change?
> > >
> >
> > It seems to add ~1.5盜 per thread creation/join pair, which is around
> > 15% overhead.  I *think* the major cost is that vmalloc calls
> > alloc_kmem_pages_node once per page rather than using a higher-order
> > block if available.
> >
> > Anyway, if anyone wants this to become faster, I think the way to do
> > it would be to ask some friendly mm folks to see if they can speed up
> > vmalloc.  I don't really want to dig in to the guts of the page
> > allocator.  My instinct would be to add a new interface
> > (GFP_SMALLER_OK?) to ask the page allocator for a high-order
> > allocation such that, if a high-order block is not immediately
> > available (on the freelist) then it should fall back to a smaller
> > allocation rather than working hard to get a high-order allocation.
> > Then vmalloc could use this, and vfree could free pages in blocks
> > corresponding to whatever orders it got the pages in, thus avoiding
> > the need to merge all the pages back together.
> >
> > There's another speedup available: actually reuse allocations.  We
> > could keep a very small freelist of vmap_areas with their associated
> > pages so we could reuse them.  (We can't efficiently reuse a vmap_area
> > without its backing pages because we need to flush the TLB in the
> > middle if we do that.)
>
> That's rather expensive. Just for the records: on s390 we use gcc's
> architecture specific compile options (kernel: CONFIG_STACK_GUARD)
>
>  -mstack-guard=stack-guard
>  -mstack-size=stack-size
>
> These generate two additional instructions at the beginning of each
> function prologue and verify that the stack size left won't be below a
> specified number of bytes. If so it would execute an illegal instruction.
>
> A disassembly looks like this (r15 is the stackpointer):
>
> 0000000000000670 <setup_arch>:
>      670:       eb 6f f0 48 00 24       stmg    %r6,%r15,72(%r15)
>      676:       c0 d0 00 00 00 00       larl    %r13,676 <setup_arch+0x6>
>      67c:       a7 f1 3f 80             tmll    %r15,16256  <--- test if enough space left
>      680:       b9 04 00 ef             lgr     %r14,%r15
>      684:       a7 84 00 01             je      686 <setup_arch+0x16> <--- branch to illegal op
>      688:       e3 f0 ff 90 ff 71       lay     %r15,-112(%r15)
>
> The branch jumps actually into the branch instruction itself since the 0001
> part of the "je" instruction is an illegal instruction.
>
> This catches at least wild stack overflows because of two many functions
> being called.
>
> Of course it doesn't catch wild accesses outside the stack because e.g. the
> index into an array on the stack is wrong.
>
> The runtime overhead is within noise ratio, therefore we have this always
> enabled.
>

Neat!  What exactly does tmll do?  I assume this works by checking the
low bits of the stack pointer.

x86_64 would have to do:

movl %esp, %r11d
shll %r11d, $18
cmpl %r11d, <threshold>
jg error

Or similar.  I think the cmpl could be eliminated if the threshold
were a power of two by simply testing the low bits of the stack
pointer.

--Andy

  reply	other threads:[~2016-06-17 17:38 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16  0:28 [kernel-hardening] [PATCH 00/13] Virtually mapped stacks with guard pages (x86, core) Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 01/13] x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 02/13] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 03/13] x86/cpa: Warn if kernel_unmap_pages_in_pgd is used inappropriately Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 04/13] mm: Track NR_KERNEL_STACK in pages instead of number of stacks Andy Lutomirski
2016-06-16 11:10   ` [kernel-hardening] " Vladimir Davydov
2016-06-16 17:21     ` Andy Lutomirski
2016-06-16 19:20       ` Andy Lutomirski
2016-06-16 15:33   ` Josh Poimboeuf
2016-06-16 17:39     ` Andy Lutomirski
2016-06-16 19:39       ` Josh Poimboeuf
2016-06-16  0:28 ` [kernel-hardening] [PATCH 05/13] mm: Move memcg stack accounting to account_kernel_stack Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 06/13] fork: Add generic vmalloced stack support Andy Lutomirski
2016-06-16 17:25   ` [kernel-hardening] " Kees Cook
2016-06-16 17:37     ` Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 07/13] x86/die: Don't try to recover from an OOPS on a non-default stack Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 08/13] x86/dumpstack: When OOPSing, rewind the stack before do_exit Andy Lutomirski
2016-06-16 17:50   ` [kernel-hardening] " Josh Poimboeuf
2016-06-16 17:57     ` Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 09/13] x86/dumpstack: When dumping stack bytes due to OOPS, start with regs->sp Andy Lutomirski
2016-06-16 11:56   ` [kernel-hardening] " Borislav Petkov
2016-06-16  0:28 ` [kernel-hardening] [PATCH 10/13] x86/dumpstack: Try harder to get a call trace on stack overflow Andy Lutomirski
2016-06-16 18:16   ` [kernel-hardening] " Josh Poimboeuf
2016-06-16 18:22     ` Andy Lutomirski
2016-06-16 18:33       ` Josh Poimboeuf
2016-06-16 18:37         ` Andy Lutomirski
2016-06-16 18:54           ` Josh Poimboeuf
2016-06-16  0:28 ` [kernel-hardening] [PATCH 11/13] x86/dumpstack/64: Handle faults when printing the "Stack:" part of an OOPS Andy Lutomirski
2016-06-16  0:28 ` [kernel-hardening] [PATCH 12/13] x86/mm/64: Enable vmapped stacks Andy Lutomirski
2016-06-16  4:17   ` [kernel-hardening] " Mika Penttilä
2016-06-16  5:33     ` Andy Lutomirski
2016-06-16 13:11       ` Rik van Riel
2016-06-16  0:28 ` [kernel-hardening] [PATCH 13/13] x86/mm: Improve stack-overflow #PF handling Andy Lutomirski
2016-06-16  6:05 ` [kernel-hardening] Re: [PATCH 00/13] Virtually mapped stacks with guard pages (x86, core) Heiko Carstens
2016-06-16 17:50   ` Andy Lutomirski
2016-06-16 18:14     ` Andy Lutomirski
2016-06-16 21:27       ` Andy Lutomirski
2016-06-17  3:58   ` Andy Lutomirski
2016-06-17  7:27     ` Heiko Carstens
2016-06-17 17:38       ` Andy Lutomirski [this message]
2016-06-20  5:58         ` Heiko Carstens
2016-06-20  6:01           ` Andy Lutomirski
2016-06-20  7:07             ` Heiko Carstens
2016-06-16 17:24 ` Kees Cook

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrVh+w1Bj2K5QGvJwsbrGrrD_=zz1vWPC2zXUai6YB9dBA@mail.gmail.com' \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).