linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Heiko Carstens <heiko.carstens@de.ibm.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	X86 ML <x86@kernel.org>, Borislav Petkov <bp@alien8.de>,
	Nadav Amit <nadav.amit@gmail.com>,
	Kees Cook <keescook@chromium.org>,
	Brian Gerst <brgerst@gmail.com>,
	"kernel-hardening@lists.openwall.com" 
	<kernel-hardening@lists.openwall.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Josh Poimboeuf <jpoimboe@redhat.com>
Subject: Re: [PATCH 00/13] Virtually mapped stacks with guard pages (x86, core)
Date: Thu, 16 Jun 2016 14:27:15 -0700	[thread overview]
Message-ID: <CALCETrVOF7VR2jd_ox32nieysLwuyQ9iuLa0XiNFn4uBd9oLrw@mail.gmail.com> (raw)
In-Reply-To: <CALCETrVsOwy=dB16OgQ5OYkkn3ShyQyUxzjyJzsaeLO4m0ohEQ@mail.gmail.com>

On Thu, Jun 16, 2016 at 11:14 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> Adding Paul, because RCU blew up.
>
> On Thu, Jun 16, 2016 at 10:50 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Wed, Jun 15, 2016 at 11:05 PM, Heiko Carstens
>> <heiko.carstens@de.ibm.com> wrote:
>>> On Wed, Jun 15, 2016 at 05:28:22PM -0700, Andy Lutomirski wrote:
>>>> Since the dawn of time, a kernel stack overflow has been a real PITA
>>>> to debug, has caused nondeterministic crashes some time after the
>>>> actual overflow, and has generally been easy to exploit for root.
>>>>
>>>> With this series, arches can enable HAVE_ARCH_VMAP_STACK.  Arches
>>>> that enable it (just x86 for now) get virtually mapped stacks with
>>>> guard pages.  This causes reliable faults when the stack overflows.
>>>>
>>>> If the arch implements it well, we get a nice OOPS on stack overflow
>>>> (as opposed to panicing directly or otherwise exploding badly).  On
>>>> x86, the OOPS is nice, has a usable call trace, and the overflowing
>>>> task is killed cleanly.
>>>
>>> Do you have numbers which reflect the performance impact of this change?
>>>
>>
>> Hmm.  My attempt to benchmark it caused some of the vmalloc core code
>> to hang.  I'll dig around.
>
> [  488.482010] Call Trace:
> [  488.482389]  <IRQ>  [<ffffffff810da5f6>] sched_show_task+0xb6/0x110
> [  488.483341]  [<ffffffff81108c7a>] rcu_check_callbacks+0x83a/0x840
> [  488.484226]  [<ffffffff810dd48a>] ? account_system_time+0x7a/0x110
> [  488.485157]  [<ffffffff8111c0f0>] ? tick_sched_do_timer+0x30/0x30
> [  488.486133]  [<ffffffff8110d314>] update_process_times+0x34/0x60
> [  488.487050]  [<ffffffff8111bb51>] tick_sched_handle.isra.13+0x31/0x40
> [  488.488018]  [<ffffffff8111c128>] tick_sched_timer+0x38/0x70
> [  488.488853]  [<ffffffff8110db2a>] __hrtimer_run_queues+0xda/0x250
> [  488.489739]  [<ffffffff8110e263>] hrtimer_interrupt+0xa3/0x190
> [  488.490630]  [<ffffffff810952f3>] local_apic_timer_interrupt+0x33/0x50
> [  488.491660]  [<ffffffff81095d58>] smp_apic_timer_interrupt+0x38/0x50
> [  488.492644]  [<ffffffff8194d022>] apic_timer_interrupt+0x82/0x90
> [  488.493502]  [<ffffffff810ee1c0>] ? queued_spin_lock_slowpath+0x20/0x190
> [  488.494550]  [<ffffffff8194c21b>] _raw_spin_lock+0x1b/0x20
> [  488.495321]  [<ffffffff811b7a54>] find_vmap_area+0x14/0x60
> [  488.496197]  [<ffffffff811b8f69>] find_vm_area+0x9/0x20
> [  488.496922]  [<ffffffff810afb19>] account_kernel_stack+0x89/0x100
> [  488.497885]  [<ffffffff810aff76>] free_task+0x16/0x50
> [  488.498599]  [<ffffffff810b0042>] __put_task_struct+0x92/0x120
> [  488.499525]  [<ffffffff810b4a66>] delayed_put_task_struct+0x76/0x80
> [  488.500348]  [<ffffffff81107969>] rcu_process_callbacks+0x1f9/0x5e0
> [  488.501208]  [<ffffffff810b7ca1>] __do_softirq+0xf1/0x280
> [  488.501932]  [<ffffffff810b7f7e>] irq_exit+0x9e/0xa0
> [  488.502955]  [<ffffffff81095d5d>] smp_apic_timer_interrupt+0x3d/0x50
> [  488.503943]  [<ffffffff8194d022>] apic_timer_interrupt+0x82/0x90
> [  488.504886]  <EOI>  [<ffffffff8194c20b>] ? _raw_spin_lock+0xb/0x20
> [  488.505877]  [<ffffffff811b7c33>] ? __get_vm_area_node+0xc3/0x160
> [  488.506812]  [<ffffffff810e013e>] ? task_move_group_fair+0x7e/0x90
> [  488.507730]  [<ffffffff811b9360>] __vmalloc_node_range+0x70/0x280
> [  488.508689]  [<ffffffff810b1ce5>] ? _do_fork+0xc5/0x370
> [  488.509512]  [<ffffffff811ceb9b>] ? kmem_cache_alloc_node+0x7b/0x170
> [  488.510502]  [<ffffffff81327418>] ? current_has_perm+0x38/0x40
> [  488.511430]  [<ffffffff810b0501>] copy_process.part.46+0x141/0x1760
> [  488.512449]  [<ffffffff810b1ce5>] ? _do_fork+0xc5/0x370
> [  488.513285]  [<ffffffff8111faf3>] ? do_futex+0x293/0xad0
> [  488.514093]  [<ffffffff810df14a>] ? check_preempt_wakeup+0x10a/0x240
> [  488.515108]  [<ffffffff810d92c2>] ? wake_up_new_task+0xf2/0x180
> [  488.516043]  [<ffffffff810b1ce5>] _do_fork+0xc5/0x370
> [  488.516786]  [<ffffffff8112039d>] ? SyS_futex+0x6d/0x150
> [  488.517615]  [<ffffffff810b2014>] SyS_clone+0x14/0x20
> [  488.518385]  [<ffffffff81002b72>] do_syscall_64+0x52/0xb0
> [  488.519239]  [<ffffffff8194c4e1>] entry_SYSCALL64_slow_path+0x25/0x25
>
> The bug seems straightforward: vmap_area_lock is held, the RCU softirq
> fires, and vmap_area_lock recurses and deadlocks.  Lockdep agrees with
> my assessment and catches the bug immediately on boot.
>
> What's the right fix?  Change all spin_lock calls on vmap_area_lock to
> spin_lock_bh?  Somehow ask RCU not to call delayed_put_task_struct
> from bh context?  I would naively have expected RCU to only call its
> callbacks from thread context, but I was clearly wrong.

I fixed (worked around?) it by caching the vm_struct * so I can skip
calling find_vm_area.  vfree works in IRQ context.  IMO this is still
a wee bit ugly.

--Andy

  reply	other threads:[~2016-06-16 21:27 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-16  0:28 [PATCH 00/13] Virtually mapped stacks with guard pages (x86, core) Andy Lutomirski
2016-06-16  0:28 ` [PATCH 01/13] x86/mm/hotplug: Don't remove PGD entries in remove_pagetable() Andy Lutomirski
2016-06-16  0:28 ` [PATCH 02/13] x86/cpa: In populate_pgd, don't set the pgd entry until it's populated Andy Lutomirski
2016-06-16  0:28 ` [PATCH 03/13] x86/cpa: Warn if kernel_unmap_pages_in_pgd is used inappropriately Andy Lutomirski
2016-06-16  0:28 ` [PATCH 04/13] mm: Track NR_KERNEL_STACK in pages instead of number of stacks Andy Lutomirski
2016-06-16 11:10   ` Vladimir Davydov
2016-06-16 17:21     ` Andy Lutomirski
2016-06-16 19:20       ` Andy Lutomirski
2016-06-16 15:33   ` Josh Poimboeuf
2016-06-16 17:39     ` Andy Lutomirski
2016-06-16 19:39       ` Josh Poimboeuf
2016-06-16  0:28 ` [PATCH 05/13] mm: Move memcg stack accounting to account_kernel_stack Andy Lutomirski
2016-06-16  0:28 ` [PATCH 06/13] fork: Add generic vmalloced stack support Andy Lutomirski
2016-06-16 17:25   ` Kees Cook
2016-06-16 17:37     ` Andy Lutomirski
2016-06-16  0:28 ` [PATCH 07/13] x86/die: Don't try to recover from an OOPS on a non-default stack Andy Lutomirski
2016-06-16  0:28 ` [PATCH 08/13] x86/dumpstack: When OOPSing, rewind the stack before do_exit Andy Lutomirski
2016-06-16 17:50   ` Josh Poimboeuf
2016-06-16 17:57     ` Andy Lutomirski
2016-06-16  0:28 ` [PATCH 09/13] x86/dumpstack: When dumping stack bytes due to OOPS, start with regs->sp Andy Lutomirski
2016-06-16 11:56   ` Borislav Petkov
2016-07-08 12:07   ` [tip:x86/debug] x86/dumpstack: Honor supplied @regs arg tip-bot for Andy Lutomirski
2016-06-16  0:28 ` [PATCH 10/13] x86/dumpstack: Try harder to get a call trace on stack overflow Andy Lutomirski
2016-06-16 18:16   ` Josh Poimboeuf
2016-06-16 18:22     ` Andy Lutomirski
2016-06-16 18:33       ` Josh Poimboeuf
2016-06-16 18:37         ` Andy Lutomirski
2016-06-16 18:54           ` Josh Poimboeuf
2016-06-16  0:28 ` [PATCH 11/13] x86/dumpstack/64: Handle faults when printing the "Stack:" part of an OOPS Andy Lutomirski
2016-06-16  0:28 ` [PATCH 12/13] x86/mm/64: Enable vmapped stacks Andy Lutomirski
2016-06-16  4:17   ` Mika Penttilä
2016-06-16  5:33     ` Andy Lutomirski
2016-06-16 13:11       ` [kernel-hardening] " Rik van Riel
2016-06-16  0:28 ` [PATCH 13/13] x86/mm: Improve stack-overflow #PF handling Andy Lutomirski
2016-06-16  6:05 ` [PATCH 00/13] Virtually mapped stacks with guard pages (x86, core) Heiko Carstens
2016-06-16 17:50   ` Andy Lutomirski
2016-06-16 18:14     ` Andy Lutomirski
2016-06-16 21:27       ` Andy Lutomirski [this message]
2016-06-17  3:58   ` Andy Lutomirski
2016-06-17  7:27     ` Heiko Carstens
2016-06-17 17:38       ` Andy Lutomirski
2016-06-20  5:58         ` Heiko Carstens
2016-06-20  6:01           ` Andy Lutomirski
2016-06-20  7:07             ` Heiko Carstens
2016-06-16 17:24 ` Kees Cook
2016-07-04 22:31 [PATCH -v2 0/3] x86/MSR: Improve unhandled MSR access error message Borislav Petkov
2016-07-04 22:31 ` [PATCH -v2 1/3] x86/dumpstack: Honor supplied @regs arg Borislav Petkov
2016-07-04 22:31 ` [PATCH -v2 2/3] printk: Make the printk*once() variants return a value Borislav Petkov
2016-07-08 12:08   ` [tip:x86/debug] " tip-bot for Borislav Petkov
2016-07-09  2:40     ` Joe Perches
2016-07-09  7:50       ` Borislav Petkov
2016-07-09 17:56         ` Joe Perches
2016-07-10  6:49           ` Borislav Petkov
2016-07-10  8:23             ` Joe Perches
2016-07-10 12:06               ` Borislav Petkov
2016-07-10 12:33                 ` Joe Perches
2016-07-04 22:31 ` [PATCH -v2 3/3] x86/dumpstack: Add show_stack_regs() and use it Borislav Petkov
2016-07-08 12:08   ` [tip:x86/debug] " tip-bot for Borislav Petkov
2016-07-06 12:58 ` [PATCH -v2 0/3] x86/MSR: Improve unhandled MSR access error message Andy Lutomirski
2016-07-06 13:11   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALCETrVOF7VR2jd_ox32nieysLwuyQ9iuLa0XiNFn4uBd9oLrw@mail.gmail.com \
    --to=luto@amacapital.net \
    --cc=bp@alien8.de \
    --cc=brgerst@gmail.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=jpoimboe@redhat.com \
    --cc=keescook@chromium.org \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luto@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).