From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751988AbdF3FFh (ORCPT <rfc822;w@1wt.eu>);
        Fri, 30 Jun 2017 01:05:37 -0400
Received: from mail.kernel.org ([198.145.29.99]:39868 "EHLO mail.kernel.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751696AbdF3FFg (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 30 Jun 2017 01:05:36 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8F63F22BE1
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org
MIME-Version: 1.0
In-Reply-To: <20170630021249.cqkszxaqtwakmzpg@treble>
References: <cover.1498659915.git.jpoimboe@redhat.com> <f068d37f70b7f5d7cc2e3b2bdccec2a1932802ad.1498659915.git.jpoimboe@redhat.com>
 <20170629175333.bicpvbwo4d5pdbak@treble> <CALCETrWJeNXiEPPVecc2OGRENj220zNeFCYgm+Q2P7ZiMvoVFg@mail.gmail.com>
 <20170629190559.ttw52ahwtsjynayx@treble> <CALCETrUNhPgrQV9Sj+ZP3A_jSHbww4XPpBLJQ15OmH3nsHBsdg@mail.gmail.com>
 <20170629214134.c36krjhvzegwkfjk@treble> <F1E11F3A-039D-48B2-A57D-7881E93028DD@amacapital.net>
 <20170630021249.cqkszxaqtwakmzpg@treble>
From: Andy Lutomirski <luto@kernel.org>
Date: Thu, 29 Jun 2017 22:05:14 -0700
X-Gmail-Original-Message-ID: <CALCETrWBDOgfuX+uSukZ+_PHT4_j87LXY-DXPvCZ_KEXcnkWwg@mail.gmail.com>
Message-ID: <CALCETrWBDOgfuX+uSukZ+_PHT4_j87LXY-DXPvCZ_KEXcnkWwg@mail.gmail.com>
Subject: Re: [PATCH v2 6/8] x86/entry: add unwind hint annotations
To: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>, X86 ML <x86@kernel.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        live-patching@vger.kernel.org,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Jiri Slaby <jslaby@suse.cz>, Ingo Molnar <mingo@kernel.org>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Peter Zijlstra <peterz@infradead.org>, Mike Galbraith <efault@gmx.de>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Thu, Jun 29, 2017 at 7:12 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jun 29, 2017 at 03:59:04PM -0700, Andy Lutomirski wrote:
>> >
>> > Sorry, I didn't explain it very well.  Undwarf can find the regs pointer
>> > in rdi, it just doesn't trust its value.
>> >
>> > See the stack_info.next_sp field, which is set in in_irq_stack():
>> >
>> >    /*
>> >     * The next stack pointer is the first thing pushed by the entry code
>> >     * after switching to the irq stack.
>> >     */
>> >    info->next_sp = (unsigned long *)*(end - 1);
>> >
>> > It's a safety mechanism.  The unwinder needs the last word of the irq
>> > stack page to point to the previous stack.  That way it can double check
>> > that the stack pointer it calculates is within the bounds of either the
>> > current stack or the previous stack.
>> >
>> > In the above code, the previous stack pointer (or next stack pointer,
>> > depending on your perspective) hasn't been set up before it switches
>> > stacks.  So the unwinder reads an uninitialized value into
>> > info->next_sp, and compares that with the regs pointer, and then stops
>> > the unwind because it thinks it went off into the weeds.
>> >
>>
>> That should be manageable, though, I think.  With my patch applied
>> (and maybe even without it), the only exception to that rule is if
>> regs->sp points just above the top of the IRQ stack and the next
>> instruction is push reg.  In that case, the reg is exactly as
>> trustworthy as the normal rule.*  Can you teach the unwinding code
>> that this is okay?
>>
>> * If an NMI hits right there, then it relies on unwinding out of the
>> NMI correctly.  But the usual checks that the target stack is a valid
>> stack should prevent us from going off into the weeds regardless.
>
> But that would remove a safeguard against the undwarf data being
> corrupt.  Sure, it would only affect the rare case where the stack
> pointer is at the top of the IRQ stack, but still...
>
> Also, the frame pointer and guess unwinders have the same issue, and
> this solution wouldn't work for them.
>
> And, worst of all, the oops stack dumping code in show_trace_log_lvl()
> also has this issue .  It relies on those previous stack pointers.  And
> it's separated from the unwinder logic by design, so it can't ask the
> unwinder where the next stack is.

Ugh.

I feel like we had this debate before, and I thought it was rather
silly that the unwinder cared.  After all, we already have separate
safety mechanisms to make sure that the unwinder never wanders off of
the valid stacks and that it never touches any given stack more than
once.  But it is indeed useful for the oops unwinder, so c'est la vie.

That being said, I bet we could get away with this (sorry for immense
whitespace damage):

.macro ENTER_IRQ_STACK old_rsp scratch_reg
  DEBUG_ENTRY_ASSERT_IRQS_OFF
  movq %rsp, \old_rsp
  incl PER_CPU_VAR(irq_count)
  jnz .Lrecurse_irq_stack_\@

   /*
   * Right now, we just incremented irq_count to zero, so we've
   * claimed the IRQ stack but we haven't switched to it yet.
   * Anything that can interrupt us here without using IST
   * must be *extremely* careful to limit its stack usage.
   *
   * We write old_rsp to the IRQ stack before switching to
   * %rsp for the benefit of the OOPS unwinder.
   */
   movq PER_CPU_VAR(irq_stack_ptr), \scratch_reg
   movq \old_rsp, -8(\scratch_reg)
   leaq -8(\old_rsp), %rsp
   jmp .Lout_\@

   .Lrecurse_irq_stack_\@:
   pushq \old_rsp

   .Lout_\@:
.endm

After all, it looks like all the users have a scratch reg available.

Hmm.  There's another option that might be considerably nicer, though:
put the IRQ stack at a known (at link time) position *in percpu
space*.  (Presumably it already is -- I haven't checked.)  Then we do:

.macro ENTER_IRQ_STACK old_rsp
    DEBUG_ENTRY_ASSERT_IRQS_OFF
    movq    %rsp, \old_rsp
    incl    PER_CPU_VAR(irq_count)

    /*
     * Right now, if we just incremented irq_count to zero, we've
     * claimed the IRQ stack but we haven't switched to it yet.
     * Anything that can interrupt us here without using IST
     * must be *extremely* careful to limit its stack usage.
     */
    jnz .Lpush_old_rsp_\@
    movq    \old_rsp, PER_CPU_VAR(top_word_in_irq_stack)
    movq    PER_CPU_VAR(irq_stack_ptr), %rsp
    .Lpush_old_rsp_\@:
    pushq    \old_rsp
.endm

This pushes the old pointer twice, but that's easy enough to fix if we
really cared.  I think I like this variant better.  What do you think?

--Andy