From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1756137AbdETFYq (ORCPT <rfc822;w@1wt.eu>);
        Sat, 20 May 2017 01:24:46 -0400
Received: from mail.kernel.org ([198.145.29.99]:34772 "EHLO mail.kernel.org"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1755146AbdETFYm (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Sat, 20 May 2017 01:24:42 -0400
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB77C239E8
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=luto@kernel.org
MIME-Version: 1.0
In-Reply-To: <20170519213556.pv5kxocfprfkloay@treble>
References: <20170505122200.31436-1-jslaby@suse.cz> <20170505122200.31436-7-jslaby@suse.cz>
 <CA+55aFzEF_A1MzH8P8EX35FSci7UFj_+5LdsEFVai1DYtCk-Hg@mail.gmail.com>
 <20170507165524.cdxfuwbd5alr7v6k@treble> <20170519205354.caeyqri2k6gvso3w@treble>
 <8dbbb971-fc41-fba2-f356-931a7eabe6ef@zytor.com> <20170519212913.otir6mlujoxoy3ha@treble>
 <20170519213556.pv5kxocfprfkloay@treble>
From: Andy Lutomirski <luto@kernel.org>
Date: Fri, 19 May 2017 22:23:53 -0700
X-Gmail-Original-Message-ID: <CALCETrWxwhKvm2jYG+d2xupd6uuEDAeBkafPonYNhxF6O=+qVA@mail.gmail.com>
Message-ID: <CALCETrWxwhKvm2jYG+d2xupd6uuEDAeBkafPonYNhxF6O=+qVA@mail.gmail.com>
Subject: Re: [PATCH 7/7] DWARF: add the config option
To: Josh Poimboeuf <jpoimboe@redhat.com>, "H. J. Lu" <hjl.tools@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
        Jiri Slaby <jslaby@suse.cz>, Andrew Morton <akpm@linux-foundation.org>,
        live-patching@vger.kernel.org, Thomas Gleixner <tglx@linutronix.de>,
        Ingo Molnar <mingo@redhat.com>,
        "the arch/x86 maintainers" <x86@kernel.org>,
        Andy Lutomirski <luto@kernel.org>, Jiri Kosina <jikos@kernel.org>,
        Linus Torvalds <torvalds@linux-foundation.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Fri, May 19, 2017 at 2:35 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Fri, May 19, 2017 at 04:29:13PM -0500, Josh Poimboeuf wrote:
>> > How are you handling control flow?
>>
>> Control flow of what?
>>
>> > > Here's the struct in its current state:
>> > >
>> > >   #define UNDWARF_REG_UNDEFINED           0
>> > >   #define UNDWARF_REG_CFA                 1
>> > >   #define UNDWARF_REG_SP                  2
>> > >   #define UNDWARF_REG_FP                  3
>> > >   #define UNDWARF_REG_SP_INDIRECT         4
>> > >   #define UNDWARF_REG_FP_INDIRECT         5
>> > >   #define UNDWARF_REG_R10                 6
>> > >   #define UNDWARF_REG_DI                  7
>> > >   #define UNDWARF_REG_DX                  8
>> > >
>> >
>> > Why only those registers?  Also, if you have the option I would really
>> > suggest using the actual x86 register numbers (ax, ex, dx, bx, sp, bp,
>> > si, di, r8-r15 in that order.)
>>
>> Those are the only registers which are ever needed as the base for
>> finding the previous stack frame.  99% of the time it's sp or bp, the
>> other registers are needed for aligned stacks and entry code.
>>
>> Using the actual register numbers isn't an option because I don't need
>> them all and they need to fit in a small number of bits.
>>
>> This construct might be useful for other arches, which is why I called
>> it "FP" instead of "BP".  But then I ruined that with the last 3 :-)
>
> BTW, here's the link to the unwinder code if you're interested:
>
>   https://github.com/jpoimboe/linux/blob/undwarf/arch/x86/kernel/unwind_undwarf.c

At the risk of potentially overcomplicating matters, here's a
suggestion.  As far as I know, all (or most all?) unwinders
effectively do the following in a loop:

1. Look up the IP to figure out how to unwind from that IP.
2. Use the results of that lookup to compute the previous frame state.

The results of step 1 could perhaps be expressed like this:

struct reg_formula {
  unsigned int source_reg :4;
  long offset;
  bool dereference;  /* true: *(reg + offset); false: (reg + offset) */
  /* For DWARF, I think this can be considerably more complicated, but
I doubt it's useful. */
};

struct unwind_step {
  u16 available_regs;  /* mask of the caller frame regs that we are
able to recover */
  struct reg_formula[16];
};

The CFA computation is just reg_formula[UNWIND_REG_SP] (or that plus
or minus sizeof(unsigned long) or whatever -- I can never remember
exactly what CFA refers to.)  For a frame pointer-based unwinder, the
entire unwind_step is a foregone conclusion independent of IP: SP = BP
+ 8 (or whatever), BP = *(BP + whatever), all other regs unknown.

Could it make sense to actually structure the code this way?  I can
see a few advantages.  It would make the actual meat of the unwind
loop totally independent of the unwinding algorithm in use, it would
make the meat be dead simple (and thus easy to verify for
non-crashiness), and I think it just might open the door for a real
in-kernel DWARF unwinder that Linus would be okay with.  Specifically,
write a function like:

bool get_dwarf_step(struct unwind_step *step, unsigned long ip);

Put this function in its own file and make it buildable as kernel code
or as user code.  Write a test case that runs it on every single
address on the kernel image (in user mode!) with address-sanitizer
enabled (or in Valgrind or both) and make sure that (a) it doesn't
blow up and (b) that the results are credible (e.g. by comparing to
objtool output).  Heck, you could even fuzz-test it where the fuzzer
is allowed to corrupt the actual DWARF data.  You could do the same
thing with whatever crazy super-compacted undwarf scheme someone comes
up with down the road, too.

I personally like the idea of using real DWARF annotations in the
entry code because it makes gdb work better (not kgdb -- real gdb
attached to KVM).  I bet that we could get entry asm annotations into
good shape if we really wanted to.  OTOH, getting DWARF to work well
for inline asm is really nasty IIRC.

(H.J., could we get a binutils feature that allows is to do:

pushq %whatever
.cfi_adjust_sp -8
...
popq %whatever
.cfi_adjust_sp 8

that will emit the right DWARF instructions regardless of whether
there's a frame pointer or not?  .cfi_adjust_cfa_offset is not
particularly helpful here because it's totally wrong if the CFA is
currently being computed based on BP.)


Also, you read the stack like this:

*val = *(unsigned long *)addr;

how about probe_kernel_read() instead?

--Andy