linux-toolchains.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
@ 2022-09-09 18:07 Josh Poimboeuf
  2022-09-11 15:26 ` Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2022-09-09 18:07 UTC (permalink / raw)
  To: linux-toolchains
  Cc: Peter Zijlstra, Indu Bhagat, Nick Desaulniers, linux-kernel,
	Jose E. Marchesi, Miroslav Benes, Mark Rutland, Will Deacon, x86,
	linux-arm-kernel, live-patching, linuxppc-dev, Ard Biesheuvel,
	Chen Zhongjin, Sathvika Vasireddy, Christophe Leroy, Mark Brown

Hi,

Here's a preview of what I'm planning to discuss at the LPC toolchains
microconference.  Feel free to start the discussion early :-)

This is a proposal for some new minor GCC/Clang features which would
help objtool greatly.


Background
----------

Objtool is a kernel-specific tool which reverse engineers the control
flow graph (CFG) of compiled objects.  It then performs various
validations, annotations, and modifications, mostly with the goal of
improving robustness and security of the kernel.

Objtool features which use the CFG include include:
validation/generation of unwinding metadata; validation of Intel SMAP
rules; and validation of kernel "noinstr" rules (preventing compiler
instrumentation in certain critical sections).

In general it's not feasible for the traditional toolchain to do any of
this work, because the kernel has a lot of "blind spots" which the
toolchain doesn't have visibility to, notably asm and inline asm.
Manual .cfi annotations are very difficult to maintain and even more
difficult to ensure correctness.  Also, due to kernel live patching, the
kernel relies on 100% correctness of unwinding metadata, whereas the
toolchain treats it as a best effort.


Challenges
----------

Reverse engineering the control flow graph is mostly quite
straightforward, with two notable exceptions:

1) Jump tables (e.g., switch statements):

   Depending on the architecture, it's somewhere between difficult and
   impossible to reliabily identify which indirect jumps correspond to
   jump tables, and what are their corresponding intra-function jump
   destinations.

2) Noreturn functions:
   
   There's no reliable way to determine which functions are designated
   by the compiler to be noreturn (either explictly via function
   attribute, or implicitly via a static function which is a wrapper
   around a noreturn function.)  This information is needed because the
   code after the call to such a function is optimized out as
   unreachable and objtool has no way of knowing that.


Proposal
--------

Add the following new compiler flags which create non-allocatable ELF
sections which "annotate" control flow:

(Note this is purely hypothetical, intended for starting a discussion.
I'm not a compiler person and I haven't written any compiler code.)


1) -fannotate-jump-table

Create an .annotate.jump_table section which is an array of the
following variable-length structure:

  struct annotate_jump_table {
	void *indirect_jmp;
	long num_targets;
	void *targets[];
  };


For example, given the following switch statement code:

  .Lswitch_jmp:
	// %rax is .Lcase_1 or .Lcase_2
	jmp %rax

  .Lcase_1:
	...
  .Lcase_2:
	...


Add the following code:

  .pushsection .annotate.jump_table
	// indirect JMP address
	.quad .Lswitch_jmp

	// num jump targets
	.quad 2

	// indirect JMP target addresses
	.quad .Lcase_1
	.quad .Lcase_2
  .popsection


2) -fannotate-noreturn

Create an .annotate.noreturn section which is an array of pointers to
noreturn functions (both explicit/implicit and defined/undefined).


For example, given the following three noreturn functions:

  // explicit noreturn:
  __attribute__((__noreturn__)) void func1(void)
  {
	exit(1);
  }

  // explicit noreturn (extern):
  extern __attribute__((__noreturn__)) void func2(void);

  // implicit noreturn:
  static void func3(void)
  {
  	// call noreturn function
	func2();
  }


Add the following code:

  .pushsection .annotate.noreturn
	.quad func1
	.quad func2
	.quad func3
  .popsection


Alternatives
------------

Another idea which has been floated in the past is for objtool to read
DWARF (or .eh_frame) to help it figure out the control flow.  That
hasn't been tried yet, but would be considerably more difficult and
fragile IMO.


-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-09 18:07 [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn} Josh Poimboeuf
@ 2022-09-11 15:26 ` Peter Zijlstra
  2022-09-11 15:31   ` Ard Biesheuvel
  2022-09-12 10:52 ` Borislav Petkov
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2022-09-11 15:26 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-toolchains, Indu Bhagat, Nick Desaulniers, linux-kernel,
	Jose E. Marchesi, Miroslav Benes, Mark Rutland, Will Deacon, x86,
	linux-arm-kernel, live-patching, linuxppc-dev, Ard Biesheuvel,
	Chen Zhongjin, Sathvika Vasireddy, Christophe Leroy, Mark Brown

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> Alternatives
> ------------
> 
> Another idea which has been floated in the past is for objtool to read
> DWARF (or .eh_frame) to help it figure out the control flow.  That
> hasn't been tried yet, but would be considerably more difficult and
> fragile IMO.

I though Ard played around with that a bit on ARM64. And yes, given that
most toolchains consider DWARF itself best-effort, I'm not holding my
breath there.

On top of that, building a kernel with DWARFs on is just so much
slower..

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-11 15:26 ` Peter Zijlstra
@ 2022-09-11 15:31   ` Ard Biesheuvel
  0 siblings, 0 replies; 20+ messages in thread
From: Ard Biesheuvel @ 2022-09-11 15:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, linux-toolchains, Indu Bhagat, Nick Desaulniers,
	linux-kernel, Jose E. Marchesi, Miroslav Benes, Mark Rutland,
	Will Deacon, x86, linux-arm-kernel, live-patching, linuxppc-dev,
	Chen Zhongjin, Sathvika Vasireddy, Christophe Leroy, Mark Brown

On Sun, 11 Sept 2022 at 16:26, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > Alternatives
> > ------------
> >
> > Another idea which has been floated in the past is for objtool to read
> > DWARF (or .eh_frame) to help it figure out the control flow.  That
> > hasn't been tried yet, but would be considerably more difficult and
> > fragile IMO.
>
> I though Ard played around with that a bit on ARM64. And yes, given that
> most toolchains consider DWARF itself best-effort, I'm not holding my
> breath there.
>

I have patches out that use unwind data to locate pointer auth
sign/authenticate instructions in the code, in order to patch them to
shadow call stack pushes and pops at runtime if pointer authentication
is not supported by the hardware. This has little to do with objtool
or reliable stack traces.

I still think DWARF could help to make objtool's job a bit easier, but
I don't think it will be of any use with jump tables or noreturn
functions in particular.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-09 18:07 [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn} Josh Poimboeuf
  2022-09-11 15:26 ` Peter Zijlstra
@ 2022-09-12 10:52 ` Borislav Petkov
  2022-09-12 14:17   ` Michael Matz
  2022-09-12 11:31 ` Segher Boessenkool
  2022-09-13 22:51 ` Indu Bhagat
  3 siblings, 1 reply; 20+ messages in thread
From: Borislav Petkov @ 2022-09-12 10:52 UTC (permalink / raw)
  To: Josh Poimboeuf, Michael Matz
  Cc: linux-toolchains, Peter Zijlstra, Indu Bhagat, Nick Desaulniers,
	linux-kernel, Jose E. Marchesi, Miroslav Benes, Mark Rutland,
	Will Deacon, x86, linux-arm-kernel, live-patching, linuxppc-dev,
	Ard Biesheuvel, Chen Zhongjin, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

+ matz.

Micha, any opinions on the below are appreciated.

Thx.

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> Hi,
> 
> Here's a preview of what I'm planning to discuss at the LPC toolchains
> microconference.  Feel free to start the discussion early :-)
> 
> This is a proposal for some new minor GCC/Clang features which would
> help objtool greatly.
> 
> 
> Background
> ----------
> 
> Objtool is a kernel-specific tool which reverse engineers the control
> flow graph (CFG) of compiled objects.  It then performs various
> validations, annotations, and modifications, mostly with the goal of
> improving robustness and security of the kernel.
> 
> Objtool features which use the CFG include include:
> validation/generation of unwinding metadata; validation of Intel SMAP
> rules; and validation of kernel "noinstr" rules (preventing compiler
> instrumentation in certain critical sections).
> 
> In general it's not feasible for the traditional toolchain to do any of
> this work, because the kernel has a lot of "blind spots" which the
> toolchain doesn't have visibility to, notably asm and inline asm.
> Manual .cfi annotations are very difficult to maintain and even more
> difficult to ensure correctness.  Also, due to kernel live patching, the
> kernel relies on 100% correctness of unwinding metadata, whereas the
> toolchain treats it as a best effort.
> 
> 
> Challenges
> ----------
> 
> Reverse engineering the control flow graph is mostly quite
> straightforward, with two notable exceptions:
> 
> 1) Jump tables (e.g., switch statements):
> 
>    Depending on the architecture, it's somewhere between difficult and
>    impossible to reliabily identify which indirect jumps correspond to
>    jump tables, and what are their corresponding intra-function jump
>    destinations.
> 
> 2) Noreturn functions:
>    
>    There's no reliable way to determine which functions are designated
>    by the compiler to be noreturn (either explictly via function
>    attribute, or implicitly via a static function which is a wrapper
>    around a noreturn function.)  This information is needed because the
>    code after the call to such a function is optimized out as
>    unreachable and objtool has no way of knowing that.
> 
> 
> Proposal
> --------
> 
> Add the following new compiler flags which create non-allocatable ELF
> sections which "annotate" control flow:
> 
> (Note this is purely hypothetical, intended for starting a discussion.
> I'm not a compiler person and I haven't written any compiler code.)
> 
> 
> 1) -fannotate-jump-table
> 
> Create an .annotate.jump_table section which is an array of the
> following variable-length structure:
> 
>   struct annotate_jump_table {
> 	void *indirect_jmp;
> 	long num_targets;
> 	void *targets[];
>   };
> 
> 
> For example, given the following switch statement code:
> 
>   .Lswitch_jmp:
> 	// %rax is .Lcase_1 or .Lcase_2
> 	jmp %rax
> 
>   .Lcase_1:
> 	...
>   .Lcase_2:
> 	...
> 
> 
> Add the following code:
> 
>   .pushsection .annotate.jump_table
> 	// indirect JMP address
> 	.quad .Lswitch_jmp
> 
> 	// num jump targets
> 	.quad 2
> 
> 	// indirect JMP target addresses
> 	.quad .Lcase_1
> 	.quad .Lcase_2
>   .popsection
> 
> 
> 2) -fannotate-noreturn
> 
> Create an .annotate.noreturn section which is an array of pointers to
> noreturn functions (both explicit/implicit and defined/undefined).
> 
> 
> For example, given the following three noreturn functions:
> 
>   // explicit noreturn:
>   __attribute__((__noreturn__)) void func1(void)
>   {
> 	exit(1);
>   }
> 
>   // explicit noreturn (extern):
>   extern __attribute__((__noreturn__)) void func2(void);
> 
>   // implicit noreturn:
>   static void func3(void)
>   {
>   	// call noreturn function
> 	func2();
>   }
> 
> 
> Add the following code:
> 
>   .pushsection .annotate.noreturn
> 	.quad func1
> 	.quad func2
> 	.quad func3
>   .popsection
> 
> 
> Alternatives
> ------------
> 
> Another idea which has been floated in the past is for objtool to read
> DWARF (or .eh_frame) to help it figure out the control flow.  That
> hasn't been tried yet, but would be considerably more difficult and
> fragile IMO.
> 
> 
> -- 
> Josh

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-09 18:07 [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn} Josh Poimboeuf
  2022-09-11 15:26 ` Peter Zijlstra
  2022-09-12 10:52 ` Borislav Petkov
@ 2022-09-12 11:31 ` Segher Boessenkool
  2022-09-14 10:21   ` Josh Poimboeuf
  2022-09-13 22:51 ` Indu Bhagat
  3 siblings, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2022-09-12 11:31 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-toolchains, Peter Zijlstra, Indu Bhagat, Nick Desaulniers,
	linux-kernel, Jose E. Marchesi, Miroslav Benes, Mark Rutland,
	Will Deacon, x86, linux-arm-kernel, live-patching, linuxppc-dev,
	Ard Biesheuvel, Chen Zhongjin, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

Hi!

On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> 2) Noreturn functions:
>    
>    There's no reliable way to determine which functions are designated
>    by the compiler to be noreturn (either explictly via function
>    attribute, or implicitly via a static function which is a wrapper
>    around a noreturn function.)

Or just a function that does not return for any other reason.

The compiler makes no difference between functions that have the
attribute and functions that do not.  There are good reasons to not
have the attribute on functions that do in fact not return.  The
not-returningness of the function may be just an implementation
accident, something you do not want part of the API, so it *should* not
have that attribute; or you may want the callers to a function to not be
optimised according to this knowledge (you cannot *prevent* that, the
compiler can figure it out it other ways, but still) for any other
reason.

>    This information is needed because the
>    code after the call to such a function is optimized out as
>    unreachable and objtool has no way of knowing that.

Since June we (GCC) have -funreachable-traps.  This creates a trap insn
wherever control flow would otherwise go into limbo.


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-12 10:52 ` Borislav Petkov
@ 2022-09-12 14:17   ` Michael Matz
  2022-09-14  0:04     ` Josh Poimboeuf
  2022-09-15  2:56     ` Chen Zhongjin
  0 siblings, 2 replies; 20+ messages in thread
From: Michael Matz @ 2022-09-12 14:17 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Josh Poimboeuf, linux-toolchains, Peter Zijlstra, Indu Bhagat,
	Nick Desaulniers, linux-kernel, Jose E. Marchesi, Miroslav Benes,
	Mark Rutland, Will Deacon, x86, linux-arm-kernel, live-patching,
	linuxppc-dev, Ard Biesheuvel, Chen Zhongjin, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

Hey,

On Mon, 12 Sep 2022, Borislav Petkov wrote:

> Micha, any opinions on the below are appreciated.
> 
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:

> > difficult to ensure correctness.  Also, due to kernel live patching, the
> > kernel relies on 100% correctness of unwinding metadata, whereas the
> > toolchain treats it as a best effort.

Unwinding certainly is not best effort.  It's 100% reliable as far as the 
source language or compilation options require.  But as it doesn't 
touch the discussed features I won't belabor that point.

I will mention that objtool's existence is based on mistrust, of persons 
(not correctly annotating stuff) and of tools (not correctly heeding those 
annotations).  The mistrust in persons is understandable and can be dealt 
with by tools, but the mistrust in tools can't be fixed by making tools 
more complicated by emitting even more information; there's no good reason 
to assume that one piece of info can be trusted more than other pieces.  
So, if you mistrust the tools you have already lost.  That's somewhat 
philosophical, so I won't beat that horse much more either.

Now, recovering the CFG.  I'll switch order of your two items:

2) noreturn function

> >   .pushsection .annotate.noreturn
> >     .quad func1
> >     .quad func2
> >     .quad func3
> >   .popsection

This won't work for indirect calls to noreturn functions:

  void (* __attribute__((noreturn)) noretptr)(void);
  int callnoret (int i)
  {
    noretptr();
    return i + 32;
  }

The return statement is unreachable (and removed by GCC).  To know that 
you would have to mark the call statements, not the individual functions.  
All schemes that mark functions that somehow indicates a meaningful 
difference in the calling sequence (e.g. the ABI of functions) have the 
same problem: it's part of the call expressions type, not of individual 
decls.

Second problem: it's not extensible.  Today it's noreturn functions you 
want to know, and tomorrow?  So, add a flag word per entry, define bit 0 
for now to be NORETURN, and see what comes.  Add a header with a version 
(and/or identifier) as well and it's properly extensible.  For easy 
linking and identifying the blobs in the linked result include a length in 
the header.  If this were in an allocated section it would be a good idea 
to refer to the symbols in a PC-relative manner, so as to not result in 
runtime relocations.  In this case, as it's within a non-alloc section 
that doesn't matter.  So:

.section .annotate.functions
.long 1       # version
.long 0xcafe  # ident
.long 2f-1f   # length
1:
.quad func1, 1   # noreturn
.quad func2, 1   # noreturn
.quad func3, 32  # something_else_but_not_noreturn
...
2:
.long 1b-2b   # align and "checksum"

It might be that the length-and-header scheme is cumbersome if you need to 
write those section commands by hand, in which case another scheme might 
be preferrable, but it should somehow be self-delimiting.

For the above problem of indirect calls to noreturns, instead do:

  .text
  noretcalllabel:
    call noreturn
  othercall:
    call really_special_thing
  .section .annotate.noretcalls
  .quad noretcalllabel, 1  # noreturn call
  .quad othercall, 32      # call to some special(-ABI?) function

Same thoughts re extensibility and self-delimitation apply.

1) jump tables

> > Create an .annotate.jump_table section which is an array of the
> > following variable-length structure:
> > 
> >   struct annotate_jump_table {
> > 	void *indirect_jmp;
> > 	long num_targets;
> > 	void *targets[];
> >   };

It's very often the case that the compiler already emits what your 
.targets[] member would encode, just at some unknown place, length and 
encoding.  So you would save space if you instead only remember the 
encoding and places of those jump tables:

struct {
  void *indirect_jump;
  long num_tables;
  struct {
    unsigned num_entries;
    unsigned encoding;
    void *start_of_table;
  } tables[];
};

The usual encodings are: direct, PC-relative, relative-to-start-of-table.  
Usually for a specific jump instruction there's only one table, so 
optimizing for that makes sense.  For strange unthought-of cases it's 
probably a good idea to have your initial scheme as fallback, which could 
be indicated by a special .encoding value.

> > For example, given the following switch statement code:
> > 
> >   .Lswitch_jmp:
> > 	// %rax is .Lcase_1 or .Lcase_2
> > 	jmp %rax

So, usually %rax would point into a table (somewhere in .rodata/.text) 
that looks like so:

.Ljump_table:
 .quad .Lcase_1 - .Ljump_table
 .quad .Lcase_2 - .Ljump_table

(for position-independend code)

and hence you would emit this as annotation:

.quad .Lswitch_jmp
.quad 1                   # only a single table
.long 2                   # with two entries
.long RELATIVE_TO_START   # all entries are X - start_of_table
.quad .Ljump_table

In this case you won't save anything of course, but as soon as there's a 
meaningful number of cases you will.

Again, if that info would be put into an allocated section you would want 
to use relative encodings of the addresses to avoid runtime relocs.  And 
the remarks about self-delimitation and extensibility also apply here.

> > Alternatives
> > ------------
> > 
> > Another idea which has been floated in the past is for objtool to read
> > DWARF (or .eh_frame) to help it figure out the control flow.  That
> > hasn't been tried yet, but would be considerably more difficult and
> > fragile IMO.

While noreturn functions are marked in the debug info, noreturn 
function types currently aren't quite correct.  And jump-tables aren't 
marked at all, so that would lose.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-09 18:07 [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn} Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2022-09-12 11:31 ` Segher Boessenkool
@ 2022-09-13 22:51 ` Indu Bhagat
  2022-09-14  0:12   ` Josh Poimboeuf
  3 siblings, 1 reply; 20+ messages in thread
From: Indu Bhagat @ 2022-09-13 22:51 UTC (permalink / raw)
  To: Josh Poimboeuf, linux-toolchains
  Cc: Peter Zijlstra, Nick Desaulniers, linux-kernel, Jose E. Marchesi,
	Miroslav Benes, Mark Rutland, Will Deacon, x86, linux-arm-kernel,
	live-patching, linuxppc-dev, Ard Biesheuvel, Chen Zhongjin,
	Sathvika Vasireddy, Christophe Leroy, Mark Brown

Hi Josh,

On 9/9/22 11:07, Josh Poimboeuf wrote:
> Hi,
> 
> Here's a preview of what I'm planning to discuss at the LPC toolchains
> microconference.  Feel free to start the discussion early :-)
> 
> This is a proposal for some new minor GCC/Clang features which would
> help objtool greatly.
> 
> 
> Background
> ----------
> 
> Objtool is a kernel-specific tool which reverse engineers the control
> flow graph (CFG) of compiled objects.  It then performs various
> validations, annotations, and modifications, mostly with the goal of
> improving robustness and security of the kernel.
> 
> Objtool features which use the CFG include include:
> validation/generation of unwinding metadata; validation of Intel SMAP
> rules; and validation of kernel "noinstr" rules (preventing compiler
> instrumentation in certain critical sections).
> 
> In general it's not feasible for the traditional toolchain to do any of
> this work, because the kernel has a lot of "blind spots" which the
> toolchain doesn't have visibility to, notably asm and inline asm.
> Manual .cfi annotations are very difficult to maintain and even more
> difficult to ensure correctness.  Also, due to kernel live patching, the
> kernel relies on 100% correctness of unwinding metadata, whereas the
> toolchain treats it as a best effort.
> 
> 
> Challenges
> ----------
> 
> Reverse engineering the control flow graph is mostly quite
> straightforward, with two notable exceptions:
> 
> 1) Jump tables (e.g., switch statements):
> 
>     Depending on the architecture, it's somewhere between difficult and
>     impossible to reliabily identify which indirect jumps correspond to
>     jump tables, and what are their corresponding intra-function jump
>     destinations.
> 
> 2) Noreturn functions:
>     
>     There's no reliable way to determine which functions are designated
>     by the compiler to be noreturn (either explictly via function
>     attribute, or implicitly via a static function which is a wrapper
>     around a noreturn function.)  This information is needed because the
>     code after the call to such a function is optimized out as
>     unreachable and objtool has no way of knowing that.
> 
> 

Curious to know what all features of objtool rely on the need to reverse 
engineer the control flow graph. Is it a larger set or it is only for 
ORC generation ?

> Proposal
> --------
> 
> Add the following new compiler flags which create non-allocatable ELF
> sections which "annotate" control flow:
> 
> (Note this is purely hypothetical, intended for starting a discussion.
> I'm not a compiler person and I haven't written any compiler code.)
> 
> 
> 1) -fannotate-jump-table
> 
> Create an .annotate.jump_table section which is an array of the
> following variable-length structure:
> 
>    struct annotate_jump_table {
> 	void *indirect_jmp;
> 	long num_targets;
> 	void *targets[];
>    };
> 
> 
> For example, given the following switch statement code:
> 
>    .Lswitch_jmp:
> 	// %rax is .Lcase_1 or .Lcase_2
> 	jmp %rax
> 
>    .Lcase_1:
> 	...
>    .Lcase_2:
> 	...
> 
> 
> Add the following code:
> 
>    .pushsection .annotate.jump_table
> 	// indirect JMP address
> 	.quad .Lswitch_jmp
> 
> 	// num jump targets
> 	.quad 2
> 
> 	// indirect JMP target addresses
> 	.quad .Lcase_1
> 	.quad .Lcase_2
>    .popsection
> 
> 
> 2) -fannotate-noreturn
> 
> Create an .annotate.noreturn section which is an array of pointers to
> noreturn functions (both explicit/implicit and defined/undefined).
> 
> 
> For example, given the following three noreturn functions:
> 
>    // explicit noreturn:
>    __attribute__((__noreturn__)) void func1(void)
>    {
> 	exit(1);
>    }
> 
>    // explicit noreturn (extern):
>    extern __attribute__((__noreturn__)) void func2(void);
> 
>    // implicit noreturn:
>    static void func3(void)
>    {
>    	// call noreturn function
> 	func2();
>    }
> 
> 
> Add the following code:
> 
>    .pushsection .annotate.noreturn
> 	.quad func1
> 	.quad func2
> 	.quad func3
>    .popsection
> 
> 
> Alternatives
> ------------
> 
> Another idea which has been floated in the past is for objtool to read
> DWARF (or .eh_frame) to help it figure out the control flow.  That
> hasn't been tried yet, but would be considerably more difficult and
> fragile IMO.
> 



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-12 14:17   ` Michael Matz
@ 2022-09-14  0:04     ` Josh Poimboeuf
  2022-09-14 14:00       ` Peter Zijlstra
  2022-09-15  2:56     ` Chen Zhongjin
  1 sibling, 1 reply; 20+ messages in thread
From: Josh Poimboeuf @ 2022-09-14  0:04 UTC (permalink / raw)
  To: Michael Matz
  Cc: Borislav Petkov, linux-toolchains, Peter Zijlstra, Indu Bhagat,
	Nick Desaulniers, linux-kernel, Jose E. Marchesi, Miroslav Benes,
	Mark Rutland, Will Deacon, x86, linux-arm-kernel, live-patching,
	linuxppc-dev, Ard Biesheuvel, Chen Zhongjin, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

On Mon, Sep 12, 2022 at 02:17:36PM +0000, Michael Matz wrote:
> Hey,

Hi Michael,

Thanks for looking at this.

> On Mon, 12 Sep 2022, Borislav Petkov wrote:
> 
> > Micha, any opinions on the below are appreciated.
> > 
> > On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> 
> > > difficult to ensure correctness.  Also, due to kernel live patching, the
> > > kernel relies on 100% correctness of unwinding metadata, whereas the
> > > toolchain treats it as a best effort.
> 
> Unwinding certainly is not best effort.  It's 100% reliable as far as the 
> source language or compilation options require.  But as it doesn't 
> touch the discussed features I won't belabor that point.

Ok, maybe I had the wrong impression about the reliability of DWARF.

> I will mention that objtool's existence is based on mistrust, of persons 
> (not correctly annotating stuff) and of tools (not correctly heeding those 
> annotations).  The mistrust in persons is understandable and can be dealt 
> with by tools, but the mistrust in tools can't be fixed by making tools 
> more complicated by emitting even more information; there's no good reason 
> to assume that one piece of info can be trusted more than other pieces.  
> So, if you mistrust the tools you have already lost.  That's somewhat 
> philosophical, so I won't beat that horse much more either.

Maybe this is semantics, but I wouldn't characterize objtool's existence
as being based on the mistrust of tools.  It's main motivation is to
fill in the toolchain's blind spots in asm and inline-asm, which exist
by design.

(Objtool has actually found many compiler bugs, but that's a side
benefit and not its reason for existence.)

I understand the concern about trusting one piece of info more than
others, but we have to trust the toolchain.  Also, objtool does a lot of
consistency checks, and experience shows that if there's a bug in the
existing jump table or noreturn detection logic, it almost always
quickly surfaces as an objtool warning: unreachable instruction, stack
state mismatch, falling through the end of a function, etc.

> Now, recovering the CFG.  I'll switch order of your two items:
> 
> 2) noreturn function
> 
> > >   .pushsection .annotate.noreturn
> > >     .quad func1
> > >     .quad func2
> > >     .quad func3
> > >   .popsection
> 
> This won't work for indirect calls to noreturn functions:
> 
>   void (* __attribute__((noreturn)) noretptr)(void);
>   int callnoret (int i)
>   {
>     noretptr();
>     return i + 32;
>   }
> 
> The return statement is unreachable (and removed by GCC).  To know that 
> you would have to mark the call statements, not the individual functions.  
> All schemes that mark functions that somehow indicates a meaningful 
> difference in the calling sequence (e.g. the ABI of functions) have the 
> same problem: it's part of the call expressions type, not of individual 
> decls.
>
> Second problem: it's not extensible.  Today it's noreturn functions you 
> want to know, and tomorrow?  So, add a flag word per entry, define bit 0 
> for now to be NORETURN, and see what comes.  Add a header with a version 
> (and/or identifier) as well and it's properly extensible.  For easy 
> linking and identifying the blobs in the linked result include a length in 
> the header.  If this were in an allocated section it would be a good idea 
> to refer to the symbols in a PC-relative manner, so as to not result in 
> runtime relocations.  In this case, as it's within a non-alloc section 
> that doesn't matter.  So:
> 
> .section .annotate.functions
> .long 1       # version
> .long 0xcafe  # ident
> .long 2f-1f   # length
> 1:
> .quad func1, 1   # noreturn
> .quad func2, 1   # noreturn
> .quad func3, 32  # something_else_but_not_noreturn
> ...
> 2:
> .long 1b-2b   # align and "checksum"
> 
> It might be that the length-and-header scheme is cumbersome if you need to 
> write those section commands by hand, in which case another scheme might 
> be preferrable, but it should somehow be self-delimiting.
> 
> For the above problem of indirect calls to noreturns, instead do:
> 
>   .text
>   noretcalllabel:
>     call noreturn
>   othercall:
>     call really_special_thing
>   .section .annotate.noretcalls
>   .quad noretcalllabel, 1  # noreturn call
>   .quad othercall, 32      # call to some special(-ABI?) function
> 
> Same thoughts re extensibility and self-delimitation apply.

Hm, I didn't know noreturn function pointers were a thing.  Annotating
the call site instead of the function would be fine.

I'm thinking PC-relative relocs are a good idea regardless, it makes the
binary smaller even if the section isn't allocatable.

As far as extending goes, I had been thinking future annotation types
would just go in new sections, e.g. .annotate.retpolinecalls, each
section with its own format.  And that has the benefit of being a
simpler and easier to parse format (no headers, versions, lengths, etc).
But either way is fine I think.

> 
> 1) jump tables
> 
> > > Create an .annotate.jump_table section which is an array of the
> > > following variable-length structure:
> > > 
> > >   struct annotate_jump_table {
> > > 	void *indirect_jmp;
> > > 	long num_targets;
> > > 	void *targets[];
> > >   };
> 
> It's very often the case that the compiler already emits what your 
> .targets[] member would encode, just at some unknown place, length and 
> encoding.  So you would save space if you instead only remember the 
> encoding and places of those jump tables:
> 
> struct {
>   void *indirect_jump;
>   long num_tables;
>   struct {
>     unsigned num_entries;
>     unsigned encoding;
>     void *start_of_table;
>   } tables[];
> };
> 
> The usual encodings are: direct, PC-relative, relative-to-start-of-table.  
> Usually for a specific jump instruction there's only one table, so 
> optimizing for that makes sense.  For strange unthought-of cases it's 
> probably a good idea to have your initial scheme as fallback, which could 
> be indicated by a special .encoding value.
> 
> > > For example, given the following switch statement code:
> > > 
> > >   .Lswitch_jmp:
> > > 	// %rax is .Lcase_1 or .Lcase_2
> > > 	jmp %rax
> 
> So, usually %rax would point into a table (somewhere in .rodata/.text) 
> that looks like so:
> 
> .Ljump_table:
>  .quad .Lcase_1 - .Ljump_table
>  .quad .Lcase_2 - .Ljump_table
> 
> (for position-independend code)
> 
> and hence you would emit this as annotation:
> 
> .quad .Lswitch_jmp
> .quad 1                   # only a single table
> .long 2                   # with two entries
> .long RELATIVE_TO_START   # all entries are X - start_of_table
> .quad .Ljump_table
> 
> In this case you won't save anything of course, but as soon as there's a 
> meaningful number of cases you will.

As a user of the data, I would prefer a simpler format (something like
my original scheme) which uses more space, rather than needing headers,
fallback scheme, encodings, blob lengths, etc just to save some
non-allocatable bytes.  But the above seems fine.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-13 22:51 ` Indu Bhagat
@ 2022-09-14  0:12   ` Josh Poimboeuf
  0 siblings, 0 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2022-09-14  0:12 UTC (permalink / raw)
  To: Indu Bhagat
  Cc: linux-toolchains, Peter Zijlstra, Nick Desaulniers, linux-kernel,
	Jose E. Marchesi, Miroslav Benes, Mark Rutland, Will Deacon, x86,
	linux-arm-kernel, live-patching, linuxppc-dev, Ard Biesheuvel,
	Chen Zhongjin, Sathvika Vasireddy, Christophe Leroy, Mark Brown

On Tue, Sep 13, 2022 at 03:51:44PM -0700, Indu Bhagat wrote:
> Curious to know what all features of objtool rely on the need to reverse
> engineer the control flow graph. Is it a larger set or it is only for ORC
> generation ?

Objtool features which rely on the CFG:

- Frame pointer rule validation (when using
  CONFIG_UNWINDER_FRAME_POINTER)

- ORC metadata generation

- Intel SMAP rule validation - ensures EFLAGS #AC is only set during
  usercopy

- "noinstr" rule validation - ensures no instrumentation/tracing
  functions are called in certain critical sections

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-12 11:31 ` Segher Boessenkool
@ 2022-09-14 10:21   ` Josh Poimboeuf
  2022-09-14 12:08     ` Michael Matz
  2022-09-14 12:16     ` Segher Boessenkool
  0 siblings, 2 replies; 20+ messages in thread
From: Josh Poimboeuf @ 2022-09-14 10:21 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Mark Rutland, Peter Zijlstra, linuxppc-dev, Chen Zhongjin, x86,
	Nick Desaulniers, linux-kernel, Mark Brown, Sathvika Vasireddy,
	linux-toolchains, Indu Bhagat, live-patching, Miroslav Benes,
	Will Deacon, Ard Biesheuvel, linux-arm-kernel, Jose E. Marchesi,
	Michael Matz

On Mon, Sep 12, 2022 at 06:31:14AM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > 2) Noreturn functions:
> >    
> >    There's no reliable way to determine which functions are designated
> >    by the compiler to be noreturn (either explictly via function
> >    attribute, or implicitly via a static function which is a wrapper
> >    around a noreturn function.)
> 
> Or just a function that does not return for any other reason.
> 
> The compiler makes no difference between functions that have the
> attribute and functions that do not.  There are good reasons to not
> have the attribute on functions that do in fact not return.  The
> not-returningness of the function may be just an implementation
> accident, something you do not want part of the API, so it *should* not
> have that attribute; or you may want the callers to a function to not be
> optimised according to this knowledge (you cannot *prevent* that, the
> compiler can figure it out it other ways, but still) for any other
> reason.

Yes, many static functions that are wrappers around noreturn functions
have this "implicit noreturn" property.  I agree we would need to know
about those functions (or, as Michael suggested, their call sites) as
well.

> >    This information is needed because the
> >    code after the call to such a function is optimized out as
> >    unreachable and objtool has no way of knowing that.
> 
> Since June we (GCC) have -funreachable-traps.  This creates a trap insn
> wherever control flow would otherwise go into limbo.

Ah, that's interesting, though I'm not sure if we'd be able to
distinguish between "call doesn't return" traps and other traps or
reasons for UD2.

-- 
Josh

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-14 10:21   ` Josh Poimboeuf
@ 2022-09-14 12:08     ` Michael Matz
  2022-09-14 12:16     ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Michael Matz @ 2022-09-14 12:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Segher Boessenkool, Mark Rutland, Peter Zijlstra, linuxppc-dev,
	Chen Zhongjin, x86, Nick Desaulniers, linux-kernel, Mark Brown,
	Sathvika Vasireddy, linux-toolchains, Indu Bhagat, live-patching,
	Miroslav Benes, Will Deacon, Ard Biesheuvel, linux-arm-kernel,
	Jose E. Marchesi

Hello,

On Wed, 14 Sep 2022, Josh Poimboeuf wrote:

> > >    This information is needed because the
> > >    code after the call to such a function is optimized out as
> > >    unreachable and objtool has no way of knowing that.
> > 
> > Since June we (GCC) have -funreachable-traps.  This creates a trap insn
> > wherever control flow would otherwise go into limbo.
> 
> Ah, that's interesting, though I'm not sure if we'd be able to
> distinguish between "call doesn't return" traps and other traps or
> reasons for UD2.

There are two reasons (which will turn out to be the same) for a trap (say 
'UD2' on x86-64) directly after a call insn:
1) "the call shall not have returned"
2) something else jumps to that trap because it was __builtin_unreachable 
   (or equivalent), and the compiler happened to put that ud2 directly 
   after the call.  It could have done that only when the call itself was 
   noreturn:
     cmp $foo, %rax
     jne do_trap
     call noret
    do_trap:
     ud2

So, it's all the same.  If there's an ud2 (or whatever the trap maker is) 
after a call then it was because it's noreturn.

(But, of course this costs (little) code size, unlike the non-alloc 
checker sections)


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-14 10:21   ` Josh Poimboeuf
  2022-09-14 12:08     ` Michael Matz
@ 2022-09-14 12:16     ` Segher Boessenkool
  1 sibling, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2022-09-14 12:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Mark Rutland, Peter Zijlstra, linuxppc-dev, Chen Zhongjin, x86,
	Nick Desaulniers, linux-kernel, Mark Brown, Sathvika Vasireddy,
	linux-toolchains, Indu Bhagat, live-patching, Miroslav Benes,
	Will Deacon, Ard Biesheuvel, linux-arm-kernel, Jose E. Marchesi,
	Michael Matz

On Wed, Sep 14, 2022 at 11:21:00AM +0100, Josh Poimboeuf wrote:
> On Mon, Sep 12, 2022 at 06:31:14AM -0500, Segher Boessenkool wrote:
> > On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
> > > 2) Noreturn functions:
> > >    
> > >    There's no reliable way to determine which functions are designated
> > >    by the compiler to be noreturn (either explictly via function
> > >    attribute, or implicitly via a static function which is a wrapper
> > >    around a noreturn function.)
> > 
> > Or just a function that does not return for any other reason.
> > 
> > The compiler makes no difference between functions that have the
> > attribute and functions that do not.  There are good reasons to not
> > have the attribute on functions that do in fact not return.  The
> > not-returningness of the function may be just an implementation
> > accident, something you do not want part of the API, so it *should* not
> > have that attribute; or you may want the callers to a function to not be
> > optimised according to this knowledge (you cannot *prevent* that, the
> > compiler can figure it out it other ways, but still) for any other
> > reason.
> 
> Yes, many static functions that are wrappers around noreturn functions
> have this "implicit noreturn" property.

I meant functions that are noreturn intrinsically.  The trivial example:

void f(void)
{
	for (;;)
		;
}

>  I agree we would need to know
> about those functions (or, as Michael suggested, their call sites) as
> well.

Many "potentially does not return" functions (there are very many such
functions!) turn into "never returns" functions, for some inputs (or
something in the environment).  If the compiler specialises a code path
that does not return, you'll not see that marked up any way.  Of course
such a path should not be taken in the kernel, normally :-)

> > >    This information is needed because the
> > >    code after the call to such a function is optimized out as
> > >    unreachable and objtool has no way of knowing that.
> > 
> > Since June we (GCC) have -funreachable-traps.  This creates a trap insn
> > wherever control flow would otherwise go into limbo.
> 
> Ah, that's interesting, though I'm not sure if we'd be able to
> distinguish between "call doesn't return" traps and other traps or
> reasons for UD2.

The trap handler can see where the trap came from.  And then look up
that address in some tables or such.  Just like __bug_table?


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-14  0:04     ` Josh Poimboeuf
@ 2022-09-14 14:00       ` Peter Zijlstra
  2022-09-14 14:28         ` Michael Matz
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2022-09-14 14:00 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Michael Matz, Borislav Petkov, linux-toolchains, Indu Bhagat,
	Nick Desaulniers, linux-kernel, Jose E. Marchesi, Miroslav Benes,
	Mark Rutland, Will Deacon, x86, linux-arm-kernel, live-patching,
	linuxppc-dev, Ard Biesheuvel, Chen Zhongjin, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

On Wed, Sep 14, 2022 at 01:04:16AM +0100, Josh Poimboeuf wrote:

> > I will mention that objtool's existence is based on mistrust, of persons 
> > (not correctly annotating stuff) and of tools (not correctly heeding those 
> > annotations).  The mistrust in persons is understandable and can be dealt 
> > with by tools, but the mistrust in tools can't be fixed by making tools 
> > more complicated by emitting even more information; there's no good reason 
> > to assume that one piece of info can be trusted more than other pieces.  
> > So, if you mistrust the tools you have already lost.  That's somewhat 
> > philosophical, so I won't beat that horse much more either.
> 
> Maybe this is semantics, but I wouldn't characterize objtool's existence
> as being based on the mistrust of tools.  It's main motivation is to
> fill in the toolchain's blind spots in asm and inline-asm, which exist
> by design.

That and a fairly deep seated loathing for the regular CFI annotations
and DWARF in general. Linus was fairly firm he didn't want anything to
do with DWARF for in-kernel unwinding.

That left us in a spot that we needed unwind information in a 'better'
format than DWARF.

Objtool was born out of those contraints. ORC not needing the CFI
annotations and ORC being *much* faster at unwiding and generation
(debug builds are slow) were all good.



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-14 14:00       ` Peter Zijlstra
@ 2022-09-14 14:28         ` Michael Matz
  2022-09-14 14:55           ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Michael Matz @ 2022-09-14 14:28 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, Borislav Petkov, linux-toolchains, Indu Bhagat,
	Nick Desaulniers, linux-kernel, Jose E. Marchesi, Miroslav Benes,
	Mark Rutland, Will Deacon, x86, linux-arm-kernel, live-patching,
	linuxppc-dev, Ard Biesheuvel, Chen Zhongjin, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

Hello,

On Wed, 14 Sep 2022, Peter Zijlstra wrote:

> > Maybe this is semantics, but I wouldn't characterize objtool's existence
> > as being based on the mistrust of tools.  It's main motivation is to
> > fill in the toolchain's blind spots in asm and inline-asm, which exist
> > by design.
> 
> That and a fairly deep seated loathing for the regular CFI annotations
> and DWARF in general. Linus was fairly firm he didn't want anything to
> do with DWARF for in-kernel unwinding.

I was referring only to the check-stuff functionality of objtool, not to 
its other parts.  Altough, of course, "deep seated loathing" is a special 
form of mistrust as well ;-)

> That left us in a spot that we needed unwind information in a 'better'
> format than DWARF.
> 
> Objtool was born out of those contraints. ORC not needing the CFI
> annotations and ORC being *much* faster at unwiding and generation
> (debug builds are slow) were all good.

Don't mix DWARF debug info with DWARF-based unwinding info, the latter 
doesn't imply the former.  Out of interest: how does ORC get around the 
need for CFI annotations (or equivalents to restore registers) and what 
makes it fast?  I want faster unwinding for DWARF as well, when there's 
feature parity :-)  Maybe something can be learned for integration into 
dwarf-unwind.


Ciao,
Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-14 14:28         ` Michael Matz
@ 2022-09-14 14:55           ` Peter Zijlstra
  2022-09-14 17:34             ` Segher Boessenkool
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2022-09-14 14:55 UTC (permalink / raw)
  To: Michael Matz
  Cc: Josh Poimboeuf, Borislav Petkov, linux-toolchains, Indu Bhagat,
	Nick Desaulniers, linux-kernel, Jose E. Marchesi, Miroslav Benes,
	Mark Rutland, Will Deacon, x86, linux-arm-kernel, live-patching,
	linuxppc-dev, Ard Biesheuvel, Chen Zhongjin, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

On Wed, Sep 14, 2022 at 02:28:26PM +0000, Michael Matz wrote:
> Hello,
> 
> On Wed, 14 Sep 2022, Peter Zijlstra wrote:
> 
> > > Maybe this is semantics, but I wouldn't characterize objtool's existence
> > > as being based on the mistrust of tools.  It's main motivation is to
> > > fill in the toolchain's blind spots in asm and inline-asm, which exist
> > > by design.
> > 
> > That and a fairly deep seated loathing for the regular CFI annotations
> > and DWARF in general. Linus was fairly firm he didn't want anything to
> > do with DWARF for in-kernel unwinding.
> 
> I was referring only to the check-stuff functionality of objtool, not to 
> its other parts.  Altough, of course, "deep seated loathing" is a special 
> form of mistrust as well ;-)

Those were born out the DWARF unwinder itself crashing the kernel due to
it's inherent complexity (tracking the whole DWARF state machine and not
being quite robust itself).

That, and the manual CFI annotations were 'always' wrong, due to humans
and no tooling verifying them.

That said; objtool does do have a number of annotations as well; mostly
things telling what kind of stackframe stuff starts with.

> > That left us in a spot that we needed unwind information in a 'better'
> > format than DWARF.
> > 
> > Objtool was born out of those contraints. ORC not needing the CFI
> > annotations and ORC being *much* faster at unwiding and generation
> > (debug builds are slow) were all good.
> 
> Don't mix DWARF debug info with DWARF-based unwinding info, the latter 
> doesn't imply the former.  Out of interest: how does ORC get around the 
> need for CFI annotations (or equivalents to restore registers) and what 

Objtool 'interprets' the stackops. So it follows the call-graph and is
an interpreter for all instructions that modify the stack. Doing that it
konws what the stackframe is at 'most' places.

> makes it fast?  I want faster unwinding for DWARF as well, when there's 
> feature parity :-)  Maybe something can be learned for integration into 
> dwarf-unwind.

I think we have some details here:

 https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-14 14:55           ` Peter Zijlstra
@ 2022-09-14 17:34             ` Segher Boessenkool
  0 siblings, 0 replies; 20+ messages in thread
From: Segher Boessenkool @ 2022-09-14 17:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Michael Matz, Josh Poimboeuf, Borislav Petkov, linux-toolchains,
	Indu Bhagat, Nick Desaulniers, linux-kernel, Jose E. Marchesi,
	Miroslav Benes, Mark Rutland, Will Deacon, x86, linux-arm-kernel,
	live-patching, linuxppc-dev, Ard Biesheuvel, Chen Zhongjin,
	Sathvika Vasireddy, Christophe Leroy, Mark Brown

On Wed, Sep 14, 2022 at 04:55:27PM +0200, Peter Zijlstra wrote:
> On Wed, Sep 14, 2022 at 02:28:26PM +0000, Michael Matz wrote:
> > Don't mix DWARF debug info with DWARF-based unwinding info, the latter 
> > doesn't imply the former.  Out of interest: how does ORC get around the 
> > need for CFI annotations (or equivalents to restore registers) and what 
> 
> Objtool 'interprets' the stackops. So it follows the call-graph and is
> an interpreter for all instructions that modify the stack. Doing that it
> konws what the stackframe is at 'most' places.

To get correct backtraces on e.g. PowerPC you need to emulate many of
the integer insns.  That is why GCC enables -fasynchronous-unwind-tables
by default for us.

The same is true for s390, aarch64, and x86 (unless 32-bit w/ frame
pointer).

The problem is that you do not know how to access anything on the stack,
whether in the current frame or in a previous frame, from a random point
in the program.  GDB has many heuristics for this, and it still does not
get it right in all cases.

> > makes it fast?  I want faster unwinding for DWARF as well, when there's 
> > feature parity :-)  Maybe something can be learned for integration into 
> > dwarf-unwind.
> 
> I think we have some details here:
> 
>  https://www.kernel.org/doc/html/latest/x86/orc-unwinder.html

It is faster because it does a whole lot less.  Is that still enough?
It's not clear (to me) what exact information it wants to provide :-(


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-12 14:17   ` Michael Matz
  2022-09-14  0:04     ` Josh Poimboeuf
@ 2022-09-15  2:56     ` Chen Zhongjin
  2022-09-15  8:47       ` Peter Zijlstra
  1 sibling, 1 reply; 20+ messages in thread
From: Chen Zhongjin @ 2022-09-15  2:56 UTC (permalink / raw)
  To: Michael Matz, Borislav Petkov
  Cc: Josh Poimboeuf, linux-toolchains, Peter Zijlstra, Indu Bhagat,
	Nick Desaulniers, linux-kernel, Jose E. Marchesi, Miroslav Benes,
	Mark Rutland, Will Deacon, x86, linux-arm-kernel, live-patching,
	linuxppc-dev, Ard Biesheuvel, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

Hi,

On 2022/9/12 22:17, Michael Matz wrote:
> Hey,
>
> On Mon, 12 Sep 2022, Borislav Petkov wrote:
>
>> Micha, any opinions on the below are appreciated.
>>
>> On Fri, Sep 09, 2022 at 11:07:04AM -0700, Josh Poimboeuf wrote:
>>> difficult to ensure correctness.  Also, due to kernel live patching, the
>>> kernel relies on 100% correctness of unwinding metadata, whereas the
>>> toolchain treats it as a best effort.
> Unwinding certainly is not best effort.  It's 100% reliable as far as the
> source language or compilation options require.  But as it doesn't
> touch the discussed features I won't belabor that point.
>
> I will mention that objtool's existence is based on mistrust, of persons
> (not correctly annotating stuff) and of tools (not correctly heeding those
> annotations).  The mistrust in persons is understandable and can be dealt
> with by tools, but the mistrust in tools can't be fixed by making tools
> more complicated by emitting even more information; there's no good reason
> to assume that one piece of info can be trusted more than other pieces.
> So, if you mistrust the tools you have already lost.  That's somewhat
> philosophical, so I won't beat that horse much more either.
>
> Now, recovering the CFG.  I'll switch order of your two items:
>
> 2) noreturn function
>
>>>    .pushsection .annotate.noreturn
>>>      .quad func1
>>>      .quad func2
>>>      .quad func3
>>>    .popsection
> This won't work for indirect calls to noreturn functions:
>
>    void (* __attribute__((noreturn)) noretptr)(void);
>    int callnoret (int i)
>    {
>      noretptr();
>      return i + 32;
>    }
>
> The return statement is unreachable (and removed by GCC).  To know that
> you would have to mark the call statements, not the individual functions.
> All schemes that mark functions that somehow indicates a meaningful
> difference in the calling sequence (e.g. the ABI of functions) have the
> same problem: it's part of the call expressions type, not of individual
> decls.
>
> Second problem: it's not extensible.  Today it's noreturn functions you
> want to know, and tomorrow?  So, add a flag word per entry, define bit 0
> for now to be NORETURN, and see what comes.  Add a header with a version
> (and/or identifier) as well and it's properly extensible.  For easy
> linking and identifying the blobs in the linked result include a length in
> the header.  If this were in an allocated section it would be a good idea
> to refer to the symbols in a PC-relative manner, so as to not result in
> runtime relocations.  In this case, as it's within a non-alloc section
> that doesn't matter.  So:
>
> .section .annotate.functions
> .long 1       # version
> .long 0xcafe  # ident
> .long 2f-1f   # length
> 1:
> .quad func1, 1   # noreturn
> .quad func2, 1   # noreturn
> .quad func3, 32  # something_else_but_not_noreturn
> ...
> 2:
> .long 1b-2b   # align and "checksum"
>
> It might be that the length-and-header scheme is cumbersome if you need to
> write those section commands by hand, in which case another scheme might
> be preferrable, but it should somehow be self-delimiting.
>
> For the above problem of indirect calls to noreturns, instead do:
>
>    .text
>    noretcalllabel:
>      call noreturn
>    othercall:
>      call really_special_thing
>    .section .annotate.noretcalls
>    .quad noretcalllabel, 1  # noreturn call
>    .quad othercall, 32      # call to some special(-ABI?) function
>
> Same thoughts re extensibility and self-delimitation apply.
>
> 1) jump tables
>
>>> Create an .annotate.jump_table section which is an array of the
>>> following variable-length structure:
>>>
>>>    struct annotate_jump_table {
>>> 	void *indirect_jmp;
>>> 	long num_targets;
>>> 	void *targets[];
>>>    };
> It's very often the case that the compiler already emits what your
> .targets[] member would encode, just at some unknown place, length and
> encoding.  So you would save space if you instead only remember the
> encoding and places of those jump tables:

We have found some anonymous information on x86 in .rodata.

I'm not sure if those are *all* of Josh wanted on x86, however for arm64 
we did not found that in the same section so it is a problem on arm64 now.

Does the compiler will emit these for all arches? At lease I tried and 
didn't find anything meaningful (maybe I omitted it).


Best,

Chen

> struct {
>    void *indirect_jump;
>    long num_tables;
>    struct {
>      unsigned num_entries;
>      unsigned encoding;
>      void *start_of_table;
>    } tables[];
> };
>
> The usual encodings are: direct, PC-relative, relative-to-start-of-table.
> Usually for a specific jump instruction there's only one table, so
> optimizing for that makes sense.  For strange unthought-of cases it's
> probably a good idea to have your initial scheme as fallback, which could
> be indicated by a special .encoding value.
>
>>> For example, given the following switch statement code:
>>>
>>>    .Lswitch_jmp:
>>> 	// %rax is .Lcase_1 or .Lcase_2
>>> 	jmp %rax
> So, usually %rax would point into a table (somewhere in .rodata/.text)
> that looks like so:
>
> .Ljump_table:
>   .quad .Lcase_1 - .Ljump_table
>   .quad .Lcase_2 - .Ljump_table
>
> (for position-independend code)
>
> and hence you would emit this as annotation:
>
> .quad .Lswitch_jmp
> .quad 1                   # only a single table
> .long 2                   # with two entries
> .long RELATIVE_TO_START   # all entries are X - start_of_table
> .quad .Ljump_table
>
> In this case you won't save anything of course, but as soon as there's a
> meaningful number of cases you will.
>
> Again, if that info would be put into an allocated section you would want
> to use relative encodings of the addresses to avoid runtime relocs.  And
> the remarks about self-delimitation and extensibility also apply here.
>
>>> Alternatives
>>> ------------
>>>
>>> Another idea which has been floated in the past is for objtool to read
>>> DWARF (or .eh_frame) to help it figure out the control flow.  That
>>> hasn't been tried yet, but would be considerably more difficult and
>>> fragile IMO.
> While noreturn functions are marked in the debug info, noreturn
> function types currently aren't quite correct.  And jump-tables aren't
> marked at all, so that would lose.
>
>
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-15  2:56     ` Chen Zhongjin
@ 2022-09-15  8:47       ` Peter Zijlstra
  2022-09-20 16:49         ` Ard Biesheuvel
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2022-09-15  8:47 UTC (permalink / raw)
  To: Chen Zhongjin
  Cc: Michael Matz, Borislav Petkov, Josh Poimboeuf, linux-toolchains,
	Indu Bhagat, Nick Desaulniers, linux-kernel, Jose E. Marchesi,
	Miroslav Benes, Mark Rutland, Will Deacon, x86, linux-arm-kernel,
	live-patching, linuxppc-dev, Ard Biesheuvel, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

On Thu, Sep 15, 2022 at 10:56:58AM +0800, Chen Zhongjin wrote:

> We have found some anonymous information on x86 in .rodata.

Well yes, but that's still a bunch of heuristics on our side.

> I'm not sure if those are *all* of Josh wanted on x86, however for arm64 we
> did not found that in the same section so it is a problem on arm64 now.

Nick found Bolt managed the ARM64 jumptables:

  https://github.com/llvm/llvm-project/blob/main/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp#L484

But that does look like a less than ideal solution too.

> Does the compiler will emit these for all arches? At lease I tried and
> didn't find anything meaningful (maybe I omitted it).

That's the question; can we get the compiler to help us here in a well
defined manner.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-15  8:47       ` Peter Zijlstra
@ 2022-09-20 16:49         ` Ard Biesheuvel
  2022-09-21  3:16           ` Chen Zhongjin
  0 siblings, 1 reply; 20+ messages in thread
From: Ard Biesheuvel @ 2022-09-20 16:49 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Chen Zhongjin, Michael Matz, Borislav Petkov, Josh Poimboeuf,
	linux-toolchains, Indu Bhagat, Nick Desaulniers, linux-kernel,
	Jose E. Marchesi, Miroslav Benes, Mark Rutland, Will Deacon, x86,
	linux-arm-kernel, live-patching, linuxppc-dev,
	Sathvika Vasireddy, Christophe Leroy, Mark Brown

On Thu, 15 Sept 2022 at 10:47, Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Thu, Sep 15, 2022 at 10:56:58AM +0800, Chen Zhongjin wrote:
>
> > We have found some anonymous information on x86 in .rodata.
>
> Well yes, but that's still a bunch of heuristics on our side.
>
> > I'm not sure if those are *all* of Josh wanted on x86, however for arm64 we
> > did not found that in the same section so it is a problem on arm64 now.
>
> Nick found Bolt managed the ARM64 jumptables:
>
>   https://github.com/llvm/llvm-project/blob/main/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp#L484
>
> But that does look like a less than ideal solution too.
>
> > Does the compiler will emit these for all arches? At lease I tried and
> > didn't find anything meaningful (maybe I omitted it).
>
> That's the question; can we get the compiler to help us here in a well
> defined manner.

Do BTI landing pads help at all here? I.e., I assume that objtool just
treats any indirect call as a dangling edge in the control flow graph,
and the problem is identifying the valid targets. In the BTI case,
those will all start with a 'BTI J' instruction.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn}
  2022-09-20 16:49         ` Ard Biesheuvel
@ 2022-09-21  3:16           ` Chen Zhongjin
  0 siblings, 0 replies; 20+ messages in thread
From: Chen Zhongjin @ 2022-09-21  3:16 UTC (permalink / raw)
  To: Ard Biesheuvel, Peter Zijlstra
  Cc: Michael Matz, Borislav Petkov, Josh Poimboeuf, linux-toolchains,
	Indu Bhagat, Nick Desaulniers, linux-kernel, Jose E. Marchesi,
	Miroslav Benes, Mark Rutland, Will Deacon, x86, linux-arm-kernel,
	live-patching, linuxppc-dev, Sathvika Vasireddy,
	Christophe Leroy, Mark Brown

Hi,

On 2022/9/21 0:49, Ard Biesheuvel wrote:
> On Thu, 15 Sept 2022 at 10:47, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Thu, Sep 15, 2022 at 10:56:58AM +0800, Chen Zhongjin wrote:
>>
>>> We have found some anonymous information on x86 in .rodata.
>> Well yes, but that's still a bunch of heuristics on our side.
>>
>>> I'm not sure if those are *all* of Josh wanted on x86, however for arm64 we
>>> did not found that in the same section so it is a problem on arm64 now.
>> Nick found Bolt managed the ARM64 jumptables:
>>
>>    https://github.com/llvm/llvm-project/blob/main/bolt/lib/Target/AArch64/AArch64MCPlusBuilder.cpp#L484
>>
>> But that does look like a less than ideal solution too.
>>
>>> Does the compiler will emit these for all arches? At lease I tried and
>>> didn't find anything meaningful (maybe I omitted it).
>> That's the question; can we get the compiler to help us here in a well
>> defined manner.
> Do BTI landing pads help at all here? I.e., I assume that objtool just
> treats any indirect call as a dangling edge in the control flow graph,
> and the problem is identifying the valid targets. In the BTI case,
> those will all start with a 'BTI J' instruction.

Maybe not enough, I guess.

For switch jump tables we need to know its *own* jump targets so that we 
can go through all its branches. If there are more than one indirect 
jump inside one function, only marks targets with BTI J can't help 
matching the entry and its targets.


Anyway I think this job is more for compiler. Switch jump tables is 
different from other indirect jump/call. It have fixed control flow just 
as if/else flow and the indirect jump table is just a compiler 
optimization which hide this.


Best,

Chen


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2022-09-21  3:16 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-09 18:07 [RFC] Objtool toolchain proposal: -fannotate-{jump-table,noreturn} Josh Poimboeuf
2022-09-11 15:26 ` Peter Zijlstra
2022-09-11 15:31   ` Ard Biesheuvel
2022-09-12 10:52 ` Borislav Petkov
2022-09-12 14:17   ` Michael Matz
2022-09-14  0:04     ` Josh Poimboeuf
2022-09-14 14:00       ` Peter Zijlstra
2022-09-14 14:28         ` Michael Matz
2022-09-14 14:55           ` Peter Zijlstra
2022-09-14 17:34             ` Segher Boessenkool
2022-09-15  2:56     ` Chen Zhongjin
2022-09-15  8:47       ` Peter Zijlstra
2022-09-20 16:49         ` Ard Biesheuvel
2022-09-21  3:16           ` Chen Zhongjin
2022-09-12 11:31 ` Segher Boessenkool
2022-09-14 10:21   ` Josh Poimboeuf
2022-09-14 12:08     ` Michael Matz
2022-09-14 12:16     ` Segher Boessenkool
2022-09-13 22:51 ` Indu Bhagat
2022-09-14  0:12   ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).