All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/10] x86: undwarf unwinder
@ 2017-06-01  5:44 Josh Poimboeuf
  2017-06-01  5:44 ` [RFC PATCH 01/10] objtool: move checking code to check.c Josh Poimboeuf
                   ` (10 more replies)
  0 siblings, 11 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Create a new 'undwarf' unwinder, enabled by CONFIG_UNDWARF_UNWINDER, and
plug it into the x86 unwinder framework.  Objtool is used to generate
the undwarf debuginfo.  The undwarf debuginfo format is basically a
simplified version of DWARF CFI.  More details below.

The unwinder works well in my testing.  It unwinds through interrupts,
exceptions, and preemption, with and without frame pointers, across
aligned stacks and dynamically allocated stacks.  If something goes
wrong during an oops, it successfully falls back to printing the '?'
entries just like the frame pointer unwinder.

I'm not tied to the 'undwarf' name, other naming ideas are welcome.

Some potential future improvements:
- properly annotate or fix whitelisted functions and files
- reduce the number of base CFA registers needed in entry code
- compress undwarf debuginfo to use less memory
- make it easier to disable CONFIG_FRAME_POINTER
- add reliability checks for livepatch
- runtime NMI stack reliability checker

This code can also be found at:

  git://github.com/jpoimboe/linux undwarf-rfc

Here's the contents of the undwarf.txt file which explains the 'why' in
more detail:


Undwarf debuginfo generation
============================

Overview
--------

The kernel CONFIG_UNDWARF_UNWINDER option enables objtool generation of
undwarf debuginfo, which is out-of-band data which is used by the
in-kernel undwarf unwinder.  It's similar in concept to DWARF CFI
debuginfo which would be used by a DWARF unwinder.  The difference is
that the format of the undwarf data is simpler than DWARF, which in turn
allows the unwinder to be simpler.

Objtool generates the undwarf data by piggybacking on the compile-time
stack metadata validation work described in stack-validation.txt.  After
analyzing all the code paths of a .o file, it creates an array of
'struct undwarf's and writes them to the .undwarf section.

Then at vmlinux link time, the .undwarf section is sorted by the
sorttable script.  The resulting sorted array of undwarf structs is used
by the unwinder at runtime to correlate a given text address with its
stack state.


Why not just use DWARF?
-----------------------

Undwarf has some of the same benefits as DWARF.  Unlike frame pointers,
the debuginfo is out-of-band. so it has no effect on runtime
performance.  Another benefit is that it's possible to reliably unwind
across interrupts and exceptions.

Undwarf debuginfo's advantage over DWARF itself is that it's much
simpler.  It gets rid of the DWARF CFI state machine and also gets rid
of the tracking of unnecessary registers.  This allows the unwinder to
be much simpler, meaning fewer bugs, which is especially important for
mission critical oops code.

The simpler debuginfo format also enables the unwinder to be relatively
fast, which is important for perf and lockdep.

The undwarf format does have a few downsides.  The undwarf table takes
up extra memory -- something in the ballpark of 3-5MB, depending on the
kernel config.  In the future we may try to rearrange the data to
compress that a bit.

Another downside is that, as GCC evolves, it's conceivable that the
undwarf data may end up being *too* simple to describe the state of the
stack for certain optimizations.  Will we end up having to track the
state of more registers and eventually end up reinventing DWARF?

I think this is unlikely because GCC seems to save the frame pointer for
any unusual stack adjustments it does, so I suspect we'll really only
ever need to keep track of the stack pointer and the frame pointer
between call frames.  But even if we do end up having to track all the
registers DWARF tracks, at least we will still control the format, e.g.
no complex state machines.


Why generate undwarf with objtool?
----------------------------------

It should be possible to generate the undwarf data with a simple tool
which converts DWARF to undwarf.  However, such a solution would be
incomplete due to the kernel's extensive use of asm, inline asm, and
special sections like exception tables.

That could be rectified by manually annotating those special code paths
using GNU assembler .cfi annotations in .S files, and homegrown
annotations for inline asm in .c files.  But asm annotations were tried
in the past and were found to be unmaintainable.  They were often
incorrect/incomplete and made the code harder to read and keep updated.
And based on looking at glibc code, annotating inline asm in .c files
might be even worse.

With compile-time stack metadata validation, objtool already follows all
the code paths and already has all the information it needs to be able
to generate undwarf data from scratch.  So it's an easy step to go from
stack validation to undwarf generation.

Objtool still needs a few annotations, but only in code which does
unusual things to the stack like entry code.  And even then, far fewer
annotations are needed than what DWARF would need, so it's much more
maintainable than DWARF CFI annotations.

So the advantages of using objtool to generate undwarf are that it gives
more accurate debuginfo, with close to zero annotations.  It also
insulates the kernel from toolchain bugs which can be very painful to
deal with in the kernel since it often has to workaround issues in older
versions of the toolchain for years.

The downside is that the unwinder now becomes dependent on objtool's
ability to reverse engineer GCC code flows.  If GCC optimizations become
too complicated for objtool to follow, the undwarf generation might stop
working or become incomplete.  In such a case we may need to revisit the
current implementation.  Some possible solutions would be asking GCC to
make the optimizations more palatable, or having objtool use DWARF as an
additional input.


Josh Poimboeuf (10):
  objtool: move checking code to check.c
  objtool, x86: add several functions and files to the objtool whitelist
  objtool: stack validation 2.0
  objtool: add undwarf debuginfo generation
  objtool, x86: add facility for asm code to provide CFI hints
  x86/entry: add CFI hint undwarf annotations
  x86/asm: add CFI hint annotations to sync_core()
  extable: rename 'sortextable' script to 'sorttable'
  extable: add undwarf table sorting ability to sorttable script
  x86/unwind: add undwarf unwinder

 Documentation/dontdiff                           |    2 +-
 arch/um/include/asm/unwind.h                     |    7 +
 arch/x86/Kconfig                                 |    1 +
 arch/x86/Kconfig.debug                           |   26 +
 arch/x86/crypto/Makefile                         |    2 +
 arch/x86/crypto/sha1-mb/Makefile                 |    2 +
 arch/x86/crypto/sha256-mb/Makefile               |    2 +
 arch/x86/entry/Makefile                          |    1 -
 arch/x86/entry/calling.h                         |    5 +
 arch/x86/entry/entry_64.S                        |   56 +-
 arch/x86/include/asm/module.h                    |    8 +
 arch/x86/include/asm/processor.h                 |    3 +
 arch/x86/include/asm/undwarf-types.h             |  100 ++
 arch/x86/include/asm/undwarf.h                   |   97 ++
 arch/x86/include/asm/unwind.h                    |   64 +-
 arch/x86/kernel/Makefile                         |    9 +-
 arch/x86/kernel/acpi/Makefile                    |    2 +
 arch/x86/kernel/kprobes/opt.c                    |    9 +-
 arch/x86/kernel/module.c                         |    9 +-
 arch/x86/kernel/reboot.c                         |    2 +
 arch/x86/kernel/unwind_frame.c                   |   39 +-
 arch/x86/kernel/unwind_guess.c                   |    5 +
 arch/x86/kernel/unwind_undwarf.c                 |  402 +++++++
 arch/x86/kvm/svm.c                               |    2 +
 arch/x86/kvm/vmx.c                               |    3 +
 arch/x86/lib/msr-reg.S                           |    8 +-
 arch/x86/net/Makefile                            |    2 +
 arch/x86/platform/efi/Makefile                   |    1 +
 arch/x86/power/Makefile                          |    2 +
 arch/x86/xen/Makefile                            |    3 +
 include/asm-generic/vmlinux.lds.h                |   14 +
 init/Kconfig                                     |    4 +
 kernel/kexec_core.c                              |    4 +-
 lib/Kconfig.debug                                |    3 +
 scripts/.gitignore                               |    2 +-
 scripts/Makefile                                 |    4 +-
 scripts/Makefile.build                           |    3 +-
 scripts/link-vmlinux.sh                          |   12 +-
 scripts/{sortextable.c => sorttable.c}           |  182 +--
 scripts/{sortextable.h => sorttable.h}           |   69 +-
 tools/objtool/Build                              |    3 +
 tools/objtool/Documentation/stack-validation.txt |  194 ++--
 tools/objtool/Documentation/undwarf.txt          |   99 ++
 tools/objtool/Makefile                           |    5 +-
 tools/objtool/arch.h                             |   64 +-
 tools/objtool/arch/x86/decode.c                  |  400 ++++++-
 tools/objtool/builtin-check.c                    | 1280 +---------------------
 tools/objtool/builtin-undwarf.c                  |   70 ++
 tools/objtool/builtin.h                          |    1 +
 tools/objtool/cfi.h                              |   55 +
 tools/objtool/{builtin-check.c => check.c}       |  868 +++++++++++----
 tools/objtool/check.h                            |   69 ++
 tools/objtool/elf.c                              |  247 ++++-
 tools/objtool/elf.h                              |    8 +-
 tools/objtool/objtool.c                          |    3 +-
 tools/objtool/special.c                          |    6 +-
 tools/objtool/undwarf-types.h                    |  100 ++
 tools/objtool/undwarf.c                          |  308 ++++++
 tools/objtool/{builtin.h => undwarf.h}           |   19 +-
 59 files changed, 3124 insertions(+), 1846 deletions(-)
 create mode 100644 arch/um/include/asm/unwind.h
 create mode 100644 arch/x86/include/asm/undwarf-types.h
 create mode 100644 arch/x86/include/asm/undwarf.h
 create mode 100644 arch/x86/kernel/unwind_undwarf.c
 rename scripts/{sortextable.c => sorttable.c} (71%)
 rename scripts/{sortextable.h => sorttable.h} (77%)
 create mode 100644 tools/objtool/Documentation/undwarf.txt
 create mode 100644 tools/objtool/builtin-undwarf.c
 create mode 100644 tools/objtool/cfi.h
 copy tools/objtool/{builtin-check.c => check.c} (62%)
 create mode 100644 tools/objtool/check.h
 create mode 100644 tools/objtool/undwarf-types.h
 create mode 100644 tools/objtool/undwarf.c
 copy tools/objtool/{builtin.h => undwarf.h} (64%)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 01/10] objtool: move checking code to check.c
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-14  7:22   ` Jiri Slaby
  2017-06-01  5:44 ` [RFC PATCH 02/10] objtool, x86: add several functions and files to the objtool whitelist Josh Poimboeuf
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

In preparation for the new 'objtool undwarf generate' command, which
will rely on 'objtool check', move the checking code from
builtin-check.c to check.c where it can be used by other commands.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 tools/objtool/Build                        |    1 +
 tools/objtool/builtin-check.c              | 1280 +---------------------------
 tools/objtool/{builtin-check.c => check.c} |   58 +-
 tools/objtool/check.h                      |   52 ++
 4 files changed, 71 insertions(+), 1320 deletions(-)
 copy tools/objtool/{builtin-check.c => check.c} (95%)
 create mode 100644 tools/objtool/check.h

diff --git a/tools/objtool/Build b/tools/objtool/Build
index d6cdece..6f2e198 100644
--- a/tools/objtool/Build
+++ b/tools/objtool/Build
@@ -1,5 +1,6 @@
 objtool-y += arch/$(SRCARCH)/
 objtool-y += builtin-check.o
+objtool-y += check.o
 objtool-y += elf.o
 objtool-y += special.o
 objtool-y += objtool.o
diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index 282a603..365c34e 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Josh Poimboeuf <jpoimboe@redhat.com>
+ * Copyright (C) 2015-2017 Josh Poimboeuf <jpoimboe@redhat.com>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -25,1286 +25,32 @@
  * For more information, see tools/objtool/Documentation/stack-validation.txt.
  */
 
-#include <string.h>
-#include <stdlib.h>
 #include <subcmd/parse-options.h>
-
 #include "builtin.h"
-#include "elf.h"
-#include "special.h"
-#include "arch.h"
-#include "warn.h"
-
-#include <linux/hashtable.h>
-#include <linux/kernel.h>
-
-#define STATE_FP_SAVED		0x1
-#define STATE_FP_SETUP		0x2
-#define STATE_FENTRY		0x4
-
-struct instruction {
-	struct list_head list;
-	struct hlist_node hash;
-	struct section *sec;
-	unsigned long offset;
-	unsigned int len, state;
-	unsigned char type;
-	unsigned long immediate;
-	bool alt_group, visited, dead_end;
-	struct symbol *call_dest;
-	struct instruction *jump_dest;
-	struct list_head alts;
-	struct symbol *func;
-};
-
-struct alternative {
-	struct list_head list;
-	struct instruction *insn;
-};
-
-struct objtool_file {
-	struct elf *elf;
-	struct list_head insn_list;
-	DECLARE_HASHTABLE(insn_hash, 16);
-	struct section *rodata, *whitelist;
-	bool ignore_unreachables, c_file;
-};
-
-const char *objname;
-static bool nofp;
-
-static struct instruction *find_insn(struct objtool_file *file,
-				     struct section *sec, unsigned long offset)
-{
-	struct instruction *insn;
-
-	hash_for_each_possible(file->insn_hash, insn, hash, offset)
-		if (insn->sec == sec && insn->offset == offset)
-			return insn;
-
-	return NULL;
-}
-
-static struct instruction *next_insn_same_sec(struct objtool_file *file,
-					      struct instruction *insn)
-{
-	struct instruction *next = list_next_entry(insn, list);
-
-	if (&next->list == &file->insn_list || next->sec != insn->sec)
-		return NULL;
-
-	return next;
-}
-
-static bool gcov_enabled(struct objtool_file *file)
-{
-	struct section *sec;
-	struct symbol *sym;
-
-	list_for_each_entry(sec, &file->elf->sections, list)
-		list_for_each_entry(sym, &sec->symbol_list, list)
-			if (!strncmp(sym->name, "__gcov_.", 8))
-				return true;
-
-	return false;
-}
-
-#define for_each_insn(file, insn)					\
-	list_for_each_entry(insn, &file->insn_list, list)
-
-#define func_for_each_insn(file, func, insn)				\
-	for (insn = find_insn(file, func->sec, func->offset);		\
-	     insn && &insn->list != &file->insn_list &&			\
-		insn->sec == func->sec &&				\
-		insn->offset < func->offset + func->len;		\
-	     insn = list_next_entry(insn, list))
-
-#define func_for_each_insn_continue_reverse(file, func, insn)		\
-	for (insn = list_prev_entry(insn, list);			\
-	     &insn->list != &file->insn_list &&				\
-		insn->sec == func->sec && insn->offset >= func->offset;	\
-	     insn = list_prev_entry(insn, list))
-
-#define sec_for_each_insn_from(file, insn)				\
-	for (; insn; insn = next_insn_same_sec(file, insn))
-
-
-/*
- * Check if the function has been manually whitelisted with the
- * STACK_FRAME_NON_STANDARD macro, or if it should be automatically whitelisted
- * due to its use of a context switching instruction.
- */
-static bool ignore_func(struct objtool_file *file, struct symbol *func)
-{
-	struct rela *rela;
-	struct instruction *insn;
-
-	/* check for STACK_FRAME_NON_STANDARD */
-	if (file->whitelist && file->whitelist->rela)
-		list_for_each_entry(rela, &file->whitelist->rela->rela_list, list) {
-			if (rela->sym->type == STT_SECTION &&
-			    rela->sym->sec == func->sec &&
-			    rela->addend == func->offset)
-				return true;
-			if (rela->sym->type == STT_FUNC && rela->sym == func)
-				return true;
-		}
-
-	/* check if it has a context switching instruction */
-	func_for_each_insn(file, func, insn)
-		if (insn->type == INSN_CONTEXT_SWITCH)
-			return true;
-
-	return false;
-}
-
-/*
- * This checks to see if the given function is a "noreturn" function.
- *
- * For global functions which are outside the scope of this object file, we
- * have to keep a manual list of them.
- *
- * For local functions, we have to detect them manually by simply looking for
- * the lack of a return instruction.
- *
- * Returns:
- *  -1: error
- *   0: no dead end
- *   1: dead end
- */
-static int __dead_end_function(struct objtool_file *file, struct symbol *func,
-			       int recursion)
-{
-	int i;
-	struct instruction *insn;
-	bool empty = true;
-
-	/*
-	 * Unfortunately these have to be hard coded because the noreturn
-	 * attribute isn't provided in ELF data.
-	 */
-	static const char * const global_noreturns[] = {
-		"__stack_chk_fail",
-		"panic",
-		"do_exit",
-		"do_task_dead",
-		"__module_put_and_exit",
-		"complete_and_exit",
-		"kvm_spurious_fault",
-		"__reiserfs_panic",
-		"lbug_with_loc"
-	};
-
-	if (func->bind == STB_WEAK)
-		return 0;
-
-	if (func->bind == STB_GLOBAL)
-		for (i = 0; i < ARRAY_SIZE(global_noreturns); i++)
-			if (!strcmp(func->name, global_noreturns[i]))
-				return 1;
-
-	if (!func->sec)
-		return 0;
-
-	func_for_each_insn(file, func, insn) {
-		empty = false;
-
-		if (insn->type == INSN_RETURN)
-			return 0;
-	}
-
-	if (empty)
-		return 0;
-
-	/*
-	 * A function can have a sibling call instead of a return.  In that
-	 * case, the function's dead-end status depends on whether the target
-	 * of the sibling call returns.
-	 */
-	func_for_each_insn(file, func, insn) {
-		if (insn->sec != func->sec ||
-		    insn->offset >= func->offset + func->len)
-			break;
-
-		if (insn->type == INSN_JUMP_UNCONDITIONAL) {
-			struct instruction *dest = insn->jump_dest;
-			struct symbol *dest_func;
-
-			if (!dest)
-				/* sibling call to another file */
-				return 0;
-
-			if (dest->sec != func->sec ||
-			    dest->offset < func->offset ||
-			    dest->offset >= func->offset + func->len) {
-				/* local sibling call */
-				dest_func = find_symbol_by_offset(dest->sec,
-								  dest->offset);
-				if (!dest_func)
-					continue;
-
-				if (recursion == 5) {
-					WARN_FUNC("infinite recursion (objtool bug!)",
-						  dest->sec, dest->offset);
-					return -1;
-				}
-
-				return __dead_end_function(file, dest_func,
-							   recursion + 1);
-			}
-		}
-
-		if (insn->type == INSN_JUMP_DYNAMIC && list_empty(&insn->alts))
-			/* sibling call */
-			return 0;
-	}
-
-	return 1;
-}
-
-static int dead_end_function(struct objtool_file *file, struct symbol *func)
-{
-	return __dead_end_function(file, func, 0);
-}
-
-/*
- * Call the arch-specific instruction decoder for all the instructions and add
- * them to the global instruction list.
- */
-static int decode_instructions(struct objtool_file *file)
-{
-	struct section *sec;
-	struct symbol *func;
-	unsigned long offset;
-	struct instruction *insn;
-	int ret;
-
-	list_for_each_entry(sec, &file->elf->sections, list) {
-
-		if (!(sec->sh.sh_flags & SHF_EXECINSTR))
-			continue;
-
-		for (offset = 0; offset < sec->len; offset += insn->len) {
-			insn = malloc(sizeof(*insn));
-			memset(insn, 0, sizeof(*insn));
-
-			INIT_LIST_HEAD(&insn->alts);
-			insn->sec = sec;
-			insn->offset = offset;
-
-			ret = arch_decode_instruction(file->elf, sec, offset,
-						      sec->len - offset,
-						      &insn->len, &insn->type,
-						      &insn->immediate);
-			if (ret)
-				return ret;
-
-			if (!insn->type || insn->type > INSN_LAST) {
-				WARN_FUNC("invalid instruction type %d",
-					  insn->sec, insn->offset, insn->type);
-				return -1;
-			}
-
-			hash_add(file->insn_hash, &insn->hash, insn->offset);
-			list_add_tail(&insn->list, &file->insn_list);
-		}
-
-		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->type != STT_FUNC)
-				continue;
-
-			if (!find_insn(file, sec, func->offset)) {
-				WARN("%s(): can't find starting instruction",
-				     func->name);
-				return -1;
-			}
-
-			func_for_each_insn(file, func, insn)
-				if (!insn->func)
-					insn->func = func;
-		}
-	}
-
-	return 0;
-}
-
-/*
- * Find all uses of the unreachable() macro, which are code path dead ends.
- */
-static int add_dead_ends(struct objtool_file *file)
-{
-	struct section *sec;
-	struct rela *rela;
-	struct instruction *insn;
-	bool found;
-
-	sec = find_section_by_name(file->elf, ".rela.discard.unreachable");
-	if (!sec)
-		return 0;
-
-	list_for_each_entry(rela, &sec->rela_list, list) {
-		if (rela->sym->type != STT_SECTION) {
-			WARN("unexpected relocation symbol type in %s", sec->name);
-			return -1;
-		}
-		insn = find_insn(file, rela->sym->sec, rela->addend);
-		if (insn)
-			insn = list_prev_entry(insn, list);
-		else if (rela->addend == rela->sym->sec->len) {
-			found = false;
-			list_for_each_entry_reverse(insn, &file->insn_list, list) {
-				if (insn->sec == rela->sym->sec) {
-					found = true;
-					break;
-				}
-			}
-
-			if (!found) {
-				WARN("can't find unreachable insn at %s+0x%x",
-				     rela->sym->sec->name, rela->addend);
-				return -1;
-			}
-		} else {
-			WARN("can't find unreachable insn at %s+0x%x",
-			     rela->sym->sec->name, rela->addend);
-			return -1;
-		}
-
-		insn->dead_end = true;
-	}
-
-	return 0;
-}
-
-/*
- * Warnings shouldn't be reported for ignored functions.
- */
-static void add_ignores(struct objtool_file *file)
-{
-	struct instruction *insn;
-	struct section *sec;
-	struct symbol *func;
-
-	list_for_each_entry(sec, &file->elf->sections, list) {
-		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->type != STT_FUNC)
-				continue;
-
-			if (!ignore_func(file, func))
-				continue;
-
-			func_for_each_insn(file, func, insn)
-				insn->visited = true;
-		}
-	}
-}
-
-/*
- * Find the destination instructions for all jumps.
- */
-static int add_jump_destinations(struct objtool_file *file)
-{
-	struct instruction *insn;
-	struct rela *rela;
-	struct section *dest_sec;
-	unsigned long dest_off;
-
-	for_each_insn(file, insn) {
-		if (insn->type != INSN_JUMP_CONDITIONAL &&
-		    insn->type != INSN_JUMP_UNCONDITIONAL)
-			continue;
-
-		/* skip ignores */
-		if (insn->visited)
-			continue;
-
-		rela = find_rela_by_dest_range(insn->sec, insn->offset,
-					       insn->len);
-		if (!rela) {
-			dest_sec = insn->sec;
-			dest_off = insn->offset + insn->len + insn->immediate;
-		} else if (rela->sym->type == STT_SECTION) {
-			dest_sec = rela->sym->sec;
-			dest_off = rela->addend + 4;
-		} else if (rela->sym->sec->idx) {
-			dest_sec = rela->sym->sec;
-			dest_off = rela->sym->sym.st_value + rela->addend + 4;
-		} else {
-			/* sibling call */
-			insn->jump_dest = 0;
-			continue;
-		}
-
-		insn->jump_dest = find_insn(file, dest_sec, dest_off);
-		if (!insn->jump_dest) {
-
-			/*
-			 * This is a special case where an alt instruction
-			 * jumps past the end of the section.  These are
-			 * handled later in handle_group_alt().
-			 */
-			if (!strcmp(insn->sec->name, ".altinstr_replacement"))
-				continue;
-
-			WARN_FUNC("can't find jump dest instruction at %s+0x%lx",
-				  insn->sec, insn->offset, dest_sec->name,
-				  dest_off);
-			return -1;
-		}
-	}
-
-	return 0;
-}
-
-/*
- * Find the destination instructions for all calls.
- */
-static int add_call_destinations(struct objtool_file *file)
-{
-	struct instruction *insn;
-	unsigned long dest_off;
-	struct rela *rela;
-
-	for_each_insn(file, insn) {
-		if (insn->type != INSN_CALL)
-			continue;
-
-		rela = find_rela_by_dest_range(insn->sec, insn->offset,
-					       insn->len);
-		if (!rela) {
-			dest_off = insn->offset + insn->len + insn->immediate;
-			insn->call_dest = find_symbol_by_offset(insn->sec,
-								dest_off);
-			if (!insn->call_dest) {
-				WARN_FUNC("can't find call dest symbol at offset 0x%lx",
-					  insn->sec, insn->offset, dest_off);
-				return -1;
-			}
-		} else if (rela->sym->type == STT_SECTION) {
-			insn->call_dest = find_symbol_by_offset(rela->sym->sec,
-								rela->addend+4);
-			if (!insn->call_dest ||
-			    insn->call_dest->type != STT_FUNC) {
-				WARN_FUNC("can't find call dest symbol at %s+0x%x",
-					  insn->sec, insn->offset,
-					  rela->sym->sec->name,
-					  rela->addend + 4);
-				return -1;
-			}
-		} else
-			insn->call_dest = rela->sym;
-	}
-
-	return 0;
-}
-
-/*
- * The .alternatives section requires some extra special care, over and above
- * what other special sections require:
- *
- * 1. Because alternatives are patched in-place, we need to insert a fake jump
- *    instruction at the end so that validate_branch() skips all the original
- *    replaced instructions when validating the new instruction path.
- *
- * 2. An added wrinkle is that the new instruction length might be zero.  In
- *    that case the old instructions are replaced with noops.  We simulate that
- *    by creating a fake jump as the only new instruction.
- *
- * 3. In some cases, the alternative section includes an instruction which
- *    conditionally jumps to the _end_ of the entry.  We have to modify these
- *    jumps' destinations to point back to .text rather than the end of the
- *    entry in .altinstr_replacement.
- *
- * 4. It has been requested that we don't validate the !POPCNT feature path
- *    which is a "very very small percentage of machines".
- */
-static int handle_group_alt(struct objtool_file *file,
-			    struct special_alt *special_alt,
-			    struct instruction *orig_insn,
-			    struct instruction **new_insn)
-{
-	struct instruction *last_orig_insn, *last_new_insn, *insn, *fake_jump;
-	unsigned long dest_off;
-
-	last_orig_insn = NULL;
-	insn = orig_insn;
-	sec_for_each_insn_from(file, insn) {
-		if (insn->offset >= special_alt->orig_off + special_alt->orig_len)
-			break;
-
-		if (special_alt->skip_orig)
-			insn->type = INSN_NOP;
-
-		insn->alt_group = true;
-		last_orig_insn = insn;
-	}
-
-	if (!next_insn_same_sec(file, last_orig_insn)) {
-		WARN("%s: don't know how to handle alternatives at end of section",
-		     special_alt->orig_sec->name);
-		return -1;
-	}
-
-	fake_jump = malloc(sizeof(*fake_jump));
-	if (!fake_jump) {
-		WARN("malloc failed");
-		return -1;
-	}
-	memset(fake_jump, 0, sizeof(*fake_jump));
-	INIT_LIST_HEAD(&fake_jump->alts);
-	fake_jump->sec = special_alt->new_sec;
-	fake_jump->offset = -1;
-	fake_jump->type = INSN_JUMP_UNCONDITIONAL;
-	fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
-
-	if (!special_alt->new_len) {
-		*new_insn = fake_jump;
-		return 0;
-	}
-
-	last_new_insn = NULL;
-	insn = *new_insn;
-	sec_for_each_insn_from(file, insn) {
-		if (insn->offset >= special_alt->new_off + special_alt->new_len)
-			break;
-
-		last_new_insn = insn;
-
-		if (insn->type != INSN_JUMP_CONDITIONAL &&
-		    insn->type != INSN_JUMP_UNCONDITIONAL)
-			continue;
-
-		if (!insn->immediate)
-			continue;
-
-		dest_off = insn->offset + insn->len + insn->immediate;
-		if (dest_off == special_alt->new_off + special_alt->new_len)
-			insn->jump_dest = fake_jump;
-
-		if (!insn->jump_dest) {
-			WARN_FUNC("can't find alternative jump destination",
-				  insn->sec, insn->offset);
-			return -1;
-		}
-	}
-
-	if (!last_new_insn) {
-		WARN_FUNC("can't find last new alternative instruction",
-			  special_alt->new_sec, special_alt->new_off);
-		return -1;
-	}
-
-	list_add(&fake_jump->list, &last_new_insn->list);
-
-	return 0;
-}
-
-/*
- * A jump table entry can either convert a nop to a jump or a jump to a nop.
- * If the original instruction is a jump, make the alt entry an effective nop
- * by just skipping the original instruction.
- */
-static int handle_jump_alt(struct objtool_file *file,
-			   struct special_alt *special_alt,
-			   struct instruction *orig_insn,
-			   struct instruction **new_insn)
-{
-	if (orig_insn->type == INSN_NOP)
-		return 0;
-
-	if (orig_insn->type != INSN_JUMP_UNCONDITIONAL) {
-		WARN_FUNC("unsupported instruction at jump label",
-			  orig_insn->sec, orig_insn->offset);
-		return -1;
-	}
-
-	*new_insn = list_next_entry(orig_insn, list);
-	return 0;
-}
-
-/*
- * Read all the special sections which have alternate instructions which can be
- * patched in or redirected to at runtime.  Each instruction having alternate
- * instruction(s) has them added to its insn->alts list, which will be
- * traversed in validate_branch().
- */
-static int add_special_section_alts(struct objtool_file *file)
-{
-	struct list_head special_alts;
-	struct instruction *orig_insn, *new_insn;
-	struct special_alt *special_alt, *tmp;
-	struct alternative *alt;
-	int ret;
-
-	ret = special_get_alts(file->elf, &special_alts);
-	if (ret)
-		return ret;
-
-	list_for_each_entry_safe(special_alt, tmp, &special_alts, list) {
-		alt = malloc(sizeof(*alt));
-		if (!alt) {
-			WARN("malloc failed");
-			ret = -1;
-			goto out;
-		}
-
-		orig_insn = find_insn(file, special_alt->orig_sec,
-				      special_alt->orig_off);
-		if (!orig_insn) {
-			WARN_FUNC("special: can't find orig instruction",
-				  special_alt->orig_sec, special_alt->orig_off);
-			ret = -1;
-			goto out;
-		}
+#include "check.h"
 
-		new_insn = NULL;
-		if (!special_alt->group || special_alt->new_len) {
-			new_insn = find_insn(file, special_alt->new_sec,
-					     special_alt->new_off);
-			if (!new_insn) {
-				WARN_FUNC("special: can't find new instruction",
-					  special_alt->new_sec,
-					  special_alt->new_off);
-				ret = -1;
-				goto out;
-			}
-		}
+bool nofp;
 
-		if (special_alt->group) {
-			ret = handle_group_alt(file, special_alt, orig_insn,
-					       &new_insn);
-			if (ret)
-				goto out;
-		} else if (special_alt->jump_or_nop) {
-			ret = handle_jump_alt(file, special_alt, orig_insn,
-					      &new_insn);
-			if (ret)
-				goto out;
-		}
-
-		alt->insn = new_insn;
-		list_add_tail(&alt->list, &orig_insn->alts);
-
-		list_del(&special_alt->list);
-		free(special_alt);
-	}
-
-out:
-	return ret;
-}
-
-static int add_switch_table(struct objtool_file *file, struct symbol *func,
-			    struct instruction *insn, struct rela *table,
-			    struct rela *next_table)
-{
-	struct rela *rela = table;
-	struct instruction *alt_insn;
-	struct alternative *alt;
-
-	list_for_each_entry_from(rela, &file->rodata->rela->rela_list, list) {
-		if (rela == next_table)
-			break;
-
-		if (rela->sym->sec != insn->sec ||
-		    rela->addend <= func->offset ||
-		    rela->addend >= func->offset + func->len)
-			break;
-
-		alt_insn = find_insn(file, insn->sec, rela->addend);
-		if (!alt_insn) {
-			WARN("%s: can't find instruction at %s+0x%x",
-			     file->rodata->rela->name, insn->sec->name,
-			     rela->addend);
-			return -1;
-		}
-
-		alt = malloc(sizeof(*alt));
-		if (!alt) {
-			WARN("malloc failed");
-			return -1;
-		}
-
-		alt->insn = alt_insn;
-		list_add_tail(&alt->list, &insn->alts);
-	}
-
-	return 0;
-}
-
-/*
- * find_switch_table() - Given a dynamic jump, find the switch jump table in
- * .rodata associated with it.
- *
- * There are 3 basic patterns:
- *
- * 1. jmpq *[rodata addr](,%reg,8)
- *
- *    This is the most common case by far.  It jumps to an address in a simple
- *    jump table which is stored in .rodata.
- *
- * 2. jmpq *[rodata addr](%rip)
- *
- *    This is caused by a rare GCC quirk, currently only seen in three driver
- *    functions in the kernel, only with certain obscure non-distro configs.
- *
- *    As part of an optimization, GCC makes a copy of an existing switch jump
- *    table, modifies it, and then hard-codes the jump (albeit with an indirect
- *    jump) to use a single entry in the table.  The rest of the jump table and
- *    some of its jump targets remain as dead code.
- *
- *    In such a case we can just crudely ignore all unreachable instruction
- *    warnings for the entire object file.  Ideally we would just ignore them
- *    for the function, but that would require redesigning the code quite a
- *    bit.  And honestly that's just not worth doing: unreachable instruction
- *    warnings are of questionable value anyway, and this is such a rare issue.
- *
- * 3. mov [rodata addr],%reg1
- *    ... some instructions ...
- *    jmpq *(%reg1,%reg2,8)
- *
- *    This is a fairly uncommon pattern which is new for GCC 6.  As of this
- *    writing, there are 11 occurrences of it in the allmodconfig kernel.
- *
- *    TODO: Once we have DWARF CFI and smarter instruction decoding logic,
- *    ensure the same register is used in the mov and jump instructions.
- */
-static struct rela *find_switch_table(struct objtool_file *file,
-				      struct symbol *func,
-				      struct instruction *insn)
-{
-	struct rela *text_rela, *rodata_rela;
-	struct instruction *orig_insn = insn;
-
-	text_rela = find_rela_by_dest_range(insn->sec, insn->offset, insn->len);
-	if (text_rela && text_rela->sym == file->rodata->sym) {
-		/* case 1 */
-		rodata_rela = find_rela_by_dest(file->rodata,
-						text_rela->addend);
-		if (rodata_rela)
-			return rodata_rela;
-
-		/* case 2 */
-		rodata_rela = find_rela_by_dest(file->rodata,
-						text_rela->addend + 4);
-		if (!rodata_rela)
-			return NULL;
-		file->ignore_unreachables = true;
-		return rodata_rela;
-	}
-
-	/* case 3 */
-	func_for_each_insn_continue_reverse(file, func, insn) {
-		if (insn->type == INSN_JUMP_DYNAMIC)
-			break;
-
-		/* allow small jumps within the range */
-		if (insn->type == INSN_JUMP_UNCONDITIONAL &&
-		    insn->jump_dest &&
-		    (insn->jump_dest->offset <= insn->offset ||
-		     insn->jump_dest->offset > orig_insn->offset))
-		    break;
-
-		/* look for a relocation which references .rodata */
-		text_rela = find_rela_by_dest_range(insn->sec, insn->offset,
-						    insn->len);
-		if (!text_rela || text_rela->sym != file->rodata->sym)
-			continue;
-
-		/*
-		 * Make sure the .rodata address isn't associated with a
-		 * symbol.  gcc jump tables are anonymous data.
-		 */
-		if (find_symbol_containing(file->rodata, text_rela->addend))
-			continue;
-
-		return find_rela_by_dest(file->rodata, text_rela->addend);
-	}
-
-	return NULL;
-}
-
-static int add_func_switch_tables(struct objtool_file *file,
-				  struct symbol *func)
-{
-	struct instruction *insn, *prev_jump = NULL;
-	struct rela *rela, *prev_rela = NULL;
-	int ret;
-
-	func_for_each_insn(file, func, insn) {
-		if (insn->type != INSN_JUMP_DYNAMIC)
-			continue;
-
-		rela = find_switch_table(file, func, insn);
-		if (!rela)
-			continue;
-
-		/*
-		 * We found a switch table, but we don't know yet how big it
-		 * is.  Don't add it until we reach the end of the function or
-		 * the beginning of another switch table in the same function.
-		 */
-		if (prev_jump) {
-			ret = add_switch_table(file, func, prev_jump, prev_rela,
-					       rela);
-			if (ret)
-				return ret;
-		}
-
-		prev_jump = insn;
-		prev_rela = rela;
-	}
-
-	if (prev_jump) {
-		ret = add_switch_table(file, func, prev_jump, prev_rela, NULL);
-		if (ret)
-			return ret;
-	}
-
-	return 0;
-}
-
-/*
- * For some switch statements, gcc generates a jump table in the .rodata
- * section which contains a list of addresses within the function to jump to.
- * This finds these jump tables and adds them to the insn->alts lists.
- */
-static int add_switch_table_alts(struct objtool_file *file)
-{
-	struct section *sec;
-	struct symbol *func;
-	int ret;
-
-	if (!file->rodata || !file->rodata->rela)
-		return 0;
-
-	list_for_each_entry(sec, &file->elf->sections, list) {
-		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->type != STT_FUNC)
-				continue;
-
-			ret = add_func_switch_tables(file, func);
-			if (ret)
-				return ret;
-		}
-	}
-
-	return 0;
-}
-
-static int decode_sections(struct objtool_file *file)
-{
-	int ret;
-
-	ret = decode_instructions(file);
-	if (ret)
-		return ret;
-
-	ret = add_dead_ends(file);
-	if (ret)
-		return ret;
-
-	add_ignores(file);
-
-	ret = add_jump_destinations(file);
-	if (ret)
-		return ret;
-
-	ret = add_call_destinations(file);
-	if (ret)
-		return ret;
-
-	ret = add_special_section_alts(file);
-	if (ret)
-		return ret;
-
-	ret = add_switch_table_alts(file);
-	if (ret)
-		return ret;
-
-	return 0;
-}
-
-static bool is_fentry_call(struct instruction *insn)
-{
-	if (insn->type == INSN_CALL &&
-	    insn->call_dest->type == STT_NOTYPE &&
-	    !strcmp(insn->call_dest->name, "__fentry__"))
-		return true;
-
-	return false;
-}
-
-static bool has_modified_stack_frame(struct instruction *insn)
-{
-	return (insn->state & STATE_FP_SAVED) ||
-	       (insn->state & STATE_FP_SETUP);
-}
-
-static bool has_valid_stack_frame(struct instruction *insn)
-{
-	return (insn->state & STATE_FP_SAVED) &&
-	       (insn->state & STATE_FP_SETUP);
-}
-
-static unsigned int frame_state(unsigned long state)
-{
-	return (state & (STATE_FP_SAVED | STATE_FP_SETUP));
-}
-
-/*
- * Follow the branch starting at the given instruction, and recursively follow
- * any other branches (jumps).  Meanwhile, track the frame pointer state at
- * each instruction and validate all the rules described in
- * tools/objtool/Documentation/stack-validation.txt.
- */
-static int validate_branch(struct objtool_file *file,
-			   struct instruction *first, unsigned char first_state)
-{
-	struct alternative *alt;
-	struct instruction *insn;
-	struct section *sec;
-	struct symbol *func = NULL;
-	unsigned char state;
-	int ret;
-
-	insn = first;
-	sec = insn->sec;
-	state = first_state;
-
-	if (insn->alt_group && list_empty(&insn->alts)) {
-		WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
-			  sec, insn->offset);
-		return 1;
-	}
-
-	while (1) {
-		if (file->c_file && insn->func) {
-			if (func && func != insn->func) {
-				WARN("%s() falls through to next function %s()",
-				     func->name, insn->func->name);
-				return 1;
-			}
-
-			func = insn->func;
-		}
-
-		if (insn->visited) {
-			if (frame_state(insn->state) != frame_state(state)) {
-				WARN_FUNC("frame pointer state mismatch",
-					  sec, insn->offset);
-				return 1;
-			}
-
-			return 0;
-		}
-
-		insn->visited = true;
-		insn->state = state;
-
-		list_for_each_entry(alt, &insn->alts, list) {
-			ret = validate_branch(file, alt->insn, state);
-			if (ret)
-				return 1;
-		}
-
-		switch (insn->type) {
-
-		case INSN_FP_SAVE:
-			if (!nofp) {
-				if (state & STATE_FP_SAVED) {
-					WARN_FUNC("duplicate frame pointer save",
-						  sec, insn->offset);
-					return 1;
-				}
-				state |= STATE_FP_SAVED;
-			}
-			break;
-
-		case INSN_FP_SETUP:
-			if (!nofp) {
-				if (state & STATE_FP_SETUP) {
-					WARN_FUNC("duplicate frame pointer setup",
-						  sec, insn->offset);
-					return 1;
-				}
-				state |= STATE_FP_SETUP;
-			}
-			break;
-
-		case INSN_FP_RESTORE:
-			if (!nofp) {
-				if (has_valid_stack_frame(insn))
-					state &= ~STATE_FP_SETUP;
-
-				state &= ~STATE_FP_SAVED;
-			}
-			break;
-
-		case INSN_RETURN:
-			if (!nofp && has_modified_stack_frame(insn)) {
-				WARN_FUNC("return without frame pointer restore",
-					  sec, insn->offset);
-				return 1;
-			}
-			return 0;
-
-		case INSN_CALL:
-			if (is_fentry_call(insn)) {
-				state |= STATE_FENTRY;
-				break;
-			}
-
-			ret = dead_end_function(file, insn->call_dest);
-			if (ret == 1)
-				return 0;
-			if (ret == -1)
-				return 1;
-
-			/* fallthrough */
-		case INSN_CALL_DYNAMIC:
-			if (!nofp && !has_valid_stack_frame(insn)) {
-				WARN_FUNC("call without frame pointer save/setup",
-					  sec, insn->offset);
-				return 1;
-			}
-			break;
-
-		case INSN_JUMP_CONDITIONAL:
-		case INSN_JUMP_UNCONDITIONAL:
-			if (insn->jump_dest) {
-				ret = validate_branch(file, insn->jump_dest,
-						      state);
-				if (ret)
-					return 1;
-			} else if (has_modified_stack_frame(insn)) {
-				WARN_FUNC("sibling call from callable instruction with changed frame pointer",
-					  sec, insn->offset);
-				return 1;
-			} /* else it's a sibling call */
-
-			if (insn->type == INSN_JUMP_UNCONDITIONAL)
-				return 0;
-
-			break;
-
-		case INSN_JUMP_DYNAMIC:
-			if (list_empty(&insn->alts) &&
-			    has_modified_stack_frame(insn)) {
-				WARN_FUNC("sibling call from callable instruction with changed frame pointer",
-					  sec, insn->offset);
-				return 1;
-			}
-
-			return 0;
-
-		default:
-			break;
-		}
-
-		if (insn->dead_end)
-			return 0;
-
-		insn = next_insn_same_sec(file, insn);
-		if (!insn) {
-			WARN("%s: unexpected end of section", sec->name);
-			return 1;
-		}
-	}
-
-	return 0;
-}
-
-static bool is_kasan_insn(struct instruction *insn)
-{
-	return (insn->type == INSN_CALL &&
-		!strcmp(insn->call_dest->name, "__asan_handle_no_return"));
-}
-
-static bool is_ubsan_insn(struct instruction *insn)
-{
-	return (insn->type == INSN_CALL &&
-		!strcmp(insn->call_dest->name,
-			"__ubsan_handle_builtin_unreachable"));
-}
-
-static bool ignore_unreachable_insn(struct symbol *func,
-				    struct instruction *insn)
-{
-	int i;
-
-	if (insn->type == INSN_NOP)
-		return true;
-
-	/*
-	 * Check if this (or a subsequent) instruction is related to
-	 * CONFIG_UBSAN or CONFIG_KASAN.
-	 *
-	 * End the search at 5 instructions to avoid going into the weeds.
-	 */
-	for (i = 0; i < 5; i++) {
-
-		if (is_kasan_insn(insn) || is_ubsan_insn(insn))
-			return true;
-
-		if (insn->type == INSN_JUMP_UNCONDITIONAL && insn->jump_dest) {
-			insn = insn->jump_dest;
-			continue;
-		}
-
-		if (insn->offset + insn->len >= func->offset + func->len)
-			break;
-		insn = list_next_entry(insn, list);
-	}
-
-	return false;
-}
-
-static int validate_functions(struct objtool_file *file)
-{
-	struct section *sec;
-	struct symbol *func;
-	struct instruction *insn;
-	int ret, warnings = 0;
-
-	list_for_each_entry(sec, &file->elf->sections, list) {
-		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->type != STT_FUNC)
-				continue;
-
-			insn = find_insn(file, sec, func->offset);
-			if (!insn)
-				continue;
-
-			ret = validate_branch(file, insn, 0);
-			warnings += ret;
-		}
-	}
-
-	list_for_each_entry(sec, &file->elf->sections, list) {
-		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->type != STT_FUNC)
-				continue;
-
-			func_for_each_insn(file, func, insn) {
-				if (insn->visited)
-					continue;
-
-				insn->visited = true;
-
-				if (file->ignore_unreachables || warnings ||
-				    ignore_unreachable_insn(func, insn))
-					continue;
-
-				/*
-				 * gcov produces a lot of unreachable
-				 * instructions.  If we get an unreachable
-				 * warning and the file has gcov enabled, just
-				 * ignore it, and all other such warnings for
-				 * the file.
-				 */
-				if (!file->ignore_unreachables &&
-				    gcov_enabled(file)) {
-					file->ignore_unreachables = true;
-					continue;
-				}
-
-				WARN_FUNC("function has unreachable instruction", insn->sec, insn->offset);
-				warnings++;
-			}
-		}
-	}
-
-	return warnings;
-}
-
-static int validate_uncallable_instructions(struct objtool_file *file)
-{
-	struct instruction *insn;
-	int warnings = 0;
-
-	for_each_insn(file, insn) {
-		if (!insn->visited && insn->type == INSN_RETURN) {
-			WARN_FUNC("return instruction outside of a callable function",
-				  insn->sec, insn->offset);
-			warnings++;
-		}
-	}
-
-	return warnings;
-}
-
-static void cleanup(struct objtool_file *file)
-{
-	struct instruction *insn, *tmpinsn;
-	struct alternative *alt, *tmpalt;
-
-	list_for_each_entry_safe(insn, tmpinsn, &file->insn_list, list) {
-		list_for_each_entry_safe(alt, tmpalt, &insn->alts, list) {
-			list_del(&alt->list);
-			free(alt);
-		}
-		list_del(&insn->list);
-		hash_del(&insn->hash);
-		free(insn);
-	}
-	elf_close(file->elf);
-}
-
-const char * const check_usage[] = {
+static const char * const check_usage[] = {
 	"objtool check [<options>] file.o",
 	NULL,
 };
 
+const struct option check_options[] = {
+	OPT_BOOLEAN('f', "no-fp", &nofp, "Skip frame pointer validation"),
+	OPT_END(),
+};
+
 int cmd_check(int argc, const char **argv)
 {
-	struct objtool_file file;
-	int ret, warnings = 0;
-
-	const struct option options[] = {
-		OPT_BOOLEAN('f', "no-fp", &nofp, "Skip frame pointer validation"),
-		OPT_END(),
-	};
+	const char *objname;
 
-	argc = parse_options(argc, argv, options, check_usage, 0);
+	argc = parse_options(argc, argv, check_options, check_usage, 0);
 
 	if (argc != 1)
-		usage_with_options(check_usage, options);
+		usage_with_options(check_usage, check_options);
 
 	objname = argv[0];
 
-	file.elf = elf_open(objname);
-	if (!file.elf) {
-		fprintf(stderr, "error reading elf file %s\n", objname);
-		return 1;
-	}
-
-	INIT_LIST_HEAD(&file.insn_list);
-	hash_init(file.insn_hash);
-	file.whitelist = find_section_by_name(file.elf, ".discard.func_stack_frame_non_standard");
-	file.rodata = find_section_by_name(file.elf, ".rodata");
-	file.ignore_unreachables = false;
-	file.c_file = find_section_by_name(file.elf, ".comment");
-
-	ret = decode_sections(&file);
-	if (ret < 0)
-		goto out;
-	warnings += ret;
-
-	ret = validate_functions(&file);
-	if (ret < 0)
-		goto out;
-	warnings += ret;
-
-	ret = validate_uncallable_instructions(&file);
-	if (ret < 0)
-		goto out;
-	warnings += ret;
-
-out:
-	cleanup(&file);
-
-	/* ignore warnings for now until we get all the code cleaned up */
-	if (ret || warnings)
-		return 0;
-	return 0;
+	return check(objname, nofp);
 }
diff --git a/tools/objtool/builtin-check.c b/tools/objtool/check.c
similarity index 95%
copy from tools/objtool/builtin-check.c
copy to tools/objtool/check.c
index 282a603..91b97db 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/check.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Josh Poimboeuf <jpoimboe@redhat.com>
+ * Copyright (C) 2015-2017 Josh Poimboeuf <jpoimboe@redhat.com>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -15,21 +15,10 @@
  * along with this program; if not, see <http://www.gnu.org/licenses/>.
  */
 
-/*
- * objtool check:
- *
- * This command analyzes every .o file and ensures the validity of its stack
- * trace metadata.  It enforces a set of rules on asm code and C inline
- * assembly code so that stack traces can be reliable.
- *
- * For more information, see tools/objtool/Documentation/stack-validation.txt.
- */
-
 #include <string.h>
 #include <stdlib.h>
-#include <subcmd/parse-options.h>
 
-#include "builtin.h"
+#include "check.h"
 #include "elf.h"
 #include "special.h"
 #include "arch.h"
@@ -42,34 +31,11 @@
 #define STATE_FP_SETUP		0x2
 #define STATE_FENTRY		0x4
 
-struct instruction {
-	struct list_head list;
-	struct hlist_node hash;
-	struct section *sec;
-	unsigned long offset;
-	unsigned int len, state;
-	unsigned char type;
-	unsigned long immediate;
-	bool alt_group, visited, dead_end;
-	struct symbol *call_dest;
-	struct instruction *jump_dest;
-	struct list_head alts;
-	struct symbol *func;
-};
-
 struct alternative {
 	struct list_head list;
 	struct instruction *insn;
 };
 
-struct objtool_file {
-	struct elf *elf;
-	struct list_head insn_list;
-	DECLARE_HASHTABLE(insn_hash, 16);
-	struct section *rodata, *whitelist;
-	bool ignore_unreachables, c_file;
-};
-
 const char *objname;
 static bool nofp;
 
@@ -1250,27 +1216,13 @@ static void cleanup(struct objtool_file *file)
 	elf_close(file->elf);
 }
 
-const char * const check_usage[] = {
-	"objtool check [<options>] file.o",
-	NULL,
-};
-
-int cmd_check(int argc, const char **argv)
+int check(const char *_objname, bool _nofp)
 {
 	struct objtool_file file;
 	int ret, warnings = 0;
 
-	const struct option options[] = {
-		OPT_BOOLEAN('f', "no-fp", &nofp, "Skip frame pointer validation"),
-		OPT_END(),
-	};
-
-	argc = parse_options(argc, argv, options, check_usage, 0);
-
-	if (argc != 1)
-		usage_with_options(check_usage, options);
-
-	objname = argv[0];
+	objname = _objname;
+	nofp = _nofp;
 
 	file.elf = elf_open(objname);
 	if (!file.elf) {
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
new file mode 100644
index 0000000..b0ac3ba
--- /dev/null
+++ b/tools/objtool/check.h
@@ -0,0 +1,52 @@
+/*
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _CHECK_H
+#define _CHECK_H
+
+#include <stdbool.h>
+#include "elf.h"
+#include "cfi.h"
+#include "arch.h"
+#include <linux/hashtable.h>
+
+struct instruction {
+	struct list_head list;
+	struct hlist_node hash;
+	struct section *sec;
+	unsigned long offset;
+	unsigned int len, state;
+	unsigned char type;
+	unsigned long immediate;
+	bool alt_group, visited, dead_end;
+	struct symbol *call_dest;
+	struct instruction *jump_dest;
+	struct list_head alts;
+	struct symbol *func;
+};
+
+struct objtool_file {
+	struct elf *elf;
+	struct list_head insn_list;
+	DECLARE_HASHTABLE(insn_hash, 16);
+	struct section *rodata, *whitelist;
+	bool ignore_unreachables, c_file;
+};
+
+int check(const char *objname, bool nofp);
+
+#endif /* _CHECK_H */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 02/10] objtool, x86: add several functions and files to the objtool whitelist
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
  2017-06-01  5:44 ` [RFC PATCH 01/10] objtool: move checking code to check.c Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-14  7:24   ` Jiri Slaby
  2017-06-01  5:44 ` [RFC PATCH 03/10] objtool: stack validation 2.0 Josh Poimboeuf
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

In preparation for an objtool rewrite which will have broader checks,
whitelist functions and files which cause problems because they do
unusual things with the stack.

These whitelists serve as a TODO list for which functions and files
don't yet have undwarf unwinder coverage.  Eventually most of the
whitelists can be removed in favor of manual CFI hint annotations or
objtool improvements.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/crypto/Makefile           | 2 ++
 arch/x86/crypto/sha1-mb/Makefile   | 2 ++
 arch/x86/crypto/sha256-mb/Makefile | 2 ++
 arch/x86/kernel/Makefile           | 1 +
 arch/x86/kernel/acpi/Makefile      | 2 ++
 arch/x86/kernel/kprobes/opt.c      | 9 ++++++++-
 arch/x86/kernel/reboot.c           | 2 ++
 arch/x86/kvm/svm.c                 | 2 ++
 arch/x86/kvm/vmx.c                 | 3 +++
 arch/x86/lib/msr-reg.S             | 8 ++++----
 arch/x86/net/Makefile              | 2 ++
 arch/x86/platform/efi/Makefile     | 1 +
 arch/x86/power/Makefile            | 2 ++
 arch/x86/xen/Makefile              | 3 +++
 kernel/kexec_core.c                | 4 +++-
 15 files changed, 39 insertions(+), 6 deletions(-)

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 34b3fa2..9e32d40 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -2,6 +2,8 @@
 # Arch-specific CryptoAPI modules.
 #
 
+OBJECT_FILES_NON_STANDARD := y
+
 avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
 avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
 				$(comma)4)$(comma)%ymm2,yes,no)
diff --git a/arch/x86/crypto/sha1-mb/Makefile b/arch/x86/crypto/sha1-mb/Makefile
index 2f87563..2e14acc 100644
--- a/arch/x86/crypto/sha1-mb/Makefile
+++ b/arch/x86/crypto/sha1-mb/Makefile
@@ -2,6 +2,8 @@
 # Arch-specific CryptoAPI modules.
 #
 
+OBJECT_FILES_NON_STANDARD := y
+
 avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
                                 $(comma)4)$(comma)%ymm2,yes,no)
 ifeq ($(avx2_supported),yes)
diff --git a/arch/x86/crypto/sha256-mb/Makefile b/arch/x86/crypto/sha256-mb/Makefile
index 41089e7..45b4fca 100644
--- a/arch/x86/crypto/sha256-mb/Makefile
+++ b/arch/x86/crypto/sha256-mb/Makefile
@@ -2,6 +2,8 @@
 # Arch-specific CryptoAPI modules.
 #
 
+OBJECT_FILES_NON_STANDARD := y
+
 avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
                                 $(comma)4)$(comma)%ymm2,yes,no)
 ifeq ($(avx2_supported),yes)
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4b99423..3c7c419 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -29,6 +29,7 @@ OBJECT_FILES_NON_STANDARD_head_$(BITS).o		:= y
 OBJECT_FILES_NON_STANDARD_relocate_kernel_$(BITS).o	:= y
 OBJECT_FILES_NON_STANDARD_ftrace_$(BITS).o		:= y
 OBJECT_FILES_NON_STANDARD_test_nx.o			:= y
+OBJECT_FILES_NON_STANDARD_paravirt_patch_$(BITS).o	:= y
 
 # If instrumentation of this dir is enabled, boot hangs during first second.
 # Probably could be more selective here, but note that files related to irqs,
diff --git a/arch/x86/kernel/acpi/Makefile b/arch/x86/kernel/acpi/Makefile
index 26b78d8..85a9e17 100644
--- a/arch/x86/kernel/acpi/Makefile
+++ b/arch/x86/kernel/acpi/Makefile
@@ -1,3 +1,5 @@
+OBJECT_FILES_NON_STANDARD_wakeup_$(BITS).o := y
+
 obj-$(CONFIG_ACPI)		+= boot.o
 obj-$(CONFIG_ACPI_SLEEP)	+= sleep.o wakeup_$(BITS).o
 obj-$(CONFIG_ACPI_APEI)		+= apei.o
diff --git a/arch/x86/kernel/kprobes/opt.c b/arch/x86/kernel/kprobes/opt.c
index 901c640..69ea0bc 100644
--- a/arch/x86/kernel/kprobes/opt.c
+++ b/arch/x86/kernel/kprobes/opt.c
@@ -28,6 +28,7 @@
 #include <linux/kdebug.h>
 #include <linux/kallsyms.h>
 #include <linux/ftrace.h>
+#include <linux/frame.h>
 
 #include <asm/text-patching.h>
 #include <asm/cacheflush.h>
@@ -94,6 +95,7 @@ static void synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long val)
 }
 
 asm (
+			"optprobe_template_func:\n"
 			".global optprobe_template_entry\n"
 			"optprobe_template_entry:\n"
 #ifdef CONFIG_X86_64
@@ -131,7 +133,12 @@ asm (
 			"	popf\n"
 #endif
 			".global optprobe_template_end\n"
-			"optprobe_template_end:\n");
+			"optprobe_template_end:\n"
+			".type optprobe_template_func, @function\n"
+			".size optprobe_template_func, .-optprobe_template_func\n");
+
+void optprobe_template_func(void);
+STACK_FRAME_NON_STANDARD(optprobe_template_func);
 
 #define TMPL_MOVE_IDX \
 	((long)&optprobe_template_val - (long)&optprobe_template_entry)
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 2544700..67393fc 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -9,6 +9,7 @@
 #include <linux/sched.h>
 #include <linux/tboot.h>
 #include <linux/delay.h>
+#include <linux/frame.h>
 #include <acpi/reboot.h>
 #include <asm/io.h>
 #include <asm/apic.h>
@@ -123,6 +124,7 @@ void __noreturn machine_real_restart(unsigned int type)
 #ifdef CONFIG_APM_MODULE
 EXPORT_SYMBOL(machine_real_restart);
 #endif
+STACK_FRAME_NON_STANDARD(machine_real_restart);
 
 /*
  * Some Apple MacBook and MacBookPro's needs reboot=p to be able to reboot
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 183ddb2..bb7a502 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -36,6 +36,7 @@
 #include <linux/slab.h>
 #include <linux/amd-iommu.h>
 #include <linux/hashtable.h>
+#include <linux/frame.h>
 
 #include <asm/apic.h>
 #include <asm/perf_event.h>
@@ -4908,6 +4909,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)
 
 	mark_all_clean(svm->vmcb);
 }
+STACK_FRAME_NON_STANDARD(svm_vcpu_run);
 
 static void svm_set_cr3(struct kvm_vcpu *vcpu, unsigned long root)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 72f7839..bc8b933 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -33,6 +33,7 @@
 #include <linux/slab.h>
 #include <linux/tboot.h>
 #include <linux/hrtimer.h>
+#include <linux/frame.h>
 #include "kvm_cache_regs.h"
 #include "x86.h"
 
@@ -8675,6 +8676,7 @@ static void vmx_handle_external_intr(struct kvm_vcpu *vcpu)
 			);
 	}
 }
+STACK_FRAME_NON_STANDARD(vmx_handle_external_intr);
 
 static bool vmx_has_high_real_mode_segbase(void)
 {
@@ -9051,6 +9053,7 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
 	vmx_recover_nmi_blocking(vmx);
 	vmx_complete_interrupts(vmx);
 }
+STACK_FRAME_NON_STANDARD(vmx_vcpu_run);
 
 static void vmx_switch_vmcs(struct kvm_vcpu *vcpu, struct loaded_vmcs *vmcs)
 {
diff --git a/arch/x86/lib/msr-reg.S b/arch/x86/lib/msr-reg.S
index c815564..10ffa7e 100644
--- a/arch/x86/lib/msr-reg.S
+++ b/arch/x86/lib/msr-reg.S
@@ -13,14 +13,14 @@
 .macro op_safe_regs op
 ENTRY(\op\()_safe_regs)
 	pushq %rbx
-	pushq %rbp
+	pushq %r12
 	movq	%rdi, %r10	/* Save pointer */
 	xorl	%r11d, %r11d	/* Return value */
 	movl    (%rdi), %eax
 	movl    4(%rdi), %ecx
 	movl    8(%rdi), %edx
 	movl    12(%rdi), %ebx
-	movl    20(%rdi), %ebp
+	movl    20(%rdi), %r12d
 	movl    24(%rdi), %esi
 	movl    28(%rdi), %edi
 1:	\op
@@ -29,10 +29,10 @@ ENTRY(\op\()_safe_regs)
 	movl    %ecx, 4(%r10)
 	movl    %edx, 8(%r10)
 	movl    %ebx, 12(%r10)
-	movl    %ebp, 20(%r10)
+	movl    %r12d, 20(%r10)
 	movl    %esi, 24(%r10)
 	movl    %edi, 28(%r10)
-	popq %rbp
+	popq %r12
 	popq %rbx
 	ret
 3:
diff --git a/arch/x86/net/Makefile b/arch/x86/net/Makefile
index 90568c3..fefb4b6 100644
--- a/arch/x86/net/Makefile
+++ b/arch/x86/net/Makefile
@@ -1,4 +1,6 @@
 #
 # Arch-specific network modules
 #
+OBJECT_FILES_NON_STANDARD_bpf_jit.o += y
+
 obj-$(CONFIG_BPF_JIT) += bpf_jit.o bpf_jit_comp.o
diff --git a/arch/x86/platform/efi/Makefile b/arch/x86/platform/efi/Makefile
index f1d83b3..2f56e1e 100644
--- a/arch/x86/platform/efi/Makefile
+++ b/arch/x86/platform/efi/Makefile
@@ -1,4 +1,5 @@
 OBJECT_FILES_NON_STANDARD_efi_thunk_$(BITS).o := y
+OBJECT_FILES_NON_STANDARD_efi_stub_$(BITS).o := y
 
 obj-$(CONFIG_EFI) 		+= quirks.o efi.o efi_$(BITS).o efi_stub_$(BITS).o
 obj-$(CONFIG_EARLY_PRINTK_EFI)	+= early_printk.o
diff --git a/arch/x86/power/Makefile b/arch/x86/power/Makefile
index a6a198c..0504187 100644
--- a/arch/x86/power/Makefile
+++ b/arch/x86/power/Makefile
@@ -1,3 +1,5 @@
+OBJECT_FILES_NON_STANDARD_hibernate_asm_$(BITS).o := y
+
 # __restore_processor_state() restores %gs after S3 resume and so should not
 # itself be stack-protected
 nostackp := $(call cc-option, -fno-stack-protector)
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index fffb0a1..bced7a3 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -1,3 +1,6 @@
+OBJECT_FILES_NON_STANDARD_xen-asm_$(BITS).o := y
+OBJECT_FILES_NON_STANDARD_xen-pvh.o := y
+
 ifdef CONFIG_FUNCTION_TRACER
 # Do not profile debug and lowlevel utilities
 CFLAGS_REMOVE_spinlock.o = -pg
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index ae1a3ba..154ffb4 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -38,6 +38,7 @@
 #include <linux/syscore_ops.h>
 #include <linux/compiler.h>
 #include <linux/hugetlb.h>
+#include <linux/frame.h>
 
 #include <asm/page.h>
 #include <asm/sections.h>
@@ -874,7 +875,7 @@ int kexec_load_disabled;
  * only when panic_cpu holds the current CPU number; this is the only CPU
  * which processes crash_kexec routines.
  */
-void __crash_kexec(struct pt_regs *regs)
+void __noclone __crash_kexec(struct pt_regs *regs)
 {
 	/* Take the kexec_mutex here to prevent sys_kexec_load
 	 * running on one cpu from replacing the crash kernel
@@ -896,6 +897,7 @@ void __crash_kexec(struct pt_regs *regs)
 		mutex_unlock(&kexec_mutex);
 	}
 }
+STACK_FRAME_NON_STANDARD(__crash_kexec);
 
 void crash_kexec(struct pt_regs *regs)
 {
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 03/10] objtool: stack validation 2.0
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
  2017-06-01  5:44 ` [RFC PATCH 01/10] objtool: move checking code to check.c Josh Poimboeuf
  2017-06-01  5:44 ` [RFC PATCH 02/10] objtool, x86: add several functions and files to the objtool whitelist Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-01  5:44 ` [RFC PATCH 04/10] objtool: add undwarf debuginfo generation Josh Poimboeuf
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

This is a major rewrite of objtool.  Instead of only tracking frame
pointer changes, it now tracks all stack-related operations, including
all register saves/restores.

In addition to making stack validation more robust, this also paves the
way for undwarf generation.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 tools/objtool/Documentation/stack-validation.txt | 153 +++---
 tools/objtool/Makefile                           |   2 +-
 tools/objtool/arch.h                             |  64 ++-
 tools/objtool/arch/x86/decode.c                  | 400 +++++++++++++--
 tools/objtool/cfi.h                              |  55 +++
 tools/objtool/check.c                            | 595 ++++++++++++++++++-----
 tools/objtool/check.h                            |  18 +-
 tools/objtool/elf.c                              |  23 +-
 tools/objtool/elf.h                              |   3 +-
 tools/objtool/special.c                          |   6 +-
 10 files changed, 1023 insertions(+), 296 deletions(-)
 create mode 100644 tools/objtool/cfi.h

diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index 55a60d3..17c1195 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -127,28 +127,13 @@ b) 100% reliable stack traces for DWARF enabled kernels
 
 c) Higher live patching compatibility rate
 
-   (NOTE: This is not yet implemented)
-
-   Currently with CONFIG_LIVEPATCH there's a basic live patching
-   framework which is safe for roughly 85-90% of "security" fixes.  But
-   patches can't have complex features like function dependency or
-   prototype changes, or data structure changes.
-
-   There's a strong need to support patches which have the more complex
-   features so that the patch compatibility rate for security fixes can
-   eventually approach something resembling 100%.  To achieve that, a
-   "consistency model" is needed, which allows tasks to be safely
-   transitioned from an unpatched state to a patched state.
-
-   One of the key requirements of the currently proposed livepatch
-   consistency model [*] is that it needs to walk the stack of each
-   sleeping task to determine if it can be transitioned to the patched
-   state.  If objtool can ensure that stack traces are reliable, this
-   consistency model can be used and the live patching compatibility
-   rate can be improved significantly.
-
-   [*] https://lkml.kernel.org/r/cover.1423499826.git.jpoimboe@redhat.com
+   Livepatch has an optional "consistency model", which is needed for
+   more complex patches.  In order for the consistency model to work,
+   stack traces need to be reliable (or an unreliable condition needs to
+   be detectable).  Objtool makes that possible.
 
+   For more details, see the livepatch documentation in the Linux kernel
+   source tree at Documentation/livepatch/livepatch.txt.
 
 Rules
 -----
@@ -201,80 +186,84 @@ To achieve the validation, objtool enforces the following rules:
    return normally.
 
 
-Errors in .S files
-------------------
+Objtool warnings
+----------------
 
-If you're getting an error in a compiled .S file which you don't
-understand, first make sure that the affected code follows the above
-rules.
+For asm files, if you're getting an error which doesn't make sense,
+first make sure that the affected code follows the above rules.
+
+For C files, the common culprits are inline asm statements and calls to
+"noreturn" functions.  See below for more details.
+
+Another possible cause for errors in C code is if the Makefile removes
+-fno-omit-frame-pointer or adds -fomit-frame-pointer to the gcc options.
 
 Here are some examples of common warnings reported by objtool, what
 they mean, and suggestions for how to fix them.
 
 
-1. asm_file.o: warning: objtool: func()+0x128: call without frame pointer save/setup
+1. file.o: warning: objtool: func()+0x128: call without frame pointer save/setup
 
    The func() function made a function call without first saving and/or
-   updating the frame pointer.
-
-   If func() is indeed a callable function, add proper frame pointer
-   logic using the FRAME_BEGIN and FRAME_END macros.  Otherwise, remove
-   its ELF function annotation by changing ENDPROC to END.
+   updating the frame pointer, and CONFIG_FRAME_POINTER is enabled.
 
-   If you're getting this error in a .c file, see the "Errors in .c
-   files" section.
+   If the error is for an asm file, and func() is indeed a callable
+   function, add proper frame pointer logic using the FRAME_BEGIN and
+   FRAME_END macros.  Otherwise, if it's not a callable function, remove
+   its ELF function annotation by changing ENDPROC to END, and instead
+   use the manual CFI hint macros in asm/undwarf.h.
 
+   If it's a GCC-compiled .c file, the error may be because the function
+   uses an inline asm() statement which has a "call" instruction.  An
+   asm() statement with a call instruction must declare the use of the
+   stack pointer in its output operand.  For example, on x86_64:
 
-2. asm_file.o: warning: objtool: .text+0x53: return instruction outside of a callable function
-
-   A return instruction was detected, but objtool couldn't find a way
-   for a callable function to reach the instruction.
+     register void *__sp asm("rsp");
+     asm volatile("call func" : "+r" (__sp));
 
-   If the return instruction is inside (or reachable from) a callable
-   function, the function needs to be annotated with the ENTRY/ENDPROC
-   macros.
+   Otherwise the stack frame may not get created before the call.
 
-   If you _really_ need a return instruction outside of a function, and
-   are 100% sure that it won't affect stack traces, you can tell
-   objtool to ignore it.  See the "Adding exceptions" section below.
 
+2. file.o: warning: objtool: .text+0x53: unreachable instruction
 
-3. asm_file.o: warning: objtool: func()+0x9: function has unreachable instruction
+   Objtool couldn't find a code path to reach the instruction.
 
-   The instruction lives inside of a callable function, but there's no
-   possible control flow path from the beginning of the function to the
-   instruction.
+   If the error is for an asm file, and the instruction is inside (or
+   reachable from) a callable function, the function should be annotated
+   with the ENTRY/ENDPROC macros (ENDPROC is the important one).
+   Otherwise, the code should probably be annotated with the CFI hint
+   macros in asm/undwarf.h so objtool and the unwinder can know the
+   stack state associated with the code.
 
-   If the instruction is actually needed, and it's actually in a
-   callable function, ensure that its function is properly annotated
-   with ENTRY/ENDPROC.
+   If you're 100% sure the code won't affect stack traces, or if you're
+   a just a bad person, you can tell objtool to ignore it.  See the
+   "Adding exceptions" section below.
 
    If it's not actually in a callable function (e.g. kernel entry code),
    change ENDPROC to END.
 
 
-4. asm_file.o: warning: objtool: func(): can't find starting instruction
+4. file.o: warning: objtool: func(): can't find starting instruction
    or
-   asm_file.o: warning: objtool: func()+0x11dd: can't decode instruction
+   file.o: warning: objtool: func()+0x11dd: can't decode instruction
 
-   Did you put data in a text section?  If so, that can confuse
+   Does the file have data in a text section?  If so, that can confuse
    objtool's instruction decoder.  Move the data to a more appropriate
    section like .data or .rodata.
 
 
-5. asm_file.o: warning: objtool: func()+0x6: kernel entry/exit from callable instruction
-
-   This is a kernel entry/exit instruction like sysenter or sysret.
-   Such instructions aren't allowed in a callable function, and are most
-   likely part of the kernel entry code.
+5. file.o: warning: objtool: func()+0x6: unsupported instruction in callable function
 
-   If the instruction isn't actually in a callable function, change
-   ENDPROC to END.
+   This is a kernel entry/exit instruction like sysenter or iret.  Such
+   instructions aren't allowed in a callable function, and are most
+   likely part of the kernel entry code.  They should usually not have
+   the callable function annotation (ENDPROC) and should always be
+   annotated with the CFI hint macros in asm/undwarf.h.
 
 
-6. asm_file.o: warning: objtool: func()+0x26: sibling call from callable instruction with changed frame pointer
+6. file.o: warning: objtool: func()+0x26: sibling call from callable instruction with modified stack frame
 
-   This is a dynamic jump or a jump to an undefined symbol.  Stacktool
+   This is a dynamic jump or a jump to an undefined symbol.  Objtool
    assumed it's a sibling call and detected that the frame pointer
    wasn't first restored to its original state.
 
@@ -282,24 +271,28 @@ they mean, and suggestions for how to fix them.
    destination code to the local file.
 
    If the instruction is not actually in a callable function (e.g.
-   kernel entry code), change ENDPROC to END.
+   kernel entry code), change ENDPROC to END and annotate manually with
+   the CFI hint macros in asm/undwarf.h.
 
 
-7. asm_file: warning: objtool: func()+0x5c: frame pointer state mismatch
+7. file: warning: objtool: func()+0x5c: stack state mismatch
 
    The instruction's frame pointer state is inconsistent, depending on
    which execution path was taken to reach the instruction.
 
-   Make sure the function pushes and sets up the frame pointer (for
-   x86_64, this means rbp) at the beginning of the function and pops it
-   at the end of the function.  Also make sure that no other code in the
-   function touches the frame pointer.
+   Make sure that, when CONFIG_FRAME_POINTER is enabled, the function
+   pushes and sets up the frame pointer (for x86_64, this means rbp) at
+   the beginning of the function and pops it at the end of the function.
+   Also make sure that no other code in the function touches the frame
+   pointer.
 
+   Another possibility is that the code has some asm or inline asm which
+   does some unusual things to the stack or the frame pointer.  In such
+   cases it's probably appropriate to use the CFI hint macros in
+   asm/undwarf.h.
 
-Errors in .c files
-------------------
 
-1. c_file.o: warning: objtool: funcA() falls through to next function funcB()
+8. file.o: warning: objtool: funcA() falls through to next function funcB()
 
    This means that funcA() doesn't end with a return instruction or an
    unconditional jump, and that objtool has determined that the function
@@ -318,22 +311,6 @@ Errors in .c files
       might be corrupt due to a gcc bug.  For more details, see:
       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
 
-2. If you're getting any other objtool error in a compiled .c file, it
-   may be because the file uses an asm() statement which has a "call"
-   instruction.  An asm() statement with a call instruction must declare
-   the use of the stack pointer in its output operand.  For example, on
-   x86_64:
-
-     register void *__sp asm("rsp");
-     asm volatile("call func" : "+r" (__sp));
-
-   Otherwise the stack frame may not get created before the call.
-
-3. Another possible cause for errors in C code is if the Makefile removes
-   -fno-omit-frame-pointer or adds -fomit-frame-pointer to the gcc options.
-
-Also see the above section for .S file errors for more information what
-the individual error messages mean.
 
 If the error doesn't seem to make sense, it could be a bug in objtool.
 Feel free to ask the objtool maintainer for help.
diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile
index 27e019c..0e2765e 100644
--- a/tools/objtool/Makefile
+++ b/tools/objtool/Makefile
@@ -25,7 +25,7 @@ OBJTOOL_IN := $(OBJTOOL)-in.o
 all: $(OBJTOOL)
 
 INCLUDES := -I$(srctree)/tools/include -I$(srctree)/tools/arch/$(HOSTARCH)/include/uapi
-CFLAGS   += -Wall -Werror $(EXTRA_WARNINGS) -fomit-frame-pointer -O2 -g $(INCLUDES)
+CFLAGS   += -Wall -Werror $(EXTRA_WARNINGS) -Wno-switch-default -Wno-switch-enum -fomit-frame-pointer -O2 -g $(INCLUDES)
 LDFLAGS  += -lelf $(LIBSUBCMD)
 
 # Allow old libelf to be used:
diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index a59e061..21aeca8 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -19,25 +19,63 @@
 #define _ARCH_H
 
 #include <stdbool.h>
+#include <linux/list.h>
 #include "elf.h"
+#include "cfi.h"
 
-#define INSN_FP_SAVE		1
-#define INSN_FP_SETUP		2
-#define INSN_FP_RESTORE		3
-#define INSN_JUMP_CONDITIONAL	4
-#define INSN_JUMP_UNCONDITIONAL	5
-#define INSN_JUMP_DYNAMIC	6
-#define INSN_CALL		7
-#define INSN_CALL_DYNAMIC	8
-#define INSN_RETURN		9
-#define INSN_CONTEXT_SWITCH	10
-#define INSN_NOP		11
-#define INSN_OTHER		12
+#define INSN_JUMP_CONDITIONAL	1
+#define INSN_JUMP_UNCONDITIONAL	2
+#define INSN_JUMP_DYNAMIC	3
+#define INSN_CALL		4
+#define INSN_CALL_DYNAMIC	5
+#define INSN_RETURN		6
+#define INSN_CONTEXT_SWITCH	7
+#define INSN_STACK		8
+#define INSN_NOP		9
+#define INSN_OTHER		10
 #define INSN_LAST		INSN_OTHER
 
+enum op_dest_type {
+	OP_DEST_REG,
+	OP_DEST_REG_INDIRECT,
+	OP_DEST_MEM,
+	OP_DEST_PUSH,
+	OP_DEST_LEAVE,
+};
+
+struct op_dest {
+	enum op_dest_type type;
+	unsigned char reg;
+	int offset;
+};
+
+enum op_src_type {
+	OP_SRC_REG,
+	OP_SRC_REG_INDIRECT,
+	OP_SRC_CONST,
+	OP_SRC_POP,
+	OP_SRC_ADD,
+	OP_SRC_AND,
+};
+
+struct op_src {
+	enum op_src_type type;
+	unsigned char reg;
+	int offset;
+};
+
+struct stack_op {
+	struct op_dest dest;
+	struct op_src src;
+};
+
+void arch_initial_func_cfi_state(struct cfi_state *state);
+
 int arch_decode_instruction(struct elf *elf, struct section *sec,
 			    unsigned long offset, unsigned int maxlen,
 			    unsigned int *len, unsigned char *type,
-			    unsigned long *displacement);
+			    unsigned long *immediate, struct stack_op *op);
+
+bool arch_callee_saved_reg(unsigned char reg);
 
 #endif /* _ARCH_H */
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 6ac99e3..a36c2eb 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -27,6 +27,17 @@
 #include "../../arch.h"
 #include "../../warn.h"
 
+static unsigned char op_to_cfi_reg[][2] = {
+	{CFI_AX, CFI_R8},
+	{CFI_CX, CFI_R9},
+	{CFI_DX, CFI_R10},
+	{CFI_BX, CFI_R11},
+	{CFI_SP, CFI_R12},
+	{CFI_BP, CFI_R13},
+	{CFI_SI, CFI_R14},
+	{CFI_DI, CFI_R15},
+};
+
 static int is_x86_64(struct elf *elf)
 {
 	switch (elf->ehdr.e_machine) {
@@ -40,24 +51,50 @@ static int is_x86_64(struct elf *elf)
 	}
 }
 
+bool arch_callee_saved_reg(unsigned char reg)
+{
+	switch (reg) {
+	case CFI_BP:
+	case CFI_BX:
+	case CFI_R12:
+	case CFI_R13:
+	case CFI_R14:
+	case CFI_R15:
+		return true;
+
+	case CFI_AX:
+	case CFI_CX:
+	case CFI_DX:
+	case CFI_SI:
+	case CFI_DI:
+	case CFI_SP:
+	case CFI_R8:
+	case CFI_R9:
+	case CFI_R10:
+	case CFI_R11:
+	case CFI_RA:
+	default:
+		return false;
+	}
+}
+
 int arch_decode_instruction(struct elf *elf, struct section *sec,
 			    unsigned long offset, unsigned int maxlen,
 			    unsigned int *len, unsigned char *type,
-			    unsigned long *immediate)
+			    unsigned long *immediate, struct stack_op *op)
 {
 	struct insn insn;
-	int x86_64;
-	unsigned char op1, op2, ext;
+	int x86_64, sign;
+	unsigned char op1, op2, rex = 0, rex_b = 0, rex_r = 0, rex_w = 0,
+		      modrm = 0, modrm_mod = 0, modrm_rm = 0, modrm_reg = 0,
+		      sib = 0;
 
 	x86_64 = is_x86_64(elf);
 	if (x86_64 == -1)
 		return -1;
 
-	insn_init(&insn, (void *)(sec->data + offset), maxlen, x86_64);
+	insn_init(&insn, sec->data->d_buf + offset, maxlen, x86_64);
 	insn_get_length(&insn);
-	insn_get_opcode(&insn);
-	insn_get_modrm(&insn);
-	insn_get_immediate(&insn);
 
 	if (!insn_complete(&insn)) {
 		WARN_FUNC("can't decode instruction", sec, offset);
@@ -73,67 +110,323 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 	op1 = insn.opcode.bytes[0];
 	op2 = insn.opcode.bytes[1];
 
+	if (insn.rex_prefix.nbytes) {
+		rex = insn.rex_prefix.bytes[0];
+		rex_w = X86_REX_W(rex) >> 3;
+		rex_r = X86_REX_R(rex) >> 2;
+		rex_b = X86_REX_B(rex);
+	}
+
+	if (insn.modrm.nbytes) {
+		modrm = insn.modrm.bytes[0];
+		modrm_mod = X86_MODRM_MOD(modrm);
+		modrm_reg = X86_MODRM_REG(modrm);
+		modrm_rm = X86_MODRM_RM(modrm);
+	}
+
+	if (insn.sib.nbytes)
+		sib = insn.sib.bytes[0];
+
 	switch (op1) {
-	case 0x55:
-		if (!insn.rex_prefix.nbytes)
-			/* push rbp */
-			*type = INSN_FP_SAVE;
+
+	case 0x1:
+	case 0x29:
+		if (rex_w && !rex_b && modrm_mod == 3 && modrm_rm == 4) {
+
+			/* add/sub reg, %rsp */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_ADD;
+			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
+			op->dest.type = OP_SRC_REG;
+			op->dest.reg = CFI_SP;
+		}
+		break;
+
+	case 0x50 ... 0x57:
+
+		/* push reg */
+		*type = INSN_STACK;
+		op->src.type = OP_SRC_REG;
+		op->src.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
+		op->dest.type = OP_DEST_PUSH;
+
 		break;
 
-	case 0x5d:
-		if (!insn.rex_prefix.nbytes)
-			/* pop rbp */
-			*type = INSN_FP_RESTORE;
+	case 0x58 ... 0x5f:
+
+		/* pop reg */
+		*type = INSN_STACK;
+		op->src.type = OP_SRC_POP;
+		op->dest.type = OP_DEST_REG;
+		op->dest.reg = op_to_cfi_reg[op1 & 0x7][rex_b];
+
+		break;
+
+	case 0x68:
+	case 0x6a:
+		/* push immediate */
+		*type = INSN_STACK;
+		op->src.type = OP_SRC_CONST;
+		op->dest.type = OP_DEST_PUSH;
 		break;
 
 	case 0x70 ... 0x7f:
 		*type = INSN_JUMP_CONDITIONAL;
 		break;
 
+	case 0x81:
+	case 0x83:
+		if (rex != 0x48)
+			break;
+
+		if (modrm == 0xe4) {
+			/* and imm, %rsp */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_AND;
+			op->src.reg = CFI_SP;
+			op->src.offset = insn.immediate.value;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_SP;
+			break;
+		}
+
+		if (modrm == 0xc4)
+			sign = 1;
+		else if (modrm == 0xec)
+			sign = -1;
+		else
+			break;
+
+		/* add/sub imm, %rsp */
+		*type = INSN_STACK;
+		op->src.type = OP_SRC_ADD;
+		op->src.reg = CFI_SP;
+		op->src.offset = insn.immediate.value * sign;
+		op->dest.type = OP_DEST_REG;
+		op->dest.reg = CFI_SP;
+		break;
+
 	case 0x89:
-		if (insn.rex_prefix.nbytes == 1 &&
-		    insn.rex_prefix.bytes[0] == 0x48 &&
-		    insn.modrm.nbytes && insn.modrm.bytes[0] == 0xe5)
-			/* mov rsp, rbp */
-			*type = INSN_FP_SETUP;
+		if (rex == 0x48 && modrm == 0xe5) {
+
+			/* mov %rsp, %rbp */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_REG;
+			op->src.reg = CFI_SP;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_BP;
+			break;
+		}
+		/* fallthrough */
+	case 0x88:
+		if (!rex_b &&
+		    (modrm_mod == 1 || modrm_mod == 2) && modrm_rm == 5) {
+
+			/* mov reg, disp(%rbp) */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_REG;
+			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
+			op->dest.type = OP_DEST_REG_INDIRECT;
+			op->dest.reg = CFI_BP;
+			op->dest.offset = insn.displacement.value;
+
+		} else if (rex_w && !rex_b && modrm_rm == 4 && sib == 0x24) {
+
+			/* mov reg, disp(%rsp) */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_REG;
+			op->src.reg = op_to_cfi_reg[modrm_reg][rex_r];
+			op->dest.type = OP_DEST_REG_INDIRECT;
+			op->dest.reg = CFI_SP;
+			op->dest.offset = insn.displacement.value;
+		}
+
+		break;
+
+	case 0x8b:
+		if (rex_w && !rex_b && modrm_mod == 1 && modrm_rm == 5) {
+
+			/* mov disp(%rbp), reg */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_BP;
+			op->src.offset = insn.displacement.value;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
+
+		} else if (rex_w && !rex_b && sib == 0x24 &&
+			   modrm_mod != 3 && modrm_rm == 4) {
+
+			/* mov disp(%rsp), reg */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_SP;
+			op->src.offset = insn.displacement.value;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = op_to_cfi_reg[modrm_reg][rex_r];
+		}
+
 		break;
 
 	case 0x8d:
-		if (insn.rex_prefix.nbytes &&
-		    insn.rex_prefix.bytes[0] == 0x48 &&
-		    insn.modrm.nbytes && insn.modrm.bytes[0] == 0x2c &&
-		    insn.sib.nbytes && insn.sib.bytes[0] == 0x24)
-			/* lea %(rsp), %rbp */
-			*type = INSN_FP_SETUP;
+		if (rex == 0x48 && modrm == 0x65) {
+
+			/* lea -disp(%rbp), %rsp */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_ADD;
+			op->src.reg = CFI_BP;
+			op->src.offset = insn.displacement.value;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_SP;
+			break;
+		}
+
+		if (rex == 0x4c && modrm == 0x54 && sib == 0x24 &&
+		    insn.displacement.value == 8) {
+
+			/*
+			 * lea 0x8(%rsp), %r10
+			 *
+			 * Here r10 is the "drap" pointer, used as a stack
+			 * pointer helper when the stack gets realigned.
+			 */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_ADD;
+			op->src.reg = CFI_SP;
+			op->src.offset = 8;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_R10;
+			break;
+		}
+
+		if (rex == 0x4c && modrm == 0x6c && sib == 0x24 &&
+		    insn.displacement.value == 16) {
+
+			/*
+			 * lea 0x10(%rsp), %r13
+			 *
+			 * Here r13 is the "drap" pointer, used as a stack
+			 * pointer helper when the stack gets realigned.
+			 */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_ADD;
+			op->src.reg = CFI_SP;
+			op->src.offset = 16;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_R13;
+			break;
+		}
+
+		if (rex == 0x49 && modrm == 0x62 &&
+		    insn.displacement.value == -8) {
+
+			/*
+			 * lea -0x8(%r10), %rsp
+			 *
+			 * Restoring rsp back to its original value after a
+			 * stack realignment.
+			 */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_ADD;
+			op->src.reg = CFI_R10;
+			op->src.offset = -8;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_SP;
+			break;
+		}
+
+		if (rex == 0x49 && modrm == 0x65 &&
+		    insn.displacement.value == -16) {
+
+			/*
+			 * lea -0x10(%r13), %rsp
+			 *
+			 * Restoring rsp back to its original value after a
+			 * stack realignment.
+			 */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_ADD;
+			op->src.reg = CFI_R13;
+			op->src.offset = -16;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = CFI_SP;
+			break;
+		}
+
+		break;
+
+	case 0x8f:
+		/* pop to mem */
+		*type = INSN_STACK;
+		op->src.type = OP_SRC_POP;
+		op->dest.type = OP_DEST_MEM;
 		break;
 
 	case 0x90:
 		*type = INSN_NOP;
 		break;
 
+	case 0x9c:
+		/* pushf */
+		*type = INSN_STACK;
+		op->src.type = OP_SRC_CONST;
+		op->dest.type = OP_DEST_PUSH;
+		break;
+
+	case 0x9d:
+		/* popf */
+		*type = INSN_STACK;
+		op->src.type = OP_SRC_POP;
+		op->dest.type = OP_DEST_MEM;
+		break;
+
 	case 0x0f:
+
 		if (op2 >= 0x80 && op2 <= 0x8f)
 			*type = INSN_JUMP_CONDITIONAL;
 		else if (op2 == 0x05 || op2 == 0x07 || op2 == 0x34 ||
 			 op2 == 0x35)
+
 			/* sysenter, sysret */
 			*type = INSN_CONTEXT_SWITCH;
+
 		else if (op2 == 0x0d || op2 == 0x1f)
+
 			/* nopl/nopw */
 			*type = INSN_NOP;
-		else if (op2 == 0x01 && insn.modrm.nbytes &&
-			 (insn.modrm.bytes[0] == 0xc2 ||
-			  insn.modrm.bytes[0] == 0xd8))
-			/* vmlaunch, vmrun */
-			*type = INSN_CONTEXT_SWITCH;
+
+		else if (op2 == 0xa0 || op2 == 0xa8) {
+
+			/* push fs/gs */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_CONST;
+			op->dest.type = OP_DEST_PUSH;
+
+		} else if (op2 == 0xa1 || op2 == 0xa9) {
+
+			/* pop fs/gs */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_POP;
+			op->dest.type = OP_DEST_MEM;
+		}
 
 		break;
 
-	case 0xc9: /* leave */
-		*type = INSN_FP_RESTORE;
+	case 0xc9:
+		/*
+		 * leave
+		 *
+		 * equivalent to:
+		 * mov bp, sp
+		 * pop bp
+		 */
+		*type = INSN_STACK;
+		op->dest.type = OP_DEST_LEAVE;
+
 		break;
 
-	case 0xe3: /* jecxz/jrcxz */
+	case 0xe3:
+		/* jecxz/jrcxz */
 		*type = INSN_JUMP_CONDITIONAL;
 		break;
 
@@ -158,14 +451,27 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 		break;
 
 	case 0xff:
-		ext = X86_MODRM_REG(insn.modrm.bytes[0]);
-		if (ext == 2 || ext == 3)
+		if (modrm_reg == 2 || modrm_reg == 3)
+
 			*type = INSN_CALL_DYNAMIC;
-		else if (ext == 4)
+
+		else if (modrm_reg == 4)
+
 			*type = INSN_JUMP_DYNAMIC;
-		else if (ext == 5) /*jmpf */
+
+		else if (modrm_reg == 5)
+
+			/* jmpf */
 			*type = INSN_CONTEXT_SWITCH;
 
+		else if (modrm_reg == 6) {
+
+			/* push from mem */
+			*type = INSN_STACK;
+			op->src.type = OP_SRC_CONST;
+			op->dest.type = OP_DEST_PUSH;
+		}
+
 		break;
 
 	default:
@@ -176,3 +482,21 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 
 	return 0;
 }
+
+void arch_initial_func_cfi_state(struct cfi_state *state)
+{
+	int i;
+
+	for (i = 0; i < CFI_NUM_REGS; i++) {
+		state->regs[i].base = CFI_UNDEFINED;
+		state->regs[i].offset = 0;
+	}
+
+	/* initial CFA (call frame address) */
+	state->cfa.base = CFI_SP;
+	state->cfa.offset = 8;
+
+	/* initial RA (return address) */
+	state->regs[16].base = CFI_CFA;
+	state->regs[16].offset = -8;
+}
diff --git a/tools/objtool/cfi.h b/tools/objtool/cfi.h
new file mode 100644
index 0000000..443ab2c
--- /dev/null
+++ b/tools/objtool/cfi.h
@@ -0,0 +1,55 @@
+/*
+ * Copyright (C) 2015-2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _OBJTOOL_CFI_H
+#define _OBJTOOL_CFI_H
+
+#define CFI_UNDEFINED		-1
+#define CFI_CFA			-2
+#define CFI_SP_INDIRECT		-3
+#define CFI_BP_INDIRECT		-4
+
+#define CFI_AX			0
+#define CFI_DX			1
+#define CFI_CX			2
+#define CFI_BX			3
+#define CFI_SI			4
+#define CFI_DI			5
+#define CFI_BP			6
+#define CFI_SP			7
+#define CFI_R8			8
+#define CFI_R9			9
+#define CFI_R10			10
+#define CFI_R11			11
+#define CFI_R12			12
+#define CFI_R13			13
+#define CFI_R14			14
+#define CFI_R15			15
+#define CFI_RA			16
+#define CFI_NUM_REGS	17
+
+struct cfi_reg {
+	int base;
+	int offset;
+};
+
+struct cfi_state {
+	struct cfi_reg cfa;
+	struct cfi_reg regs[CFI_NUM_REGS];
+};
+
+#endif /* _OBJTOOL_CFI_H */
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 91b97db..e924fd5 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -27,10 +27,6 @@
 #include <linux/hashtable.h>
 #include <linux/kernel.h>
 
-#define STATE_FP_SAVED		0x1
-#define STATE_FP_SETUP		0x2
-#define STATE_FENTRY		0x4
-
 struct alternative {
 	struct list_head list;
 	struct instruction *insn;
@@ -38,6 +34,7 @@ struct alternative {
 
 const char *objname;
 static bool nofp;
+struct cfi_state initial_func_cfi;
 
 static struct instruction *find_insn(struct objtool_file *file,
 				     struct section *sec, unsigned long offset)
@@ -56,7 +53,7 @@ static struct instruction *next_insn_same_sec(struct objtool_file *file,
 {
 	struct instruction *next = list_next_entry(insn, list);
 
-	if (&next->list == &file->insn_list || next->sec != insn->sec)
+	if (!next || &next->list == &file->insn_list || next->sec != insn->sec)
 		return NULL;
 
 	return next;
@@ -75,9 +72,6 @@ static bool gcov_enabled(struct objtool_file *file)
 	return false;
 }
 
-#define for_each_insn(file, insn)					\
-	list_for_each_entry(insn, &file->insn_list, list)
-
 #define func_for_each_insn(file, func, insn)				\
 	for (insn = find_insn(file, func->sec, func->offset);		\
 	     insn && &insn->list != &file->insn_list &&			\
@@ -94,6 +88,9 @@ static bool gcov_enabled(struct objtool_file *file)
 #define sec_for_each_insn_from(file, insn)				\
 	for (; insn; insn = next_insn_same_sec(file, insn))
 
+#define sec_for_each_insn_continue(file, insn)				\
+	for (insn = next_insn_same_sec(file, insn); insn;		\
+	     insn = next_insn_same_sec(file, insn))
 
 /*
  * Check if the function has been manually whitelisted with the
@@ -103,7 +100,6 @@ static bool gcov_enabled(struct objtool_file *file)
 static bool ignore_func(struct objtool_file *file, struct symbol *func)
 {
 	struct rela *rela;
-	struct instruction *insn;
 
 	/* check for STACK_FRAME_NON_STANDARD */
 	if (file->whitelist && file->whitelist->rela)
@@ -116,11 +112,6 @@ static bool ignore_func(struct objtool_file *file, struct symbol *func)
 				return true;
 		}
 
-	/* check if it has a context switching instruction */
-	func_for_each_insn(file, func, insn)
-		if (insn->type == INSN_CONTEXT_SWITCH)
-			return true;
-
 	return false;
 }
 
@@ -233,6 +224,17 @@ static int dead_end_function(struct objtool_file *file, struct symbol *func)
 	return __dead_end_function(file, func, 0);
 }
 
+static void clear_insn_state(struct insn_state *state)
+{
+	int i;
+
+	memset(state, 0, sizeof(*state));
+	state->cfa.base = CFI_UNDEFINED;
+	for (i = 0; i < CFI_NUM_REGS; i++)
+		state->regs[i].base = CFI_UNDEFINED;
+	state->drap_reg = CFI_UNDEFINED;
+}
+
 /*
  * Call the arch-specific instruction decoder for all the instructions and add
  * them to the global instruction list.
@@ -252,16 +254,22 @@ static int decode_instructions(struct objtool_file *file)
 
 		for (offset = 0; offset < sec->len; offset += insn->len) {
 			insn = malloc(sizeof(*insn));
+			if (!insn) {
+				WARN("malloc failed");
+				return -1;
+			}
 			memset(insn, 0, sizeof(*insn));
-
 			INIT_LIST_HEAD(&insn->alts);
+			clear_insn_state(&insn->state);
+
 			insn->sec = sec;
 			insn->offset = offset;
 
 			ret = arch_decode_instruction(file->elf, sec, offset,
 						      sec->len - offset,
 						      &insn->len, &insn->type,
-						      &insn->immediate);
+						      &insn->immediate,
+						      &insn->stack_op);
 			if (ret)
 				return ret;
 
@@ -360,7 +368,7 @@ static void add_ignores(struct objtool_file *file)
 				continue;
 
 			func_for_each_insn(file, func, insn)
-				insn->visited = true;
+				insn->ignore = true;
 		}
 	}
 }
@@ -380,8 +388,7 @@ static int add_jump_destinations(struct objtool_file *file)
 		    insn->type != INSN_JUMP_UNCONDITIONAL)
 			continue;
 
-		/* skip ignores */
-		if (insn->visited)
+		if (insn->ignore)
 			continue;
 
 		rela = find_rela_by_dest_range(insn->sec, insn->offset,
@@ -518,10 +525,13 @@ static int handle_group_alt(struct objtool_file *file,
 	}
 	memset(fake_jump, 0, sizeof(*fake_jump));
 	INIT_LIST_HEAD(&fake_jump->alts);
+	clear_insn_state(&fake_jump->state);
+
 	fake_jump->sec = special_alt->new_sec;
 	fake_jump->offset = -1;
 	fake_jump->type = INSN_JUMP_UNCONDITIONAL;
 	fake_jump->jump_dest = list_next_entry(last_orig_insn, list);
+	fake_jump->ignore = true;
 
 	if (!special_alt->new_len) {
 		*new_insn = fake_jump;
@@ -900,21 +910,356 @@ static bool is_fentry_call(struct instruction *insn)
 	return false;
 }
 
-static bool has_modified_stack_frame(struct instruction *insn)
+static bool has_modified_stack_frame(struct insn_state *state)
 {
-	return (insn->state & STATE_FP_SAVED) ||
-	       (insn->state & STATE_FP_SETUP);
+	int i;
+
+	if (state->cfa.base != initial_func_cfi.cfa.base ||
+	    state->cfa.offset != initial_func_cfi.cfa.offset ||
+	    state->stack_size != initial_func_cfi.cfa.offset ||
+	    state->drap)
+		return true;
+
+	for (i = 0; i < CFI_NUM_REGS; i++)
+		if (state->regs[i].base != initial_func_cfi.regs[i].base ||
+		    state->regs[i].offset != initial_func_cfi.regs[i].offset)
+			return true;
+
+	return false;
 }
 
-static bool has_valid_stack_frame(struct instruction *insn)
+static bool has_valid_stack_frame(struct insn_state *state)
 {
-	return (insn->state & STATE_FP_SAVED) &&
-	       (insn->state & STATE_FP_SETUP);
+	if (state->cfa.base == CFI_BP && state->regs[CFI_BP].base == CFI_CFA &&
+	    state->regs[CFI_BP].offset == -16)
+		return true;
+
+	if (state->drap && state->regs[CFI_BP].base == CFI_BP)
+		return true;
+
+	return false;
 }
 
-static unsigned int frame_state(unsigned long state)
+static void save_reg(struct insn_state *state, unsigned char reg, int base,
+		     int offset)
 {
-	return (state & (STATE_FP_SAVED | STATE_FP_SETUP));
+	if ((arch_callee_saved_reg(reg) ||
+	    (state->drap && reg == state->drap_reg)) &&
+	    state->regs[reg].base == CFI_UNDEFINED) {
+		state->regs[reg].base = base;
+		state->regs[reg].offset = offset;
+	}
+}
+
+static void restore_reg(struct insn_state *state, unsigned char reg)
+{
+	state->regs[reg].base = CFI_UNDEFINED;
+	state->regs[reg].offset = 0;
+}
+
+/*
+ * A note about DRAP stack alignment:
+ *
+ * GCC has the concept of a DRAP register, which is used to help keep track of
+ * the stack pointer when aligning the stack.  r10 or r13 is used as the DRAP
+ * register.  The typical DRAP pattern is:
+ *
+ *   ffffffff810079c5:	4c 8d 54 24 08		lea    0x8(%rsp),%r10
+ *   ffffffff810079ca:	48 83 e4 c0		and    $0xffffffffffffffc0,%rsp
+ *   ffffffff810079ce:	41 ff 72 f8		pushq  -0x8(%r10)
+ *   ffffffff810079d2:	55			push   %rbp
+ *   ffffffff810079d3:	48 89 e5		mov    %rsp,%rbp
+ *   ...
+ *   ffffffff810079e5:	41 52			push   %r10
+ *   ...
+ *   ffffffff810079e8:	48 81 ec 00 02 00 00	sub    $0x200,%rsp
+ *   ...
+ *   ffffffff81007b56:	48 81 c4 00 02 00 00	add    $0x200,%rsp
+ *   ...
+ *   ffffffff81007b5e:	41 5a			pop    %r10
+ *   ...
+ *   ...
+ *   ffffffff81007b68:	5d			pop    %rbp
+ *   ffffffff81007b69:	49 8d 62 f8		lea    -0x8(%r10),%rsp
+ *   ffffffff81007b6d:	c3			retq
+ *
+ * When r13 is used as the DRAP pointer, the main difference from the above is
+ * that r13 gets pushed before the realignment.
+ */
+static int update_insn_state(struct instruction *insn, struct insn_state *state)
+{
+	struct stack_op *op = &insn->stack_op;
+	struct cfi_reg *cfa = &state->cfa;
+	struct cfi_reg *regs = state->regs;
+
+	/* stack operations don't make sense with an undefined CFA */
+	if (cfa->base == CFI_UNDEFINED) {
+		if (insn->func) {
+			WARN_FUNC("undefined stack state", insn->sec, insn->offset);
+			return -1;
+		}
+		return 0;
+	}
+
+	switch (op->dest.type) {
+
+	case OP_DEST_REG:
+		switch (op->src.type) {
+
+		case OP_SRC_REG:
+			if (cfa->base == op->src.reg && cfa->base == CFI_SP &&
+			    op->dest.reg == CFI_BP && regs[CFI_BP].base == CFI_CFA &&
+			    regs[CFI_BP].offset == -cfa->offset) {
+
+				/* mov %rsp, %rbp */
+				cfa->base = op->dest.reg;
+				state->bp_scratch = false;
+			} else if (state->drap) {
+
+				/* drap: mov %rsp, %rbp */
+				regs[CFI_BP].base = CFI_BP;
+				regs[CFI_BP].offset = -state->stack_size;
+				state->bp_scratch = false;
+			} else if (!nofp) {
+
+				WARN_FUNC("unknown stack-related register move",
+					  insn->sec, insn->offset);
+				return -1;
+			}
+
+			break;
+
+		case OP_SRC_ADD:
+			if (op->dest.reg == CFI_SP && op->src.reg == CFI_SP) {
+
+				/* add imm, %rsp */
+				state->stack_size -= op->src.offset;
+				if (cfa->base == CFI_SP)
+					cfa->offset -= op->src.offset;
+				break;
+			}
+
+			if (op->dest.reg == CFI_SP && op->src.reg == CFI_BP) {
+
+				/* lea disp(%rbp), %rsp */
+				state->stack_size = -(op->src.offset + regs[CFI_BP].offset);
+				break;
+			}
+
+			if (op->dest.reg != CFI_BP && op->src.reg == CFI_SP &&
+			    cfa->base == CFI_SP) {
+
+				/* drap: lea disp(%rsp), %drap */
+				state->drap_reg = op->dest.reg;
+				break;
+			}
+
+			if (state->drap && op->dest.reg == CFI_SP &&
+			    op->src.reg == state->drap_reg) {
+
+				 /* drap: lea disp(%drap), %rsp */
+				cfa->base = CFI_SP;
+				cfa->offset = state->stack_size = -op->src.offset;
+				state->drap_reg = CFI_UNDEFINED;
+				state->drap = false;
+				break;
+			}
+
+			if (op->dest.reg == state->cfa.base) {
+				WARN_FUNC("unsupported stack register modification",
+					  insn->sec, insn->offset);
+				return -1;
+			}
+
+			break;
+
+		case OP_SRC_AND:
+			if (op->dest.reg != CFI_SP || state->drap_reg == CFI_UNDEFINED ||
+			    cfa->base != CFI_SP) {
+				WARN_FUNC("unsupported stack pointer realignment",
+					  insn->sec, insn->offset);
+				return -1;
+			}
+
+			/* and imm, %rsp */
+			cfa->base = state->drap_reg;
+			cfa->offset = state->stack_size = 0;
+			state->drap = true;
+			break;
+
+		case OP_SRC_POP:
+			if (!state->drap && op->dest.type == OP_DEST_REG &&
+			    op->dest.reg == cfa->base) {
+
+				/* pop %rbp */
+				cfa->base = CFI_SP;
+			}
+
+			state->stack_size -= 8;
+			if (cfa->base == CFI_SP)
+				cfa->offset -= 8;
+
+			if (regs[op->dest.reg].offset + state->stack_size == -8) {
+
+				if (state->drap && cfa->base == CFI_BP_INDIRECT &&
+				    op->dest.type == OP_DEST_REG &&
+				    op->dest.reg == state->drap_reg) {
+
+					/* drap: pop %drap */
+					cfa->base = state->drap_reg;
+					cfa->offset = 0;
+				}
+
+				/* the saved value is popped and gone */
+				restore_reg(state, op->dest.reg);
+			}
+
+			break;
+
+		case OP_SRC_REG_INDIRECT:
+			if (op->src.reg == cfa->base &&
+			    op->src.offset == regs[op->dest.reg].offset + cfa->offset) {
+
+				/* mov disp(%rbp), reg */
+				/* mov disp(%rsp), reg */
+				restore_reg(state, op->dest.reg);
+			}
+
+			break;
+
+		default:
+			WARN_FUNC("unknown stack-related instruction",
+				  insn->sec, insn->offset);
+			return -1;
+		}
+
+		break;
+
+	case OP_DEST_PUSH:
+		state->stack_size += 8;
+		if (cfa->base == CFI_SP)
+			cfa->offset += 8;
+
+		if (op->src.type != OP_SRC_REG)
+			break;
+
+		/* fallthrough */
+
+	case OP_DEST_REG_INDIRECT:
+		if (state->drap) {
+			if (op->src.reg == cfa->base) {
+
+				/* drap: push %drap */
+				cfa->base = CFI_BP_INDIRECT;
+				cfa->offset = -state->stack_size;
+
+				/* save drap so we know when to undefine it */
+				save_reg(state, op->src.reg, CFI_CFA, -state->stack_size);
+
+			} else if (op->src.reg == CFI_BP) {
+
+				/* drap: push %rbp */
+				state->stack_size = 0;
+
+			} else if (regs[op->src.reg].base == CFI_UNDEFINED) {
+
+				/* drap: push reg (other than rbp) */
+				save_reg(state, op->src.reg, CFI_BP, -state->stack_size);
+			}
+
+		} else if (op->dest.type == OP_DEST_PUSH) {
+
+			/* push reg */
+			save_reg(state, op->src.reg, CFI_CFA, -state->stack_size);
+		} else if (op->dest.reg == cfa->base) {
+
+			/* mov reg, disp(%rbp) */
+			/* mov reg, disp(%rsp) */
+			save_reg(state, op->src.reg, CFI_CFA,
+				 op->dest.offset - state->cfa.offset);
+		}
+
+		/* detect when asm code uses rbp as a scratch register */
+		if (!nofp && insn->func && op->src.reg == CFI_BP &&
+		    cfa->base != CFI_BP)
+			state->bp_scratch = true;
+
+		break;
+
+	case OP_DEST_LEAVE:
+		if (cfa->base != CFI_BP) {
+			WARN_FUNC("leave instruction without stack frame pointer setup",
+				  insn->sec, insn->offset);
+			return -1;
+		}
+
+		/* leaveq */
+		regs[CFI_BP].base = CFI_UNDEFINED;
+		regs[CFI_BP].offset = 0;
+		cfa->base = CFI_SP;
+		cfa->offset -= 8;
+		state->stack_size = cfa->offset;
+
+		break;
+
+	case OP_DEST_MEM:
+		if (op->src.type != OP_SRC_POP) {
+			WARN_FUNC("unknown stack-related memory operation",
+				  insn->sec, insn->offset);
+			return -1;
+		}
+
+		/* pop mem */
+		state->stack_size -= 8;
+		if (cfa->base == CFI_SP)
+			cfa->offset -= 8;
+
+		break;
+
+	default:
+		WARN_FUNC("unknown stack-related instruction",
+			  insn->sec, insn->offset);
+		return -1;
+	}
+
+	return 0;
+}
+
+static bool insn_state_match(struct instruction *insn, struct insn_state *state)
+{
+	struct insn_state *state1 = &insn->state, *state2 = state;
+	int i;
+
+	if (memcmp(&state1->cfa, &state2->cfa, sizeof(state1->cfa))) {
+		WARN_FUNC("stack state mismatch: cfa1=%d%+d cfa2=%d%+d",
+			  insn->sec, insn->offset,
+			  state1->cfa.base, state1->cfa.offset,
+			  state2->cfa.base, state2->cfa.offset);
+
+	} else if (memcmp(&state1->regs, &state2->regs, sizeof(state1->regs))) {
+		for (i = 0; i < CFI_NUM_REGS; i++) {
+			if (!memcmp(&state1->regs[i], &state2->regs[i],
+				    sizeof(struct cfi_reg)))
+				continue;
+
+			WARN_FUNC("stack state mismatch: reg1[%d]=%d%+d reg2[%d]=%d%+d",
+				  insn->sec, insn->offset,
+				  i, state1->regs[i].base, state1->regs[i].offset,
+				  i, state2->regs[i].base, state2->regs[i].offset);
+			break;
+		}
+
+	} else if (state1->drap != state2->drap ||
+		 (state1->drap && state1->drap_reg != state2->drap_reg)) {
+		WARN_FUNC("stack state mismatch: drap1=%d(%d) drap2=%d(%d)",
+			  insn->sec, insn->offset,
+			  state1->drap, state1->drap_reg,
+			  state2->drap, state2->drap_reg);
+
+	} else
+		return true;
+
+	return false;
 }
 
 /*
@@ -923,24 +1268,22 @@ static unsigned int frame_state(unsigned long state)
  * each instruction and validate all the rules described in
  * tools/objtool/Documentation/stack-validation.txt.
  */
-static int validate_branch(struct objtool_file *file,
-			   struct instruction *first, unsigned char first_state)
+static int validate_branch(struct objtool_file *file, struct instruction *first,
+			   struct insn_state state)
 {
 	struct alternative *alt;
 	struct instruction *insn;
 	struct section *sec;
 	struct symbol *func = NULL;
-	unsigned char state;
 	int ret;
 
 	insn = first;
 	sec = insn->sec;
-	state = first_state;
 
 	if (insn->alt_group && list_empty(&insn->alts)) {
 		WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
 			  sec, insn->offset);
-		return 1;
+		return -1;
 	}
 
 	while (1) {
@@ -950,23 +1293,21 @@ static int validate_branch(struct objtool_file *file,
 				     func->name, insn->func->name);
 				return 1;
 			}
-
-			func = insn->func;
 		}
 
+		func = insn->func;
+
 		if (insn->visited) {
-			if (frame_state(insn->state) != frame_state(state)) {
-				WARN_FUNC("frame pointer state mismatch",
-					  sec, insn->offset);
+			if (!!insn_state_match(insn, &state))
 				return 1;
-			}
 
 			return 0;
 		}
 
-		insn->visited = true;
 		insn->state = state;
 
+		insn->visited = true;
+
 		list_for_each_entry(alt, &insn->alts, list) {
 			ret = validate_branch(file, alt->insn, state);
 			if (ret)
@@ -975,50 +1316,24 @@ static int validate_branch(struct objtool_file *file,
 
 		switch (insn->type) {
 
-		case INSN_FP_SAVE:
-			if (!nofp) {
-				if (state & STATE_FP_SAVED) {
-					WARN_FUNC("duplicate frame pointer save",
-						  sec, insn->offset);
-					return 1;
-				}
-				state |= STATE_FP_SAVED;
-			}
-			break;
-
-		case INSN_FP_SETUP:
-			if (!nofp) {
-				if (state & STATE_FP_SETUP) {
-					WARN_FUNC("duplicate frame pointer setup",
-						  sec, insn->offset);
-					return 1;
-				}
-				state |= STATE_FP_SETUP;
-			}
-			break;
-
-		case INSN_FP_RESTORE:
-			if (!nofp) {
-				if (has_valid_stack_frame(insn))
-					state &= ~STATE_FP_SETUP;
-
-				state &= ~STATE_FP_SAVED;
-			}
-			break;
-
 		case INSN_RETURN:
-			if (!nofp && has_modified_stack_frame(insn)) {
-				WARN_FUNC("return without frame pointer restore",
+			if (func && has_modified_stack_frame(&state)) {
+				WARN_FUNC("return with modified stack frame",
 					  sec, insn->offset);
 				return 1;
 			}
+
+			if (state.bp_scratch) {
+				WARN("%s uses BP as a scratch register",
+				     insn->func->name);
+				return 1;
+			}
+
 			return 0;
 
 		case INSN_CALL:
-			if (is_fentry_call(insn)) {
-				state |= STATE_FENTRY;
+			if (is_fentry_call(insn))
 				break;
-			}
 
 			ret = dead_end_function(file, insn->call_dest);
 			if (ret == 1)
@@ -1028,7 +1343,7 @@ static int validate_branch(struct objtool_file *file,
 
 			/* fallthrough */
 		case INSN_CALL_DYNAMIC:
-			if (!nofp && !has_valid_stack_frame(insn)) {
+			if (!nofp && func && !has_valid_stack_frame(&state)) {
 				WARN_FUNC("call without frame pointer save/setup",
 					  sec, insn->offset);
 				return 1;
@@ -1042,8 +1357,8 @@ static int validate_branch(struct objtool_file *file,
 						      state);
 				if (ret)
 					return 1;
-			} else if (has_modified_stack_frame(insn)) {
-				WARN_FUNC("sibling call from callable instruction with changed frame pointer",
+			} else if (func && has_modified_stack_frame(&state)) {
+				WARN_FUNC("sibling call from callable instruction with modified stack frame",
 					  sec, insn->offset);
 				return 1;
 			} /* else it's a sibling call */
@@ -1054,15 +1369,29 @@ static int validate_branch(struct objtool_file *file,
 			break;
 
 		case INSN_JUMP_DYNAMIC:
-			if (list_empty(&insn->alts) &&
-			    has_modified_stack_frame(insn)) {
-				WARN_FUNC("sibling call from callable instruction with changed frame pointer",
+			if (func && list_empty(&insn->alts) &&
+			    has_modified_stack_frame(&state)) {
+				WARN_FUNC("sibling call from callable instruction with modified stack frame",
 					  sec, insn->offset);
 				return 1;
 			}
 
 			return 0;
 
+		case INSN_CONTEXT_SWITCH:
+			if (func) {
+				WARN_FUNC("unsupported instruction in callable function",
+					  sec, insn->offset);
+				return 1;
+			}
+			return 0;
+
+		case INSN_STACK:
+			if (update_insn_state(insn, &state))
+				return -1;
+
+			break;
+
 		default:
 			break;
 		}
@@ -1093,12 +1422,18 @@ static bool is_ubsan_insn(struct instruction *insn)
 			"__ubsan_handle_builtin_unreachable"));
 }
 
-static bool ignore_unreachable_insn(struct symbol *func,
-				    struct instruction *insn)
+static bool ignore_unreachable_insn(struct instruction *insn)
 {
 	int i;
 
-	if (insn->type == INSN_NOP)
+	if (insn->ignore || insn->type == INSN_NOP)
+		return true;
+
+	/*
+	 * Ignore any unused exceptions.  This can happen when a whitelisted
+	 * function has an exception table entry.
+	 */
+	if (!strcmp(insn->sec->name, ".fixup"))
 		return true;
 
 	/*
@@ -1107,6 +1442,8 @@ static bool ignore_unreachable_insn(struct symbol *func,
 	 *
 	 * End the search at 5 instructions to avoid going into the weeds.
 	 */
+	if (!insn->func)
+		return false;
 	for (i = 0; i < 5; i++) {
 
 		if (is_kasan_insn(insn) || is_ubsan_insn(insn))
@@ -1117,7 +1454,7 @@ static bool ignore_unreachable_insn(struct symbol *func,
 			continue;
 		}
 
-		if (insn->offset + insn->len >= func->offset + func->len)
+		if (insn->offset + insn->len >= insn->func->offset + insn->func->len)
 			break;
 		insn = list_next_entry(insn, list);
 	}
@@ -1130,73 +1467,58 @@ static int validate_functions(struct objtool_file *file)
 	struct section *sec;
 	struct symbol *func;
 	struct instruction *insn;
+	struct insn_state state;
 	int ret, warnings = 0;
 
+	clear_insn_state(&state);
+
+	state.cfa = initial_func_cfi.cfa;
+	memcpy(&state.regs, &initial_func_cfi.regs,
+	       CFI_NUM_REGS * sizeof(struct cfi_reg));
+	state.stack_size = initial_func_cfi.cfa.offset;
+
 	list_for_each_entry(sec, &file->elf->sections, list) {
 		list_for_each_entry(func, &sec->symbol_list, list) {
 			if (func->type != STT_FUNC)
 				continue;
 
 			insn = find_insn(file, sec, func->offset);
-			if (!insn)
+			if (!insn || insn->ignore)
 				continue;
 
-			ret = validate_branch(file, insn, 0);
+			ret = validate_branch(file, insn, state);
 			warnings += ret;
 		}
 	}
 
-	list_for_each_entry(sec, &file->elf->sections, list) {
-		list_for_each_entry(func, &sec->symbol_list, list) {
-			if (func->type != STT_FUNC)
-				continue;
-
-			func_for_each_insn(file, func, insn) {
-				if (insn->visited)
-					continue;
-
-				insn->visited = true;
-
-				if (file->ignore_unreachables || warnings ||
-				    ignore_unreachable_insn(func, insn))
-					continue;
-
-				/*
-				 * gcov produces a lot of unreachable
-				 * instructions.  If we get an unreachable
-				 * warning and the file has gcov enabled, just
-				 * ignore it, and all other such warnings for
-				 * the file.
-				 */
-				if (!file->ignore_unreachables &&
-				    gcov_enabled(file)) {
-					file->ignore_unreachables = true;
-					continue;
-				}
-
-				WARN_FUNC("function has unreachable instruction", insn->sec, insn->offset);
-				warnings++;
-			}
-		}
-	}
-
 	return warnings;
 }
 
-static int validate_uncallable_instructions(struct objtool_file *file)
+static int validate_reachable_instructions(struct objtool_file *file)
 {
 	struct instruction *insn;
-	int warnings = 0;
+
+	if (file->ignore_unreachables)
+		return 0;
 
 	for_each_insn(file, insn) {
-		if (!insn->visited && insn->type == INSN_RETURN) {
-			WARN_FUNC("return instruction outside of a callable function",
-				  insn->sec, insn->offset);
-			warnings++;
-		}
+		if (insn->visited || ignore_unreachable_insn(insn))
+			continue;
+
+		/*
+		 * gcov produces a lot of unreachable instructions.  If we get
+		 * an unreachable warning and the file has gcov enabled, just
+		 * ignore it, and all other such warnings for the file.  Do
+		 * this here because this is an expensive function.
+		 */
+		if (gcov_enabled(file))
+			return 0;
+
+		WARN_FUNC("unreachable instruction", insn->sec, insn->offset);
+		return 1;
 	}
 
-	return warnings;
+	return 0;
 }
 
 static void cleanup(struct objtool_file *file)
@@ -1237,21 +1559,28 @@ int check(const char *_objname, bool _nofp)
 	file.ignore_unreachables = false;
 	file.c_file = find_section_by_name(file.elf, ".comment");
 
+	arch_initial_func_cfi_state(&initial_func_cfi);
+
 	ret = decode_sections(&file);
 	if (ret < 0)
 		goto out;
 	warnings += ret;
 
-	ret = validate_functions(&file);
-	if (ret < 0)
+	if (list_empty(&file.insn_list))
 		goto out;
-	warnings += ret;
 
-	ret = validate_uncallable_instructions(&file);
+	ret = validate_functions(&file);
 	if (ret < 0)
 		goto out;
 	warnings += ret;
 
+	if (!warnings) {
+		ret = validate_reachable_instructions(&file);
+		if (ret < 0)
+			goto out;
+		warnings += ret;
+	}
+
 out:
 	cleanup(&file);
 
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index b0ac3ba..da85f5b 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -24,19 +24,30 @@
 #include "arch.h"
 #include <linux/hashtable.h>
 
+struct insn_state {
+	struct cfi_reg cfa;
+	struct cfi_reg regs[CFI_NUM_REGS];
+	int stack_size;
+	bool bp_scratch;
+	bool drap;
+	int drap_reg;
+};
+
 struct instruction {
 	struct list_head list;
 	struct hlist_node hash;
 	struct section *sec;
 	unsigned long offset;
-	unsigned int len, state;
+	unsigned int len;
 	unsigned char type;
 	unsigned long immediate;
-	bool alt_group, visited, dead_end;
+	bool alt_group, visited, dead_end, ignore;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;
 	struct list_head alts;
 	struct symbol *func;
+	struct stack_op stack_op;
+	struct insn_state state;
 };
 
 struct objtool_file {
@@ -49,4 +60,7 @@ struct objtool_file {
 
 int check(const char *objname, bool nofp);
 
+#define for_each_insn(file, insn)					\
+	list_for_each_entry(insn, &file->insn_list, list)
+
 #endif /* _CHECK_H */
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index d897702..3fb0747 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -182,20 +182,19 @@ static int read_sections(struct elf *elf)
 			return -1;
 		}
 
-		sec->elf_data = elf_getdata(s, NULL);
-		if (!sec->elf_data) {
+		sec->data = elf_getdata(s, NULL);
+		if (!sec->data) {
 			perror("elf_getdata");
 			return -1;
 		}
 
-		if (sec->elf_data->d_off != 0 ||
-		    sec->elf_data->d_size != sec->sh.sh_size) {
+		if (sec->data->d_off != 0 ||
+		    sec->data->d_size != sec->sh.sh_size) {
 			WARN("unexpected data attributes for %s", sec->name);
 			return -1;
 		}
 
-		sec->data = (unsigned long)sec->elf_data->d_buf;
-		sec->len = sec->elf_data->d_size;
+		sec->len = sec->data->d_size;
 	}
 
 	/* sanity check, one more call to elf_nextscn() should return NULL */
@@ -232,7 +231,7 @@ static int read_symbols(struct elf *elf)
 
 		sym->idx = i;
 
-		if (!gelf_getsym(symtab->elf_data, i, &sym->sym)) {
+		if (!gelf_getsym(symtab->data, i, &sym->sym)) {
 			perror("gelf_getsym");
 			goto err;
 		}
@@ -322,7 +321,7 @@ static int read_relas(struct elf *elf)
 			}
 			memset(rela, 0, sizeof(*rela));
 
-			if (!gelf_getrela(sec->elf_data, i, &rela->rela)) {
+			if (!gelf_getrela(sec->data, i, &rela->rela)) {
 				perror("gelf_getrela");
 				return -1;
 			}
@@ -362,12 +361,6 @@ struct elf *elf_open(const char *name)
 
 	INIT_LIST_HEAD(&elf->sections);
 
-	elf->name = strdup(name);
-	if (!elf->name) {
-		perror("strdup");
-		goto err;
-	}
-
 	elf->fd = open(name, O_RDONLY);
 	if (elf->fd == -1) {
 		perror("open");
@@ -421,8 +414,6 @@ void elf_close(struct elf *elf)
 		list_del(&sec->list);
 		free(sec);
 	}
-	if (elf->name)
-		free(elf->name);
 	if (elf->fd > 0)
 		close(elf->fd);
 	if (elf->elf)
diff --git a/tools/objtool/elf.h b/tools/objtool/elf.h
index 731973e..75e44c7 100644
--- a/tools/objtool/elf.h
+++ b/tools/objtool/elf.h
@@ -37,10 +37,9 @@ struct section {
 	DECLARE_HASHTABLE(rela_hash, 16);
 	struct section *base, *rela;
 	struct symbol *sym;
-	Elf_Data *elf_data;
+	Elf_Data *data;
 	char *name;
 	int idx;
-	unsigned long data;
 	unsigned int len;
 };
 
diff --git a/tools/objtool/special.c b/tools/objtool/special.c
index bff8abb..84f001d 100644
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -91,16 +91,16 @@ static int get_alt_entry(struct elf *elf, struct special_entry *entry,
 	alt->jump_or_nop = entry->jump_or_nop;
 
 	if (alt->group) {
-		alt->orig_len = *(unsigned char *)(sec->data + offset +
+		alt->orig_len = *(unsigned char *)(sec->data->d_buf + offset +
 						   entry->orig_len);
-		alt->new_len = *(unsigned char *)(sec->data + offset +
+		alt->new_len = *(unsigned char *)(sec->data->d_buf + offset +
 						  entry->new_len);
 	}
 
 	if (entry->feature) {
 		unsigned short feature;
 
-		feature = *(unsigned short *)(sec->data + offset +
+		feature = *(unsigned short *)(sec->data->d_buf + offset +
 					      entry->feature);
 
 		/*
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 04/10] objtool: add undwarf debuginfo generation
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
                   ` (2 preceding siblings ...)
  2017-06-01  5:44 ` [RFC PATCH 03/10] objtool: stack validation 2.0 Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-14  8:42   ` Jiri Slaby
  2017-06-01  5:44 ` [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints Josh Poimboeuf
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Now that objtool knows the states of all registers on the stack for each
instruction, it's straightforward to generate debuginfo for an unwinder
to use.

Instead of generating DWARF, generate a new format called undwarf, which
is more suitable for an in-kernel unwinder.  See
tools/objtool/Documentation/undwarf.txt for a more detailed description
of this new debuginfo format and why it's preferable to DWARF.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 tools/objtool/Build                              |   2 +
 tools/objtool/Documentation/stack-validation.txt |  45 +---
 tools/objtool/Documentation/undwarf.txt          |  99 ++++++++
 tools/objtool/builtin-check.c                    |   2 +-
 tools/objtool/builtin-undwarf.c                  |  70 ++++++
 tools/objtool/builtin.h                          |   1 +
 tools/objtool/check.c                            |  57 ++++-
 tools/objtool/check.h                            |   7 +-
 tools/objtool/elf.c                              | 224 +++++++++++++++++
 tools/objtool/elf.h                              |   5 +
 tools/objtool/objtool.c                          |   3 +-
 tools/objtool/undwarf-types.h                    | 100 ++++++++
 tools/objtool/undwarf.c                          | 308 +++++++++++++++++++++++
 tools/objtool/{builtin.h => undwarf.h}           |  19 +-
 14 files changed, 898 insertions(+), 44 deletions(-)
 create mode 100644 tools/objtool/Documentation/undwarf.txt
 create mode 100644 tools/objtool/builtin-undwarf.c
 create mode 100644 tools/objtool/undwarf-types.h
 create mode 100644 tools/objtool/undwarf.c
 copy tools/objtool/{builtin.h => undwarf.h} (64%)

diff --git a/tools/objtool/Build b/tools/objtool/Build
index 6f2e198..845c879 100644
--- a/tools/objtool/Build
+++ b/tools/objtool/Build
@@ -1,6 +1,8 @@
 objtool-y += arch/$(SRCARCH)/
 objtool-y += builtin-check.o
+objtool-y += builtin-undwarf.o
 objtool-y += check.o
+objtool-y += undwarf.o
 objtool-y += elf.o
 objtool-y += special.o
 objtool-y += objtool.o
diff --git a/tools/objtool/Documentation/stack-validation.txt b/tools/objtool/Documentation/stack-validation.txt
index 17c1195..e961971 100644
--- a/tools/objtool/Documentation/stack-validation.txt
+++ b/tools/objtool/Documentation/stack-validation.txt
@@ -11,9 +11,6 @@ analyzes every .o file and ensures the validity of its stack metadata.
 It enforces a set of rules on asm code and C inline assembly code so
 that stack traces can be reliable.
 
-Currently it only checks frame pointer usage, but there are plans to add
-CFI validation for C files and CFI generation for asm files.
-
 For each function, it recursively follows all possible code paths and
 validates the correct frame pointer state at each instruction.
 
@@ -23,6 +20,9 @@ alternative execution paths to a given instruction (or set of
 instructions).  Similarly, it knows how to follow switch statements, for
 which gcc sometimes uses jump tables.
 
+(Objtool also has an 'undwarf generate' subcommand which generates
+debuginfo for the undwarf unwinder.  See undwarf.txt for more details.)
+
 
 Why do we need stack metadata validation?
 -----------------------------------------
@@ -93,37 +93,14 @@ a) More reliable stack traces for frame pointer enabled kernels
        or at the very end of the function after the stack frame has been
        destroyed.  This is an inherent limitation of frame pointers.
 
-b) 100% reliable stack traces for DWARF enabled kernels
-
-   (NOTE: This is not yet implemented)
-
-   As an alternative to frame pointers, DWARF Call Frame Information
-   (CFI) metadata can be used to walk the stack.  Unlike frame pointers,
-   CFI metadata is out of band.  So it doesn't affect runtime
-   performance and it can be reliable even when interrupts or exceptions
-   are involved.
-
-   For C code, gcc automatically generates DWARF CFI metadata.  But for
-   asm code, generating CFI is a tedious manual approach which requires
-   manually placed .cfi assembler macros to be scattered throughout the
-   code.  It's clumsy and very easy to get wrong, and it makes the real
-   code harder to read.
-
-   Stacktool will improve this situation in several ways.  For code
-   which already has CFI annotations, it will validate them.  For code
-   which doesn't have CFI annotations, it will generate them.  So an
-   architecture can opt to strip out all the manual .cfi annotations
-   from their asm code and have objtool generate them instead.
-
-   We might also add a runtime stack validation debug option where we
-   periodically walk the stack from schedule() and/or an NMI to ensure
-   that the stack metadata is sane and that we reach the bottom of the
-   stack.
-
-   So the benefit of objtool here will be that external tooling should
-   always show perfect stack traces.  And the same will be true for
-   kernel warning/oops traces if the architecture has a runtime DWARF
-   unwinder.
+b) Out-of-band debuginfo generation (undwarf)
+
+   As an alternative to frame pointers, undwarf metadata can be used to
+   walk the stack.  Unlike frame pointers, undwarf is out of band.  So
+   it doesn't affect runtime performance and it can be reliable even
+   when interrupts or exceptions are involved.
+
+   For more details, see undwarf.txt.
 
 c) Higher live patching compatibility rate
 
diff --git a/tools/objtool/Documentation/undwarf.txt b/tools/objtool/Documentation/undwarf.txt
new file mode 100644
index 0000000..3c7a6d6
--- /dev/null
+++ b/tools/objtool/Documentation/undwarf.txt
@@ -0,0 +1,99 @@
+Undwarf debuginfo generation
+============================
+
+Overview
+--------
+
+The kernel CONFIG_UNDWARF_UNWINDER option enables objtool generation of
+undwarf debuginfo, which is out-of-band data which is used by the
+in-kernel undwarf unwinder.  It's similar in concept to DWARF CFI
+debuginfo which would be used by a DWARF unwinder.  The difference is
+that the format of the undwarf data is simpler than DWARF, which in turn
+allows the unwinder to be simpler.
+
+Objtool generates the undwarf data by piggybacking on the compile-time
+stack metadata validation work described in stack-validation.txt.  After
+analyzing all the code paths of a .o file, it creates an array of
+'struct undwarf's and writes them to the .undwarf section.
+
+Then at vmlinux link time, the .undwarf section is sorted by the
+sorttable script.  The resulting sorted array of undwarf structs is used
+by the unwinder at runtime to correlate a given text address with its
+stack state.
+
+
+Why not just use DWARF?
+-----------------------
+
+Undwarf has some of the same benefits as DWARF.  Unlike frame pointers,
+the debuginfo is out-of-band. so it has no effect on runtime
+performance.  Another benefit is that it's possible to reliably unwind
+across interrupts and exceptions.
+
+Undwarf debuginfo's advantage over DWARF itself is that it's much
+simpler.  It gets rid of the DWARF CFI state machine and also gets rid
+of the tracking of unnecessary registers.  This allows the unwinder to
+be much simpler, meaning fewer bugs, which is especially important for
+mission critical oops code.
+
+The simpler debuginfo format also enables the unwinder to be relatively
+fast, which is important for perf and lockdep.
+
+The undwarf format does have a few downsides.  The undwarf table takes
+up extra memory -- something in the ballpark of 3-5MB, depending on the
+kernel config.  In the future we may try to rearrange the data to
+compress that a bit.
+
+Another downside is that, as GCC evolves, it's conceivable that the
+undwarf data may end up being *too* simple to describe the state of the
+stack for certain optimizations.  Will we end up having to track the
+state of more registers and eventually end up reinventing DWARF?
+
+I think this is unlikely because GCC seems to save the frame pointer for
+any unusual stack adjustments it does, so I suspect we'll really only
+ever need to keep track of the stack pointer and the frame pointer
+between call frames.  But even if we do end up having to track all the
+registers DWARF tracks, at least we will still control the format, e.g.
+no complex state machines.
+
+
+Why generate undwarf with objtool?
+----------------------------------
+
+It should be possible to generate the undwarf data with a simple tool
+which converts DWARF to undwarf.  However, such a solution would be
+incomplete due to the kernel's extensive use of asm, inline asm, and
+special sections like exception tables.
+
+That could be rectified by manually annotating those special code paths
+using GNU assembler .cfi annotations in .S files, and homegrown
+annotations for inline asm in .c files.  But asm annotations were tried
+in the past and were found to be unmaintainable.  They were often
+incorrect/incomplete and made the code harder to read and keep updated.
+And based on looking at glibc code, annotating inline asm in .c files
+might be even worse.
+
+With compile-time stack metadata validation, objtool already follows all
+the code paths and already has all the information it needs to be able
+to generate undwarf data from scratch.  So it's an easy step to go from
+stack validation to undwarf generation.
+
+Objtool still needs a few annotations, but only in code which does
+unusual things to the stack like entry code.  And even then, far fewer
+annotations are needed than what DWARF would need, so it's much more
+maintainable than DWARF CFI annotations.
+
+So the advantages of using objtool to generate undwarf are that it gives
+more accurate debuginfo, with close to zero annotations.  It also
+insulates the kernel from toolchain bugs which can be very painful to
+deal with in the kernel since it often has to workaround issues in older
+versions of the toolchain for years.
+
+The downside is that the unwinder now becomes dependent on objtool's
+ability to reverse engineer GCC code flows.  If GCC optimizations become
+too complicated for objtool to follow, the undwarf generation might stop
+working or become incomplete.  In such a case we may need to revisit the
+current implementation.  Some possible solutions would be asking GCC to
+make the optimizations more palatable, or having objtool use DWARF as an
+additional input.  (It's worth noting that live patching already has
+such a dependency on objtool.)
diff --git a/tools/objtool/builtin-check.c b/tools/objtool/builtin-check.c
index 365c34e..eedf089 100644
--- a/tools/objtool/builtin-check.c
+++ b/tools/objtool/builtin-check.c
@@ -52,5 +52,5 @@ int cmd_check(int argc, const char **argv)
 
 	objname = argv[0];
 
-	return check(objname, nofp);
+	return check(objname, nofp, false);
 }
diff --git a/tools/objtool/builtin-undwarf.c b/tools/objtool/builtin-undwarf.c
new file mode 100644
index 0000000..900b1e5
--- /dev/null
+++ b/tools/objtool/builtin-undwarf.c
@@ -0,0 +1,70 @@
+/*
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+/*
+ * objtool undwarf:
+ *
+ * This command analyzes a .o file and adds an .undwarf section to it, which is
+ * used by the in-kernel "undwarf" unwinder.
+ *
+ * This command is a superset of "objtool check".
+ */
+
+#include <string.h>
+#include <subcmd/parse-options.h>
+#include "builtin.h"
+#include "check.h"
+
+
+static const char *undwarf_usage[] = {
+	"objtool undwarf generate [<options>] file.o",
+	"objtool undwarf dump file.o",
+	NULL,
+};
+
+extern const struct option check_options[];
+extern bool nofp;
+
+int cmd_undwarf(int argc, const char **argv)
+{
+	const char *objname;
+
+	argc--; argv++;
+	if (!strncmp(argv[0], "gen", 3)) {
+		argc = parse_options(argc, argv, check_options, undwarf_usage, 0);
+		if (argc != 1)
+			usage_with_options(undwarf_usage, check_options);
+
+		objname = argv[0];
+
+		return check(objname, nofp, true);
+
+	}
+
+	if (!strcmp(argv[0], "dump")) {
+		if (argc != 2)
+			usage_with_options(undwarf_usage, check_options);
+
+		objname = argv[1];
+
+		return undwarf_dump(objname);
+	}
+
+	usage_with_options(undwarf_usage, check_options);
+
+	return 0;
+}
diff --git a/tools/objtool/builtin.h b/tools/objtool/builtin.h
index 34d2ba7..0b9722f 100644
--- a/tools/objtool/builtin.h
+++ b/tools/objtool/builtin.h
@@ -18,5 +18,6 @@
 #define _BUILTIN_H
 
 extern int cmd_check(int argc, const char **argv);
+extern int cmd_undwarf(int argc, const char **argv);
 
 #endif /* _BUILTIN_H */
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index e924fd5..ca8f4fc 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -246,12 +246,20 @@ static int decode_instructions(struct objtool_file *file)
 	unsigned long offset;
 	struct instruction *insn;
 	int ret;
+	bool needs_cfi;
 
 	list_for_each_entry(sec, &file->elf->sections, list) {
 
 		if (!(sec->sh.sh_flags & SHF_EXECINSTR))
 			continue;
 
+		if (!strcmp(sec->name, ".altinstr_replacement") ||
+		    !strcmp(sec->name, ".altinstr_aux") ||
+		    !strncmp(sec->name, ".discard.", 9))
+			needs_cfi = false;
+		else
+			needs_cfi = true;
+
 		for (offset = 0; offset < sec->len; offset += insn->len) {
 			insn = malloc(sizeof(*insn));
 			if (!insn) {
@@ -264,6 +272,7 @@ static int decode_instructions(struct objtool_file *file)
 
 			insn->sec = sec;
 			insn->offset = offset;
+			insn->needs_cfi = needs_cfi;
 
 			ret = arch_decode_instruction(file->elf, sec, offset,
 						      sec->len - offset,
@@ -940,6 +949,30 @@ static bool has_valid_stack_frame(struct insn_state *state)
 	return false;
 }
 
+static int update_insn_state_regs(struct instruction *insn, struct insn_state *state)
+{
+	struct cfi_reg *cfa = &state->cfa;
+	struct stack_op *op = &insn->stack_op;
+
+	if (cfa->base != CFI_SP)
+		return 0;
+
+	/* push */
+	if (op->dest.type == OP_DEST_PUSH)
+		cfa->offset += 8;
+
+	/* pop */
+	if (op->src.type == OP_SRC_POP)
+		cfa->offset -= 8;
+
+	/* add immediate to sp */
+	if (op->dest.type == OP_DEST_REG && op->src.type == OP_SRC_ADD &&
+	    op->dest.reg == CFI_SP && op->src.reg == CFI_SP)
+		cfa->offset -= op->src.offset;
+
+	return 0;
+}
+
 static void save_reg(struct insn_state *state, unsigned char reg, int base,
 		     int offset)
 {
@@ -1001,6 +1034,10 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
 		return 0;
 	}
 
+	if (state->type == UNDWARF_TYPE_REGS ||
+	    state->type == UNDWARF_TYPE_REGS_IRET)
+		return update_insn_state_regs(insn, state);
+
 	switch (op->dest.type) {
 
 	case OP_DEST_REG:
@@ -1249,6 +1286,10 @@ static bool insn_state_match(struct instruction *insn, struct insn_state *state)
 			break;
 		}
 
+	} else if (state1->type != state2->type) {
+		WARN_FUNC("stack state mismatch: type1=%d type2=%d",
+			  insn->sec, insn->offset, state1->type, state2->type);
+
 	} else if (state1->drap != state2->drap ||
 		 (state1->drap && state1->drap_reg != state2->drap_reg)) {
 		WARN_FUNC("stack state mismatch: drap1=%d(%d) drap2=%d(%d)",
@@ -1538,7 +1579,7 @@ static void cleanup(struct objtool_file *file)
 	elf_close(file->elf);
 }
 
-int check(const char *_objname, bool _nofp)
+int check(const char *_objname, bool _nofp, bool undwarf)
 {
 	struct objtool_file file;
 	int ret, warnings = 0;
@@ -1581,6 +1622,20 @@ int check(const char *_objname, bool _nofp)
 		warnings += ret;
 	}
 
+	if (undwarf) {
+		ret = create_undwarf(&file);
+		if (ret < 0)
+			goto out;
+
+		ret = create_undwarf_section(&file);
+		if (ret < 0)
+			goto out;
+
+		ret = update_file(&file);
+		if (ret < 0)
+			goto out;
+	}
+
 out:
 	cleanup(&file);
 
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index da85f5b..e56bb1c 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -22,12 +22,14 @@
 #include "elf.h"
 #include "cfi.h"
 #include "arch.h"
+#include "undwarf.h"
 #include <linux/hashtable.h>
 
 struct insn_state {
 	struct cfi_reg cfa;
 	struct cfi_reg regs[CFI_NUM_REGS];
 	int stack_size;
+	unsigned char type;
 	bool bp_scratch;
 	bool drap;
 	int drap_reg;
@@ -41,13 +43,14 @@ struct instruction {
 	unsigned int len;
 	unsigned char type;
 	unsigned long immediate;
-	bool alt_group, visited, dead_end, ignore;
+	bool alt_group, visited, dead_end, ignore, needs_cfi;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;
 	struct list_head alts;
 	struct symbol *func;
 	struct stack_op stack_op;
 	struct insn_state state;
+	struct undwarf undwarf;
 };
 
 struct objtool_file {
@@ -58,7 +61,7 @@ struct objtool_file {
 	bool ignore_unreachables, c_file;
 };
 
-int check(const char *objname, bool nofp);
+int check(const char *objname, bool nofp, bool undwarf);
 
 #define for_each_insn(file, insn)					\
 	list_for_each_entry(insn, &file->insn_list, list)
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 3fb0747..1ca5d9a 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -394,6 +394,230 @@ struct elf *elf_open(const char *name)
 	return NULL;
 }
 
+struct section *elf_create_section(struct elf *elf, const char *name,
+				   size_t entsize, int nr)
+{
+	struct section *sec, *shstrtab;
+	size_t size = entsize * nr;
+	char *buf;
+
+	sec = malloc(sizeof(*sec));
+	if (!sec) {
+		perror("malloc");
+		return NULL;
+	}
+	memset(sec, 0, sizeof(*sec));
+
+	INIT_LIST_HEAD(&sec->symbol_list);
+	INIT_LIST_HEAD(&sec->rela_list);
+	hash_init(sec->rela_hash);
+	hash_init(sec->symbol_hash);
+
+	list_add_tail(&sec->list, &elf->sections);
+
+	sec->name = strdup(name);
+	if (!sec->name) {
+		perror("strdup");
+		return NULL;
+	}
+
+	sec->idx = list_prev_entry(sec, list)->idx + 1;
+	sec->len = size;
+
+	sec->sh.sh_entsize = entsize;
+	sec->sh.sh_size = size;
+
+	sec->data = malloc(sizeof(*sec->data));
+	if (!sec->data) {
+		perror("malloc");
+		return NULL;
+	}
+
+	sec->data->d_buf = NULL;
+	if (size) {
+		sec->data->d_buf = malloc(size);
+		if (!sec->data->d_buf) {
+			perror("malloc");
+			return NULL;
+		}
+	}
+
+	sec->data->d_size = size;
+	sec->data->d_type = ELF_T_BYTE;
+
+	shstrtab = find_section_by_name(elf, ".shstrtab");
+	if (!shstrtab) {
+		WARN("can't find .shstrtab section");
+		return NULL;
+	}
+	size = shstrtab->len + strlen(name) + 1;
+	buf = malloc(size);
+	memcpy(buf, shstrtab->data->d_buf, shstrtab->len);
+	strcpy(buf + shstrtab->len, name);
+	sec->sh.sh_name = shstrtab->len;
+	shstrtab->data->d_buf = buf;
+	shstrtab->data->d_size = shstrtab->len = size;
+
+	return sec;
+}
+
+struct section *elf_create_rela_section(struct elf *elf, struct section *base)
+{
+	char *relaname;
+	struct section *sec;
+
+	relaname = malloc(strlen(base->name) + strlen(".rela") + 1);
+	if (!relaname) {
+		perror("malloc");
+		return NULL;
+	}
+	strcpy(relaname, ".rela");
+	strcat(relaname, base->name);
+
+	sec = elf_create_section(elf, relaname, sizeof(GElf_Rela), 0);
+	if (!sec)
+		return NULL;
+
+	base->rela = sec;
+	sec->base = base;
+
+	sec->sh.sh_type = SHT_RELA;
+	sec->sh.sh_addralign = 8;
+	sec->sh.sh_link = find_section_by_name(elf, ".symtab")->idx;
+	sec->sh.sh_info = base->idx;
+	sec->sh.sh_flags = SHF_INFO_LINK;
+
+	return sec;
+}
+
+int elf_rebuild_rela_section(struct section *sec)
+{
+	struct rela *rela;
+	int nr, index = 0, size;
+	GElf_Rela *relas;
+
+	nr = 0;
+	list_for_each_entry(rela, &sec->rela_list, list)
+		nr++;
+
+	size = nr * sizeof(*relas);
+	relas = malloc(size);
+	if (!relas) {
+		perror("malloc");
+		return -1;
+	}
+
+	sec->data->d_buf = relas;
+	sec->data->d_size = size;
+
+	sec->sh.sh_size = size;
+
+	index = 0;
+	list_for_each_entry(rela, &sec->rela_list, list) {
+		relas[index].r_offset = rela->offset;
+		relas[index].r_addend = rela->addend;
+		relas[index].r_info = GELF_R_INFO(rela->sym->idx, rela->type);
+		index++;
+	}
+
+	return 0;
+}
+
+int elf_write_to_file(struct elf *elf, char *outfile)
+{
+	int fd;
+	struct section *sec;
+	Elf *elfout;
+	GElf_Ehdr eh, ehout;
+	Elf_Scn *scn;
+	Elf_Data *data;
+	GElf_Shdr sh;
+
+	fd = creat(outfile, 0777);
+	if (fd == -1) {
+		perror("creat");
+		return -1;
+	}
+
+	elfout = elf_begin(fd, ELF_C_WRITE, NULL);
+	if (!elfout) {
+		perror("elf_begin");
+		return -1;
+	}
+
+	if (!gelf_newehdr(elfout, gelf_getclass(elf->elf))) {
+		perror("gelf_newehdr");
+		return -1;
+	}
+
+	if (!gelf_getehdr(elfout, &ehout)) {
+		perror("gelf_getehdr");
+		return -1;
+	}
+
+	if (!gelf_getehdr(elf->elf, &eh)) {
+		perror("gelf_getehdr");
+		return -1;
+	}
+
+	memset(&ehout, 0, sizeof(ehout));
+	ehout.e_ident[EI_DATA] = eh.e_ident[EI_DATA];
+	ehout.e_machine = eh.e_machine;
+	ehout.e_type = eh.e_type;
+	ehout.e_version = EV_CURRENT;
+	ehout.e_shstrndx = find_section_by_name(elf, ".shstrtab")->idx;
+
+	list_for_each_entry(sec, &elf->sections, list) {
+		if (sec->idx == 0)
+			continue;
+
+		scn = elf_newscn(elfout);
+		if (!scn) {
+			perror("elf_newscn");
+			return -1;
+		}
+
+		data = elf_newdata(scn);
+		if (!data) {
+			perror("elf_newdata");
+			return -1;
+		}
+
+		if (!elf_flagdata(data, ELF_C_SET, ELF_F_DIRTY)) {
+			perror("elf_flagdata");
+			return -1;
+		}
+
+		data->d_type = sec->data->d_type;
+		data->d_buf = sec->data->d_buf;
+		data->d_size = sec->data->d_size;
+
+		if(!gelf_getshdr(scn, &sh)) {
+			perror("gelf_getshdr");
+			return -1;
+		}
+
+		sh = sec->sh;
+
+		if (!gelf_update_shdr(scn, &sh)) {
+			perror("gelf_update_shdr");
+			return -1;
+		}
+	}
+
+	if (!gelf_update_ehdr(elfout, &ehout)) {
+		perror("gelf_update_ehdr");
+		return -1;
+	}
+
+	if (elf_update(elfout, ELF_C_WRITE) < 0) {
+		perror("elf_update");
+		return -1;
+	}
+
+	return 0;
+}
+
 void elf_close(struct elf *elf)
 {
 	struct section *sec, *tmpsec;
diff --git a/tools/objtool/elf.h b/tools/objtool/elf.h
index 75e44c7..f9d68ea 100644
--- a/tools/objtool/elf.h
+++ b/tools/objtool/elf.h
@@ -83,6 +83,11 @@ struct rela *find_rela_by_dest(struct section *sec, unsigned long offset);
 struct rela *find_rela_by_dest_range(struct section *sec, unsigned long offset,
 				     unsigned int len);
 struct symbol *find_containing_func(struct section *sec, unsigned long offset);
+struct section *elf_create_section(struct elf *elf, const char *name, size_t
+				   entsize, int nr);
+struct section *elf_create_rela_section(struct elf *elf, struct section *base);
+int elf_rebuild_rela_section(struct section *sec);
+int elf_write_to_file(struct elf *elf, char *outfile);
 void elf_close(struct elf *elf);
 
 
diff --git a/tools/objtool/objtool.c b/tools/objtool/objtool.c
index ecc5b1b..b2051d1 100644
--- a/tools/objtool/objtool.c
+++ b/tools/objtool/objtool.c
@@ -42,10 +42,11 @@ struct cmd_struct {
 };
 
 static const char objtool_usage_string[] =
-	"objtool [OPTIONS] COMMAND [ARGS]";
+	"objtool COMMAND [ARGS]";
 
 static struct cmd_struct objtool_cmds[] = {
 	{"check",	cmd_check,	"Perform stack metadata validation on an object file" },
+	{"undwarf",	cmd_undwarf,	"Generate in-place undwarf metadata for an object file" },
 };
 
 bool help;
diff --git a/tools/objtool/undwarf-types.h b/tools/objtool/undwarf-types.h
new file mode 100644
index 0000000..b624188
--- /dev/null
+++ b/tools/objtool/undwarf-types.h
@@ -0,0 +1,100 @@
+/*
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _UNDWARF_TYPES_H
+#define _UNDWARF_TYPES_H
+
+/*
+ * The UNDWARF_REG_* registers are base registers which are used to find other
+ * registers on the stack.
+ *
+ * The CFA (call frame address) is the value of the stack pointer on the
+ * previous frame, i.e. the caller's SP before it called the callee.
+ *
+ * The CFA is usually based on SP, unless a frame pointer has been saved, in
+ * which case it's based on BP.
+ *
+ * BP is usually either based on CFA or is undefined (meaning its value didn't
+ * change for the current frame).
+ *
+ * So the CFA base is usually either SP or BP, and the FP base is usually either
+ * CFA or undefined.  The rest of the base registers are needed for special
+ * cases like entry code and gcc aligned stacks.
+ */
+#define UNDWARF_REG_UNDEFINED		0
+#define UNDWARF_REG_CFA			1
+#define UNDWARF_REG_DX			2
+#define UNDWARF_REG_DI			3
+#define UNDWARF_REG_BP			4
+#define UNDWARF_REG_SP			5
+#define UNDWARF_REG_R10			6
+#define UNDWARF_REG_R13			7
+#define UNDWARF_REG_BP_INDIRECT		8
+#define UNDWARF_REG_SP_INDIRECT		9
+#define UNDWARF_REG_MAX			15
+
+/*
+ * UNDWARF_TYPE_CFA: Indicates that cfa_reg+cfa_offset points to the caller's
+ * stack pointer (aka the CFA in DWARF terms).  Used for all callable
+ * functions, i.e.  all C code and all callable asm functions.
+ *
+ * UNDWARF_TYPE_REGS: Used in entry code to indicate that cfa_reg+cfa_offset
+ * points to a fully populated pt_regs from a syscall, interrupt, or exception.
+ *
+ * UNDWARF_TYPE_REGS_IRET: Used in entry code to indicate that
+ * cfa_reg+cfa_offset points to the iret return frame.
+ *
+ * The CFI_HINT macros are only used for the undwarf_cfi_hints struct.  They
+ * are not used for the undwarf struct due to size and complexity constraints.
+ */
+#define UNDWARF_TYPE_CFA		0
+#define UNDWARF_TYPE_REGS		1
+#define UNDWARF_TYPE_REGS_IRET		2
+#define CFI_HINT_TYPE_SAVE		3
+#define CFI_HINT_TYPE_RESTORE		4
+
+#ifndef __ASSEMBLY__
+/*
+ * This struct contains a simplified version of the DWARF Call Frame
+ * Information standard.  It contains only the necessary parts of the real
+ * DWARF, simplified for ease of access by the in-kernel unwinder.  It tells
+ * the unwinder how to find the previous SP and BP (and sometimes entry regs)
+ * on the stack, given a code address (IP).
+ */
+struct undwarf {
+	int ip;
+	unsigned int len;
+	short cfa_offset;
+	short bp_offset;
+	unsigned cfa_reg:4;
+	unsigned bp_reg:4;
+	unsigned type:2;
+};
+
+/*
+ * This struct is used by asm and inline asm code to manually annotate the
+ * location of registers on the stack for the undwarf unwinder.
+ */
+struct undwarf_cfi_hint {
+	unsigned int ip;
+	short cfa_offset;
+	unsigned char cfa_reg;
+	unsigned char type;
+};
+#endif /* __ASSEMBLY__ */
+
+#endif /* _UNDWARF_TYPES_H */
diff --git a/tools/objtool/undwarf.c b/tools/objtool/undwarf.c
new file mode 100644
index 0000000..68c026f
--- /dev/null
+++ b/tools/objtool/undwarf.c
@@ -0,0 +1,308 @@
+/*
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+
+#include "undwarf.h"
+#include "check.h"
+#include "warn.h"
+
+int create_undwarf(struct objtool_file *file)
+{
+	struct instruction *insn;
+
+	for_each_insn(file, insn) {
+		struct undwarf *undwarf = &insn->undwarf;
+		struct cfi_reg *cfa = &insn->state.cfa;
+		struct cfi_reg *bp = &insn->state.regs[CFI_BP];
+
+		if (cfa->base == CFI_UNDEFINED) {
+			undwarf->cfa_reg = UNDWARF_REG_UNDEFINED;
+			continue;
+		}
+
+		switch (cfa->base) {
+		case CFI_SP:
+			undwarf->cfa_reg = UNDWARF_REG_SP;
+			break;
+		case CFI_SP_INDIRECT:
+			undwarf->cfa_reg = UNDWARF_REG_SP_INDIRECT;
+			break;
+		case CFI_BP:
+			undwarf->cfa_reg = UNDWARF_REG_BP;
+			break;
+		case CFI_BP_INDIRECT:
+			undwarf->cfa_reg = UNDWARF_REG_BP_INDIRECT;
+			break;
+		case CFI_R10:
+			undwarf->cfa_reg = UNDWARF_REG_R10;
+			break;
+		case CFI_R13:
+			undwarf->cfa_reg = UNDWARF_REG_R10;
+			break;
+		case CFI_DI:
+			undwarf->cfa_reg = UNDWARF_REG_DI;
+			break;
+		case CFI_DX:
+			undwarf->cfa_reg = UNDWARF_REG_DX;
+			break;
+		default:
+			WARN_FUNC("unknown CFA base reg %d",
+				  insn->sec, insn->offset, cfa->base);
+			return -1;
+		}
+
+		switch(bp->base) {
+		case CFI_UNDEFINED:
+			undwarf->bp_reg = UNDWARF_REG_UNDEFINED;
+			break;
+		case CFI_CFA:
+			undwarf->bp_reg = UNDWARF_REG_CFA;
+			break;
+		case CFI_BP:
+			undwarf->bp_reg = UNDWARF_REG_BP;
+			break;
+		default:
+			WARN_FUNC("unknown BP base reg %d",
+				  insn->sec, insn->offset, bp->base);
+			return -1;
+		}
+
+		undwarf->cfa_offset = cfa->offset;
+		undwarf->bp_offset = bp->offset;
+		undwarf->type = insn->state.type;
+	}
+
+	return 0;
+}
+
+int create_undwarf_section(struct objtool_file *file)
+{
+	struct instruction *insn, *prev_insn = NULL;
+	struct section *sec, *relasec;
+	struct rela *rela;
+	unsigned int index, nr = 0;
+	struct undwarf *undwarf = NULL;
+
+	sec = find_section_by_name(file->elf, ".undwarf");
+	if (sec) {
+		WARN("file already has .undwarf section, skipping");
+		return -1;
+	}
+
+	/* count number of needed undwarves */
+	for_each_insn(file, insn) {
+		if (insn->needs_cfi &&
+		    (!prev_insn || prev_insn->sec != insn->sec ||
+		     memcmp(&insn->undwarf, &prev_insn->undwarf,
+			    sizeof(struct undwarf)))) {
+			nr++;
+		}
+		prev_insn = insn;
+	}
+
+	if (!nr)
+		return 0;
+
+	/* create .undwarf and .rela.undwarf sections */
+	sec = elf_create_section(file->elf, ".undwarf",
+				 sizeof(struct undwarf), nr);
+
+	sec->sh.sh_type = SHT_PROGBITS;
+	sec->sh.sh_addralign = 1;
+	sec->sh.sh_flags = SHF_ALLOC;
+
+	relasec = elf_create_rela_section(file->elf, sec);
+	if (!relasec)
+		return -1;
+
+	/* populate sections */
+	index = 0;
+	prev_insn = NULL;
+	for_each_insn(file, insn) {
+		if (insn->needs_cfi &&
+		    (!prev_insn || prev_insn->sec != insn->sec ||
+		     memcmp(&insn->undwarf, &prev_insn->undwarf,
+			    sizeof(struct undwarf)))) {
+
+#if 0
+			printf("%s:%lx: cfa:%d+%d bp:%d+%d type:%d\n",
+			       insn->sec->name, insn->offset, insn->undwarf.cfa_reg,
+			       insn->undwarf.cfa_offset, insn->undwarf.fp_reg,
+			       insn->undwarf.fp_offset, insn->undwarf.type);
+#endif
+
+			undwarf = (struct undwarf *)sec->data->d_buf + index;
+
+			memcpy(undwarf, &insn->undwarf, sizeof(*undwarf));
+			undwarf->len = insn->len;
+
+			/* add rela for undwarf->ip */
+			rela = malloc(sizeof(*rela));
+			if (!rela) {
+				perror("malloc");
+				return -1;
+			}
+			memset(rela, 0, sizeof(*rela));
+
+			rela->sym = insn->sec->sym;
+			rela->addend = insn->offset;
+			rela->type = R_X86_64_PC32;
+			rela->offset = index * sizeof(struct undwarf);
+
+			list_add_tail(&rela->list, &relasec->rela_list);
+			hash_add(relasec->rela_hash, &rela->hash, rela->offset);
+
+			index++;
+
+		} else if (insn->needs_cfi) {
+			undwarf->len += insn->len;
+		}
+		prev_insn = insn;
+	}
+
+	if (elf_rebuild_rela_section(relasec))
+		return -1;
+
+	return 0;
+}
+
+int update_file(struct objtool_file *file)
+{
+	char *outfile;
+	int ret;
+
+	outfile = malloc(strlen(objname) + strlen(".undwarf") + 1);
+	if (!outfile) {
+		perror("malloc");
+		return -1;
+	}
+
+	strcpy(outfile, objname);
+	strcat(outfile, ".undwarf");
+	ret = elf_write_to_file(file->elf, outfile);
+	if (ret < 0)
+		return -1;
+
+	if (rename(outfile, objname) < 0) {
+		WARN("can't rename file");
+		perror("rename");
+		return -1;
+	}
+
+	free(outfile);
+
+	return 0;
+}
+
+static const char *reg_name(unsigned int reg)
+{
+	switch (reg) {
+	case UNDWARF_REG_CFA:
+		return "cfa";
+	case UNDWARF_REG_DX:
+		return "dx";
+	case UNDWARF_REG_DI:
+		return "di";
+	case UNDWARF_REG_BP:
+		return "bp";
+	case UNDWARF_REG_SP:
+		return "sp";
+	case UNDWARF_REG_R10:
+		return "r10";
+	case UNDWARF_REG_R13:
+		return "r13";
+	case UNDWARF_REG_BP_INDIRECT:
+		return "bp(ind)";
+	case UNDWARF_REG_SP_INDIRECT:
+		return "sp(ind)";
+	default:
+		return "?";
+	}
+}
+
+static const char *undwarf_type_name(unsigned int type)
+{
+	switch (type) {
+	case UNDWARF_TYPE_CFA:
+		return "cfa";
+	case UNDWARF_TYPE_REGS:
+		return "regs";
+	case UNDWARF_TYPE_REGS_IRET:
+		return "iret";
+	default:
+		return "?";
+	}
+}
+
+static void print_reg(unsigned int reg, int offset)
+{
+	if (reg == UNDWARF_REG_BP_INDIRECT)
+		printf("(bp%+d)", offset);
+	else if (reg == UNDWARF_REG_SP_INDIRECT)
+		printf("(sp%+d)", offset);
+	else if (reg == UNDWARF_REG_UNDEFINED)
+		printf("(und)");
+	else
+		printf("%s%+d", reg_name(reg), offset);
+}
+
+int undwarf_dump(const char *_objname)
+{
+	struct elf *elf;
+	struct section *sec;
+	struct rela *rela;
+	struct undwarf *undwarf;
+	int nr, i;
+
+	objname = _objname;
+
+	elf = elf_open(objname);
+	if (!elf) {
+		WARN("error reading elf file %s\n", objname);
+		return 1;
+	}
+
+	sec = find_section_by_name(elf, ".undwarf");
+	if (!sec || !sec->rela)
+		return 0;
+
+	nr = sec->len / sizeof(*undwarf);
+	for (i = 0; i < nr; i++) {
+		undwarf = (struct undwarf *)sec->data->d_buf + i;
+
+		rela = find_rela_by_dest(sec, i * sizeof(*undwarf));
+		if (!rela) {
+			WARN("can't find rela for undwarf[%d]\n", i);
+			return 1;
+		}
+
+		printf("%s+%x: len:%u cfa:",
+		       rela->sym->name, rela->addend, undwarf->len);
+
+		print_reg(undwarf->cfa_reg, undwarf->cfa_offset);
+
+		printf(" bp:");
+
+		print_reg(undwarf->bp_reg, undwarf->bp_offset);
+
+		printf(" type:%s\n", undwarf_type_name(undwarf->type));
+	}
+
+	return 0;
+}
diff --git a/tools/objtool/builtin.h b/tools/objtool/undwarf.h
similarity index 64%
copy from tools/objtool/builtin.h
copy to tools/objtool/undwarf.h
index 34d2ba7..76346e7 100644
--- a/tools/objtool/builtin.h
+++ b/tools/objtool/undwarf.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2015 Josh Poimboeuf <jpoimboe@redhat.com>
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -14,9 +14,18 @@
  * You should have received a copy of the GNU General Public License
  * along with this program; if not, see <http://www.gnu.org/licenses/>.
  */
-#ifndef _BUILTIN_H
-#define _BUILTIN_H
 
-extern int cmd_check(int argc, const char **argv);
+#ifndef _UNDWARF_H
+#define _UNDWARF_H
 
-#endif /* _BUILTIN_H */
+#include "undwarf-types.h"
+
+struct objtool_file;
+
+int create_undwarf(struct objtool_file *file);
+int create_undwarf_section(struct objtool_file *file);
+int update_file(struct objtool_file *file);
+
+int undwarf_dump(const char *objname);
+
+#endif /* _UNDWARF_H */
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
                   ` (3 preceding siblings ...)
  2017-06-01  5:44 ` [RFC PATCH 04/10] objtool: add undwarf debuginfo generation Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-01 13:57   ` Andy Lutomirski
  2017-06-01  5:44 ` [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations Josh Poimboeuf
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Some asm (and inline asm) code does special things to the stack which
objtool can't understand.  (Nor can GCC or GNU assembler, for that
matter.)  In such cases we need a facility for the code to provide
annotations, so the unwinder can unwind through it.

This provides such a facility, in the form of CFI hints.  They're
similar to the GNU assembler .cfi* directives, but they give more
information, and are needed in far fewer places, because objtool can
fill in the blanks by following branches and adjusting the stack pointer
for pushes and pops.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/undwarf-types.h | 100 ++++++++++++++++++++
 arch/x86/include/asm/undwarf.h       |  97 +++++++++++++++++++
 tools/objtool/Makefile               |   3 +
 tools/objtool/check.c                | 176 ++++++++++++++++++++++++++++++++++-
 tools/objtool/check.h                |   4 +-
 5 files changed, 373 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/include/asm/undwarf-types.h
 create mode 100644 arch/x86/include/asm/undwarf.h

diff --git a/arch/x86/include/asm/undwarf-types.h b/arch/x86/include/asm/undwarf-types.h
new file mode 100644
index 0000000..b624188
--- /dev/null
+++ b/arch/x86/include/asm/undwarf-types.h
@@ -0,0 +1,100 @@
+/*
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _UNDWARF_TYPES_H
+#define _UNDWARF_TYPES_H
+
+/*
+ * The UNDWARF_REG_* registers are base registers which are used to find other
+ * registers on the stack.
+ *
+ * The CFA (call frame address) is the value of the stack pointer on the
+ * previous frame, i.e. the caller's SP before it called the callee.
+ *
+ * The CFA is usually based on SP, unless a frame pointer has been saved, in
+ * which case it's based on BP.
+ *
+ * BP is usually either based on CFA or is undefined (meaning its value didn't
+ * change for the current frame).
+ *
+ * So the CFA base is usually either SP or BP, and the FP base is usually either
+ * CFA or undefined.  The rest of the base registers are needed for special
+ * cases like entry code and gcc aligned stacks.
+ */
+#define UNDWARF_REG_UNDEFINED		0
+#define UNDWARF_REG_CFA			1
+#define UNDWARF_REG_DX			2
+#define UNDWARF_REG_DI			3
+#define UNDWARF_REG_BP			4
+#define UNDWARF_REG_SP			5
+#define UNDWARF_REG_R10			6
+#define UNDWARF_REG_R13			7
+#define UNDWARF_REG_BP_INDIRECT		8
+#define UNDWARF_REG_SP_INDIRECT		9
+#define UNDWARF_REG_MAX			15
+
+/*
+ * UNDWARF_TYPE_CFA: Indicates that cfa_reg+cfa_offset points to the caller's
+ * stack pointer (aka the CFA in DWARF terms).  Used for all callable
+ * functions, i.e.  all C code and all callable asm functions.
+ *
+ * UNDWARF_TYPE_REGS: Used in entry code to indicate that cfa_reg+cfa_offset
+ * points to a fully populated pt_regs from a syscall, interrupt, or exception.
+ *
+ * UNDWARF_TYPE_REGS_IRET: Used in entry code to indicate that
+ * cfa_reg+cfa_offset points to the iret return frame.
+ *
+ * The CFI_HINT macros are only used for the undwarf_cfi_hints struct.  They
+ * are not used for the undwarf struct due to size and complexity constraints.
+ */
+#define UNDWARF_TYPE_CFA		0
+#define UNDWARF_TYPE_REGS		1
+#define UNDWARF_TYPE_REGS_IRET		2
+#define CFI_HINT_TYPE_SAVE		3
+#define CFI_HINT_TYPE_RESTORE		4
+
+#ifndef __ASSEMBLY__
+/*
+ * This struct contains a simplified version of the DWARF Call Frame
+ * Information standard.  It contains only the necessary parts of the real
+ * DWARF, simplified for ease of access by the in-kernel unwinder.  It tells
+ * the unwinder how to find the previous SP and BP (and sometimes entry regs)
+ * on the stack, given a code address (IP).
+ */
+struct undwarf {
+	int ip;
+	unsigned int len;
+	short cfa_offset;
+	short bp_offset;
+	unsigned cfa_reg:4;
+	unsigned bp_reg:4;
+	unsigned type:2;
+};
+
+/*
+ * This struct is used by asm and inline asm code to manually annotate the
+ * location of registers on the stack for the undwarf unwinder.
+ */
+struct undwarf_cfi_hint {
+	unsigned int ip;
+	short cfa_offset;
+	unsigned char cfa_reg;
+	unsigned char type;
+};
+#endif /* __ASSEMBLY__ */
+
+#endif /* _UNDWARF_TYPES_H */
diff --git a/arch/x86/include/asm/undwarf.h b/arch/x86/include/asm/undwarf.h
new file mode 100644
index 0000000..e763cf9
--- /dev/null
+++ b/arch/x86/include/asm/undwarf.h
@@ -0,0 +1,97 @@
+#ifndef _ASM_X86_UNDWARF_H
+#define _ASM_X86_UNDWARF_H
+
+#include "undwarf-types.h"
+
+#ifdef __ASSEMBLY__
+
+/*
+ * In asm, there are two kinds of code: normal C-type callable functions and
+ * the rest.  The normal callable functions can be called by other code, and
+ * don't do anything unusual with the stack.  Such normal callable functions
+ * are annotated with the ENTRY/ENDPROC macros.  Most asm code falls in this
+ * category.  In this case, no special debugging annotations are needed because
+ * objtool can automatically generate the .undwarf section which the undwarf
+ * unwinder reads at runtime.
+ *
+ * Anything which doesn't fall into the above category, such as syscall and
+ * interrupt handlers, tends to not be called directly by other functions, and
+ * often does unusual non-C-function-type things with the stack pointer.  Such
+ * code needs to be annotated such that objtool can understand it.  The
+ * following CFI hint macros are for this type of code.
+ *
+ * These macros provide hints to objtool about the state of the stack at each
+ * instruction.  Objtool starts from the hints and follows the code flow,
+ * making automatic CFI adjustments when it sees pushes and pops, adjusting the
+ * debuginfo as necessary.  It will also warn if it sees any inconsistencies.
+ */
+.macro CFI_HINT cfa_reg=UNDWARF_REG_SP cfa_offset=0 type=UNDWARF_TYPE_CFA
+#ifdef CONFIG_STACK_VALIDATION
+999:
+	.pushsection .discard.undwarf_cfi_hints
+		/* struct undwarf_cfi_hint */
+		.long 999b - .		/* ip		*/
+		.short \cfa_offset	/* cfa_offset	*/
+		.byte \cfa_reg		/* cfa_reg */
+		.byte \type		/* type */
+	.popsection
+#endif
+.endm
+
+.macro CFI_EMPTY
+	CFI_HINT cfa_reg=UNDWARF_REG_UNDEFINED
+.endm
+
+.macro CFI_REGS base=rsp offset=0 indirect=0 extra=1 iret=0
+	.if \base == rsp && \indirect
+		.set cfa_reg, UNDWARF_REG_SP_INDIRECT
+	.elseif \base == rsp
+		.set cfa_reg, UNDWARF_REG_SP
+	.elseif \base == rbp
+		.set cfa_reg, UNDWARF_REG_BP
+	.elseif \base == rdi
+		.set cfa_reg, UNDWARF_REG_DI
+	.elseif \base == rdx
+		.set cfa_reg, UNDWARF_REG_DX
+	.else
+		.error "CFI_REGS: bad base register"
+	.endif
+
+	.if \iret
+		.set type, UNDWARF_TYPE_REGS_IRET
+	.else
+		.set type, UNDWARF_TYPE_REGS
+	.endif
+
+	CFI_HINT cfa_reg=cfa_reg cfa_offset=\offset type=type
+.endm
+
+.macro CFI_IRET_REGS base=rsp offset=0
+	CFI_REGS base=\base offset=\offset iret=1
+.endm
+
+.macro CFI_FUNC cfa_offset=8
+	CFI_HINT cfa_offset=\cfa_offset
+.endm
+
+#else /* !__ASSEMBLY__ */
+
+#define CFI_HINT(cfa_reg, cfa_offset, type)			\
+	"999: \n\t"						\
+	".pushsection .discard.undwarf_cfi_hints\n\t"		\
+	/* struct undwarf_cfi_hint */				\
+	".long 999b - .\n\t"					\
+	".short " __stringify(cfa_offset) "\n\t"		\
+	".byte " __stringify(cfa_reg) "\n\t"			\
+	".byte " __stringify(type) "\n\t"			\
+	".popsection\n\t"
+
+#define CFI_SAVE CFI_HINT(0, 0, CFI_HINT_TYPE_SAVE)
+
+#define CFI_RESTORE CFI_HINT(0, 0, CFI_HINT_TYPE_RESTORE)
+
+
+#endif /* __ASSEMBLY__ */
+
+
+#endif /* _ASM_X86_UNDWARF_H */
diff --git a/tools/objtool/Makefile b/tools/objtool/Makefile
index 0e2765e..7997d5c 100644
--- a/tools/objtool/Makefile
+++ b/tools/objtool/Makefile
@@ -52,6 +52,9 @@ $(OBJTOOL): $(LIBSUBCMD) $(OBJTOOL_IN)
 	diff -I'^#include' arch/x86/insn/inat.h ../../arch/x86/include/asm/inat.h >/dev/null && \
 	diff -I'^#include' arch/x86/insn/inat_types.h ../../arch/x86/include/asm/inat_types.h >/dev/null) \
 	|| echo "warning: objtool: x86 instruction decoder differs from kernel" >&2 )) || true
+	@(test -d ../../kernel -a -d ../../tools -a -d ../objtool && (( \
+	diff ../../arch/x86/include/asm/undwarf-types.h undwarf-types.h >/dev/null) \
+	|| echo "warning: objtool: undwarf-types.h differs from kernel" >&2 )) || true
 	$(QUIET_LINK)$(CC) $(OBJTOOL_IN) $(LDFLAGS) -o $@
 
 
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index ca8f4fc..a409a4d 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -876,6 +876,99 @@ static int add_switch_table_alts(struct objtool_file *file)
 	return 0;
 }
 
+static int read_cfi_hints(struct objtool_file *file)
+{
+	struct section *sec, *relasec;
+	struct rela *rela;
+	struct undwarf_cfi_hint *hint;
+	struct instruction *insn;
+	struct cfi_reg *cfa;
+	int i;
+
+	sec = find_section_by_name(file->elf, ".discard.undwarf_cfi_hints");
+	if (!sec)
+		return 0;
+
+	relasec = sec->rela;
+	if (!relasec) {
+		WARN("missing .rela.discard.undwarf_cfi_hints section");
+		return -1;
+	}
+
+	if (sec->len % sizeof(struct undwarf_cfi_hint)) {
+		WARN("struct undwarf_cfi_hint size mismatch");
+		return -1;
+	}
+
+	file->hints = true;
+
+	for (i = 0; i < sec->len / sizeof(struct undwarf_cfi_hint); i++) {
+		hint = (struct undwarf_cfi_hint *)sec->data->d_buf + i;
+
+		rela = find_rela_by_dest(sec, i * sizeof(*hint));
+		if (!rela) {
+			WARN("can't find rela for undwarf_cfi_hints[%d]", i);
+			return -1;
+		}
+
+		insn = find_insn(file, rela->sym->sec, rela->addend);
+		if (!insn) {
+			WARN("can't find insn for undwarf_cfi_hints[%d]", i);
+			return -1;
+		}
+
+		cfa = &insn->state.cfa;
+
+		if (hint->type == CFI_HINT_TYPE_SAVE) {
+			insn->save = true;
+			continue;
+
+		} else if (hint->type == CFI_HINT_TYPE_RESTORE) {
+			insn->restore = true;
+			insn->hint = true;
+			continue;
+		}
+
+		insn->hint = true;
+
+		switch (hint->cfa_reg) {
+		case UNDWARF_REG_UNDEFINED:
+			cfa->base = CFI_UNDEFINED;
+			break;
+		case UNDWARF_REG_SP:
+			cfa->base = CFI_SP;
+			break;
+		case UNDWARF_REG_BP:
+			cfa->base = CFI_BP;
+			break;
+		case UNDWARF_REG_SP_INDIRECT:
+			cfa->base = CFI_SP_INDIRECT;
+			break;
+		case UNDWARF_REG_R10:
+			cfa->base = CFI_R10;
+			break;
+		case UNDWARF_REG_R13:
+			cfa->base = CFI_R13;
+			break;
+		case UNDWARF_REG_DI:
+			cfa->base = CFI_DI;
+			break;
+		case UNDWARF_REG_DX:
+			cfa->base = CFI_DX;
+			break;
+		default:
+			WARN_FUNC("unsupported undwarf_cfi_hint cfa base reg %d",
+				  insn->sec, insn->offset, hint->cfa_reg);
+			return -1;
+		}
+
+		cfa->offset = hint->cfa_offset;
+		insn->state.type = hint->type;
+	}
+
+	return 0;
+}
+
 static int decode_sections(struct objtool_file *file)
 {
 	int ret;
@@ -906,6 +999,10 @@ static int decode_sections(struct objtool_file *file)
 	if (ret)
 		return ret;
 
+	ret = read_cfi_hints(file);
+	if (ret)
+		return ret;
+
 	return 0;
 }
 
@@ -1313,7 +1410,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 			   struct insn_state state)
 {
 	struct alternative *alt;
-	struct instruction *insn;
+	struct instruction *insn, *next_insn;
 	struct section *sec;
 	struct symbol *func = NULL;
 	int ret;
@@ -1328,6 +1425,8 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 	}
 
 	while (1) {
+		next_insn = next_insn_same_sec(file, insn);
+
 		if (file->c_file && insn->func) {
 			if (func && func != insn->func) {
 				WARN("%s() falls through to next function %s()",
@@ -1339,13 +1438,54 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 		func = insn->func;
 
 		if (insn->visited) {
-			if (!!insn_state_match(insn, &state))
+			if (!insn->hint && !insn_state_match(insn, &state))
 				return 1;
 
 			return 0;
 		}
 
-		insn->state = state;
+		if (insn->hint) {
+			if (insn->restore) {
+				struct instruction *save_insn, *i;
+
+				i = insn;
+				save_insn = NULL;
+				func_for_each_insn_continue_reverse(file, func, i) {
+					if (i->save) {
+						save_insn = i;
+						break;
+					}
+				}
+
+				if (!save_insn) {
+					WARN_FUNC("no corresponding CFI save for CFI restore",
+						  sec, insn->offset);
+					return 1;
+				}
+
+				if (!save_insn->visited) {
+					/*
+					 * Oops, no state to copy yet.
+					 * Hopefully we can reach this
+					 * instruction from another branch
+					 * after the save insn has been
+					 * visited.
+					 */
+					if (insn == first)
+						return 0;
+
+					WARN_FUNC("objtool isn't smart enough to handle this CFI save/restore combo",
+						  sec, insn->offset);
+					return 1;
+				}
+
+				insn->state = save_insn->state;
+			}
+
+			state = insn->state;
+
+		} else
+			insn->state = state;
 
 		insn->visited = true;
 
@@ -1420,7 +1560,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 			return 0;
 
 		case INSN_CONTEXT_SWITCH:
-			if (func) {
+			if (func && (!next_insn || !next_insn->hint)) {
 				WARN_FUNC("unsupported instruction in callable function",
 					  sec, insn->offset);
 				return 1;
@@ -1440,7 +1580,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 		if (insn->dead_end)
 			return 0;
 
-		insn = next_insn_same_sec(file, insn);
+		insn = next_insn;
 		if (!insn) {
 			WARN("%s: unexpected end of section", sec->name);
 			return 1;
@@ -1450,6 +1590,27 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 	return 0;
 }
 
+static int validate_cfi_hints(struct objtool_file *file)
+{
+	struct instruction *insn;
+	int ret, warnings = 0;
+	struct insn_state state;
+
+	if (!file->hints)
+		return 0;
+
+	clear_insn_state(&state);
+
+	for_each_insn(file, insn) {
+		if (insn->hint && !insn->visited) {
+			ret = validate_branch(file, insn, state);
+			warnings += ret;
+		}
+	}
+
+	return warnings;
+}
+
 static bool is_kasan_insn(struct instruction *insn)
 {
 	return (insn->type == INSN_CALL &&
@@ -1615,6 +1776,11 @@ int check(const char *_objname, bool _nofp, bool undwarf)
 		goto out;
 	warnings += ret;
 
+	ret = validate_cfi_hints(&file);
+	if (ret < 0)
+		goto out;
+	warnings += ret;
+
 	if (!warnings) {
 		ret = validate_reachable_instructions(&file);
 		if (ret < 0)
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index e56bb1c..d711336 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -43,7 +43,7 @@ struct instruction {
 	unsigned int len;
 	unsigned char type;
 	unsigned long immediate;
-	bool alt_group, visited, dead_end, ignore, needs_cfi;
+	bool alt_group, visited, dead_end, ignore, needs_cfi, hint, save, restore;
 	struct symbol *call_dest;
 	struct instruction *jump_dest;
 	struct list_head alts;
@@ -58,7 +58,7 @@ struct objtool_file {
 	struct list_head insn_list;
 	DECLARE_HASHTABLE(insn_hash, 16);
 	struct section *rodata, *whitelist;
-	bool ignore_unreachables, c_file;
+	bool ignore_unreachables, c_file, hints;
 };
 
 int check(const char *objname, bool nofp, bool undwarf);
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
                   ` (4 preceding siblings ...)
  2017-06-01  5:44 ` [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-01 14:03   ` Andy Lutomirski
  2017-06-01  5:44 ` [RFC PATCH 07/10] x86/asm: add CFI hint annotations to sync_core() Josh Poimboeuf
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Add CFI hint undwarf annotations to entry_64.S.  This will enable the
undwarf unwinder to unwind through any location in the entry code
including syscalls, interrupts, and exceptions.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/entry/Makefile   |  1 -
 arch/x86/entry/calling.h  |  5 +++++
 arch/x86/entry/entry_64.S | 56 ++++++++++++++++++++++++++++++++++++++++++-----
 3 files changed, 55 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
index 9976fce..af28a8a 100644
--- a/arch/x86/entry/Makefile
+++ b/arch/x86/entry/Makefile
@@ -2,7 +2,6 @@
 # Makefile for the x86 low level entry code
 #
 
-OBJECT_FILES_NON_STANDARD_entry_$(BITS).o   := y
 OBJECT_FILES_NON_STANDARD_entry_64_compat.o := y
 
 CFLAGS_syscall_64.o		+= $(call cc-option,-Wno-override-init,)
diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 05ed3d3..bbec02e 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -1,4 +1,6 @@
 #include <linux/jump_label.h>
+#include <asm/undwarf.h>
+
 
 /*
 
@@ -112,6 +114,7 @@ For 32-bit we have the following conventions - kernel is built with
 	movq %rdx, 12*8+\offset(%rsp)
 	movq %rsi, 13*8+\offset(%rsp)
 	movq %rdi, 14*8+\offset(%rsp)
+	CFI_REGS offset=\offset extra=0
 	.endm
 	.macro SAVE_C_REGS offset=0
 	SAVE_C_REGS_HELPER \offset, 1, 1, 1, 1
@@ -136,6 +139,7 @@ For 32-bit we have the following conventions - kernel is built with
 	movq %r12, 3*8+\offset(%rsp)
 	movq %rbp, 4*8+\offset(%rsp)
 	movq %rbx, 5*8+\offset(%rsp)
+	CFI_REGS offset=\offset
 	.endm
 
 	.macro RESTORE_EXTRA_REGS offset=0
@@ -145,6 +149,7 @@ For 32-bit we have the following conventions - kernel is built with
 	movq 3*8+\offset(%rsp), %r12
 	movq 4*8+\offset(%rsp), %rbp
 	movq 5*8+\offset(%rsp), %rbx
+	CFI_REGS offset=\offset extra=0
 	.endm
 
 	.macro RESTORE_C_REGS_HELPER rstor_rax=1, rstor_rcx=1, rstor_r11=1, rstor_r8910=1, rstor_rdx=1
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 4a4c083..d280cbe 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -36,6 +36,7 @@
 #include <asm/smap.h>
 #include <asm/pgtable_types.h>
 #include <asm/export.h>
+#include <asm/frame.h>
 #include <linux/err.h>
 
 .code64
@@ -43,9 +44,10 @@
 
 #ifdef CONFIG_PARAVIRT
 ENTRY(native_usergs_sysret64)
+	CFI_EMPTY
 	swapgs
 	sysretq
-ENDPROC(native_usergs_sysret64)
+END(native_usergs_sysret64)
 #endif /* CONFIG_PARAVIRT */
 
 .macro TRACE_IRQS_IRETQ
@@ -134,6 +136,7 @@ ENDPROC(native_usergs_sysret64)
  */
 
 ENTRY(entry_SYSCALL_64)
+	CFI_EMPTY
 	/*
 	 * Interrupts are off on entry.
 	 * We do not frame this tiny irq-off block with TRACE_IRQS_OFF/ON,
@@ -169,6 +172,7 @@ GLOBAL(entry_SYSCALL_64_after_swapgs)
 	pushq	%r10				/* pt_regs->r10 */
 	pushq	%r11				/* pt_regs->r11 */
 	sub	$(6*8), %rsp			/* pt_regs->bp, bx, r12-15 not saved */
+	CFI_REGS extra=0
 
 	/*
 	 * If we need to do entry work or if we guess we'll need to do
@@ -223,6 +227,7 @@ entry_SYSCALL_64_fastpath:
 	movq	EFLAGS(%rsp), %r11
 	RESTORE_C_REGS_EXCEPT_RCX_R11
 	movq	RSP(%rsp), %rsp
+	CFI_EMPTY
 	USERGS_SYSRET64
 
 1:
@@ -315,6 +320,7 @@ syscall_return_via_sysret:
 	/* rcx and r11 are already restored (see code above) */
 	RESTORE_C_REGS_EXCEPT_RCX_R11
 	movq	RSP(%rsp), %rsp
+	CFI_EMPTY
 	USERGS_SYSRET64
 
 opportunistic_sysret_failed:
@@ -342,6 +348,7 @@ ENTRY(stub_ptregs_64)
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF
 	popq	%rax
+	CFI_REGS extra=0
 	jmp	entry_SYSCALL64_slow_path
 
 1:
@@ -350,6 +357,7 @@ END(stub_ptregs_64)
 
 .macro ptregs_stub func
 ENTRY(ptregs_\func)
+	CFI_FUNC
 	leaq	\func(%rip), %rax
 	jmp	stub_ptregs_64
 END(ptregs_\func)
@@ -366,6 +374,7 @@ END(ptregs_\func)
  * %rsi: next task
  */
 ENTRY(__switch_to_asm)
+	CFI_FUNC
 	/*
 	 * Save callee-saved registers
 	 * This must match the order in inactive_task_frame
@@ -405,6 +414,7 @@ END(__switch_to_asm)
  * r12: kernel thread arg
  */
 ENTRY(ret_from_fork)
+	CFI_EMPTY
 	movq	%rax, %rdi
 	call	schedule_tail			/* rdi: 'prev' task parameter */
 
@@ -414,6 +424,7 @@ ENTRY(ret_from_fork)
 2:
 	movq	%rsp, %rdi
 	call	syscall_return_slowpath	/* returns with IRQs disabled */
+	CFI_REGS
 	TRACE_IRQS_ON			/* user mode is traced as IRQS on */
 	SWAPGS
 	jmp	restore_regs_and_iret
@@ -439,10 +450,11 @@ END(ret_from_fork)
 ENTRY(irq_entries_start)
     vector=FIRST_EXTERNAL_VECTOR
     .rept (FIRST_SYSTEM_VECTOR - FIRST_EXTERNAL_VECTOR)
+	CFI_IRET_REGS
 	pushq	$(~vector+0x80)			/* Note: always in signed byte range */
-    vector=vector+1
 	jmp	common_interrupt
 	.align	8
+	vector=vector+1
     .endr
 END(irq_entries_start)
 
@@ -494,7 +506,9 @@ END(irq_entries_start)
 	movq	%rsp, %rdi
 	incl	PER_CPU_VAR(irq_count)
 	cmovzq	PER_CPU_VAR(irq_stack_ptr), %rsp
+	CFI_REGS base=rdi
 	pushq	%rdi
+	CFI_REGS indirect=1
 	/* We entered an interrupt context - irqs are off: */
 	TRACE_IRQS_OFF
 
@@ -518,6 +532,7 @@ ret_from_intr:
 
 	/* Restore saved previous stack */
 	popq	%rsp
+	CFI_REGS
 
 	testb	$3, CS(%rsp)
 	jz	retint_kernel
@@ -560,6 +575,7 @@ restore_c_regs_and_iret:
 	INTERRUPT_RETURN
 
 ENTRY(native_iret)
+	CFI_IRET_REGS
 	/*
 	 * Are we returning to a stack segment from the LDT?  Note: in
 	 * 64-bit mode SS:RSP on the exception stack is always valid.
@@ -632,6 +648,7 @@ native_irq_return_ldt:
 	orq	PER_CPU_VAR(espfix_stack), %rax
 	SWAPGS
 	movq	%rax, %rsp
+	CFI_IRET_REGS offset=8
 
 	/*
 	 * At this point, we cannot write to the stack any more, but we can
@@ -653,6 +670,7 @@ END(common_interrupt)
  */
 .macro apicinterrupt3 num sym do_sym
 ENTRY(\sym)
+	CFI_IRET_REGS
 	ASM_CLAC
 	pushq	$~(\num)
 .Lcommon_\sym:
@@ -738,6 +756,8 @@ apicinterrupt IRQ_WORK_VECTOR			irq_work_interrupt		smp_irq_work_interrupt
 
 .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
 ENTRY(\sym)
+	CFI_IRET_REGS offset=8
+
 	/* Sanity check */
 	.if \shift_ist != -1 && \paranoid == 0
 	.error "using shift_ist requires paranoid=1"
@@ -761,6 +781,7 @@ ENTRY(\sym)
 	.else
 	call	error_entry
 	.endif
+	CFI_REGS
 	/* returned flag: ebx=0: need swapgs on exit, ebx=1: don't need it */
 
 	.if \paranoid
@@ -858,6 +879,7 @@ idtentry simd_coprocessor_error		do_simd_coprocessor_error	has_error_code=0
 	 * edi:  new selector
 	 */
 ENTRY(native_load_gs_index)
+	FRAME_BEGIN
 	pushfq
 	DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
 	SWAPGS
@@ -866,8 +888,9 @@ ENTRY(native_load_gs_index)
 2:	ALTERNATIVE "", "mfence", X86_BUG_SWAPGS_FENCE
 	SWAPGS
 	popfq
+	FRAME_END
 	ret
-END(native_load_gs_index)
+ENDPROC(native_load_gs_index)
 EXPORT_SYMBOL(native_load_gs_index)
 
 	_ASM_EXTABLE(.Lgs_change, bad_gs)
@@ -897,7 +920,7 @@ ENTRY(do_softirq_own_stack)
 	leaveq
 	decl	PER_CPU_VAR(irq_count)
 	ret
-END(do_softirq_own_stack)
+ENDPROC(do_softirq_own_stack)
 
 #ifdef CONFIG_XEN
 idtentry xen_hypervisor_callback xen_do_hypervisor_callback has_error_code=0
@@ -921,13 +944,18 @@ ENTRY(xen_do_hypervisor_callback)		/* do_hypervisor_callback(struct *pt_regs) */
  * Since we don't modify %rdi, evtchn_do_upall(struct *pt_regs) will
  * see the correct pointer to the pt_regs
  */
+	CFI_FUNC
 	movq	%rdi, %rsp			/* we don't return, adjust the stack frame */
+	CFI_REGS
 11:	incl	PER_CPU_VAR(irq_count)
 	movq	%rsp, %rbp
 	cmovzq	PER_CPU_VAR(irq_stack_ptr), %rsp
+	CFI_REGS base=rbp
 	pushq	%rbp				/* frame pointer backlink */
+	CFI_REGS indirect=1
 	call	xen_evtchn_do_upcall
 	popq	%rsp
+	CFI_REGS
 	decl	PER_CPU_VAR(irq_count)
 #ifndef CONFIG_PREEMPT
 	call	xen_maybe_preempt_hcall
@@ -949,6 +977,7 @@ END(xen_do_hypervisor_callback)
  * with its current contents: any discrepancy means we in category 1.
  */
 ENTRY(xen_failsafe_callback)
+	CFI_EMPTY
 	movl	%ds, %ecx
 	cmpw	%cx, 0x10(%rsp)
 	jne	1f
@@ -968,11 +997,13 @@ ENTRY(xen_failsafe_callback)
 	pushq	$0				/* RIP */
 	pushq	%r11
 	pushq	%rcx
+	CFI_IRET_REGS offset=8
 	jmp	general_protection
 1:	/* Segment mismatch => Category 1 (Bad segment). Retry the IRET. */
 	movq	(%rsp), %rcx
 	movq	8(%rsp), %r11
 	addq	$0x30, %rsp
+	CFI_IRET_REGS
 	pushq	$-1 /* orig_ax = -1 => not a system call */
 	ALLOC_PT_GPREGS_ON_STACK
 	SAVE_C_REGS
@@ -1018,6 +1049,7 @@ idtentry machine_check					has_error_code=0	paranoid=1 do_sym=*machine_check_vec
  * Return: ebx=0: need swapgs on exit, ebx=1: otherwise
  */
 ENTRY(paranoid_entry)
+	CFI_FUNC
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
@@ -1045,6 +1077,7 @@ END(paranoid_entry)
  * On entry, ebx is "no swapgs" flag (1: don't need swapgs, 0: need it)
  */
 ENTRY(paranoid_exit)
+	CFI_REGS
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF_DEBUG
 	testl	%ebx, %ebx			/* swapgs needed? */
@@ -1066,6 +1099,7 @@ END(paranoid_exit)
  * Return: EBX=0: came from user mode; EBX=1: otherwise
  */
 ENTRY(error_entry)
+	CFI_FUNC
 	cld
 	SAVE_C_REGS 8
 	SAVE_EXTRA_REGS 8
@@ -1150,6 +1184,7 @@ END(error_entry)
  *   0: user gsbase is loaded, we need SWAPGS and standard preparation for return to usermode
  */
 ENTRY(error_exit)
+	CFI_REGS
 	DISABLE_INTERRUPTS(CLBR_ANY)
 	TRACE_IRQS_OFF
 	testl	%ebx, %ebx
@@ -1159,6 +1194,7 @@ END(error_exit)
 
 /* Runs on exception stack */
 ENTRY(nmi)
+	CFI_IRET_REGS
 	/*
 	 * Fix up the exception frame if we're on Xen.
 	 * PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
@@ -1230,11 +1266,13 @@ ENTRY(nmi)
 	cld
 	movq	%rsp, %rdx
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rsp
+	CFI_IRET_REGS base=rdx offset=8
 	pushq	5*8(%rdx)	/* pt_regs->ss */
 	pushq	4*8(%rdx)	/* pt_regs->rsp */
 	pushq	3*8(%rdx)	/* pt_regs->flags */
 	pushq	2*8(%rdx)	/* pt_regs->cs */
 	pushq	1*8(%rdx)	/* pt_regs->rip */
+	CFI_IRET_REGS
 	pushq   $-1		/* pt_regs->orig_ax */
 	pushq   %rdi		/* pt_regs->di */
 	pushq   %rsi		/* pt_regs->si */
@@ -1251,6 +1289,7 @@ ENTRY(nmi)
 	pushq	%r13		/* pt_regs->r13 */
 	pushq	%r14		/* pt_regs->r14 */
 	pushq	%r15		/* pt_regs->r15 */
+	CFI_REGS
 	ENCODE_FRAME_POINTER
 
 	/*
@@ -1405,6 +1444,7 @@ first_nmi:
 	.rept 5
 	pushq	11*8(%rsp)
 	.endr
+	CFI_IRET_REGS
 
 	/* Everything up to here is safe from nested NMIs */
 
@@ -1420,6 +1460,7 @@ first_nmi:
 	pushq	$__KERNEL_CS	/* CS */
 	pushq	$1f		/* RIP */
 	INTERRUPT_RETURN	/* continues at repeat_nmi below */
+	CFI_IRET_REGS
 1:
 #endif
 
@@ -1469,6 +1510,7 @@ end_repeat_nmi:
 	 * exceptions might do.
 	 */
 	call	paranoid_entry
+	CFI_REGS
 
 	/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
 	movq	%rsp, %rdi
@@ -1506,17 +1548,19 @@ nmi_restore:
 END(nmi)
 
 ENTRY(ignore_sysret)
+	CFI_EMPTY
 	mov	$-ENOSYS, %eax
 	sysret
 END(ignore_sysret)
 
 ENTRY(rewind_stack_do_exit)
+	CFI_FUNC
 	/* Prevent any naive code from trying to unwind to our caller. */
 	xorl	%ebp, %ebp
 
 	movq	PER_CPU_VAR(cpu_current_top_of_stack), %rax
-	leaq	-TOP_OF_KERNEL_STACK_PADDING-PTREGS_SIZE(%rax), %rsp
+	leaq	-PTREGS_SIZE(%rax), %rsp
+	CFI_FUNC cfa_offset=PTREGS_SIZE
 
 	call	do_exit
-1:	jmp 1b
 END(rewind_stack_do_exit)
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 07/10] x86/asm: add CFI hint annotations to sync_core()
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
                   ` (5 preceding siblings ...)
  2017-06-01  5:44 ` [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-01  5:44 ` [RFC PATCH 08/10] extable: rename 'sortextable' script to 'sorttable' Josh Poimboeuf
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

This enables the unwinder to grok the iret in the middle of a C
function.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/include/asm/processor.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 3cada99..9b90129 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -22,6 +22,7 @@ struct vm86;
 #include <asm/nops.h>
 #include <asm/special_insns.h>
 #include <asm/fpu/types.h>
+#include <asm/undwarf.h>
 
 #include <linux/personality.h>
 #include <linux/cache.h>
@@ -676,6 +677,7 @@ static inline void sync_core(void)
 	unsigned int tmp;
 
 	asm volatile (
+		CFI_SAVE
 		"mov %%ss, %0\n\t"
 		"pushq %q0\n\t"
 		"pushq %%rsp\n\t"
@@ -685,6 +687,7 @@ static inline void sync_core(void)
 		"pushq %q0\n\t"
 		"pushq $1f\n\t"
 		"iretq\n\t"
+		CFI_RESTORE
 		"1:"
 		: "=&r" (tmp), "+r" (__sp) : : "cc", "memory");
 #endif
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 08/10] extable: rename 'sortextable' script to 'sorttable'
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
                   ` (6 preceding siblings ...)
  2017-06-01  5:44 ` [RFC PATCH 07/10] x86/asm: add CFI hint annotations to sync_core() Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-01  5:44 ` [RFC PATCH 09/10] extable: add undwarf table sorting ability to sorttable script Josh Poimboeuf
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Soon it will be used to sort the undwarf table as well.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 Documentation/dontdiff                 | 2 +-
 scripts/.gitignore                     | 2 +-
 scripts/Makefile                       | 4 ++--
 scripts/link-vmlinux.sh                | 2 +-
 scripts/{sortextable.c => sorttable.c} | 8 ++++----
 scripts/{sortextable.h => sorttable.h} | 2 +-
 6 files changed, 10 insertions(+), 10 deletions(-)
 rename scripts/{sortextable.c => sorttable.c} (98%)
 rename scripts/{sortextable.h => sorttable.h} (99%)

diff --git a/Documentation/dontdiff b/Documentation/dontdiff
index 77b9222..b270d98 100644
--- a/Documentation/dontdiff
+++ b/Documentation/dontdiff
@@ -217,7 +217,7 @@ series
 setup
 setup.bin
 setup.elf
-sortextable
+sorttable
 sImage
 sm_tbl*
 split-include
diff --git a/scripts/.gitignore b/scripts/.gitignore
index e063daa..14b108e 100644
--- a/scripts/.gitignore
+++ b/scripts/.gitignore
@@ -9,7 +9,7 @@ ihex2fw
 recordmcount
 docproc
 check-lc_ctype
-sortextable
+sorttable
 asn1_compiler
 extract-cert
 sign-file
diff --git a/scripts/Makefile b/scripts/Makefile
index 1d80897..a7b700f 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -15,13 +15,13 @@ hostprogs-$(CONFIG_KALLSYMS)     += kallsyms
 hostprogs-$(CONFIG_LOGO)         += pnmtologo
 hostprogs-$(CONFIG_VT)           += conmakehash
 hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
-hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sortextable
+hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sorttable
 hostprogs-$(CONFIG_ASN1)	 += asn1_compiler
 hostprogs-$(CONFIG_MODULE_SIG)	 += sign-file
 hostprogs-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
 hostprogs-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
 
-HOSTCFLAGS_sortextable.o = -I$(srctree)/tools/include
+HOSTCFLAGS_sorttable.o = -I$(srctree)/tools/include
 HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
 HOSTLOADLIBES_sign-file = -lcrypto
 HOSTLOADLIBES_extract-cert = -lcrypto
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index c802913..18dd369 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -154,7 +154,7 @@ mksysmap()
 
 sortextable()
 {
-	${objtree}/scripts/sortextable ${1}
+	${objtree}/scripts/sorttable ${1}
 }
 
 # Delete output files in case of error
diff --git a/scripts/sortextable.c b/scripts/sorttable.c
similarity index 98%
rename from scripts/sortextable.c
rename to scripts/sorttable.c
index 365a907..17324dd 100644
--- a/scripts/sortextable.c
+++ b/scripts/sorttable.c
@@ -1,5 +1,5 @@
 /*
- * sortextable.c: Sort the kernel's exception table
+ * sorttable.c: Sort the kernel's exception table
  *
  * Copyright 2011 - 2012 Cavium, Inc.
  *
@@ -193,9 +193,9 @@ static inline unsigned int get_secindex(unsigned int shndx,
 }
 
 /* 32 bit and 64 bit are very similar */
-#include "sortextable.h"
+#include "sorttable.h"
 #define SORTEXTABLE_64
-#include "sortextable.h"
+#include "sorttable.h"
 
 static int compare_relative_table(const void *a, const void *b)
 {
@@ -367,7 +367,7 @@ main(int argc, char *argv[])
 	int i;
 
 	if (argc < 2) {
-		fprintf(stderr, "usage: sortextable vmlinux...\n");
+		fprintf(stderr, "usage: sorttable vmlinux...\n");
 		return 0;
 	}
 
diff --git a/scripts/sortextable.h b/scripts/sorttable.h
similarity index 99%
rename from scripts/sortextable.h
rename to scripts/sorttable.h
index ba87004..0de9488 100644
--- a/scripts/sortextable.h
+++ b/scripts/sorttable.h
@@ -1,5 +1,5 @@
 /*
- * sortextable.h
+ * sorttable.h
  *
  * Copyright 2011 - 2012 Cavium, Inc.
  *
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 09/10] extable: add undwarf table sorting ability to sorttable script
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
                   ` (7 preceding siblings ...)
  2017-06-01  5:44 ` [RFC PATCH 08/10] extable: rename 'sortextable' script to 'sorttable' Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-01  5:44 ` [RFC PATCH 10/10] x86/unwind: add undwarf unwinder Josh Poimboeuf
  2017-06-01  6:08 ` [RFC PATCH 00/10] x86: " Ingo Molnar
  10 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

The undwarf table needs to be sorted at vmlinux link time, just like the
extable.  Extend sorttable's functionality to do so.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 init/Kconfig            |   4 ++
 scripts/Makefile        |   2 +-
 scripts/link-vmlinux.sh |   7 +-
 scripts/sorttable.c     | 178 +++++++++++++++++++++++++-----------------------
 scripts/sorttable.h     |  71 ++++++++++---------
 5 files changed, 142 insertions(+), 120 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index 1d3475f..4c096f0 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -25,6 +25,10 @@ config IRQ_WORK
 
 config BUILDTIME_EXTABLE_SORT
 	bool
+	select SORTTABLE
+
+config SORTTABLE
+	bool
 
 config THREAD_INFO_IN_TASK
 	bool
diff --git a/scripts/Makefile b/scripts/Makefile
index a7b700f..99c05de 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -15,7 +15,7 @@ hostprogs-$(CONFIG_KALLSYMS)     += kallsyms
 hostprogs-$(CONFIG_LOGO)         += pnmtologo
 hostprogs-$(CONFIG_VT)           += conmakehash
 hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
-hostprogs-$(CONFIG_BUILDTIME_EXTABLE_SORT) += sorttable
+hostprogs-$(CONFIG_SORTTABLE)    += sorttable
 hostprogs-$(CONFIG_ASN1)	 += asn1_compiler
 hostprogs-$(CONFIG_MODULE_SIG)	 += sign-file
 hostprogs-$(CONFIG_SYSTEM_TRUSTED_KEYRING) += extract-cert
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 18dd369..f4eb9dc 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -154,7 +154,12 @@ mksysmap()
 
 sortextable()
 {
-	${objtree}/scripts/sorttable ${1}
+	${objtree}/scripts/sorttable ${1} extable
+}
+
+sortundwarf()
+{
+	${objtree}/scripts/sorttable ${1} undwarf
 }
 
 # Delete output files in case of error
diff --git a/scripts/sorttable.c b/scripts/sorttable.c
index 17324dd..299db227 100644
--- a/scripts/sorttable.c
+++ b/scripts/sorttable.c
@@ -1,5 +1,5 @@
 /*
- * sorttable.c: Sort the kernel's exception table
+ * sorttable.c: Sort vmlinux tables
  *
  * Copyright 2011 - 2012 Cavium, Inc.
  *
@@ -51,11 +51,10 @@
 #define EM_ARCV2	195
 #endif
 
-static int fd_map;	/* File descriptor for file being modified. */
-static int mmap_failed; /* Boolean flag. */
+static int fd_map = -1;	/* File descriptor for file being modified. */
+static int mmap_succeeded; /* Boolean flag. */
 static void *ehdr_curr; /* current ElfXX_Ehdr *  for resource cleanup */
 static struct stat sb;	/* Remember .st_size, etc. */
-static jmp_buf jmpenv;	/* setjmp/longjmp per-file error escape */
 
 /* setjmp() return values */
 enum {
@@ -64,20 +63,19 @@ enum {
 	SJ_SUCCEED
 };
 
+enum sectype {
+	SEC_TYPE_EXTABLE,
+	SEC_TYPE_UNDWARF,
+};
+
 /* Per-file resource cleanup when multiple files. */
 static void
 cleanup(void)
 {
-	if (!mmap_failed)
+	if (mmap_succeeded)
 		munmap(ehdr_curr, sb.st_size);
-	close(fd_map);
-}
-
-static void __attribute__((noreturn))
-fail_file(void)
-{
-	cleanup();
-	longjmp(jmpenv, SJ_FAIL);
+	if (fd_map >= 0)
+		close(fd_map);
 }
 
 /*
@@ -93,19 +91,20 @@ static void *mmap_file(char const *fname)
 	fd_map = open(fname, O_RDWR);
 	if (fd_map < 0 || fstat(fd_map, &sb) < 0) {
 		perror(fname);
-		fail_file();
+		return NULL;
 	}
 	if (!S_ISREG(sb.st_mode)) {
 		fprintf(stderr, "not a regular file: %s\n", fname);
-		fail_file();
+		return NULL;
 	}
 	addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED,
 		    fd_map, 0);
 	if (addr == MAP_FAILED) {
-		mmap_failed = 1;
 		fprintf(stderr, "Could not mmap file: %s\n", fname);
-		fail_file();
+		return NULL;
 	}
+	mmap_succeeded = 1;
+
 	return addr;
 }
 
@@ -166,7 +165,7 @@ static void (*w8)(uint64_t, uint64_t *);
 static void (*w)(uint32_t, uint32_t *);
 static void (*w2)(uint16_t, uint16_t *);
 
-typedef void (*table_sort_t)(char *, int);
+typedef void (*table_sort_t)(char *, size_t, size_t);
 
 /*
  * Move reserved section indices SHN_LORESERVE..SHN_HIRESERVE out of
@@ -194,7 +193,7 @@ static inline unsigned int get_secindex(unsigned int shndx,
 
 /* 32 bit and 64 bit are very similar */
 #include "sorttable.h"
-#define SORTEXTABLE_64
+#define SORTTABLE_64
 #include "sorttable.h"
 
 static int compare_relative_table(const void *a, const void *b)
@@ -209,36 +208,33 @@ static int compare_relative_table(const void *a, const void *b)
 	return 0;
 }
 
-static void x86_sort_relative_table(char *extab_image, int image_size)
+static void sort_relative_extable(char *image, size_t image_size, size_t entsize)
 {
 	int i;
 
+	/*
+	 * Do the same thing the runtime sort does, first normalize to
+	 * being relative to the start of the section.
+	 */
 	i = 0;
 	while (i < image_size) {
-		uint32_t *loc = (uint32_t *)(extab_image + i);
-
+		uint32_t *loc = (uint32_t *)(image + i);
 		w(r(loc) + i, loc);
-		w(r(loc + 1) + i + 4, loc + 1);
-		w(r(loc + 2) + i + 8, loc + 2);
-
-		i += sizeof(uint32_t) * 3;
+		i += 4;
 	}
 
-	qsort(extab_image, image_size / 12, 12, compare_relative_table);
+	qsort(image, image_size / entsize, entsize, compare_relative_table);
 
+	/* Now denormalize. */
 	i = 0;
 	while (i < image_size) {
-		uint32_t *loc = (uint32_t *)(extab_image + i);
-
+		uint32_t *loc = (uint32_t *)(image + i);
 		w(r(loc) - i, loc);
-		w(r(loc + 1) - (i + 4), loc + 1);
-		w(r(loc + 2) - (i + 8), loc + 2);
-
-		i += sizeof(uint32_t) * 3;
+		i += 4;
 	}
 }
 
-static void sort_relative_table(char *extab_image, int image_size)
+static void sort_undwarf_table(char *image, size_t image_size, size_t entsize)
 {
 	int i;
 
@@ -248,34 +244,39 @@ static void sort_relative_table(char *extab_image, int image_size)
 	 */
 	i = 0;
 	while (i < image_size) {
-		uint32_t *loc = (uint32_t *)(extab_image + i);
+		uint32_t *loc = (uint32_t *)(image + i);
 		w(r(loc) + i, loc);
-		i += 4;
+		i += entsize;
 	}
 
-	qsort(extab_image, image_size / 8, 8, compare_relative_table);
+	qsort(image, image_size / entsize, entsize, compare_relative_table);
 
 	/* Now denormalize. */
 	i = 0;
 	while (i < image_size) {
-		uint32_t *loc = (uint32_t *)(extab_image + i);
+		uint32_t *loc = (uint32_t *)(image + i);
 		w(r(loc) - i, loc);
-		i += 4;
+		i += entsize;
 	}
 }
 
-static void
-do_file(char const *const fname)
+static int do_file(char const *const fname, enum sectype sectype)
 {
 	table_sort_t custom_sort;
-	Elf32_Ehdr *ehdr = mmap_file(fname);
+	Elf32_Ehdr *ehdr;
+	const char *secname, *sort_needed_var;
+	size_t entsize_32, entsize_64;
+
+	ehdr = mmap_file(fname);
+	if (!ehdr)
+		return -1;
 
 	ehdr_curr = ehdr;
 	switch (ehdr->e_ident[EI_DATA]) {
 	default:
 		fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
 			ehdr->e_ident[EI_DATA], fname);
-		fail_file();
+		return -1;
 		break;
 	case ELFDATA2LSB:
 		r = rle;
@@ -298,7 +299,7 @@ do_file(char const *const fname)
 	||  (r2(&ehdr->e_type) != ET_EXEC && r2(&ehdr->e_type) != ET_DYN)
 	||  ehdr->e_ident[EI_VERSION] != EV_CURRENT) {
 		fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
-		fail_file();
+		return -1;
 	}
 
 	custom_sort = NULL;
@@ -306,11 +307,13 @@ do_file(char const *const fname)
 	default:
 		fprintf(stderr, "unrecognized e_machine %d %s\n",
 			r2(&ehdr->e_machine), fname);
-		fail_file();
-		break;
+		return -1;
 	case EM_386:
 	case EM_X86_64:
-		custom_sort = x86_sort_relative_table;
+		if (sectype == SEC_TYPE_EXTABLE) {
+			custom_sort = sort_relative_extable;
+			entsize_32 = entsize_64 = 12;
+		}
 		break;
 
 	case EM_S390:
@@ -318,7 +321,10 @@ do_file(char const *const fname)
 	case EM_PARISC:
 	case EM_PPC:
 	case EM_PPC64:
-		custom_sort = sort_relative_table;
+		if (sectype == SEC_TYPE_EXTABLE) {
+			custom_sort = sort_relative_extable;
+			entsize_32 = entsize_64 = 8;
+		}
 		break;
 	case EM_ARCOMPACT:
 	case EM_ARCV2:
@@ -326,23 +332,38 @@ do_file(char const *const fname)
 	case EM_MICROBLAZE:
 	case EM_MIPS:
 	case EM_XTENSA:
+		entsize_32 = 8;
+		entsize_64 = 16;
 		break;
 	}  /* end switch */
 
+	switch (sectype) {
+	case SEC_TYPE_EXTABLE:
+		secname = "__ex_table";
+		sort_needed_var = "main_extable_sort_needed";
+		break;
+	case SEC_TYPE_UNDWARF:
+		secname = ".undwarf";
+		custom_sort = sort_undwarf_table;
+		entsize_32 = entsize_64 = 16;
+		sort_needed_var = NULL;
+		break;
+	}
+
 	switch (ehdr->e_ident[EI_CLASS]) {
 	default:
 		fprintf(stderr, "unrecognized ELF class %d %s\n",
 			ehdr->e_ident[EI_CLASS], fname);
-		fail_file();
-		break;
+		return -1;
 	case ELFCLASS32:
 		if (r2(&ehdr->e_ehsize) != sizeof(Elf32_Ehdr)
 		||  r2(&ehdr->e_shentsize) != sizeof(Elf32_Shdr)) {
 			fprintf(stderr,
 				"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
-			fail_file();
+			return -1;
 		}
-		do32(ehdr, fname, custom_sort);
+		if (do32(ehdr, fname, secname, entsize_32, custom_sort, sort_needed_var))
+			return -1;
 		break;
 	case ELFCLASS64: {
 		Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
@@ -350,51 +371,40 @@ do_file(char const *const fname)
 		||  r2(&ghdr->e_shentsize) != sizeof(Elf64_Shdr)) {
 			fprintf(stderr,
 				"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
-			fail_file();
+			return -1;
 		}
-		do64(ghdr, fname, custom_sort);
+		if (do64(ghdr, fname, secname, entsize_64, custom_sort, sort_needed_var))
+			return -1;
 		break;
 	}
 	}  /* end switch */
 
 	cleanup();
+
+	return 0;
 }
 
 int
 main(int argc, char *argv[])
 {
-	int n_error = 0;  /* gcc-4.3.0 false positive complaint */
-	int i;
+	char *file;
+	enum sectype sectype;
 
-	if (argc < 2) {
-		fprintf(stderr, "usage: sorttable vmlinux...\n");
-		return 0;
+	if (argc != 3) {
+		fprintf(stderr, "usage: sorttable <object file> <extable|undwarf>\n");
+		return -1;
 	}
 
-	/* Process each file in turn, allowing deep failure. */
-	for (i = 1; i < argc; i++) {
-		char *file = argv[i];
-		int const sjval = setjmp(jmpenv);
-
-		switch (sjval) {
-		default:
-			fprintf(stderr, "internal error: %s\n", file);
-			exit(1);
-			break;
-		case SJ_SETJMP:    /* normal sequence */
-			/* Avoid problems if early cleanup() */
-			fd_map = -1;
-			ehdr_curr = NULL;
-			mmap_failed = 1;
-			do_file(file);
-			break;
-		case SJ_FAIL:    /* error in do_file or below */
-			++n_error;
-			break;
-		case SJ_SUCCEED:    /* premature success */
-			/* do nothing */
-			break;
-		}  /* end switch */
+	file = argv[1];
+
+	if (!strcmp(argv[2], "extable"))
+		sectype = SEC_TYPE_EXTABLE;
+	else if (!strcmp(argv[2], "undwarf"))
+		sectype = SEC_TYPE_UNDWARF;
+	else  {
+		fprintf(stderr, "unsupported section type %s\n", argv[2]);
+		return -1;
 	}
-	return !!n_error;
+
+	return do_file(file, sectype);
 }
diff --git a/scripts/sorttable.h b/scripts/sorttable.h
index 0de9488..68f7200 100644
--- a/scripts/sorttable.h
+++ b/scripts/sorttable.h
@@ -1,5 +1,5 @@
 /*
- * sorttable.h
+ * sortextable.h
  *
  * Copyright 2011 - 2012 Cavium, Inc.
  *
@@ -13,7 +13,7 @@
  */
 
 #undef extable_ent_size
-#undef compare_extable
+#undef generic_compare
 #undef do_func
 #undef Elf_Addr
 #undef Elf_Ehdr
@@ -33,9 +33,8 @@
 #undef _r
 #undef _w
 
-#ifdef SORTEXTABLE_64
-# define extable_ent_size	16
-# define compare_extable	compare_extable_64
+#ifdef SORTTABLE_64
+# define generic_compare	generic_compare_64
 # define do_func		do64
 # define Elf_Addr		Elf64_Addr
 # define Elf_Ehdr		Elf64_Ehdr
@@ -55,8 +54,7 @@
 # define _r			r8
 # define _w			w8
 #else
-# define extable_ent_size	8
-# define compare_extable	compare_extable_32
+# define generic_compare	generic_compare_32
 # define do_func		do32
 # define Elf_Addr		Elf32_Addr
 # define Elf_Ehdr		Elf32_Ehdr
@@ -77,7 +75,7 @@
 # define _w			w
 #endif
 
-static int compare_extable(const void *a, const void *b)
+static int generic_compare(const void *a, const void *b)
 {
 	Elf_Addr av = _r(a);
 	Elf_Addr bv = _r(b);
@@ -89,14 +87,16 @@ static int compare_extable(const void *a, const void *b)
 	return 0;
 }
 
-static void
-do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
+static int
+do_func(Elf_Ehdr *ehdr, char const *const fname, char const *const secname,
+	size_t entsize, table_sort_t custom_sort,
+	char const *const sort_needed_var)
 {
 	Elf_Shdr *shdr;
 	Elf_Shdr *shstrtab_sec;
 	Elf_Shdr *strtab_sec = NULL;
 	Elf_Shdr *symtab_sec = NULL;
-	Elf_Shdr *extab_sec = NULL;
+	Elf_Shdr *table_sec = NULL;
 	Elf_Sym *sym;
 	const Elf_Sym *symtab;
 	Elf32_Word *symtab_shndx_start = NULL;
@@ -107,8 +107,8 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
 	uint32_t *sort_done_location;
 	const char *secstrtab;
 	const char *strtab;
-	char *extab_image;
-	int extab_index = 0;
+	char *table_image;
+	int table_index = 0;
 	int i;
 	int idx;
 	unsigned int num_sections;
@@ -128,13 +128,13 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
 	secstrtab = (const char *)ehdr + _r(&shstrtab_sec->sh_offset);
 	for (i = 0; i < num_sections; i++) {
 		idx = r(&shdr[i].sh_name);
-		if (strcmp(secstrtab + idx, "__ex_table") == 0) {
-			extab_sec = shdr + i;
-			extab_index = i;
+		if (strcmp(secstrtab + idx, secname) == 0) {
+			table_sec = shdr + i;
+			table_index = i;
 		}
 		if ((r(&shdr[i].sh_type) == SHT_REL ||
 		     r(&shdr[i].sh_type) == SHT_RELA) &&
-		    r(&shdr[i].sh_info) == extab_index) {
+		    r(&shdr[i].sh_info) == table_index) {
 			relocs = (void *)ehdr + _r(&shdr[i].sh_offset);
 			relocs_size = _r(&shdr[i].sh_size);
 		}
@@ -147,35 +147,37 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
 				(const char *)ehdr + _r(&shdr[i].sh_offset));
 	}
 	if (strtab_sec == NULL) {
-		fprintf(stderr,	"no .strtab in  file: %s\n", fname);
-		fail_file();
+		fprintf(stderr,	"no .strtab in file: %s\n", fname);
+		return -1;
 	}
 	if (symtab_sec == NULL) {
-		fprintf(stderr,	"no .symtab in  file: %s\n", fname);
-		fail_file();
+		fprintf(stderr,	"no .symtab in file: %s\n", fname);
+		return -1;
 	}
 	symtab = (const Elf_Sym *)((const char *)ehdr +
 				   _r(&symtab_sec->sh_offset));
-	if (extab_sec == NULL) {
-		fprintf(stderr,	"no __ex_table in  file: %s\n", fname);
-		fail_file();
+	if (table_sec == NULL) {
+		fprintf(stderr,	"no %s section in file: %s\n", secname, fname);
+		return -1;
 	}
 	strtab = (const char *)ehdr + _r(&strtab_sec->sh_offset);
 
-	extab_image = (void *)ehdr + _r(&extab_sec->sh_offset);
+	table_image = (void *)ehdr + _r(&table_sec->sh_offset);
 
 	if (custom_sort) {
-		custom_sort(extab_image, _r(&extab_sec->sh_size));
+		custom_sort(table_image, _r(&table_sec->sh_size), entsize);
 	} else {
-		int num_entries = _r(&extab_sec->sh_size) / extable_ent_size;
-		qsort(extab_image, num_entries,
-		      extable_ent_size, compare_extable);
+		int num_entries = _r(&table_sec->sh_size) / entsize;
+		qsort(table_image, num_entries, entsize, generic_compare);
 	}
 	/* If there were relocations, we no longer need them. */
 	if (relocs)
 		memset(relocs, 0, relocs_size);
 
-	/* find main_extable_sort_needed */
+	if (!sort_needed_var)
+		return 0;
+
+	/* find sort needed variable so we can clear it */
 	sort_needed_sym = NULL;
 	for (i = 0; i < _r(&symtab_sec->sh_size) / sizeof(Elf_Sym); i++) {
 		sym = (void *)ehdr + _r(&symtab_sec->sh_offset);
@@ -183,16 +185,16 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
 		if (ELF_ST_TYPE(sym->st_info) != STT_OBJECT)
 			continue;
 		idx = r(&sym->st_name);
-		if (strcmp(strtab + idx, "main_extable_sort_needed") == 0) {
+		if (strcmp(strtab + idx, sort_needed_var) == 0) {
 			sort_needed_sym = sym;
 			break;
 		}
 	}
 	if (sort_needed_sym == NULL) {
 		fprintf(stderr,
-			"no main_extable_sort_needed symbol in  file: %s\n",
-			fname);
-		fail_file();
+			"no %s symbol in file: %s\n",
+			sort_needed_var, fname);
+		return -1;
 	}
 	sort_needed_sec = &shdr[get_secindex(r2(&sym->st_shndx),
 					     sort_needed_sym - symtab,
@@ -208,4 +210,5 @@ do_func(Elf_Ehdr *ehdr, char const *const fname, table_sort_t custom_sort)
 #endif
 	/* We sorted it, clear the flag. */
 	w(0, sort_done_location);
+	return 0;
 }
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
                   ` (8 preceding siblings ...)
  2017-06-01  5:44 ` [RFC PATCH 09/10] extable: add undwarf table sorting ability to sorttable script Josh Poimboeuf
@ 2017-06-01  5:44 ` Josh Poimboeuf
  2017-06-01 11:05   ` Peter Zijlstra
                     ` (2 more replies)
  2017-06-01  6:08 ` [RFC PATCH 00/10] x86: " Ingo Molnar
  10 siblings, 3 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01  5:44 UTC (permalink / raw)
  To: x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Jiri Slaby, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

Add a new 'undwarf' unwinder which is enabled by
CONFIG_UNDWARF_UNWINDER.  It plugs into the existing x86 unwinder
framework.

It relies on objtool to generate the needed .undwarf section.

For more details on why undwarf is used instead of DWARF, see
tools/objtool/Documentation/undwarf.txt.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/um/include/asm/unwind.h      |   7 +
 arch/x86/Kconfig                  |   1 +
 arch/x86/Kconfig.debug            |  26 +++
 arch/x86/include/asm/module.h     |   8 +
 arch/x86/include/asm/unwind.h     |  64 +++---
 arch/x86/kernel/Makefile          |   8 +-
 arch/x86/kernel/module.c          |   9 +-
 arch/x86/kernel/unwind_frame.c    |  39 ++--
 arch/x86/kernel/unwind_guess.c    |   5 +
 arch/x86/kernel/unwind_undwarf.c  | 402 ++++++++++++++++++++++++++++++++++++++
 include/asm-generic/vmlinux.lds.h |  14 ++
 lib/Kconfig.debug                 |   3 +
 scripts/Makefile.build            |   3 +-
 scripts/link-vmlinux.sh           |   5 +
 14 files changed, 534 insertions(+), 60 deletions(-)
 create mode 100644 arch/um/include/asm/unwind.h
 create mode 100644 arch/x86/kernel/unwind_undwarf.c

diff --git a/arch/um/include/asm/unwind.h b/arch/um/include/asm/unwind.h
new file mode 100644
index 0000000..4e3f719
--- /dev/null
+++ b/arch/um/include/asm/unwind.h
@@ -0,0 +1,7 @@
+#ifndef _ASM_UML_UNWIND_H
+#define _ASM_UML_UNWIND_H
+
+static inline void
+unwind_module_init(struct module *mod, void *undwarf, size_t size) {}
+
+#endif /* _ASM_UML_UNWIND_H */
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4ccfacc..869fbc5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -151,6 +151,7 @@ config X86
 	select HAVE_MEMBLOCK
 	select HAVE_MEMBLOCK_NODE_MAP
 	select HAVE_MIXED_BREAKPOINTS_REGS
+	select HAVE_MOD_ARCH_SPECIFIC
 	select HAVE_NMI
 	select HAVE_OPROFILE
 	select HAVE_OPTPROBES
diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
index fcb7604..6717463 100644
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -357,4 +357,30 @@ config PUNIT_ATOM_DEBUG
 	  The current power state can be read from
 	  /sys/kernel/debug/punit_atom/dev_power_state
 
+config UNDWARF_UNWINDER
+	bool "undwarf unwinder"
+	depends on X86_64
+	select STACK_VALIDATION
+	select SORTTABLE
+	---help---
+	  This option enables the "undwarf" unwinder for unwinding kernel stack
+	  traces.  It uses a custom data format which is a simplified version
+	  of the DWARF Call Frame Information standard.
+
+	  This unwinder is more accurate across interrupt entry frames than the
+	  frame pointer unwinder.  This also can enable a small performance
+	  improvement across the entire kernel if CONFIG_FRAME_POINTER is
+	  disabled.
+
+	  Enabling this option will increase the kernel's runtime memory usage
+	  by roughly 3-5MB, depending on the kernel config.
+
+config FRAME_POINTER_UNWINDER
+	def_bool y
+	depends on !UNDWARF_UNWINDER && FRAME_POINTER
+
+config GUESS_UNWINDER
+	def_bool y
+	depends on !UNDWARF_UNWINDER && !FRAME_POINTER
+
 endmenu
diff --git a/arch/x86/include/asm/module.h b/arch/x86/include/asm/module.h
index e3b7819..454eeea 100644
--- a/arch/x86/include/asm/module.h
+++ b/arch/x86/include/asm/module.h
@@ -2,6 +2,14 @@
 #define _ASM_X86_MODULE_H
 
 #include <asm-generic/module.h>
+#include <asm/undwarf.h>
+
+struct mod_arch_specific {
+#ifdef CONFIG_UNDWARF_UNWINDER
+	unsigned int num_undwarves;
+	struct undwarf *undwarf;
+#endif
+};
 
 #ifdef CONFIG_X86_64
 /* X86_64 does not define MODULE_PROC_FAMILY */
diff --git a/arch/x86/include/asm/unwind.h b/arch/x86/include/asm/unwind.h
index e667649..f06be3f 100644
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -12,11 +12,13 @@ struct unwind_state {
 	struct task_struct *task;
 	int graph_idx;
 	bool error;
-#ifdef CONFIG_FRAME_POINTER
+#if defined(CONFIG_UNDWARF_UNWINDER)
+	unsigned long sp, bp, ip;
+	struct pt_regs *regs;
+#elif defined(CONFIG_FRAME_POINTER)
 	bool got_irq;
-	unsigned long *bp, *orig_sp;
+	unsigned long *bp, *orig_sp, ip;
 	struct pt_regs *regs;
-	unsigned long ip;
 #else
 	unsigned long *sp;
 #endif
@@ -24,41 +26,30 @@ struct unwind_state {
 
 void __unwind_start(struct unwind_state *state, struct task_struct *task,
 		    struct pt_regs *regs, unsigned long *first_frame);
-
 bool unwind_next_frame(struct unwind_state *state);
-
 unsigned long unwind_get_return_address(struct unwind_state *state);
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state);
 
 static inline bool unwind_done(struct unwind_state *state)
 {
 	return state->stack_info.type == STACK_TYPE_UNKNOWN;
 }
 
-static inline
-void unwind_start(struct unwind_state *state, struct task_struct *task,
-		  struct pt_regs *regs, unsigned long *first_frame)
-{
-	first_frame = first_frame ? : get_stack_pointer(task, regs);
-
-	__unwind_start(state, task, regs, first_frame);
-}
-
 static inline bool unwind_error(struct unwind_state *state)
 {
 	return state->error;
 }
 
-#ifdef CONFIG_FRAME_POINTER
-
 static inline
-unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+void unwind_start(struct unwind_state *state, struct task_struct *task,
+		  struct pt_regs *regs, unsigned long *first_frame)
 {
-	if (unwind_done(state))
-		return NULL;
+	first_frame = first_frame ? : get_stack_pointer(task, regs);
 
-	return state->regs ? &state->regs->ip : state->bp + 1;
+	__unwind_start(state, task, regs, first_frame);
 }
 
+#if defined(CONFIG_UNDWARF_UNWINDER) || defined(CONFIG_FRAME_POINTER)
 static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
 {
 	if (unwind_done(state))
@@ -66,20 +57,33 @@ static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
 
 	return state->regs;
 }
-
-#else /* !CONFIG_FRAME_POINTER */
-
-static inline
-unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
-{
-	return NULL;
-}
-
+#else
 static inline struct pt_regs *unwind_get_entry_regs(struct unwind_state *state)
 {
 	return NULL;
 }
+#endif
+
+#ifdef CONFIG_UNDWARF_UNWINDER
+void unwind_module_init(struct module *mod, void *undwarf, size_t size);
+#else
+static inline void
+unwind_module_init(struct module *mod, void *undwarf, size_t size) {}
+#endif
 
-#endif /* CONFIG_FRAME_POINTER */
+/*
+ * This disables KASAN checking when reading a value from another task's stack,
+ * since the other task could be running on another CPU and could have poisoned
+ * the stack in the meantime.
+ */
+#define READ_ONCE_TASK_STACK(task, x)			\
+({							\
+	unsigned long val;				\
+	if (task == current)				\
+		val = READ_ONCE(x);			\
+	else						\
+		val = READ_ONCE_NOCHECK(x);		\
+	val;						\
+})
 
 #endif /* _ASM_X86_UNWIND_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3c7c419..4865889 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -125,11 +125,9 @@ obj-$(CONFIG_PERF_EVENTS)		+= perf_regs.o
 obj-$(CONFIG_TRACING)			+= tracepoint.o
 obj-$(CONFIG_SCHED_MC_PRIO)		+= itmt.o
 
-ifdef CONFIG_FRAME_POINTER
-obj-y					+= unwind_frame.o
-else
-obj-y					+= unwind_guess.o
-endif
+obj-$(CONFIG_UNDWARF_UNWINDER)		+= unwind_undwarf.o
+obj-$(CONFIG_FRAME_POINTER_UNWINDER)	+= unwind_frame.o
+obj-$(CONFIG_GUESS_UNWINDER)		+= unwind_guess.o
 
 ###
 # 64 bit specific files
diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c
index f67bd32..6756070 100644
--- a/arch/x86/kernel/module.c
+++ b/arch/x86/kernel/module.c
@@ -35,6 +35,7 @@
 #include <asm/page.h>
 #include <asm/pgtable.h>
 #include <asm/setup.h>
+#include <asm/unwind.h>
 
 #if 0
 #define DEBUGP(fmt, ...)				\
@@ -213,7 +214,7 @@ int module_finalize(const Elf_Ehdr *hdr,
 		    struct module *me)
 {
 	const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL,
-		*para = NULL;
+		*para = NULL, *undwarf = NULL;
 	char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
 
 	for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) {
@@ -225,6 +226,8 @@ int module_finalize(const Elf_Ehdr *hdr,
 			locks = s;
 		if (!strcmp(".parainstructions", secstrings + s->sh_name))
 			para = s;
+		if (!strcmp(".undwarf", secstrings + s->sh_name))
+			undwarf = s;
 	}
 
 	if (alt) {
@@ -248,6 +251,10 @@ int module_finalize(const Elf_Ehdr *hdr,
 	/* make jump label nops */
 	jump_label_apply_nops(me);
 
+	if (undwarf)
+		unwind_module_init(me, (void *)undwarf->sh_addr,
+				   undwarf->sh_size);
+
 	return 0;
 }
 
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c
index b9389d7..7574ef5 100644
--- a/arch/x86/kernel/unwind_frame.c
+++ b/arch/x86/kernel/unwind_frame.c
@@ -10,20 +10,22 @@
 
 #define FRAME_HEADER_SIZE (sizeof(long) * 2)
 
-/*
- * This disables KASAN checking when reading a value from another task's stack,
- * since the other task could be running on another CPU and could have poisoned
- * the stack in the meantime.
- */
-#define READ_ONCE_TASK_STACK(task, x)			\
-({							\
-	unsigned long val;				\
-	if (task == current)				\
-		val = READ_ONCE(x);			\
-	else						\
-		val = READ_ONCE_NOCHECK(x);		\
-	val;						\
-})
+unsigned long unwind_get_return_address(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return 0;
+
+	return __kernel_text_address(state->ip) ? state->ip : 0;
+}
+EXPORT_SYMBOL_GPL(unwind_get_return_address);
+
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return NULL;
+
+	return state->regs ? &state->regs->ip : state->bp + 1;
+}
 
 static void unwind_dump(struct unwind_state *state)
 {
@@ -66,15 +68,6 @@ static void unwind_dump(struct unwind_state *state)
 	}
 }
 
-unsigned long unwind_get_return_address(struct unwind_state *state)
-{
-	if (unwind_done(state))
-		return 0;
-
-	return __kernel_text_address(state->ip) ? state->ip : 0;
-}
-EXPORT_SYMBOL_GPL(unwind_get_return_address);
-
 static size_t regs_size(struct pt_regs *regs)
 {
 	/* x86_32 regs from kernel mode are two words shorter: */
diff --git a/arch/x86/kernel/unwind_guess.c b/arch/x86/kernel/unwind_guess.c
index 039f367..4f0e17b 100644
--- a/arch/x86/kernel/unwind_guess.c
+++ b/arch/x86/kernel/unwind_guess.c
@@ -19,6 +19,11 @@ unsigned long unwind_get_return_address(struct unwind_state *state)
 }
 EXPORT_SYMBOL_GPL(unwind_get_return_address);
 
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	return NULL;
+}
+
 bool unwind_next_frame(struct unwind_state *state)
 {
 	struct stack_info *info = &state->stack_info;
diff --git a/arch/x86/kernel/unwind_undwarf.c b/arch/x86/kernel/unwind_undwarf.c
new file mode 100644
index 0000000..8023662
--- /dev/null
+++ b/arch/x86/kernel/unwind_undwarf.c
@@ -0,0 +1,402 @@
+#include <linux/module.h>
+#include <linux/sort.h>
+#include <asm/ptrace.h>
+#include <asm/stacktrace.h>
+#include <asm/unwind.h>
+#include <asm/undwarf.h>
+
+#define undwarf_warn(fmt, ...) \
+	printk_deferred_once(KERN_WARNING pr_fmt("WARNING: " fmt), ##__VA_ARGS__)
+
+extern struct undwarf __undwarf_start[];
+extern struct undwarf __undwarf_end[];
+
+unsigned long unwind_get_return_address(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return 0;
+
+	return __kernel_text_address(state->ip) ? state->ip : 0;
+}
+EXPORT_SYMBOL_GPL(unwind_get_return_address);
+
+unsigned long *unwind_get_return_address_ptr(struct unwind_state *state)
+{
+	if (unwind_done(state))
+		return NULL;
+
+	if (state->regs)
+		return &state->regs->ip;
+
+	if (state->sp)
+		return (unsigned long *)state->sp - 1;
+
+	return NULL;
+}
+
+static inline unsigned long undwarf_ip(struct undwarf *undwarf)
+{
+	return (unsigned long)&undwarf->ip + undwarf->ip;
+}
+
+static struct undwarf *__undwarf_lookup(struct undwarf *undwarf,
+					unsigned int num, unsigned long ip)
+{
+	struct undwarf *first = undwarf;
+	struct undwarf *last = undwarf + num - 1;
+	struct undwarf *mid;
+	unsigned long u_ip;
+
+	while (first <= last) {
+		mid = first + ((last - first) / 2);
+		u_ip = undwarf_ip(mid);
+
+		if (ip >= u_ip) {
+			if (ip < u_ip + mid->len)
+				return mid;
+			first = mid + 1;
+		} else
+			last = mid - 1;
+	}
+
+	return NULL;
+}
+
+static struct undwarf *undwarf_lookup(unsigned long ip)
+{
+	struct undwarf *undwarf;
+	struct module *mod;
+
+	/* Look in vmlinux undwarf section: */
+	undwarf = __undwarf_lookup(__undwarf_start, __undwarf_end - __undwarf_start, ip);
+	if (undwarf)
+		return undwarf;
+
+	/* Look in module undwarf sections: */
+	preempt_disable();
+	mod = __module_address(ip);
+	if (!mod || !mod->arch.undwarf)
+		goto module_out;
+	undwarf = __undwarf_lookup(mod->arch.undwarf, mod->arch.num_undwarves, ip);
+
+module_out:
+	preempt_enable();
+	return undwarf;
+}
+
+static bool stack_access_ok(struct unwind_state *state, unsigned long addr,
+			    size_t len)
+{
+	struct stack_info *info = &state->stack_info;
+
+	/*
+	 * If the next bp isn't on the current stack, switch to the next one.
+	 *
+	 * We may have to traverse multiple stacks to deal with the possibility
+	 * that info->next_sp could point to an empty stack and the next bp
+	 * could be on a subsequent stack.
+	 */
+	while (!on_stack(info, (void *)addr, len))
+		if (get_stack_info(info->next_sp, state->task, info,
+				   &state->stack_mask))
+			return false;
+
+	return true;
+}
+
+static bool deref_stack_reg(struct unwind_state *state, unsigned long addr,
+			    unsigned long *val)
+{
+	if (!stack_access_ok(state, addr, sizeof(long)))
+		return false;
+
+	*val = READ_ONCE_TASK_STACK(state->task, *(unsigned long *)addr);
+	return true;
+}
+
+#define REGS_SIZE (sizeof(struct pt_regs))
+#define SP_OFFSET (offsetof(struct pt_regs, sp))
+#define IRET_REGS_SIZE (REGS_SIZE - offsetof(struct pt_regs, ip))
+#define IRET_SP_OFFSET (SP_OFFSET - offsetof(struct pt_regs, ip))
+
+static bool deref_stack_regs(struct unwind_state *state, unsigned long addr,
+			     unsigned long *ip, unsigned long *sp, bool full)
+{
+	size_t regs_size = full ? REGS_SIZE : IRET_REGS_SIZE;
+	size_t sp_offset = full ? SP_OFFSET : IRET_SP_OFFSET;
+	struct pt_regs *regs = (struct pt_regs *)(addr + regs_size - REGS_SIZE);
+
+	if (IS_ENABLED(CONFIG_X86_64)) {
+		if (!stack_access_ok(state, addr, regs_size))
+			return false;
+
+		*ip = regs->ip;
+		*sp = regs->sp;
+
+		return true;
+	}
+
+	if (!stack_access_ok(state, addr, sp_offset))
+		return false;
+
+	*ip = regs->ip;
+
+	if (user_mode(regs)) {
+		if (!stack_access_ok(state, addr + sp_offset, REGS_SIZE - SP_OFFSET))
+			return false;
+
+		*sp = regs->sp;
+	} else
+		*sp = (unsigned long)&regs->sp;
+
+	return true;
+}
+
+bool unwind_next_frame(struct unwind_state *state)
+{
+	struct undwarf *undwarf;
+	unsigned long cfa;
+	bool indirect = false;
+	enum stack_type prev_type = state->stack_info.type;
+	unsigned long ip_p, prev_sp = state->sp;
+
+	if (unwind_done(state))
+		return false;
+
+	/* Have we reached the end? */
+	if (state->regs && user_mode(state->regs))
+		goto done;
+
+	/* Look up the instruction address in the .undwarf table: */
+	undwarf = undwarf_lookup(state->ip);
+	if (!undwarf || undwarf->cfa_reg == UNDWARF_REG_UNDEFINED)
+		goto done;
+
+	/* Calculate the CFA (caller frame address): */
+	switch (undwarf->cfa_reg) {
+	case UNDWARF_REG_SP:
+		cfa = state->sp + undwarf->cfa_offset;
+		break;
+
+	case UNDWARF_REG_BP:
+		cfa = state->bp + undwarf->cfa_offset;
+		break;
+
+	case UNDWARF_REG_SP_INDIRECT:
+		cfa = state->sp + undwarf->cfa_offset;
+		indirect = true;
+		break;
+
+	case UNDWARF_REG_BP_INDIRECT:
+		cfa = state->bp + undwarf->cfa_offset;
+		indirect = true;
+		break;
+
+	case UNDWARF_REG_R10:
+		if (!state->regs) {
+			undwarf_warn("missing regs for base reg R10 at ip %p\n",
+				     (void *)state->ip);
+			goto done;
+		}
+		cfa = state->regs->r10;
+		break;
+
+	case UNDWARF_REG_DI:
+		if (!state->regs) {
+			undwarf_warn("missing regs for base reg DI at ip %p\n",
+				     (void *)state->ip);
+			goto done;
+		}
+		cfa = state->regs->di;
+		break;
+
+	case UNDWARF_REG_DX:
+		if (!state->regs) {
+			undwarf_warn("missing regs for base reg DI at ip %p\n",
+				     (void *)state->ip);
+			goto done;
+		}
+		cfa = state->regs->dx;
+		break;
+
+	default:
+		undwarf_warn("unknown CFA base reg %d for ip %p\n",
+			     undwarf->cfa_reg, (void *)state->ip);
+		goto done;
+	}
+
+	if (indirect) {
+		if (!deref_stack_reg(state, cfa, &cfa))
+			goto done;
+	}
+
+	/* Find IP, SP and possibly regs: */
+	switch (undwarf->type) {
+	case UNDWARF_TYPE_CFA:
+		ip_p = cfa - sizeof(long);
+
+		if (!deref_stack_reg(state, ip_p, &state->ip))
+			goto done;
+
+		state->ip = ftrace_graph_ret_addr(state->task, &state->graph_idx,
+						  state->ip, (void *)ip_p);
+
+		state->sp = cfa;
+		state->regs = NULL;
+		break;
+
+	case UNDWARF_TYPE_REGS:
+		if (!deref_stack_regs(state, cfa, &state->ip, &state->sp, true)) {
+			undwarf_warn("can't dereference registers at %p for ip %p\n",
+				     (void *)cfa, (void *)state->ip);
+			goto done;
+		}
+
+		state->regs = (struct pt_regs *)cfa;
+		break;
+
+	case UNDWARF_TYPE_REGS_IRET:
+		if (!deref_stack_regs(state, cfa, &state->ip, &state->sp, false)) {
+			undwarf_warn("can't dereference iret registers at %p for ip %p\n",
+				     (void *)cfa, (void *)state->ip);
+			goto done;
+		}
+
+		state->regs = NULL;
+		break;
+
+	default:
+		undwarf_warn("unknown undwarf type %d\n", undwarf->type);
+		break;
+	}
+
+	/* Find BP: */
+	switch (undwarf->bp_reg) {
+	case UNDWARF_REG_UNDEFINED:
+		if (state->regs)
+			state->bp = state->regs->bp;
+		break;
+
+	case UNDWARF_REG_CFA:
+		if (!deref_stack_reg(state, cfa + undwarf->bp_offset, &state->bp))
+			goto done;
+		break;
+
+	case UNDWARF_REG_BP:
+		if (!deref_stack_reg(state, state->bp + undwarf->bp_offset, &state->bp))
+			goto done;
+		break;
+
+	default:
+		undwarf_warn("unknown BP base reg %d for ip %p\n",
+			     undwarf->bp_reg, (void *)undwarf_ip(undwarf));
+		goto done;
+	}
+
+	/* Prevent a recursive loop due to bad .undwarf data: */
+	if (state->stack_info.type == prev_type &&
+	    on_stack(&state->stack_info, (void *)state->sp, sizeof(long)) &&
+	    state->sp <= prev_sp) {
+		undwarf_warn("stack going in the wrong direction? ip=%p\n",
+			     (void *)state->ip);
+		goto done;
+	}
+
+	return true;
+
+done:
+	state->stack_info.type = STACK_TYPE_UNKNOWN;
+	return false;
+}
+EXPORT_SYMBOL_GPL(unwind_next_frame);
+
+void __unwind_start(struct unwind_state *state, struct task_struct *task,
+		    struct pt_regs *regs, unsigned long *first_frame)
+{
+	memset(state, 0, sizeof(*state));
+	state->task = task;
+
+	if (regs) {
+		if (user_mode(regs)) {
+			state->stack_info.type = STACK_TYPE_UNKNOWN;
+			return;
+		}
+
+		state->ip = regs->ip;
+		state->sp = kernel_stack_pointer(regs);
+		state->bp = regs->bp;
+		state->regs = regs;
+
+	} else if (task == current) {
+		register void *__sp asm(_ASM_SP);
+
+		asm volatile("lea (%%rip), %0\n\t"
+			     "mov %%rsp, %1\n\t"
+			     "mov %%rbp, %2\n\t"
+			     : "=r" (state->ip), "=r" (state->sp),
+			       "=r" (state->bp), "+r" (__sp));
+
+		state->regs = NULL;
+
+	} else {
+		struct inactive_task_frame *frame = (void *)task->thread.sp;
+
+		state->ip = frame->ret_addr;
+		state->sp = task->thread.sp;
+		state->bp = frame->bp;
+		state->regs = NULL;
+	}
+
+	if (get_stack_info((unsigned long *)state->sp, state->task,
+			   &state->stack_info, &state->stack_mask))
+		return;
+
+	/*
+	 * The caller can provide the address of the first frame directly
+	 * (first_frame) or indirectly (regs->sp) to indicate which stack frame
+	 * to start unwinding at.  Skip ahead until we reach it.
+	 */
+	while (!unwind_done(state) &&
+	       (!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
+			state->sp <= (unsigned long)first_frame))
+		unwind_next_frame(state);
+}
+EXPORT_SYMBOL_GPL(__unwind_start);
+
+static void undwarf_sort_swap(void *_a, void *_b, int size)
+{
+	struct undwarf *a = _a, *b = _b, tmp;
+	int delta = _b - _a;
+
+	tmp = *a;
+	*a = *b;
+	*b = tmp;
+
+	a->ip += delta;
+	b->ip -= delta;
+}
+
+static int undwarf_sort_cmp(const void *_a, const void *_b)
+{
+	unsigned long a = undwarf_ip((struct undwarf *)_a);
+	unsigned long b = undwarf_ip((struct undwarf *)_b);
+
+	if (a > b)
+		return 1;
+	if (a < b)
+		return -1;
+	return 0;
+}
+
+void unwind_module_init(struct module *mod, void *u, size_t size)
+{
+	struct undwarf *undwarf = u;
+	unsigned int num = size / sizeof(*undwarf);
+
+	WARN_ON_ONCE(size % sizeof(*undwarf) != 0);
+
+	sort(undwarf, num, sizeof(*undwarf), undwarf_sort_cmp, undwarf_sort_swap);
+
+	mod->arch.undwarf = undwarf;
+	mod->arch.num_undwarves = num;
+}
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 314a0b9..e350116 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -324,6 +324,8 @@
 									\
 	TRACEDATA							\
 									\
+	UNDWARF_TABLE							\
+									\
 	/* Kernel symbol table: Normal symbols */			\
 	__ksymtab         : AT(ADDR(__ksymtab) - LOAD_OFFSET) {		\
 		VMLINUX_SYMBOL(__start___ksymtab) = .;			\
@@ -669,6 +671,18 @@
 #define BUG_TABLE
 #endif
 
+#ifdef CONFIG_UNDWARF_UNWINDER
+#define UNDWARF_TABLE							\
+	. = ALIGN(16);							\
+	.undwarf : AT(ADDR(.undwarf) - LOAD_OFFSET) {			\
+		VMLINUX_SYMBOL(__undwarf_start) = .;			\
+		KEEP(*(.undwarf))					\
+		VMLINUX_SYMBOL(__undwarf_end) = .;			\
+	}
+#else
+#define UNDWARF_TABLE
+#endif
+
 #ifdef CONFIG_PM_TRACE
 #define TRACEDATA							\
 	. = ALIGN(4);							\
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index e4587eb..31f73d9 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -374,6 +374,9 @@ config STACK_VALIDATION
 	  pointers (if CONFIG_FRAME_POINTER is enabled).  This helps ensure
 	  that runtime stack traces are more reliable.
 
+	  This is also a prerequisite for creation of the undwarf format which
+	  is needed for CONFIG_UNDWARF_UNWINDER.
+
 	  For more information, see
 	  tools/objtool/Documentation/stack-validation.txt.
 
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 733e044..b43dddf 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -258,7 +258,8 @@ ifneq ($(SKIP_STACK_VALIDATION),1)
 
 __objtool_obj := $(objtree)/tools/objtool/objtool
 
-objtool_args = check
+objtool_args = $(if $(CONFIG_UNDWARF_UNWINDER),undwarf generate,check)
+
 ifndef CONFIG_FRAME_POINTER
 objtool_args += --no-fp
 endif
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index f4eb9dc..286ea8d 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -296,6 +296,11 @@ if [ -n "${CONFIG_BUILDTIME_EXTABLE_SORT}" ]; then
 	sortextable vmlinux
 fi
 
+if [ -n "${CONFIG_UNDWARF_UNWINDER}" ]; then
+	info SORTUD vmlinux
+	sortundwarf vmlinux
+fi
+
 info SYSMAP System.map
 mksysmap vmlinux System.map
 
-- 
2.7.4

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
                   ` (9 preceding siblings ...)
  2017-06-01  5:44 ` [RFC PATCH 10/10] x86/unwind: add undwarf unwinder Josh Poimboeuf
@ 2017-06-01  6:08 ` Ingo Molnar
  2017-06-01 11:58   ` Josh Poimboeuf
  10 siblings, 1 reply; 55+ messages in thread
From: Ingo Molnar @ 2017-06-01  6:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, H. Peter Anvin, Peter Zijlstra


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> Here's the contents of the undwarf.txt file which explains the 'why' in
> more detail:

Ok, so the code quality looks pretty convincing to me - the new core 'undwarf' 
unwinder code is a _lot_ more readable than any of the Dwarf based attempts 
before.

That we control the debug info generation at build time is icing on the cake to 
me.

One thing I'd like to see on the list of benefits side of the equation is a size 
comparison of kernel .text, with frame pointers vs. undwarf, on 64-bit kernels.

Being able to generate more optimal code in the hottest code paths of the kernel 
is the _real_, primary upstream kernel benefit of a different debuginfo method - 
which has to be weighed against the pain of introducing a new unwinder. But this 
submission does not talk about that aspect at all, which should be fixed I think.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01  5:44 ` [RFC PATCH 10/10] x86/unwind: add undwarf unwinder Josh Poimboeuf
@ 2017-06-01 11:05   ` Peter Zijlstra
  2017-06-01 12:26     ` Josh Poimboeuf
  2017-06-01 12:13   ` Peter Zijlstra
  2017-06-14 11:45   ` Jiri Slaby
  2 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2017-06-01 11:05 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 12:44:16AM -0500, Josh Poimboeuf wrote:

> +static struct undwarf *__undwarf_lookup(struct undwarf *undwarf,
> +					unsigned int num, unsigned long ip)
> +{
> +	struct undwarf *first = undwarf;
> +	struct undwarf *last = undwarf + num - 1;
> +	struct undwarf *mid;
> +	unsigned long u_ip;
> +
> +	while (first <= last) {
> +		mid = first + ((last - first) / 2);
> +		u_ip = undwarf_ip(mid);
> +
> +		if (ip >= u_ip) {
> +			if (ip < u_ip + mid->len)
> +				return mid;
> +			first = mid + 1;
> +		} else
> +			last = mid - 1;
> +	}
> +
> +	return NULL;
> +}

That's a bog standard binary search thing, don't we have a helper for
that someplace?

> +static struct undwarf *undwarf_lookup(unsigned long ip)
> +{
> +	struct undwarf *undwarf;
> +	struct module *mod;
> +
> +	/* Look in vmlinux undwarf section: */
> +	undwarf = __undwarf_lookup(__undwarf_start, __undwarf_end - __undwarf_start, ip);
> +	if (undwarf)
> +		return undwarf;
> +
> +	/* Look in module undwarf sections: */
> +	preempt_disable();
> +	mod = __module_address(ip);
> +	if (!mod || !mod->arch.undwarf)
> +		goto module_out;
> +	undwarf = __undwarf_lookup(mod->arch.undwarf, mod->arch.num_undwarves, ip);
> +
> +module_out:
> +	preempt_enable();
> +	return undwarf;
> +}

A few points here:

 - that first lookup is entirely pointless if !core_kernel_text(ip)

 - that preempt_{dis,en}able() muck is 'pointless', for while it shuts
   up the warnings from __modules_address(), nothing preserves the
   struct undwarf you get a pointer to after the preempt_enable().

 - what about 'interesting' things like, ftrace_trampoline, kprobe insn
   slots and bpf text?

> +static bool stack_access_ok(struct unwind_state *state, unsigned long addr,
> +			    size_t len)
> +{
> +	struct stack_info *info = &state->stack_info;
> +
> +	/*
> +	 * If the next bp isn't on the current stack, switch to the next one.
> +	 *
> +	 * We may have to traverse multiple stacks to deal with the possibility
> +	 * that info->next_sp could point to an empty stack and the next bp
> +	 * could be on a subsequent stack.
> +	 */
> +	while (!on_stack(info, (void *)addr, len)) {
> +		if (get_stack_info(info->next_sp, state->task, info,
> +				   &state->stack_mask))
> +			return false;
	}
> +
> +	return true;
> +}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01  6:08 ` [RFC PATCH 00/10] x86: " Ingo Molnar
@ 2017-06-01 11:58   ` Josh Poimboeuf
  2017-06-01 12:17     ` Peter Zijlstra
  2017-06-01 13:50     ` Ingo Molnar
  0 siblings, 2 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 11:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 01, 2017 at 08:08:24AM +0200, Ingo Molnar wrote:
> 
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > Here's the contents of the undwarf.txt file which explains the 'why' in
> > more detail:
> 
> Ok, so the code quality looks pretty convincing to me - the new core 'undwarf' 
> unwinder code is a _lot_ more readable than any of the Dwarf based attempts 
> before.
> 
> That we control the debug info generation at build time is icing on the cake to 
> me.
> 
> One thing I'd like to see on the list of benefits side of the equation is a size 
> comparison of kernel .text, with frame pointers vs. undwarf, on 64-bit kernels.

Ok, will do a text size comparison.  The only difficulty I encountered
there is that the 'size' tool considers the .undwarf section to be text
for some reason.  So the "text" size grew considerably :-)

> Being able to generate more optimal code in the hottest code paths of the kernel 
> is the _real_, primary upstream kernel benefit of a different debuginfo method - 
> which has to be weighed against the pain of introducing a new unwinder. But this 
> submission does not talk about that aspect at all, which should be fixed I think.

Actually I devoted an entire one-sentence paragraph to performance in
the documentation:

  The simpler debuginfo format also enables the unwinder to be relatively
  fast, which is important for perf and lockdep.

But I'll try to highlight that a little more.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01  5:44 ` [RFC PATCH 10/10] x86/unwind: add undwarf unwinder Josh Poimboeuf
  2017-06-01 11:05   ` Peter Zijlstra
@ 2017-06-01 12:13   ` Peter Zijlstra
  2017-06-01 12:36     ` Josh Poimboeuf
  2017-06-14 11:45   ` Jiri Slaby
  2 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2017-06-01 12:13 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 12:44:16AM -0500, Josh Poimboeuf wrote:

> +static struct undwarf *undwarf_lookup(unsigned long ip)
> +{
> +	struct undwarf *undwarf;
> +	struct module *mod;
> +
> +	/* Look in vmlinux undwarf section: */
> +	undwarf = __undwarf_lookup(__undwarf_start, __undwarf_end - __undwarf_start, ip);
> +	if (undwarf)
> +		return undwarf;
> +
> +	/* Look in module undwarf sections: */
> +	preempt_disable();
> +	mod = __module_address(ip);
> +	if (!mod || !mod->arch.undwarf)
> +		goto module_out;
> +	undwarf = __undwarf_lookup(mod->arch.undwarf, mod->arch.num_undwarves, ip);
> +
> +module_out:
> +	preempt_enable();
> +	return undwarf;
> +}

> +bool unwind_next_frame(struct unwind_state *state)
> +{
> +	struct undwarf *undwarf;
> +	unsigned long cfa;
> +	bool indirect = false;
> +	enum stack_type prev_type = state->stack_info.type;
> +	unsigned long ip_p, prev_sp = state->sp;
> +
> +	if (unwind_done(state))
> +		return false;
> +
> +	/* Have we reached the end? */
> +	if (state->regs && user_mode(state->regs))
> +		goto done;
> +
> +	/* Look up the instruction address in the .undwarf table: */
> +	undwarf = undwarf_lookup(state->ip);
> +	if (!undwarf || undwarf->cfa_reg == UNDWARF_REG_UNDEFINED)
> +		goto done;
> +

	....

> +}
> +EXPORT_SYMBOL_GPL(unwind_next_frame);
> +
> +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> +		    struct pt_regs *regs, unsigned long *first_frame)
> +{

	...

> +	while (!unwind_done(state) &&
> +	       (!on_stack(&state->stack_info, first_frame, sizeof(long)) ||
> +			state->sp <= (unsigned long)first_frame))
> +		unwind_next_frame(state);
> +}

So we do that lookup for every single frame. That's going to hurt.

Would it make sense to cache the last 'module' in an attempt to at least
avoid that lookup again? Something like so:

---
--- a/arch/x86/include/asm/unwind.h
+++ b/arch/x86/include/asm/unwind.h
@@ -15,6 +15,7 @@ struct unwind_state {
 #if defined(CONFIG_UNDWARF_UNWINDER)
 	unsigned long sp, bp, ip;
 	struct pt_regs *regs;
+	struct module *mod;
 #elif defined(CONFIG_FRAME_POINTER)
 	bool got_irq;
 	unsigned long *bp, *orig_sp, ip;
--- a/arch/x86/kernel/unwind_undwarf.c
+++ b/arch/x86/kernel/unwind_undwarf.c
@@ -62,26 +62,45 @@ static struct undwarf *__undwarf_lookup(
 	return NULL;
 }
 
-static struct undwarf *undwarf_lookup(unsigned long ip)
+static struct undwarf *undwarf_lookup(struct unwind_state *state)
 {
+	struct module *mod = state->mod;
+	unsigned long ip = state->ip;
 	struct undwarf *undwarf;
-	struct module *mod;
+	unsigned int num;
 
-	/* Look in vmlinux undwarf section: */
-	undwarf = __undwarf_lookup(__undwarf_start, __undwarf_end - __undwarf_start, ip);
-	if (undwarf)
-		return undwarf;
+	if (mod) {
+		if (within_module(ip, mod)) {
+			undwarf = mod->arch.undwarf;
+			num	= mod->arch.num_undwarves;
+			goto lookup;
+		}
+		mod = NULL;
+	}
+
+	if (core_kernel_text(ip)) {
+		undwarf = __undwarf_start;
+		num	= __undwarf_end - __undwarf_start;
+		goto lookup;
+	}
 
-	/* Look in module undwarf sections: */
+	/*
+	 * Shut up the warning from __module_address(), regardless the undwarf
+	 * pointer can disappear from under us.
+	 */
 	preempt_disable();
 	mod = __module_address(ip);
+	preempt_enable();
+
 	if (!mod || !mod->arch.undwarf)
-		goto module_out;
-	undwarf = __undwarf_lookup(mod->arch.undwarf, mod->arch.num_undwarves, ip);
+		return NULL;
 
-module_out:
-	preempt_enable();
-	return undwarf;
+	undwarf	= mod->arch.undwarf;
+	num	= mod->arch.num_undwarves;
+
+lookup:
+	state->mod = mod;
+	return __undwarf_lookup(undwarf, num, ip);
 }
 
 static bool stack_access_ok(struct unwind_state *state, unsigned long addr,
@@ -168,7 +187,7 @@ bool unwind_next_frame(struct unwind_sta
 		goto done;
 
 	/* Look up the instruction address in the .undwarf table: */
-	undwarf = undwarf_lookup(state->ip);
+	undwarf = undwarf_lookup(state);
 	if (!undwarf || undwarf->cfa_reg == UNDWARF_REG_UNDEFINED)
 		goto done;
 

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 11:58   ` Josh Poimboeuf
@ 2017-06-01 12:17     ` Peter Zijlstra
  2017-06-01 12:33       ` Jiri Slaby
  2017-06-01 12:47       ` Josh Poimboeuf
  2017-06-01 13:50     ` Ingo Molnar
  1 sibling, 2 replies; 55+ messages in thread
From: Peter Zijlstra @ 2017-06-01 12:17 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, H. Peter Anvin

On Thu, Jun 01, 2017 at 06:58:20AM -0500, Josh Poimboeuf wrote:
> > Being able to generate more optimal code in the hottest code paths of the kernel 
> > is the _real_, primary upstream kernel benefit of a different debuginfo method - 
> > which has to be weighed against the pain of introducing a new unwinder. But this 
> > submission does not talk about that aspect at all, which should be fixed I think.
> 
> Actually I devoted an entire one-sentence paragraph to performance in
> the documentation:
> 
>   The simpler debuginfo format also enables the unwinder to be relatively
>   fast, which is important for perf and lockdep.
> 
> But I'll try to highlight that a little more.

That's relative to a DWARF unwinder. It doesn't appear to be possible to
get anywhere near a frame-pointer unwinder due to having to do this
log(n) lookup for every single frame.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01 11:05   ` Peter Zijlstra
@ 2017-06-01 12:26     ` Josh Poimboeuf
  2017-06-01 12:47       ` Jiri Slaby
  2017-06-01 13:10       ` Peter Zijlstra
  0 siblings, 2 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 12:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 01:05:39PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 01, 2017 at 12:44:16AM -0500, Josh Poimboeuf wrote:
> 
> > +static struct undwarf *__undwarf_lookup(struct undwarf *undwarf,
> > +					unsigned int num, unsigned long ip)
> > +{
> > +	struct undwarf *first = undwarf;
> > +	struct undwarf *last = undwarf + num - 1;
> > +	struct undwarf *mid;
> > +	unsigned long u_ip;
> > +
> > +	while (first <= last) {
> > +		mid = first + ((last - first) / 2);
> > +		u_ip = undwarf_ip(mid);
> > +
> > +		if (ip >= u_ip) {
> > +			if (ip < u_ip + mid->len)
> > +				return mid;
> > +			first = mid + 1;
> > +		} else
> > +			last = mid - 1;
> > +	}
> > +
> > +	return NULL;
> > +}
> 
> That's a bog standard binary search thing, don't we have a helper for
> that someplace?

I wasn't able to find one...

> > +static struct undwarf *undwarf_lookup(unsigned long ip)
> > +{
> > +	struct undwarf *undwarf;
> > +	struct module *mod;
> > +
> > +	/* Look in vmlinux undwarf section: */
> > +	undwarf = __undwarf_lookup(__undwarf_start, __undwarf_end - __undwarf_start, ip);
> > +	if (undwarf)
> > +		return undwarf;
> > +
> > +	/* Look in module undwarf sections: */
> > +	preempt_disable();
> > +	mod = __module_address(ip);
> > +	if (!mod || !mod->arch.undwarf)
> > +		goto module_out;
> > +	undwarf = __undwarf_lookup(mod->arch.undwarf, mod->arch.num_undwarves, ip);
> > +
> > +module_out:
> > +	preempt_enable();
> > +	return undwarf;
> > +}
> 
> A few points here:
> 
>  - that first lookup is entirely pointless if !core_kernel_text(ip)

True.

>  - that preempt_{dis,en}able() muck is 'pointless', for while it shuts
>    up the warnings from __modules_address(), nothing preserves the
>    struct undwarf you get a pointer to after the preempt_enable().

Oops!

>  - what about 'interesting' things like, ftrace_trampoline, kprobe insn
>    slots and bpf text?

I think support for generated code can come later.  My current plan is
to have some kind of registration interface for associating debuginfo
with generated code.  Maybe as part of the generated code management
thing we talked about before:

  https://lkml.kernel.org/r/20170110085923.GD3092@twins.programming.kicks-ass.net

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 12:17     ` Peter Zijlstra
@ 2017-06-01 12:33       ` Jiri Slaby
  2017-06-01 12:52         ` Josh Poimboeuf
  2017-06-01 12:47       ` Josh Poimboeuf
  1 sibling, 1 reply; 55+ messages in thread
From: Jiri Slaby @ 2017-06-01 12:33 UTC (permalink / raw)
  To: Peter Zijlstra, Josh Poimboeuf
  Cc: Ingo Molnar, x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, H. Peter Anvin

On 06/01/2017, 02:17 PM, Peter Zijlstra wrote:
> On Thu, Jun 01, 2017 at 06:58:20AM -0500, Josh Poimboeuf wrote:
>>> Being able to generate more optimal code in the hottest code paths of the kernel 
>>> is the _real_, primary upstream kernel benefit of a different debuginfo method - 
>>> which has to be weighed against the pain of introducing a new unwinder. But this 
>>> submission does not talk about that aspect at all, which should be fixed I think.
>>
>> Actually I devoted an entire one-sentence paragraph to performance in
>> the documentation:
>>
>>   The simpler debuginfo format also enables the unwinder to be relatively
>>   fast, which is important for perf and lockdep.
>>
>> But I'll try to highlight that a little more.
> 
> That's relative to a DWARF unwinder. It doesn't appear to be possible to
> get anywhere near a frame-pointer unwinder due to having to do this
> log(n) lookup for every single frame.

This is ~ 20 times faster than my DWARF unwinder by a quick measurement
(20000 calls to save_stack_trace via single vfs_write).

perf profile, if you care:

__save_stack_trace
|
|--65.89%--unwind_next_frame
|          |
|          |--53.64%--__undwarf_lookup
|          |
|           --5.30%--deref_stack_reg
|                     |
|                      --2.32%--stack_access_ok
|
|--24.17%--__unwind_start
|          |
|          |--21.52%--unwind_next_frame
|          |          |
|          |          |--14.24%--__undwarf_lookup
|          |          |
|          |           --2.98%--deref_stack_reg
|          |                     |
|          |                      --1.32%--stack_access_ok
|          |
|           --1.32%--get_stack_info
|                     |
|                      --0.66%--in_task_stack
|
|--3.31%--unwind_get_return_address
|          __kernel_text_address
|          |
|          |--0.99%--is_ftrace_trampoline
|          |
|          |--0.99%--__is_insn_slot_addr
|          |          |
|          |           --0.66%--__rcu_read_unlock
|          |
|           --0.66%--is_bpf_text_address
|
 --1.66%--save_stack_address


-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01 12:13   ` Peter Zijlstra
@ 2017-06-01 12:36     ` Josh Poimboeuf
  2017-06-01 13:12       ` Peter Zijlstra
  0 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 12:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 02:13:56PM +0200, Peter Zijlstra wrote:
> So we do that lookup for every single frame. That's going to hurt.
> 
> Would it make sense to cache the last 'module' in an attempt to at least
> avoid that lookup again? Something like so:

The only thing with caching the module is, what if the module goes away?

Based on your previous comment I was thinking I would disable preemption
for the entire unwind_next_frame() step, but not *between* steps.  I
suppose we could require the unwind caller to disable preemption but I'd
like to avoid that if possible.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 12:17     ` Peter Zijlstra
  2017-06-01 12:33       ` Jiri Slaby
@ 2017-06-01 12:47       ` Josh Poimboeuf
  2017-06-01 13:25         ` Peter Zijlstra
  2017-06-01 13:50         ` Andy Lutomirski
  1 sibling, 2 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 12:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, H. Peter Anvin

On Thu, Jun 01, 2017 at 02:17:21PM +0200, Peter Zijlstra wrote:
> On Thu, Jun 01, 2017 at 06:58:20AM -0500, Josh Poimboeuf wrote:
> > > Being able to generate more optimal code in the hottest code paths of the kernel 
> > > is the _real_, primary upstream kernel benefit of a different debuginfo method - 
> > > which has to be weighed against the pain of introducing a new unwinder. But this 
> > > submission does not talk about that aspect at all, which should be fixed I think.
> > 
> > Actually I devoted an entire one-sentence paragraph to performance in
> > the documentation:
> > 
> >   The simpler debuginfo format also enables the unwinder to be relatively
> >   fast, which is important for perf and lockdep.
> > 
> > But I'll try to highlight that a little more.
> 
> That's relative to a DWARF unwinder.

Yes.

> It doesn't appear to be possible to get anywhere near a frame-pointer
> unwinder due to having to do this log(n) lookup for every single
> frame.

Hm, is there something faster, yet not substantially bigger?  Hash?
Trie?

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01 12:26     ` Josh Poimboeuf
@ 2017-06-01 12:47       ` Jiri Slaby
  2017-06-01 13:02         ` Josh Poimboeuf
  2017-06-01 13:42         ` Peter Zijlstra
  2017-06-01 13:10       ` Peter Zijlstra
  1 sibling, 2 replies; 55+ messages in thread
From: Jiri Slaby @ 2017-06-01 12:47 UTC (permalink / raw)
  To: Josh Poimboeuf, Peter Zijlstra
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin

On 06/01/2017, 02:26 PM, Josh Poimboeuf wrote:
> On Thu, Jun 01, 2017 at 01:05:39PM +0200, Peter Zijlstra wrote:
>> On Thu, Jun 01, 2017 at 12:44:16AM -0500, Josh Poimboeuf wrote:
>>
>>> +static struct undwarf *__undwarf_lookup(struct undwarf *undwarf,
>>> +					unsigned int num, unsigned long ip)
>>> +{
>>> +	struct undwarf *first = undwarf;
>>> +	struct undwarf *last = undwarf + num - 1;
>>> +	struct undwarf *mid;
>>> +	unsigned long u_ip;
>>> +
>>> +	while (first <= last) {
>>> +		mid = first + ((last - first) / 2);
>>> +		u_ip = undwarf_ip(mid);
>>> +
>>> +		if (ip >= u_ip) {
>>> +			if (ip < u_ip + mid->len)
>>> +				return mid;
>>> +			first = mid + 1;
>>> +		} else
>>> +			last = mid - 1;
>>> +	}
>>> +
>>> +	return NULL;
>>> +}
>>
>> That's a bog standard binary search thing, don't we have a helper for
>> that someplace?
> 
> I wasn't able to find one...

There is bsearch, but that doesn't support searching for a value in
between of 2 keys. I.e. what we typically have is these keys:
  some_function1 at 0x1000
  some_function2 at 0x2000
and we look for IP which can be e.g. 0x1010. The bsearch's cmp function
currently has no option to say, yes, the last one you asked me was the
right one, this one is after it already.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 12:33       ` Jiri Slaby
@ 2017-06-01 12:52         ` Josh Poimboeuf
  2017-06-01 12:57           ` Jiri Slaby
  0 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 12:52 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Peter Zijlstra, Ingo Molnar, x86, linux-kernel, live-patching,
	Linus Torvalds, Andy Lutomirski, H. Peter Anvin

On Thu, Jun 01, 2017 at 02:33:20PM +0200, Jiri Slaby wrote:
> On 06/01/2017, 02:17 PM, Peter Zijlstra wrote:
> > On Thu, Jun 01, 2017 at 06:58:20AM -0500, Josh Poimboeuf wrote:
> >>> Being able to generate more optimal code in the hottest code paths of the kernel 
> >>> is the _real_, primary upstream kernel benefit of a different debuginfo method - 
> >>> which has to be weighed against the pain of introducing a new unwinder. But this 
> >>> submission does not talk about that aspect at all, which should be fixed I think.
> >>
> >> Actually I devoted an entire one-sentence paragraph to performance in
> >> the documentation:
> >>
> >>   The simpler debuginfo format also enables the unwinder to be relatively
> >>   fast, which is important for perf and lockdep.
> >>
> >> But I'll try to highlight that a little more.
> > 
> > That's relative to a DWARF unwinder. It doesn't appear to be possible to
> > get anywhere near a frame-pointer unwinder due to having to do this
> > log(n) lookup for every single frame.
> 
> This is ~ 20 times faster than my DWARF unwinder by a quick measurement
> (20000 calls to save_stack_trace via single vfs_write).

Wow!  Thanks for quantifying that.  Looks like the lookup is indeed the
bottleneck as expected.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 12:52         ` Josh Poimboeuf
@ 2017-06-01 12:57           ` Jiri Slaby
  0 siblings, 0 replies; 55+ messages in thread
From: Jiri Slaby @ 2017-06-01 12:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Ingo Molnar, x86, linux-kernel, live-patching,
	Linus Torvalds, Andy Lutomirski, H. Peter Anvin

On 06/01/2017, 02:52 PM, Josh Poimboeuf wrote:
>> This is ~ 20 times faster than my DWARF unwinder by a quick measurement
>> (20000 calls to save_stack_trace via single vfs_write).
> 
> Wow!

BTW the most time spent when unwinding DWARF was in those routines
decrypting uleb/sleb128 into u/longs.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01 12:47       ` Jiri Slaby
@ 2017-06-01 13:02         ` Josh Poimboeuf
  2017-06-01 13:42         ` Peter Zijlstra
  1 sibling, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 13:02 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Peter Zijlstra, x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 02:47:48PM +0200, Jiri Slaby wrote:
> On 06/01/2017, 02:26 PM, Josh Poimboeuf wrote:
> > On Thu, Jun 01, 2017 at 01:05:39PM +0200, Peter Zijlstra wrote:
> >> On Thu, Jun 01, 2017 at 12:44:16AM -0500, Josh Poimboeuf wrote:
> >>
> >>> +static struct undwarf *__undwarf_lookup(struct undwarf *undwarf,
> >>> +					unsigned int num, unsigned long ip)
> >>> +{
> >>> +	struct undwarf *first = undwarf;
> >>> +	struct undwarf *last = undwarf + num - 1;
> >>> +	struct undwarf *mid;
> >>> +	unsigned long u_ip;
> >>> +
> >>> +	while (first <= last) {
> >>> +		mid = first + ((last - first) / 2);
> >>> +		u_ip = undwarf_ip(mid);
> >>> +
> >>> +		if (ip >= u_ip) {
> >>> +			if (ip < u_ip + mid->len)
> >>> +				return mid;
> >>> +			first = mid + 1;
> >>> +		} else
> >>> +			last = mid - 1;
> >>> +	}
> >>> +
> >>> +	return NULL;
> >>> +}
> >>
> >> That's a bog standard binary search thing, don't we have a helper for
> >> that someplace?
> > 
> > I wasn't able to find one...
> 
> There is bsearch, but that doesn't support searching for a value in
> between of 2 keys. I.e. what we typically have is these keys:
>   some_function1 at 0x1000
>   some_function2 at 0x2000
> and we look for IP which can be e.g. 0x1010. The bsearch's cmp function
> currently has no option to say, yes, the last one you asked me was the
> right one, this one is after it already.

Actually, I think bsearch would work with the latest version of the data
structure.  It now has a len field, so we can do a self-contained
compare.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01 12:26     ` Josh Poimboeuf
  2017-06-01 12:47       ` Jiri Slaby
@ 2017-06-01 13:10       ` Peter Zijlstra
  1 sibling, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2017-06-01 13:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 07:26:51AM -0500, Josh Poimboeuf wrote:
> On Thu, Jun 01, 2017 at 01:05:39PM +0200, Peter Zijlstra wrote:
> > On Thu, Jun 01, 2017 at 12:44:16AM -0500, Josh Poimboeuf wrote:
> > 
> > > +static struct undwarf *__undwarf_lookup(struct undwarf *undwarf,
> > > +					unsigned int num, unsigned long ip)
> > > +{
> > > +	struct undwarf *first = undwarf;
> > > +	struct undwarf *last = undwarf + num - 1;
> > > +	struct undwarf *mid;
> > > +	unsigned long u_ip;
> > > +
> > > +	while (first <= last) {
> > > +		mid = first + ((last - first) / 2);
> > > +		u_ip = undwarf_ip(mid);
> > > +
> > > +		if (ip >= u_ip) {
> > > +			if (ip < u_ip + mid->len)
> > > +				return mid;
> > > +			first = mid + 1;
> > > +		} else
> > > +			last = mid - 1;
> > > +	}
> > > +
> > > +	return NULL;
> > > +}
> > 
> > That's a bog standard binary search thing, don't we have a helper for
> > that someplace?
> 
> I wasn't able to find one...

Yeah, I looked too and couldn't find it either. A well.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01 12:36     ` Josh Poimboeuf
@ 2017-06-01 13:12       ` Peter Zijlstra
  2017-06-01 15:03         ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2017-06-01 13:12 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 07:36:09AM -0500, Josh Poimboeuf wrote:
> On Thu, Jun 01, 2017 at 02:13:56PM +0200, Peter Zijlstra wrote:
> > So we do that lookup for every single frame. That's going to hurt.
> > 
> > Would it make sense to cache the last 'module' in an attempt to at least
> > avoid that lookup again? Something like so:
> 
> The only thing with caching the module is, what if the module goes away?

Yeah.. *boom* ;-) We could of course play games with module_get() and
module_put(), but meh.

> Based on your previous comment I was thinking I would disable preemption
> for the entire unwind_next_frame() step, but not *between* steps.  I
> suppose we could require the unwind caller to disable preemption but I'd
> like to avoid that if possible.

Right, keeping it disabled across a frame should be ok I suppose.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 12:47       ` Josh Poimboeuf
@ 2017-06-01 13:25         ` Peter Zijlstra
  2017-06-06 14:14           ` Sergey Senozhatsky
  2017-06-01 13:50         ` Andy Lutomirski
  1 sibling, 1 reply; 55+ messages in thread
From: Peter Zijlstra @ 2017-06-01 13:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Ingo Molnar, x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, H. Peter Anvin

On Thu, Jun 01, 2017 at 07:47:05AM -0500, Josh Poimboeuf wrote:

> > It doesn't appear to be possible to get anywhere near a frame-pointer
> > unwinder due to having to do this log(n) lookup for every single
> > frame.
> 
> Hm, is there something faster, yet not substantially bigger?  Hash?
> Trie?

Not sure how to make a Hash work with nearest neighbour searches. And a
trie will only give you a constant speedup over the binary search but
not an improvement in complexity IIRC.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01 12:47       ` Jiri Slaby
  2017-06-01 13:02         ` Josh Poimboeuf
@ 2017-06-01 13:42         ` Peter Zijlstra
  1 sibling, 0 replies; 55+ messages in thread
From: Peter Zijlstra @ 2017-06-01 13:42 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Josh Poimboeuf, x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 02:47:48PM +0200, Jiri Slaby wrote:

> There is bsearch,

Shiny, should we move that into lib/sort.h and maybe get more people to
use it? search_extable() seems like something that could use it.

And __bug_table is something that seems to want all things sort applied.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 11:58   ` Josh Poimboeuf
  2017-06-01 12:17     ` Peter Zijlstra
@ 2017-06-01 13:50     ` Ingo Molnar
  2017-06-01 13:58       ` Jiri Slaby
                         ` (2 more replies)
  1 sibling, 3 replies; 55+ messages in thread
From: Ingo Molnar @ 2017-06-01 13:50 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, H. Peter Anvin, Peter Zijlstra


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Thu, Jun 01, 2017 at 08:08:24AM +0200, Ingo Molnar wrote:
> > 
> > * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > 
> > > Here's the contents of the undwarf.txt file which explains the 'why' in
> > > more detail:
> > 
> > Ok, so the code quality looks pretty convincing to me - the new core 'undwarf' 
> > unwinder code is a _lot_ more readable than any of the Dwarf based attempts 
> > before.
> > 
> > That we control the debug info generation at build time is icing on the cake to 
> > me.
> > 
> > One thing I'd like to see on the list of benefits side of the equation is a size 
> > comparison of kernel .text, with frame pointers vs. undwarf, on 64-bit kernels.
> 
> Ok, will do a text size comparison.  The only difficulty I encountered
> there is that the 'size' tool considers the .undwarf section to be text
> for some reason.  So the "text" size grew considerably :-)

One trick I sometimes use is to only size some of the key builtin.o files.

> > Being able to generate more optimal code in the hottest code paths of the kernel 
> > is the _real_, primary upstream kernel benefit of a different debuginfo method - 
> > which has to be weighed against the pain of introducing a new unwinder. But this 
> > submission does not talk about that aspect at all, which should be fixed I think.
> 
> Actually I devoted an entire one-sentence paragraph to performance in
> the documentation:
> 
>   The simpler debuginfo format also enables the unwinder to be relatively
>   fast, which is important for perf and lockdep.
> 
> But I'll try to highlight that a little more.

That's not what I meant! The speedup comes from (hopefully) being able to disable 
CONFIG_FRAME_POINTER, which:

 - creates simpler/faster function prologues and epilogues - no managing of RBP 
   needed

 - gives one more generic purpose register to work from. This matters less on 
   64-bit kernels but it's a small effect.

I've seen numbers of 1-2% of instruction count reduction in common kernel 
workloads, which would be pretty significant on well cached workloads.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 12:47       ` Josh Poimboeuf
  2017-06-01 13:25         ` Peter Zijlstra
@ 2017-06-01 13:50         ` Andy Lutomirski
  1 sibling, 0 replies; 55+ messages in thread
From: Andy Lutomirski @ 2017-06-01 13:50 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Peter Zijlstra, Ingo Molnar, X86 ML, linux-kernel, live-patching,
	Linus Torvalds, Andy Lutomirski, Jiri Slaby, H. Peter Anvin

On Thu, Jun 1, 2017 at 5:47 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jun 01, 2017 at 02:17:21PM +0200, Peter Zijlstra wrote:
>> On Thu, Jun 01, 2017 at 06:58:20AM -0500, Josh Poimboeuf wrote:
>> > > Being able to generate more optimal code in the hottest code paths of the kernel
>> > > is the _real_, primary upstream kernel benefit of a different debuginfo method -
>> > > which has to be weighed against the pain of introducing a new unwinder. But this
>> > > submission does not talk about that aspect at all, which should be fixed I think.
>> >
>> > Actually I devoted an entire one-sentence paragraph to performance in
>> > the documentation:
>> >
>> >   The simpler debuginfo format also enables the unwinder to be relatively
>> >   fast, which is important for perf and lockdep.
>> >
>> > But I'll try to highlight that a little more.
>>
>> That's relative to a DWARF unwinder.
>
> Yes.
>
>> It doesn't appear to be possible to get anywhere near a frame-pointer
>> unwinder due to having to do this log(n) lookup for every single
>> frame.
>
> Hm, is there something faster, yet not substantially bigger?  Hash?
> Trie?

You have, roughly, a set of (key_start, value) pairs where, for any
given key, you want to find the (key_start, value) with the largest
key_start that doesn't exceed key.  Binary search gives you log_2(n)
queries, but its locality of reference sucks.  Here are two
suggestions for improving it:

1. Change the data layout.  Instead of having an array of undwarf
entries, have two parallel arrays, one with the ip addresses and one
with everything else.  This has no effect on the amount of space used,
but it makes the part used during search more compact.

2. Your key space is fairly small and your table entries should be
reasonably uniformly distributed.  Let the first IP you have unwind
data for be IP0.  Make an array mapping (IP - IP0) / B to the index of
the unwind entry for that IP for some suitable block size B.  Then, to
look up an IP, you'd find the indices of the unwind entries for (IP -
IP0) / B and (IP - IP0) / B + 1 and binary search between them.  With
constant B, this gives you O(1) performance instead of O(log(n)).
With B = 1, it's very fast, but the table is huge.  With B = 64k or
so, maybe you'd get a nice tradeoff of speedup vs size.  (With
modules, you'd presumably first search an rbtree to find which
instance of this data structure you're using and then do the lookup.)

3. Use a B-tree.  B-trees are simple if you don't need to deal with
insertion and deletion.  Presumably you'd choose your internal node
size so each internal node is exactly 64 or 128 bytes for good cache
performance.  This is still O(log(n)) and it uses more comparisons
than binary search, but you touch many fewer cache lines.

I expect that, if you do #1 and #2, you'd get excellent performance at
very little cost to the complexity of the code.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints
  2017-06-01  5:44 ` [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints Josh Poimboeuf
@ 2017-06-01 13:57   ` Andy Lutomirski
  2017-06-01 14:16     ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2017-06-01 13:57 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: X86 ML, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra

On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Some asm (and inline asm) code does special things to the stack which
> objtool can't understand.  (Nor can GCC or GNU assembler, for that
> matter.)  In such cases we need a facility for the code to provide
> annotations, so the unwinder can unwind through it.
>
> This provides such a facility, in the form of CFI hints.  They're
> similar to the GNU assembler .cfi* directives, but they give more
> information, and are needed in far fewer places, because objtool can
> fill in the blanks by following branches and adjusting the stack pointer
> for pushes and pops.

Two minor suggestions:

Could you prefix these with something other than "CFI_"?  For those of
use who have read the binutils manual, using "CFI_" sounds awfully
like .cfi_, and people might expect the semantics to be the same.

> +#define CFI_HINT(cfa_reg, cfa_offset, type)                    \
> +       "999: \n\t"                                             \

Have you checked if 999 is used elsewhere?  My personal preference is to use:

.Ldescriptive_text_\@:

instead of a hopefully-unique number.  I never researched the history,
but I suspect that the convention of using large numbers came from
early binutils versions that didn't have \@, but we use \@ fairly
extensively in the kernel these days, so it would seem that we no
longer support binutils versions that old.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 13:50     ` Ingo Molnar
@ 2017-06-01 13:58       ` Jiri Slaby
  2017-06-02  8:30         ` Jiri Slaby
  2017-06-01 14:05       ` Josh Poimboeuf
  2017-06-01 14:08       ` Jiri Slaby
  2 siblings, 1 reply; 55+ messages in thread
From: Jiri Slaby @ 2017-06-01 13:58 UTC (permalink / raw)
  To: Ingo Molnar, Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, H. Peter Anvin, Peter Zijlstra

On 06/01/2017, 03:50 PM, Ingo Molnar wrote:
> That's not what I meant! The speedup comes from (hopefully) being able to disable 
> CONFIG_FRAME_POINTER, which:

BTW when you are mentioning this -- my measurements were with FP disabled.

Is there any reasonably simple-to-use benchmark I could run with FP=y
and FP=n quickly?

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations
  2017-06-01  5:44 ` [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations Josh Poimboeuf
@ 2017-06-01 14:03   ` Andy Lutomirski
  2017-06-01 14:23     ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2017-06-01 14:03 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: X86 ML, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra

On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Add CFI hint undwarf annotations to entry_64.S.  This will enable the
> undwarf unwinder to unwind through any location in the entry code
> including syscalls, interrupts, and exceptions.
>
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/entry/Makefile   |  1 -
>  arch/x86/entry/calling.h  |  5 +++++
>  arch/x86/entry/entry_64.S | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>  3 files changed, 55 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
> index 9976fce..af28a8a 100644
> --- a/arch/x86/entry/Makefile
> +++ b/arch/x86/entry/Makefile
> @@ -2,7 +2,6 @@
>  # Makefile for the x86 low level entry code
>  #
>
> -OBJECT_FILES_NON_STANDARD_entry_$(BITS).o   := y
>  OBJECT_FILES_NON_STANDARD_entry_64_compat.o := y
>
>  CFLAGS_syscall_64.o            += $(call cc-option,-Wno-override-init,)
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 05ed3d3..bbec02e 100644
> --- a/arch/x86/entry/calling.h
> +++ b/arch/x86/entry/calling.h
> @@ -1,4 +1,6 @@
>  #include <linux/jump_label.h>
> +#include <asm/undwarf.h>
> +
>
>  /*
>

Just to make sure I understand this, if we unwind from...

> @@ -112,6 +114,7 @@ For 32-bit we have the following conventions - kernel is built with
>         movq %rdx, 12*8+\offset(%rsp)
>         movq %rsi, 13*8+\offset(%rsp)

...here..., will objtool think that rdx and rsi (etc) still live in
their respective regs, or will it find them in the on-stack data given
by CFI_REGS?  If the former, how does undwarf deal with the
corresponding pops?

>         movq %rdi, 14*8+\offset(%rsp)
> +       CFI_REGS offset=\offset extra=0

> @@ -414,6 +424,7 @@ ENTRY(ret_from_fork)
>  2:
>         movq    %rsp, %rdi
>         call    syscall_return_slowpath /* returns with IRQs disabled */
> +       CFI_REGS

I'm confused.  syscall_return_slowpath didn't change anything relevant
to unwinding, right?  What's CFI_REGS here for?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 13:50     ` Ingo Molnar
  2017-06-01 13:58       ` Jiri Slaby
@ 2017-06-01 14:05       ` Josh Poimboeuf
  2017-06-01 14:08       ` Jiri Slaby
  2 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 14:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 01, 2017 at 03:50:05PM +0200, Ingo Molnar wrote:
> * Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jun 01, 2017 at 08:08:24AM +0200, Ingo Molnar wrote:
> > > Being able to generate more optimal code in the hottest code paths of the kernel 
> > > is the _real_, primary upstream kernel benefit of a different debuginfo method - 
> > > which has to be weighed against the pain of introducing a new unwinder. But this 
> > > submission does not talk about that aspect at all, which should be fixed I think.
> > 
> > Actually I devoted an entire one-sentence paragraph to performance in
> > the documentation:
> > 
> >   The simpler debuginfo format also enables the unwinder to be relatively
> >   fast, which is important for perf and lockdep.
> > 
> > But I'll try to highlight that a little more.
> 
> That's not what I meant! The speedup comes from (hopefully) being able to disable 
> CONFIG_FRAME_POINTER, which:
> 
>  - creates simpler/faster function prologues and epilogues - no managing of RBP 
>    needed
> 
>  - gives one more generic purpose register to work from. This matters less on 
>    64-bit kernels but it's a small effect.
> 
> I've seen numbers of 1-2% of instruction count reduction in common kernel 
> workloads, which would be pretty significant on well cached workloads.

Ah, you meant runtime performance with FP disabled.  I also dedicated a
whole sentence to that one :-)

  Unlike frame pointers, the debuginfo is out-of-band, so it has no
  effect on runtime performance.

I'll try to flesh that out and maybe come up with some numbers.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 13:50     ` Ingo Molnar
  2017-06-01 13:58       ` Jiri Slaby
  2017-06-01 14:05       ` Josh Poimboeuf
@ 2017-06-01 14:08       ` Jiri Slaby
  2017-06-02 10:40         ` Mel Gorman
  2 siblings, 1 reply; 55+ messages in thread
From: Jiri Slaby @ 2017-06-01 14:08 UTC (permalink / raw)
  To: Ingo Molnar, Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, H. Peter Anvin, Peter Zijlstra, Mel Gorman

Ccing Mel who did proper measurements and can hopefully comment on his
results.

On 06/01/2017, 03:50 PM, Ingo Molnar wrote:
> That's not what I meant! The speedup comes from (hopefully) being able to disable 
> CONFIG_FRAME_POINTER, which:
> 
>  - creates simpler/faster function prologues and epilogues - no managing of RBP 
>    needed
> 
>  - gives one more generic purpose register to work from. This matters less on 
>    64-bit kernels but it's a small effect.
> 
> I've seen numbers of 1-2% of instruction count reduction in common kernel 
> workloads, which would be pretty significant on well cached workloads.

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints
  2017-06-01 13:57   ` Andy Lutomirski
@ 2017-06-01 14:16     ` Josh Poimboeuf
  2017-06-01 14:40       ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 14:16 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, live-patching, Linus Torvalds, Jiri Slaby,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 01, 2017 at 06:57:24AM -0700, Andy Lutomirski wrote:
> On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > Some asm (and inline asm) code does special things to the stack which
> > objtool can't understand.  (Nor can GCC or GNU assembler, for that
> > matter.)  In such cases we need a facility for the code to provide
> > annotations, so the unwinder can unwind through it.
> >
> > This provides such a facility, in the form of CFI hints.  They're
> > similar to the GNU assembler .cfi* directives, but they give more
> > information, and are needed in far fewer places, because objtool can
> > fill in the blanks by following branches and adjusting the stack pointer
> > for pushes and pops.
> 
> Two minor suggestions:
> 
> Could you prefix these with something other than "CFI_"?  For those of
> use who have read the binutils manual, using "CFI_" sounds awfully
> like .cfi_, and people might expect the semantics to be the same.

The intention was that even if this undwarf thing doesn't work out, the
CFI_ macros could still be used by objtool to generate proper DWARF.
Would prefixing them with CFI_HINT_ be better?  Or UNWIND_HINT_?

> > +#define CFI_HINT(cfa_reg, cfa_offset, type)                    \
> > +       "999: \n\t"                                             \
> 
> Have you checked if 999 is used elsewhere?  My personal preference is to use:
> 
> .Ldescriptive_text_\@:
> 
> instead of a hopefully-unique number.  I never researched the history,
> but I suspect that the convention of using large numbers came from
> early binutils versions that didn't have \@, but we use \@ fairly
> extensively in the kernel these days, so it would seem that we no
> longer support binutils versions that old.

Yeah, that would be a lot better, thanks.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations
  2017-06-01 14:03   ` Andy Lutomirski
@ 2017-06-01 14:23     ` Josh Poimboeuf
  2017-06-01 14:28       ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 14:23 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, live-patching, Linus Torvalds, Jiri Slaby,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 01, 2017 at 07:03:18AM -0700, Andy Lutomirski wrote:
> On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > Add CFI hint undwarf annotations to entry_64.S.  This will enable the
> > undwarf unwinder to unwind through any location in the entry code
> > including syscalls, interrupts, and exceptions.
> >
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> >  arch/x86/entry/Makefile   |  1 -
> >  arch/x86/entry/calling.h  |  5 +++++
> >  arch/x86/entry/entry_64.S | 56 ++++++++++++++++++++++++++++++++++++++++++-----
> >  3 files changed, 55 insertions(+), 7 deletions(-)
> >
> > diff --git a/arch/x86/entry/Makefile b/arch/x86/entry/Makefile
> > index 9976fce..af28a8a 100644
> > --- a/arch/x86/entry/Makefile
> > +++ b/arch/x86/entry/Makefile
> > @@ -2,7 +2,6 @@
> >  # Makefile for the x86 low level entry code
> >  #
> >
> > -OBJECT_FILES_NON_STANDARD_entry_$(BITS).o   := y
> >  OBJECT_FILES_NON_STANDARD_entry_64_compat.o := y
> >
> >  CFLAGS_syscall_64.o            += $(call cc-option,-Wno-override-init,)
> > diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> > index 05ed3d3..bbec02e 100644
> > --- a/arch/x86/entry/calling.h
> > +++ b/arch/x86/entry/calling.h
> > @@ -1,4 +1,6 @@
> >  #include <linux/jump_label.h>
> > +#include <asm/undwarf.h>
> > +
> >
> >  /*
> >
> 
> Just to make sure I understand this, if we unwind from...
> 
> > @@ -112,6 +114,7 @@ For 32-bit we have the following conventions - kernel is built with
> >         movq %rdx, 12*8+\offset(%rsp)
> >         movq %rsi, 13*8+\offset(%rsp)
> 
> ...here..., will objtool think that rdx and rsi (etc) still live in
> their respective regs, or will it find them in the on-stack data given
> by CFI_REGS?  If the former, how does undwarf deal with the
> corresponding pops?

It will find them in their respective registers, which is fine because
they haven't been clobbered yet.

> 
> >         movq %rdi, 14*8+\offset(%rsp)
> > +       CFI_REGS offset=\offset extra=0

And here it will find them on the stack.

> > @@ -414,6 +424,7 @@ ENTRY(ret_from_fork)
> >  2:
> >         movq    %rsp, %rdi
> >         call    syscall_return_slowpath /* returns with IRQs disabled */
> > +       CFI_REGS
> 
> I'm confused.  syscall_return_slowpath didn't change anything relevant
> to unwinding, right?  What's CFI_REGS here for?

Yes, you're right, this CFI_REGS should be right at the '2' label.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations
  2017-06-01 14:23     ` Josh Poimboeuf
@ 2017-06-01 14:28       ` Josh Poimboeuf
  2017-06-01 14:39         ` Andy Lutomirski
  0 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 14:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, live-patching, Linus Torvalds, Jiri Slaby,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 01, 2017 at 09:23:58AM -0500, Josh Poimboeuf wrote:
> On Thu, Jun 01, 2017 at 07:03:18AM -0700, Andy Lutomirski wrote:
> > On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > Just to make sure I understand this, if we unwind from...
> > 
> > > @@ -112,6 +114,7 @@ For 32-bit we have the following conventions - kernel is built with
> > >         movq %rdx, 12*8+\offset(%rsp)
> > >         movq %rsi, 13*8+\offset(%rsp)
> > 
> > ...here..., will objtool think that rdx and rsi (etc) still live in
> > their respective regs, or will it find them in the on-stack data given
> > by CFI_REGS?  If the former, how does undwarf deal with the
> > corresponding pops?
> 
> It will find them in their respective registers, which is fine because
> they haven't been clobbered yet.

Sorry, I hit send too soon.  Which pops are you referring to?

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations
  2017-06-01 14:28       ` Josh Poimboeuf
@ 2017-06-01 14:39         ` Andy Lutomirski
  2017-06-01 15:01           ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2017-06-01 14:39 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, X86 ML, linux-kernel, live-patching,
	Linus Torvalds, Jiri Slaby, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra

On Thu, Jun 1, 2017 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jun 01, 2017 at 09:23:58AM -0500, Josh Poimboeuf wrote:
>> On Thu, Jun 01, 2017 at 07:03:18AM -0700, Andy Lutomirski wrote:
>> > On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > Just to make sure I understand this, if we unwind from...
>> >
>> > > @@ -112,6 +114,7 @@ For 32-bit we have the following conventions - kernel is built with
>> > >         movq %rdx, 12*8+\offset(%rsp)
>> > >         movq %rsi, 13*8+\offset(%rsp)
>> >
>> > ...here..., will objtool think that rdx and rsi (etc) still live in
>> > their respective regs, or will it find them in the on-stack data given
>> > by CFI_REGS?  If the former, how does undwarf deal with the
>> > corresponding pops?
>>
>> It will find them in their respective registers, which is fine because
>> they haven't been clobbered yet.
>
> Sorry, I hit send too soon.  Which pops are you referring to?
>

If we do push, push, push, CFI_REGS, and then, later, we pop all those
saved regs, how does undwarf figure out that those pops are moving a
saved reg from the stack back to a register?  Is objtool just that
smart, or did I fail to notice an annotation somewhere, or does it not
matter?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints
  2017-06-01 14:16     ` Josh Poimboeuf
@ 2017-06-01 14:40       ` Andy Lutomirski
  2017-06-01 15:02         ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Andy Lutomirski @ 2017-06-01 14:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, X86 ML, linux-kernel, live-patching,
	Linus Torvalds, Jiri Slaby, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra

On Thu, Jun 1, 2017 at 7:16 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jun 01, 2017 at 06:57:24AM -0700, Andy Lutomirski wrote:
>> On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > Some asm (and inline asm) code does special things to the stack which
>> > objtool can't understand.  (Nor can GCC or GNU assembler, for that
>> > matter.)  In such cases we need a facility for the code to provide
>> > annotations, so the unwinder can unwind through it.
>> >
>> > This provides such a facility, in the form of CFI hints.  They're
>> > similar to the GNU assembler .cfi* directives, but they give more
>> > information, and are needed in far fewer places, because objtool can
>> > fill in the blanks by following branches and adjusting the stack pointer
>> > for pushes and pops.
>>
>> Two minor suggestions:
>>
>> Could you prefix these with something other than "CFI_"?  For those of
>> use who have read the binutils manual, using "CFI_" sounds awfully
>> like .cfi_, and people might expect the semantics to be the same.
>
> The intention was that even if this undwarf thing doesn't work out, the
> CFI_ macros could still be used by objtool to generate proper DWARF.
> Would prefixing them with CFI_HINT_ be better?  Or UNWIND_HINT_?

This has nothing to do with the data format or implementation.   I
just think that "CFI_" suggests that they're semantically equivalent
to binutils' .cfi directives.  If they're not, then maybe UNWIND_HINT
is better.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations
  2017-06-01 14:39         ` Andy Lutomirski
@ 2017-06-01 15:01           ` Josh Poimboeuf
  0 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 15:01 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, live-patching, Linus Torvalds, Jiri Slaby,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 01, 2017 at 07:39:38AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 1, 2017 at 7:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jun 01, 2017 at 09:23:58AM -0500, Josh Poimboeuf wrote:
> >> On Thu, Jun 01, 2017 at 07:03:18AM -0700, Andy Lutomirski wrote:
> >> > On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > Just to make sure I understand this, if we unwind from...
> >> >
> >> > > @@ -112,6 +114,7 @@ For 32-bit we have the following conventions - kernel is built with
> >> > >         movq %rdx, 12*8+\offset(%rsp)
> >> > >         movq %rsi, 13*8+\offset(%rsp)
> >> >
> >> > ...here..., will objtool think that rdx and rsi (etc) still live in
> >> > their respective regs, or will it find them in the on-stack data given
> >> > by CFI_REGS?  If the former, how does undwarf deal with the
> >> > corresponding pops?
> >>
> >> It will find them in their respective registers, which is fine because
> >> they haven't been clobbered yet.
> >
> > Sorry, I hit send too soon.  Which pops are you referring to?
> >
> 
> If we do push, push, push, CFI_REGS, and then, later, we pop all those
> saved regs, how does undwarf figure out that those pops are moving a
> saved reg from the stack back to a register?  Is objtool just that
> smart, or did I fail to notice an annotation somewhere, or does it not
> matter?

RESTORE_EXTRA_REGS has an annotation that attempts to do that, though
CFI_REGS ignores the 'extra' arg so there's a bug there.  It should
resolve to a CFI_IRET_REGS annotation with an offset because the
unwinder doesn't care about C regs.

I'll fix that and make the save/restore annotations more symmetrical.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints
  2017-06-01 14:40       ` Andy Lutomirski
@ 2017-06-01 15:02         ` Josh Poimboeuf
  0 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 15:02 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: X86 ML, linux-kernel, live-patching, Linus Torvalds, Jiri Slaby,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 01, 2017 at 07:40:47AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 1, 2017 at 7:16 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jun 01, 2017 at 06:57:24AM -0700, Andy Lutomirski wrote:
> >> On Wed, May 31, 2017 at 10:44 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > Some asm (and inline asm) code does special things to the stack which
> >> > objtool can't understand.  (Nor can GCC or GNU assembler, for that
> >> > matter.)  In such cases we need a facility for the code to provide
> >> > annotations, so the unwinder can unwind through it.
> >> >
> >> > This provides such a facility, in the form of CFI hints.  They're
> >> > similar to the GNU assembler .cfi* directives, but they give more
> >> > information, and are needed in far fewer places, because objtool can
> >> > fill in the blanks by following branches and adjusting the stack pointer
> >> > for pushes and pops.
> >>
> >> Two minor suggestions:
> >>
> >> Could you prefix these with something other than "CFI_"?  For those of
> >> use who have read the binutils manual, using "CFI_" sounds awfully
> >> like .cfi_, and people might expect the semantics to be the same.
> >
> > The intention was that even if this undwarf thing doesn't work out, the
> > CFI_ macros could still be used by objtool to generate proper DWARF.
> > Would prefixing them with CFI_HINT_ be better?  Or UNWIND_HINT_?
> 
> This has nothing to do with the data format or implementation.   I
> just think that "CFI_" suggests that they're semantically equivalent
> to binutils' .cfi directives.  If they're not, then maybe UNWIND_HINT
> is better.

Ok, I'll go with the UNWIND_HINT_ prefix.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01 13:12       ` Peter Zijlstra
@ 2017-06-01 15:03         ` Josh Poimboeuf
  0 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-01 15:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Jiri Slaby, Ingo Molnar, H. Peter Anvin

On Thu, Jun 01, 2017 at 03:12:04PM +0200, Peter Zijlstra wrote:
> > Based on your previous comment I was thinking I would disable preemption
> > for the entire unwind_next_frame() step, but not *between* steps.  I
> > suppose we could require the unwind caller to disable preemption but I'd
> > like to avoid that if possible.
> 
> Right, keeping it disabled across a frame should be ok I suppose.

But then we'd either have to require the unwind user to explicitly
disable preemption, or we'd need to add a new unwind_end() interface
which the caller would be required to use when they're done unwinding.
Neither is ideal.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 13:58       ` Jiri Slaby
@ 2017-06-02  8:30         ` Jiri Slaby
  0 siblings, 0 replies; 55+ messages in thread
From: Jiri Slaby @ 2017-06-02  8:30 UTC (permalink / raw)
  To: Ingo Molnar, Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, H. Peter Anvin, Peter Zijlstra

On 06/01/2017, 03:58 PM, Jiri Slaby wrote:
> On 06/01/2017, 03:50 PM, Ingo Molnar wrote:
>> That's not what I meant! The speedup comes from (hopefully) being able to disable 
>> CONFIG_FRAME_POINTER, which:
> 
> BTW when you are mentioning this -- my measurements were with FP disabled.
> 
> Is there any reasonably simple-to-use benchmark I could run with FP=y
> and FP=n quickly?

Nevermind, I tried 10000000 stack unwindings several times and ran
netperf too. Both on the same virtual machine. On these microbenchmarks,
the former performs ~ 1.03 times better, the latter ~ 1.3 times. When
Mel measured the difference, it was around 10 % overall using more
sophisticated benchmarks.

With FP=n:

# time echo 10000000 > /dev/test_dwarf
real    0m6.659s
user    0m0.000s
sys     0m6.655s

# for aa in 1 0 0 0 0; do netperf -P $aa ; done
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
localhost () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    47819.30
 87380  16384  16384    10.00    41991.01
 87380  16384  16384    10.00    43607.82
 87380  16384  16384    10.00    42208.44
 87380  16384  16384    10.00    44383.92




With FP=y:

# time echo 10000000 > /dev/test_dwarf
real    0m6.869s
user    0m0.000s
sys     0m6.868s

# for aa in 1 0 0 0 0; do netperf -P $aa ; done
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
localhost () port 0 AF_INET
Recv   Send    Send
Socket Socket  Message  Elapsed
Size   Size    Size     Time     Throughput
bytes  bytes   bytes    secs.    10^6bits/sec

 87380  16384  16384    10.00    37807.90
 87380  16384  16384    10.00    32246.67
 87380  16384  16384    10.00    31358.76
 87380  16384  16384    10.00    32450.00
 87380  16384  16384    10.00    31326.70





thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 14:08       ` Jiri Slaby
@ 2017-06-02 10:40         ` Mel Gorman
  0 siblings, 0 replies; 55+ messages in thread
From: Mel Gorman @ 2017-06-02 10:40 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Ingo Molnar, Josh Poimboeuf, x86, linux-kernel, live-patching,
	Linus Torvalds, Andy Lutomirski, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 01, 2017 at 04:08:25PM +0200, Jiri Slaby wrote:
> Ccing Mel who did proper measurements and can hopefully comment on his
> results.
> 
> On 06/01/2017, 03:50 PM, Ingo Molnar wrote:
> > That's not what I meant! The speedup comes from (hopefully) being able to disable 
> > CONFIG_FRAME_POINTER, which:
> > 
> >  - creates simpler/faster function prologues and epilogues - no managing of RBP 
> >    needed
> > 
> >  - gives one more generic purpose register to work from. This matters less on 
> >    64-bit kernels but it's a small effect.
> > 
> > I've seen numbers of 1-2% of instruction count reduction in common kernel 
> > workloads, which would be pretty significant on well cached workloads.
> 

I didn't preserve the data involved but in a variety of workloads including
netperf, page allocator microbenchmark, pgbench and sqlite, enabling
framepointer introduced overhead of around the 5-10% mark. According
to an internal report I gave at the time, hackbench-thread-sockets was
around the 5% mark and a perf run showed "3.49% more cache misses with
framepointer enabled and 6.59% more cycles". Additional notes I made at
the time although again, without the original data is

---8<---
It looks like a small amount of overhead added everywhere and the size of
the vmlinux files supports that

   text    data     bss     dec     hex   filename
8143072 6480614 11153408 25777094 18953c6 vmlinux/decker/vmlinux-4.8.0-disable-fp
8396698 6480614 11153408 26030720 18d3280 vmlinux/decker/vmlinux-4.8.0-enable-fp

I also took a closer look at the pagealloc microbenchmarks because they
rely on so few functions. Profiles were not always captured due to the
short-lived nature of some of the tests so I looked at batches of 16384
allocation/frees of order-0 pages. Overall it showed 4.46% decline with
framepointer enabled and profiling. 3.89% more cycles and 24.94% more
cache misses.

As before, the framepointer cache miss overhead is not that obvious as
the bulk of samples take place elsewhere -- in this case, in checking
whether pages are buddies when merging. It's slightly clearer in
__rmqueue where 17.9% of cache misses are in the function entry point
with framepointer enabled vs 4.04% with framepointer disabled.
---8<---

Granted, the check was done back in 4.8, but I've no reason to believe
that 4.12 is any different and enabling framepointer does have a quite
substantial hit to workloads that spent a lot of time in the kernel.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 00/10] x86: undwarf unwinder
  2017-06-01 13:25         ` Peter Zijlstra
@ 2017-06-06 14:14           ` Sergey Senozhatsky
  0 siblings, 0 replies; 55+ messages in thread
From: Sergey Senozhatsky @ 2017-06-06 14:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Josh Poimboeuf, Ingo Molnar, x86, linux-kernel, live-patching,
	Linus Torvalds, Andy Lutomirski, Jiri Slaby, H. Peter Anvin,
	Sergey Senozhatsky

On (06/01/17 15:25), Peter Zijlstra wrote:
[..]
> On Thu, Jun 01, 2017 at 07:47:05AM -0500, Josh Poimboeuf wrote:
> 
> > > It doesn't appear to be possible to get anywhere near a frame-pointer
> > > unwinder due to having to do this log(n) lookup for every single
> > > frame.
> > 
> > Hm, is there something faster, yet not substantially bigger?  Hash?
> > Trie?
> 
> Not sure how to make a Hash work with nearest neighbour searches. And a
> trie will only give you a constant speedup over the binary search but
> not an improvement in complexity IIRC.

by the way, as far as I know, there is *a bit* faster bsearch(). basically
there is a way to calculate pivot (middle element) using less instructions.

something like below, perhaps... may be can give some extra performance.
(not really tested. but I believe this is close to what gcc does in
libstdc++).


======

./scripts/bloat-o-meter lib/bsearch.o.old lib/bsearch.o.new
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-24 (-24)
function                                     old     new   delta
bsearch                                      122      98     -24

---
 lib/bsearch.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/lib/bsearch.c b/lib/bsearch.c
index e33c179089db..18b445b010c3 100644
--- a/lib/bsearch.c
+++ b/lib/bsearch.c
@@ -33,19 +33,21 @@
 void *bsearch(const void *key, const void *base, size_t num, size_t size,
 	      int (*cmp)(const void *key, const void *elt))
 {
-	size_t start = 0, end = num;
+	const char *pivot;
 	int result;
 
-	while (start < end) {
-		size_t mid = start + (end - start) / 2;
+	while (num > 0) {
+		pivot = base + (num >> 1) * size;
+		result = cmp(key, pivot);
 
-		result = cmp(key, base + mid * size);
-		if (result < 0)
-			end = mid;
-		else if (result > 0)
-			start = mid + 1;
-		else
-			return (void *)base + mid * size;
+		if (result == 0)
+			return (void *)pivot;
+
+		if (result > 0) {
+			base = pivot + size;
+			num--;
+		}
+		num >>= 1;
 	}
 
 	return NULL;
-- 
2.13.1

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 01/10] objtool: move checking code to check.c
  2017-06-01  5:44 ` [RFC PATCH 01/10] objtool: move checking code to check.c Josh Poimboeuf
@ 2017-06-14  7:22   ` Jiri Slaby
  0 siblings, 0 replies; 55+ messages in thread
From: Jiri Slaby @ 2017-06-14  7:22 UTC (permalink / raw)
  To: Josh Poimboeuf, x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On 06/01/2017, 07:44 AM, Josh Poimboeuf wrote:
> In preparation for the new 'objtool undwarf generate' command, which
> will rely on 'objtool check', move the checking code from
> builtin-check.c to check.c where it can be used by other commands.
> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>

Reviewed-by: Jiri Slaby <jslaby@suse.cz>

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 02/10] objtool, x86: add several functions and files to the objtool whitelist
  2017-06-01  5:44 ` [RFC PATCH 02/10] objtool, x86: add several functions and files to the objtool whitelist Josh Poimboeuf
@ 2017-06-14  7:24   ` Jiri Slaby
  2017-06-14 13:03     ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Slaby @ 2017-06-14  7:24 UTC (permalink / raw)
  To: Josh Poimboeuf, x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On 06/01/2017, 07:44 AM, Josh Poimboeuf wrote:
...
> --- a/arch/x86/kernel/kprobes/opt.c
> +++ b/arch/x86/kernel/kprobes/opt.c
> @@ -28,6 +28,7 @@
>  #include <linux/kdebug.h>
>  #include <linux/kallsyms.h>
>  #include <linux/ftrace.h>
> +#include <linux/frame.h>
>  
>  #include <asm/text-patching.h>
>  #include <asm/cacheflush.h>
> @@ -94,6 +95,7 @@ static void synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long val)
>  }
>  
>  asm (
> +			"optprobe_template_func:\n"

Why do you add another symbol here? What's wrong with
optprobe_template_entry?

>  			".global optprobe_template_entry\n"
>  			"optprobe_template_entry:\n"
>  #ifdef CONFIG_X86_64
> @@ -131,7 +133,12 @@ asm (
>  			"	popf\n"
>  #endif
>  			".global optprobe_template_end\n"
> -			"optprobe_template_end:\n");
> +			"optprobe_template_end:\n"
> +			".type optprobe_template_func, @function\n"
> +			".size optprobe_template_func, .-optprobe_template_func\n");
> +
> +void optprobe_template_func(void);
> +STACK_FRAME_NON_STANDARD(optprobe_template_func);
>  
>  #define TMPL_MOVE_IDX \
>  	((long)&optprobe_template_val - (long)&optprobe_template_entry)

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 04/10] objtool: add undwarf debuginfo generation
  2017-06-01  5:44 ` [RFC PATCH 04/10] objtool: add undwarf debuginfo generation Josh Poimboeuf
@ 2017-06-14  8:42   ` Jiri Slaby
  2017-06-14 13:27     ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Slaby @ 2017-06-14  8:42 UTC (permalink / raw)
  To: Josh Poimboeuf, x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On 06/01/2017, 07:44 AM, Josh Poimboeuf wrote:
...
> index 3fb0747..1ca5d9a 100644
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c
...
> +int elf_write_to_file(struct elf *elf, char *outfile)
> +{
> +	int fd;
> +	struct section *sec;
> +	Elf *elfout;
> +	GElf_Ehdr eh, ehout;
> +	Elf_Scn *scn;
> +	Elf_Data *data;
> +	GElf_Shdr sh;
> +
> +	fd = creat(outfile, 0777);

0755 even though it is umasked?

> +	if (fd == -1) {
> +		perror("creat");
> +		return -1;
> +	}
> +
> +	elfout = elf_begin(fd, ELF_C_WRITE, NULL);
> +	if (!elfout) {
> +		perror("elf_begin");
> +		return -1;
> +	}
> +
> +	if (!gelf_newehdr(elfout, gelf_getclass(elf->elf))) {
> +		perror("gelf_newehdr");
> +		return -1;
> +	}
> +
> +	if (!gelf_getehdr(elfout, &ehout)) {

This does not make much sense to do. You memset(0) it below.

> +		perror("gelf_getehdr");
> +		return -1;
> +	}
> +
> +	if (!gelf_getehdr(elf->elf, &eh)) {
> +		perror("gelf_getehdr");
> +		return -1;
> +	}
> +
> +	memset(&ehout, 0, sizeof(ehout));
> +	ehout.e_ident[EI_DATA] = eh.e_ident[EI_DATA];
> +	ehout.e_machine = eh.e_machine;
> +	ehout.e_type = eh.e_type;
> +	ehout.e_version = EV_CURRENT;
> +	ehout.e_shstrndx = find_section_by_name(elf, ".shstrtab")->idx;
> +
> +	list_for_each_entry(sec, &elf->sections, list) {
> +		if (sec->idx == 0)
> +			continue;
> +
> +		scn = elf_newscn(elfout);
> +		if (!scn) {
> +			perror("elf_newscn");
> +			return -1;
> +		}
> +
> +		data = elf_newdata(scn);
> +		if (!data) {
> +			perror("elf_newdata");
> +			return -1;
> +		}
> +
> +		if (!elf_flagdata(data, ELF_C_SET, ELF_F_DIRTY)) {
> +			perror("elf_flagdata");
> +			return -1;
> +		}

There is not much point setting DIRTY flag here. elf_newdata does so.

> +		data->d_type = sec->data->d_type;
> +		data->d_buf = sec->data->d_buf;
> +		data->d_size = sec->data->d_size;
> +
> +		if(!gelf_getshdr(scn, &sh)) {
> +			perror("gelf_getshdr");
> +			return -1;
> +		}

This does not make much sense to do again. You overwrite the content
right away:

> +		sh = sec->sh;
> +
> +		if (!gelf_update_shdr(scn, &sh)) {
> +			perror("gelf_update_shdr");
> +			return -1;
> +		}
> +	}
> +
> +	if (!gelf_update_ehdr(elfout, &ehout)) {
> +		perror("gelf_update_ehdr");
> +		return -1;
> +	}
> +
> +	if (elf_update(elfout, ELF_C_WRITE) < 0) {
> +		perror("elf_update");
> +		return -1;
> +	}

elf_end() + close() ?

> +
> +	return 0;
> +}
> +
>  void elf_close(struct elf *elf)
>  {
>  	struct section *sec, *tmpsec;

...

> --- /dev/null
> +++ b/tools/objtool/undwarf.c
> @@ -0,0 +1,308 @@
...
> +int undwarf_dump(const char *_objname)
> +{
> +	struct elf *elf;
> +	struct section *sec;
> +	struct rela *rela;
> +	struct undwarf *undwarf;
> +	int nr, i;
> +
> +	objname = _objname;
> +
> +	elf = elf_open(objname);
> +	if (!elf) {
> +		WARN("error reading elf file %s\n", objname);
> +		return 1;
> +	}
> +
> +	sec = find_section_by_name(elf, ".undwarf");
> +	if (!sec || !sec->rela)
> +		return 0;
> +
> +	nr = sec->len / sizeof(*undwarf);
> +	for (i = 0; i < nr; i++) {
...
> +	}

elf_close() ?

> +
> +	return 0;
> +}

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-01  5:44 ` [RFC PATCH 10/10] x86/unwind: add undwarf unwinder Josh Poimboeuf
  2017-06-01 11:05   ` Peter Zijlstra
  2017-06-01 12:13   ` Peter Zijlstra
@ 2017-06-14 11:45   ` Jiri Slaby
  2017-06-14 13:44     ` Josh Poimboeuf
  2 siblings, 1 reply; 55+ messages in thread
From: Jiri Slaby @ 2017-06-14 11:45 UTC (permalink / raw)
  To: Josh Poimboeuf, x86
  Cc: linux-kernel, live-patching, Linus Torvalds, Andy Lutomirski,
	Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On 06/01/2017, 07:44 AM, Josh Poimboeuf wrote:
> --- /dev/null
> +++ b/arch/x86/kernel/unwind_undwarf.c
> @@ -0,0 +1,402 @@
...
> +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> +		    struct pt_regs *regs, unsigned long *first_frame)
> +{
> +	memset(state, 0, sizeof(*state));
> +	state->task = task;
> +
> +	if (regs) {
> +		if (user_mode(regs)) {
> +			state->stack_info.type = STACK_TYPE_UNKNOWN;
> +			return;
> +		}
> +
> +		state->ip = regs->ip;
> +		state->sp = kernel_stack_pointer(regs);
> +		state->bp = regs->bp;
> +		state->regs = regs;
> +
> +	} else if (task == current) {
> +		register void *__sp asm(_ASM_SP);
> +
> +		asm volatile("lea (%%rip), %0\n\t"
> +			     "mov %%rsp, %1\n\t"
> +			     "mov %%rbp, %2\n\t"
> +			     : "=r" (state->ip), "=r" (state->sp),
> +			       "=r" (state->bp), "+r" (__sp));

Maybe I don't understand this completely, but what is __sp used for here?

> +		state->regs = NULL;
> +
> +	} else {

In DWARF unwinder, we also used to do here:

+#ifdef CONFIG_SMP
+       } else if (task->on_cpu) {
+               return;
+#endif
        } else {

> +		struct inactive_task_frame *frame = (void *)task->thread.sp;

Since there is no inactive_task_frame for tasks currently running (on
other CPUs). At least this always held in the past.

Though, the test is indeed racy.

> +		state->ip = frame->ret_addr;
> +		state->sp = task->thread.sp;
> +		state->bp = frame->bp;
> +		state->regs = NULL;
> +	}

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 02/10] objtool, x86: add several functions and files to the objtool whitelist
  2017-06-14  7:24   ` Jiri Slaby
@ 2017-06-14 13:03     ` Josh Poimboeuf
  0 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-14 13:03 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Wed, Jun 14, 2017 at 09:24:39AM +0200, Jiri Slaby wrote:
> On 06/01/2017, 07:44 AM, Josh Poimboeuf wrote:
> ...
> > --- a/arch/x86/kernel/kprobes/opt.c
> > +++ b/arch/x86/kernel/kprobes/opt.c
> > @@ -28,6 +28,7 @@
> >  #include <linux/kdebug.h>
> >  #include <linux/kallsyms.h>
> >  #include <linux/ftrace.h>
> > +#include <linux/frame.h>
> >  
> >  #include <asm/text-patching.h>
> >  #include <asm/cacheflush.h>
> > @@ -94,6 +95,7 @@ static void synthesize_set_arg1(kprobe_opcode_t *addr, unsigned long val)
> >  }
> >  
> >  asm (
> > +			"optprobe_template_func:\n"
> 
> Why do you add another symbol here? What's wrong with
> optprobe_template_entry?

I tried to do that, but the STACK_FRAME_NON_STANDARD macro needs a
function, and optprobe_template_entry is defined elsewhere as a data
symbol with a type of kprobe_opcode_t.  So I had to wrap the asm code
inside a function.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 04/10] objtool: add undwarf debuginfo generation
  2017-06-14  8:42   ` Jiri Slaby
@ 2017-06-14 13:27     ` Josh Poimboeuf
  2017-06-22  7:47       ` Jiri Slaby
  0 siblings, 1 reply; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-14 13:27 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Wed, Jun 14, 2017 at 10:42:19AM +0200, Jiri Slaby wrote:
> On 06/01/2017, 07:44 AM, Josh Poimboeuf wrote:
> ...
> > index 3fb0747..1ca5d9a 100644
> > --- a/tools/objtool/elf.c
> > +++ b/tools/objtool/elf.c
> ...
> > +int elf_write_to_file(struct elf *elf, char *outfile)
> > +{
> > +	int fd;
> > +	struct section *sec;
> > +	Elf *elfout;
> > +	GElf_Ehdr eh, ehout;
> > +	Elf_Scn *scn;
> > +	Elf_Data *data;
> > +	GElf_Shdr sh;
> > +
> > +	fd = creat(outfile, 0777);
> 
> 0755 even though it is umasked?
> 
> > +	if (fd == -1) {
> > +		perror("creat");
> > +		return -1;
> > +	}
> > +
> > +	elfout = elf_begin(fd, ELF_C_WRITE, NULL);
> > +	if (!elfout) {
> > +		perror("elf_begin");
> > +		return -1;
> > +	}
> > +
> > +	if (!gelf_newehdr(elfout, gelf_getclass(elf->elf))) {
> > +		perror("gelf_newehdr");
> > +		return -1;
> > +	}
> > +
> > +	if (!gelf_getehdr(elfout, &ehout)) {
> 
> This does not make much sense to do. You memset(0) it below.
> 
> > +		perror("gelf_getehdr");
> > +		return -1;
> > +	}
> > +
> > +	if (!gelf_getehdr(elf->elf, &eh)) {
> > +		perror("gelf_getehdr");
> > +		return -1;
> > +	}
> > +
> > +	memset(&ehout, 0, sizeof(ehout));
> > +	ehout.e_ident[EI_DATA] = eh.e_ident[EI_DATA];
> > +	ehout.e_machine = eh.e_machine;
> > +	ehout.e_type = eh.e_type;
> > +	ehout.e_version = EV_CURRENT;
> > +	ehout.e_shstrndx = find_section_by_name(elf, ".shstrtab")->idx;
> > +
> > +	list_for_each_entry(sec, &elf->sections, list) {
> > +		if (sec->idx == 0)
> > +			continue;
> > +
> > +		scn = elf_newscn(elfout);
> > +		if (!scn) {
> > +			perror("elf_newscn");
> > +			return -1;
> > +		}
> > +
> > +		data = elf_newdata(scn);
> > +		if (!data) {
> > +			perror("elf_newdata");
> > +			return -1;
> > +		}
> > +
> > +		if (!elf_flagdata(data, ELF_C_SET, ELF_F_DIRTY)) {
> > +			perror("elf_flagdata");
> > +			return -1;
> > +		}
> 
> There is not much point setting DIRTY flag here. elf_newdata does so.
> 
> > +		data->d_type = sec->data->d_type;
> > +		data->d_buf = sec->data->d_buf;
> > +		data->d_size = sec->data->d_size;
> > +
> > +		if(!gelf_getshdr(scn, &sh)) {
> > +			perror("gelf_getshdr");
> > +			return -1;
> > +		}
> 
> This does not make much sense to do again. You overwrite the content
> right away:
> 
> > +		sh = sec->sh;
> > +
> > +		if (!gelf_update_shdr(scn, &sh)) {
> > +			perror("gelf_update_shdr");
> > +			return -1;
> > +		}
> > +	}
> > +
> > +	if (!gelf_update_ehdr(elfout, &ehout)) {
> > +		perror("gelf_update_ehdr");
> > +		return -1;
> > +	}
> > +
> > +	if (elf_update(elfout, ELF_C_WRITE) < 0) {
> > +		perror("elf_update");
> > +		return -1;
> > +	}
> 
> elf_end() + close() ?
> 
> > +
> > +	return 0;
> > +}
> > +
> >  void elf_close(struct elf *elf)
> >  {
> >  	struct section *sec, *tmpsec;
> 
> ...
> 
> > --- /dev/null
> > +++ b/tools/objtool/undwarf.c
> > @@ -0,0 +1,308 @@
> ...
> > +int undwarf_dump(const char *_objname)
> > +{
> > +	struct elf *elf;
> > +	struct section *sec;
> > +	struct rela *rela;
> > +	struct undwarf *undwarf;
> > +	int nr, i;
> > +
> > +	objname = _objname;
> > +
> > +	elf = elf_open(objname);
> > +	if (!elf) {
> > +		WARN("error reading elf file %s\n", objname);
> > +		return 1;
> > +	}
> > +
> > +	sec = find_section_by_name(elf, ".undwarf");
> > +	if (!sec || !sec->rela)
> > +		return 0;
> > +
> > +	nr = sec->len / sizeof(*undwarf);
> > +	for (i = 0; i < nr; i++) {
> ...
> > +	}
> 
> elf_close() ?
> 
> > +
> > +	return 0;
> > +}

I agree with all your comments, will fix them all.  Thanks for the
review.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 10/10] x86/unwind: add undwarf unwinder
  2017-06-14 11:45   ` Jiri Slaby
@ 2017-06-14 13:44     ` Josh Poimboeuf
  0 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-14 13:44 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Wed, Jun 14, 2017 at 01:45:41PM +0200, Jiri Slaby wrote:
> On 06/01/2017, 07:44 AM, Josh Poimboeuf wrote:
> > --- /dev/null
> > +++ b/arch/x86/kernel/unwind_undwarf.c
> > @@ -0,0 +1,402 @@
> ...
> > +void __unwind_start(struct unwind_state *state, struct task_struct *task,
> > +		    struct pt_regs *regs, unsigned long *first_frame)
> > +{
> > +	memset(state, 0, sizeof(*state));
> > +	state->task = task;
> > +
> > +	if (regs) {
> > +		if (user_mode(regs)) {
> > +			state->stack_info.type = STACK_TYPE_UNKNOWN;
> > +			return;
> > +		}
> > +
> > +		state->ip = regs->ip;
> > +		state->sp = kernel_stack_pointer(regs);
> > +		state->bp = regs->bp;
> > +		state->regs = regs;
> > +
> > +	} else if (task == current) {
> > +		register void *__sp asm(_ASM_SP);
> > +
> > +		asm volatile("lea (%%rip), %0\n\t"
> > +			     "mov %%rsp, %1\n\t"
> > +			     "mov %%rbp, %2\n\t"
> > +			     : "=r" (state->ip), "=r" (state->sp),
> > +			       "=r" (state->bp), "+r" (__sp));
> 
> Maybe I don't understand this completely, but what is __sp used for here?

This tells gcc "if this function saves the frame pointer, make sure it's
saved before inserting this inline asm."

But on second thought, it shouldn't be needed.  Either way it can use
the undwarf data to find the previous bp.  I'm struggling to remember
why I thought this was needed in the first place...

> > +		state->regs = NULL;
> > +
> > +	} else {
> 
> In DWARF unwinder, we also used to do here:
> 
> +#ifdef CONFIG_SMP
> +       } else if (task->on_cpu) {
> +               return;
> +#endif
>         } else {
> 
> > +		struct inactive_task_frame *frame = (void *)task->thread.sp;
> 
> Since there is no inactive_task_frame for tasks currently running (on
> other CPUs). At least this always held in the past.
> 
> Though, the test is indeed racy.

Yeah, it's indeed racy, but it's probably a good idea to add the check
anyway.  There are other checks to prevent going off the rails, but we
should try to detect it early when we can.  The frame pointer unwinder
could probably use a similar check.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 04/10] objtool: add undwarf debuginfo generation
  2017-06-14 13:27     ` Josh Poimboeuf
@ 2017-06-22  7:47       ` Jiri Slaby
  2017-06-22 12:49         ` Josh Poimboeuf
  0 siblings, 1 reply; 55+ messages in thread
From: Jiri Slaby @ 2017-06-22  7:47 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On 06/14/2017, 03:27 PM, Josh Poimboeuf wrote:
> I agree with all your comments, will fix them all.  Thanks for the
> review.

This is not the correct way:
++      if (flags & O_WRONLY)
++              cmd = ELF_C_WRITE;
++      else if (flags & O_RDWR)
++              cmd = ELF_C_RDWR;
++      else
++              cmd = ELF_C_READ_MMAP;

For this particular codeflow, it works, but it should be:
(flags & O_ACCMODE) == O_WRONLY
                    == O_RDWR

thanks,
-- 
js
suse labs

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [RFC PATCH 04/10] objtool: add undwarf debuginfo generation
  2017-06-22  7:47       ` Jiri Slaby
@ 2017-06-22 12:49         ` Josh Poimboeuf
  0 siblings, 0 replies; 55+ messages in thread
From: Josh Poimboeuf @ 2017-06-22 12:49 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: x86, linux-kernel, live-patching, Linus Torvalds,
	Andy Lutomirski, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Thu, Jun 22, 2017 at 09:47:46AM +0200, Jiri Slaby wrote:
> On 06/14/2017, 03:27 PM, Josh Poimboeuf wrote:
> > I agree with all your comments, will fix them all.  Thanks for the
> > review.
> 
> This is not the correct way:
> ++      if (flags & O_WRONLY)
> ++              cmd = ELF_C_WRITE;
> ++      else if (flags & O_RDWR)
> ++              cmd = ELF_C_RDWR;
> ++      else
> ++              cmd = ELF_C_READ_MMAP;
> 
> For this particular codeflow, it works, but it should be:
> (flags & O_ACCMODE) == O_WRONLY
>                     == O_RDWR

Ok, thanks.  I see you started reviewing v2 early ;-)  It's almost ready
to post, will send the patches next week.

-- 
Josh

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2017-06-22 12:49 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-01  5:44 [RFC PATCH 00/10] x86: undwarf unwinder Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 01/10] objtool: move checking code to check.c Josh Poimboeuf
2017-06-14  7:22   ` Jiri Slaby
2017-06-01  5:44 ` [RFC PATCH 02/10] objtool, x86: add several functions and files to the objtool whitelist Josh Poimboeuf
2017-06-14  7:24   ` Jiri Slaby
2017-06-14 13:03     ` Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 03/10] objtool: stack validation 2.0 Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 04/10] objtool: add undwarf debuginfo generation Josh Poimboeuf
2017-06-14  8:42   ` Jiri Slaby
2017-06-14 13:27     ` Josh Poimboeuf
2017-06-22  7:47       ` Jiri Slaby
2017-06-22 12:49         ` Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 05/10] objtool, x86: add facility for asm code to provide CFI hints Josh Poimboeuf
2017-06-01 13:57   ` Andy Lutomirski
2017-06-01 14:16     ` Josh Poimboeuf
2017-06-01 14:40       ` Andy Lutomirski
2017-06-01 15:02         ` Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 06/10] x86/entry: add CFI hint undwarf annotations Josh Poimboeuf
2017-06-01 14:03   ` Andy Lutomirski
2017-06-01 14:23     ` Josh Poimboeuf
2017-06-01 14:28       ` Josh Poimboeuf
2017-06-01 14:39         ` Andy Lutomirski
2017-06-01 15:01           ` Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 07/10] x86/asm: add CFI hint annotations to sync_core() Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 08/10] extable: rename 'sortextable' script to 'sorttable' Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 09/10] extable: add undwarf table sorting ability to sorttable script Josh Poimboeuf
2017-06-01  5:44 ` [RFC PATCH 10/10] x86/unwind: add undwarf unwinder Josh Poimboeuf
2017-06-01 11:05   ` Peter Zijlstra
2017-06-01 12:26     ` Josh Poimboeuf
2017-06-01 12:47       ` Jiri Slaby
2017-06-01 13:02         ` Josh Poimboeuf
2017-06-01 13:42         ` Peter Zijlstra
2017-06-01 13:10       ` Peter Zijlstra
2017-06-01 12:13   ` Peter Zijlstra
2017-06-01 12:36     ` Josh Poimboeuf
2017-06-01 13:12       ` Peter Zijlstra
2017-06-01 15:03         ` Josh Poimboeuf
2017-06-14 11:45   ` Jiri Slaby
2017-06-14 13:44     ` Josh Poimboeuf
2017-06-01  6:08 ` [RFC PATCH 00/10] x86: " Ingo Molnar
2017-06-01 11:58   ` Josh Poimboeuf
2017-06-01 12:17     ` Peter Zijlstra
2017-06-01 12:33       ` Jiri Slaby
2017-06-01 12:52         ` Josh Poimboeuf
2017-06-01 12:57           ` Jiri Slaby
2017-06-01 12:47       ` Josh Poimboeuf
2017-06-01 13:25         ` Peter Zijlstra
2017-06-06 14:14           ` Sergey Senozhatsky
2017-06-01 13:50         ` Andy Lutomirski
2017-06-01 13:50     ` Ingo Molnar
2017-06-01 13:58       ` Jiri Slaby
2017-06-02  8:30         ` Jiri Slaby
2017-06-01 14:05       ` Josh Poimboeuf
2017-06-01 14:08       ` Jiri Slaby
2017-06-02 10:40         ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.