linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] objtool: Add support for Arm64
@ 2019-04-09 13:52 Raphael Gault
  2019-04-09 13:52 ` [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support Raphael Gault
                   ` (8 more replies)
  0 siblings, 9 replies; 36+ messages in thread
From: Raphael Gault @ 2019-04-09 13:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon, julien.thierry,
	Raphael Gault

Hi,

As of now, objtool only supports the x86_64 architecture but the
groundwork has already been done in order to add support for other
architecture without too much effort.

This series of patches adds support for the arm64 architecture
based on the Armv8.5 Architecture Reference Manual.

* Patch 1 adapts the existing code to be able to add support for other
  architecture.
* Patch 2 provide implementation of the required function for the arm64
  architecture.
* Patch 3 adapts the checking of the stack state for the arm64
  architecture.
* Patch 4 & 5 fix some warning objtool raised in some particular
  functions of ~/arch/arm64/kernel/sleep.S. Patch 4 add a macro to
  signal that some function should be ignored by objtool. 
* Patch 6 enables stack validation for arm64.

Theses patches should provide support for the main cases and behaviour.
However a few corner cases are not yet handled by objtool:

* In the `~/arch/arm64/crypto/` directory, I noticed that some plain
  data are sometimes stored in the `.text` section causing objtool to mistake
  this for instructions and trying (and failing) to interprete them.  If someone
  could explain to me why we store data directly in the .text section I would
  appreciate it.

* In the support for arm32 architecture such as in `~/arch/arm64/kernel/kuser32.S`
  some A32 instructions are used but such instructions are not understood by
  objtool causing a warning.

I also have a few unclear points I would like to bring to your
attention:

* For x86_64, when looking for a symbol relocation with explicit
  addend, objtool systematically adds a +4 offset to the addend.
  I don't understand why even if I have a feeling it is related
  to the type of relacation.

* I currently don't have a clear understanding about how switch-tables
  are generated on arm64 and how to retrieve them (based on relocations).

Please provide me with any feedback and comments as well on the content
than the style of these patches.

Thanks,

Raphael

->

Raphael Gault (6):
  objtool: Refactor code to make it more suitable for multiple
    architecture support
  objtool: arm64: Add required implementation for supporting the aarch64
    architecture in objtool.
  objtool: arm64: Adapt the stack frame checks and the section analysis
    for the arm architecture
  arm64: assembler: Add macro to annotate asm function having non
    standard stack-frame.
  arm64: sleep: Add stack frame setup for __cpu_supsend_enter
  objtool: arm64: Enable stack validation for arm64

 arch/arm64/Kconfig                            |    1 +
 arch/arm64/include/asm/assembler.h            |   18 +
 arch/arm64/kernel/sleep.S                     |    4 +
 tools/objtool/Build                           |    1 -
 tools/objtool/arch.h                          |   11 +
 tools/objtool/arch/arm64/Build                |    6 +
 tools/objtool/arch/arm64/bit_operations.c     |   65 +
 tools/objtool/arch/arm64/decode.c             | 2870 +++++++++++++++++
 .../objtool/arch/arm64/include/arch_special.h |   44 +
 .../arch/arm64/include/asm/orc_types.h        |  109 +
 .../arch/arm64/include/bit_operations.h       |   22 +
 tools/objtool/arch/arm64/include/cfi.h        |   76 +
 .../objtool/arch/arm64/include/insn_decode.h  |  219 ++
 tools/objtool/arch/arm64/orc_gen.c            |   40 +
 tools/objtool/arch/x86/Build                  |    1 +
 tools/objtool/arch/x86/decode.c               |  111 +
 tools/objtool/arch/x86/include/arch_special.h |   35 +
 tools/objtool/{ => arch/x86/include}/cfi.h    |    0
 tools/objtool/{ => arch/x86}/orc_gen.c        |   10 +-
 tools/objtool/check.c                         |  209 +-
 tools/objtool/check.h                         |    1 +
 tools/objtool/elf.c                           |    3 +-
 tools/objtool/orc.h                           |    4 +-
 tools/objtool/special.c                       |   18 +-
 24 files changed, 3740 insertions(+), 138 deletions(-)
 create mode 100644 tools/objtool/arch/arm64/Build
 create mode 100644 tools/objtool/arch/arm64/bit_operations.c
 create mode 100644 tools/objtool/arch/arm64/decode.c
 create mode 100644 tools/objtool/arch/arm64/include/arch_special.h
 create mode 100644 tools/objtool/arch/arm64/include/asm/orc_types.h
 create mode 100644 tools/objtool/arch/arm64/include/bit_operations.h
 create mode 100644 tools/objtool/arch/arm64/include/cfi.h
 create mode 100644 tools/objtool/arch/arm64/include/insn_decode.h
 create mode 100644 tools/objtool/arch/arm64/orc_gen.c
 create mode 100644 tools/objtool/arch/x86/include/arch_special.h
 rename tools/objtool/{ => arch/x86/include}/cfi.h (100%)
 rename tools/objtool/{ => arch/x86}/orc_gen.c (96%)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
@ 2019-04-09 13:52 ` Raphael Gault
  2019-04-23 20:13   ` Josh Poimboeuf
  2019-04-09 13:52 ` [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool Raphael Gault
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Raphael Gault @ 2019-04-09 13:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon, julien.thierry,
	Raphael Gault

The jump destination and relocation offset used previously are only reliable
on x86_64 architecture. We abstract these computations by calling arch-dependant
implementation.

The control flow information and register macro definitions were based on
the x86_64 architecture but should be abstract so that each architecture
can define the correct values for the registers, especially the registers
related to the stack frame (Frame Pointer, Stack Pointer and possibly
Return Address).

The ORC unwinder is only supported on x86 at the moment and should thus be
in the x86 architecture code. In order not to break the whole structure in
case another architecture decides to support the ORC unwinder via objtool
we choose to let the implementation be done in the architecture dependant
code.

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 tools/objtool/Build                           |   1 -
 tools/objtool/arch.h                          |   9 ++
 tools/objtool/arch/x86/Build                  |   1 +
 tools/objtool/arch/x86/decode.c               | 106 ++++++++++++++++
 tools/objtool/arch/x86/include/arch_special.h |  35 ++++++
 tools/objtool/{ => arch/x86/include}/cfi.h    |   0
 tools/objtool/{ => arch/x86}/orc_gen.c        |  10 +-
 tools/objtool/check.c                         | 114 ++----------------
 tools/objtool/check.h                         |   1 +
 tools/objtool/orc.h                           |   4 +-
 tools/objtool/special.c                       |  18 +--
 11 files changed, 173 insertions(+), 126 deletions(-)
 create mode 100644 tools/objtool/arch/x86/include/arch_special.h
 rename tools/objtool/{ => arch/x86/include}/cfi.h (100%)
 rename tools/objtool/{ => arch/x86}/orc_gen.c (96%)

diff --git a/tools/objtool/Build b/tools/objtool/Build
index 749becdf5b90..ec925d49565a 100644
--- a/tools/objtool/Build
+++ b/tools/objtool/Build
@@ -2,7 +2,6 @@ objtool-y += arch/$(SRCARCH)/
 objtool-y += builtin-check.o
 objtool-y += builtin-orc.o
 objtool-y += check.o
-objtool-y += orc_gen.o
 objtool-y += orc_dump.o
 objtool-y += elf.o
 objtool-y += special.o
diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index b0d7dc3d71b5..0eff166ca613 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -22,6 +22,7 @@
 #include <linux/list.h>
 #include "elf.h"
 #include "cfi.h"
+#include "orc.h"
 
 #define INSN_JUMP_CONDITIONAL	1
 #define INSN_JUMP_UNCONDITIONAL	2
@@ -70,6 +71,8 @@ struct stack_op {
 	struct op_src src;
 };
 
+struct instruction;
+
 void arch_initial_func_cfi_state(struct cfi_state *state);
 
 int arch_decode_instruction(struct elf *elf, struct section *sec,
@@ -79,4 +82,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
 
 bool arch_callee_saved_reg(unsigned char reg);
 
+int arch_orc_read_unwind_hints(struct objtool_file *file);
+
+unsigned long arch_compute_jump_destination(struct instruction *insn);
+
+unsigned long arch_compute_rela_sym_offset(int addend);
+
 #endif /* _ARCH_H */
diff --git a/tools/objtool/arch/x86/Build b/tools/objtool/arch/x86/Build
index b998412c017d..74015be53ef0 100644
--- a/tools/objtool/arch/x86/Build
+++ b/tools/objtool/arch/x86/Build
@@ -1,4 +1,5 @@
 objtool-y += decode.o
+objtool-y += orc_gen.o
 
 inat_tables_script = arch/x86/tools/gen-insn-attr-x86.awk
 inat_tables_maps = arch/x86/lib/x86-opcode-map.txt
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 540a209b78ab..1af7b4996307 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -23,6 +23,8 @@
 #include "lib/inat.c"
 #include "lib/insn.c"
 
+
+#include "../../check.h"
 #include "../../elf.h"
 #include "../../arch.h"
 #include "../../warn.h"
@@ -78,6 +80,105 @@ bool arch_callee_saved_reg(unsigned char reg)
 	}
 }
 
+unsigned long arch_compute_rela_sym_offset(int addend)
+{
+	return addend + 4;
+}
+
+int arch_orc_read_unwind_hints(struct objtool_file *file)
+{
+	struct section *sec, *relasec;
+	struct rela *rela;
+	struct unwind_hint *hint;
+	struct instruction *insn;
+	struct cfi_reg *cfa;
+	int i;
+
+	sec = find_section_by_name(file->elf, ".discard.unwind_hints");
+	if (!sec)
+		return 0;
+
+	relasec = sec->rela;
+	if (!relasec) {
+		WARN("missing .rela.discard.unwind_hints section");
+		return -1;
+	}
+
+	if (sec->len % sizeof(struct unwind_hint)) {
+		WARN("struct unwind_hint size mismatch");
+		return -1;
+	}
+
+	file->hints = true;
+
+	for (i = 0; i < sec->len / sizeof(struct unwind_hint); i++) {
+		hint = (struct unwind_hint *)sec->data->d_buf + i;
+
+		rela = find_rela_by_dest(sec, i * sizeof(*hint));
+		if (!rela) {
+			WARN("can't find rela for unwind_hints[%d]", i);
+			return -1;
+		}
+
+		insn = find_insn(file, rela->sym->sec, rela->addend);
+		if (!insn) {
+			WARN("can't find insn for unwind_hints[%d]", i);
+			return -1;
+		}
+
+		cfa = &insn->state.cfa;
+
+		if (hint->type == UNWIND_HINT_TYPE_SAVE) {
+			insn->save = true;
+			continue;
+
+		} else if (hint->type == UNWIND_HINT_TYPE_RESTORE) {
+			insn->restore = true;
+			insn->hint = true;
+			continue;
+		}
+
+		insn->hint = true;
+
+		switch (hint->sp_reg) {
+		case ORC_REG_UNDEFINED:
+			cfa->base = CFI_UNDEFINED;
+			break;
+		case ORC_REG_SP:
+			cfa->base = CFI_SP;
+			break;
+		case ORC_REG_BP:
+			cfa->base = CFI_BP;
+			break;
+		case ORC_REG_SP_INDIRECT:
+			cfa->base = CFI_SP_INDIRECT;
+			break;
+		case ORC_REG_R10:
+			cfa->base = CFI_R10;
+			break;
+		case ORC_REG_R13:
+			cfa->base = CFI_R13;
+			break;
+		case ORC_REG_DI:
+			cfa->base = CFI_DI;
+			break;
+		case ORC_REG_DX:
+			cfa->base = CFI_DX;
+			break;
+		default:
+			WARN_FUNC("unsupported unwind_hint sp base reg %d",
+				  insn->sec, insn->offset, hint->sp_reg);
+			return -1;
+		}
+
+		cfa->offset = hint->sp_offset;
+		insn->state.type = hint->type;
+		insn->state.end = hint->end;
+	}
+
+	return 0;
+}
+
 int arch_decode_instruction(struct elf *elf, struct section *sec,
 			    unsigned long offset, unsigned int maxlen,
 			    unsigned int *len, unsigned char *type,
@@ -494,3 +595,8 @@ void arch_initial_func_cfi_state(struct cfi_state *state)
 	state->regs[16].base = CFI_CFA;
 	state->regs[16].offset = -8;
 }
+
+unsigned long arch_compute_jump_destination(struct instruction *insn)
+{
+	return insn->offset + insn->len + insn->immediate;
+}
diff --git a/tools/objtool/arch/x86/include/arch_special.h b/tools/objtool/arch/x86/include/arch_special.h
new file mode 100644
index 000000000000..bd91b1096359
--- /dev/null
+++ b/tools/objtool/arch/x86/include/arch_special.h
@@ -0,0 +1,35 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define EX_ENTRY_SIZE		12
+#define EX_ORIG_OFFSET		0
+#define EX_NEW_OFFSET		4
+
+#define JUMP_ENTRY_SIZE		16
+#define JUMP_ORIG_OFFSET	0
+#define JUMP_NEW_OFFSET		4
+
+#define ALT_ENTRY_SIZE		13
+#define ALT_ORIG_OFFSET		0
+#define ALT_NEW_OFFSET		4
+#define ALT_FEATURE_OFFSET	8
+#define ALT_ORIG_LEN_OFFSET	10
+#define ALT_NEW_LEN_OFFSET	11
+
+#define IGNORE_SHF_EXEC_FLAG	0
+
+#define JUMP_DYNAMIC_IS_SWITCH_TABLE	0
+
+#define X86_FEATURE_POPCNT (4*32+23)
diff --git a/tools/objtool/cfi.h b/tools/objtool/arch/x86/include/cfi.h
similarity index 100%
rename from tools/objtool/cfi.h
rename to tools/objtool/arch/x86/include/cfi.h
diff --git a/tools/objtool/orc_gen.c b/tools/objtool/arch/x86/orc_gen.c
similarity index 96%
rename from tools/objtool/orc_gen.c
rename to tools/objtool/arch/x86/orc_gen.c
index 3f98dcfbc177..aaa5ca31a8f4 100644
--- a/tools/objtool/orc_gen.c
+++ b/tools/objtool/arch/x86/orc_gen.c
@@ -18,11 +18,11 @@
 #include <stdlib.h>
 #include <string.h>
 
-#include "orc.h"
-#include "check.h"
-#include "warn.h"
+#include "../../orc.h"
+#include "../../check.h"
+#include "../../warn.h"
 
-int create_orc(struct objtool_file *file)
+int arch_create_orc(struct objtool_file *file)
 {
 	struct instruction *insn;
 
@@ -128,7 +128,7 @@ static int create_orc_entry(struct section *u_sec, struct section *ip_relasec,
 	return 0;
 }
 
-int create_orc_sections(struct objtool_file *file)
+int arch_create_orc_sections(struct objtool_file *file)
 {
 	struct instruction *insn, *prev_insn;
 	struct section *sec, *u_sec, *ip_relasec;
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 0414a0d52262..17fcd8c8f9c1 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -507,7 +507,7 @@ static int add_jump_destinations(struct objtool_file *file)
 					       insn->len);
 		if (!rela) {
 			dest_sec = insn->sec;
-			dest_off = insn->offset + insn->len + insn->immediate;
+			dest_off = arch_compute_jump_destination(insn);
 		} else if (rela->sym->type == STT_SECTION) {
 			dest_sec = rela->sym->sec;
 			dest_off = rela->addend + 4;
@@ -587,7 +587,7 @@ static int add_call_destinations(struct objtool_file *file)
 		rela = find_rela_by_dest_range(insn->sec, insn->offset,
 					       insn->len);
 		if (!rela) {
-			dest_off = insn->offset + insn->len + insn->immediate;
+			dest_off = arch_compute_jump_destination(insn);
 			insn->call_dest = find_symbol_by_offset(insn->sec,
 								dest_off);
 
@@ -600,14 +600,19 @@ static int add_call_destinations(struct objtool_file *file)
 			}
 
 		} else if (rela->sym->type == STT_SECTION) {
+			/*
+			 * the original x86_64 code adds 4 to the rela->addend
+			 * which is not needed on arm64 architecture.
+			 */
+			dest_off = arch_compute_rela_sym_offset(rela->addend);
 			insn->call_dest = find_symbol_by_offset(rela->sym->sec,
-								rela->addend+4);
+								dest_off);
 			if (!insn->call_dest ||
 			    insn->call_dest->type != STT_FUNC) {
-				WARN_FUNC("can't find call dest symbol at %s+0x%x",
+				WARN_FUNC("can't find call dest symbol at %s+0x%lx",
 					  insn->sec, insn->offset,
 					  rela->sym->sec->name,
-					  rela->addend + 4);
+					  dest_off);
 				return -1;
 			}
 		} else
@@ -1073,99 +1078,6 @@ static int add_switch_table_alts(struct objtool_file *file)
 	return 0;
 }
 
-static int read_unwind_hints(struct objtool_file *file)
-{
-	struct section *sec, *relasec;
-	struct rela *rela;
-	struct unwind_hint *hint;
-	struct instruction *insn;
-	struct cfi_reg *cfa;
-	int i;
-
-	sec = find_section_by_name(file->elf, ".discard.unwind_hints");
-	if (!sec)
-		return 0;
-
-	relasec = sec->rela;
-	if (!relasec) {
-		WARN("missing .rela.discard.unwind_hints section");
-		return -1;
-	}
-
-	if (sec->len % sizeof(struct unwind_hint)) {
-		WARN("struct unwind_hint size mismatch");
-		return -1;
-	}
-
-	file->hints = true;
-
-	for (i = 0; i < sec->len / sizeof(struct unwind_hint); i++) {
-		hint = (struct unwind_hint *)sec->data->d_buf + i;
-
-		rela = find_rela_by_dest(sec, i * sizeof(*hint));
-		if (!rela) {
-			WARN("can't find rela for unwind_hints[%d]", i);
-			return -1;
-		}
-
-		insn = find_insn(file, rela->sym->sec, rela->addend);
-		if (!insn) {
-			WARN("can't find insn for unwind_hints[%d]", i);
-			return -1;
-		}
-
-		cfa = &insn->state.cfa;
-
-		if (hint->type == UNWIND_HINT_TYPE_SAVE) {
-			insn->save = true;
-			continue;
-
-		} else if (hint->type == UNWIND_HINT_TYPE_RESTORE) {
-			insn->restore = true;
-			insn->hint = true;
-			continue;
-		}
-
-		insn->hint = true;
-
-		switch (hint->sp_reg) {
-		case ORC_REG_UNDEFINED:
-			cfa->base = CFI_UNDEFINED;
-			break;
-		case ORC_REG_SP:
-			cfa->base = CFI_SP;
-			break;
-		case ORC_REG_BP:
-			cfa->base = CFI_BP;
-			break;
-		case ORC_REG_SP_INDIRECT:
-			cfa->base = CFI_SP_INDIRECT;
-			break;
-		case ORC_REG_R10:
-			cfa->base = CFI_R10;
-			break;
-		case ORC_REG_R13:
-			cfa->base = CFI_R13;
-			break;
-		case ORC_REG_DI:
-			cfa->base = CFI_DI;
-			break;
-		case ORC_REG_DX:
-			cfa->base = CFI_DX;
-			break;
-		default:
-			WARN_FUNC("unsupported unwind_hint sp base reg %d",
-				  insn->sec, insn->offset, hint->sp_reg);
-			return -1;
-		}
-
-		cfa->offset = hint->sp_offset;
-		insn->state.type = hint->type;
-		insn->state.end = hint->end;
-	}
-
-	return 0;
-}
 
 static int read_retpoline_hints(struct objtool_file *file)
 {
@@ -1259,7 +1171,7 @@ static int decode_sections(struct objtool_file *file)
 	if (ret)
 		return ret;
 
-	ret = read_unwind_hints(file);
+	ret = arch_orc_read_unwind_hints(file);
 	if (ret)
 		return ret;
 
@@ -2237,11 +2149,11 @@ int check(const char *_objname, bool orc)
 	}
 
 	if (orc) {
-		ret = create_orc(&file);
+		ret = arch_create_orc(&file);
 		if (ret < 0)
 			goto out;
 
-		ret = create_orc_sections(&file);
+		ret = arch_create_orc_sections(&file);
 		if (ret < 0)
 			goto out;
 
diff --git a/tools/objtool/check.h b/tools/objtool/check.h
index e6e8a655b556..f8bad59575e5 100644
--- a/tools/objtool/check.h
+++ b/tools/objtool/check.h
@@ -23,6 +23,7 @@
 #include "cfi.h"
 #include "arch.h"
 #include "orc.h"
+#include "arch_special.h"
 #include <linux/hashtable.h>
 
 struct insn_state {
diff --git a/tools/objtool/orc.h b/tools/objtool/orc.h
index b0e92a6d0903..1aa02dce3bca 100644
--- a/tools/objtool/orc.h
+++ b/tools/objtool/orc.h
@@ -22,8 +22,8 @@
 
 struct objtool_file;
 
-int create_orc(struct objtool_file *file);
-int create_orc_sections(struct objtool_file *file);
+int arch_create_orc(struct objtool_file *file);
+int arch_create_orc_sections(struct objtool_file *file);
 
 int orc_dump(const char *objname);
 
diff --git a/tools/objtool/special.c b/tools/objtool/special.c
index 50af4e1274b3..787a123391ec 100644
--- a/tools/objtool/special.c
+++ b/tools/objtool/special.c
@@ -25,23 +25,7 @@
 
 #include "special.h"
 #include "warn.h"
-
-#define EX_ENTRY_SIZE		12
-#define EX_ORIG_OFFSET		0
-#define EX_NEW_OFFSET		4
-
-#define JUMP_ENTRY_SIZE		16
-#define JUMP_ORIG_OFFSET	0
-#define JUMP_NEW_OFFSET		4
-
-#define ALT_ENTRY_SIZE		13
-#define ALT_ORIG_OFFSET		0
-#define ALT_NEW_OFFSET		4
-#define ALT_FEATURE_OFFSET	8
-#define ALT_ORIG_LEN_OFFSET	10
-#define ALT_NEW_LEN_OFFSET	11
-
-#define X86_FEATURE_POPCNT (4*32+23)
+#include "arch_special.h"
 
 struct special_entry {
 	const char *sec;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool.
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
  2019-04-09 13:52 ` [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support Raphael Gault
@ 2019-04-09 13:52 ` Raphael Gault
  2019-04-09 16:20   ` Peter Zijlstra
  2019-04-23 20:18   ` Josh Poimboeuf
  2019-04-09 13:52 ` [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture Raphael Gault
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 36+ messages in thread
From: Raphael Gault @ 2019-04-09 13:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon, julien.thierry,
	Raphael Gault

Provide implementation for the arch-dependent functions that are called by the main check
function of objtool.
The ORC unwinder is not yet supported by the arm64 architecture so we only provide a dummy
interface for now.
The decoding of the instruction is split into classes and subclasses as described into
the Instruction Encoding in the ArmV8.5 Architecture Reference Manual.

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 tools/objtool/arch/arm64/Build                |    6 +
 tools/objtool/arch/arm64/bit_operations.c     |   65 +
 tools/objtool/arch/arm64/decode.c             | 2843 +++++++++++++++++
 .../objtool/arch/arm64/include/arch_special.h |   44 +
 .../arch/arm64/include/asm/orc_types.h        |  109 +
 .../arch/arm64/include/bit_operations.h       |   22 +
 tools/objtool/arch/arm64/include/cfi.h        |   76 +
 .../objtool/arch/arm64/include/insn_decode.h  |  219 ++
 tools/objtool/arch/arm64/orc_gen.c            |   40 +
 9 files changed, 3424 insertions(+)
 create mode 100644 tools/objtool/arch/arm64/Build
 create mode 100644 tools/objtool/arch/arm64/bit_operations.c
 create mode 100644 tools/objtool/arch/arm64/decode.c
 create mode 100644 tools/objtool/arch/arm64/include/arch_special.h
 create mode 100644 tools/objtool/arch/arm64/include/asm/orc_types.h
 create mode 100644 tools/objtool/arch/arm64/include/bit_operations.h
 create mode 100644 tools/objtool/arch/arm64/include/cfi.h
 create mode 100644 tools/objtool/arch/arm64/include/insn_decode.h
 create mode 100644 tools/objtool/arch/arm64/orc_gen.c

diff --git a/tools/objtool/arch/arm64/Build b/tools/objtool/arch/arm64/Build
new file mode 100644
index 000000000000..deac494a30d3
--- /dev/null
+++ b/tools/objtool/arch/arm64/Build
@@ -0,0 +1,6 @@
+objtool-y += decode.o
+objtool-y += orc_gen.o
+objtool-y += bit_operations.o
+
+
+CFLAGS_decode.o += -I$(OUTPUT)arch/arm64/lib
diff --git a/tools/objtool/arch/arm64/bit_operations.c b/tools/objtool/arch/arm64/bit_operations.c
new file mode 100644
index 000000000000..02429ede1519
--- /dev/null
+++ b/tools/objtool/arch/arm64/bit_operations.c
@@ -0,0 +1,65 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include "bit_operations.h"
+
+#include "../../warn.h"
+
+u64 replicate(u64 x, int size, int n)
+{
+	u64 ret = 0;
+	while (n >= 0) {
+		ret = (ret | x) << size;
+		n--;
+	}
+	return ret | x;
+}
+
+u64 ror(u64 x, int size, int shift)
+{
+	int m = shift % size;
+	if (shift == 0)
+		return x;
+	return ZERO_EXTEND((x >> m) | (x << (size - m)), size);
+}
+
+int highest_set_bit(u32 x)
+{
+	int i;
+	for (i = 31; i >= 0; i--, x <<= 1)
+		if (x & 0x80000000)
+			return i;
+	return 0;
+}
+
+/* imms and immr are both 6 bit long */
+__uint128_t decode_bit_masks(unsigned char N, unsigned char imms,
+			unsigned char immr, bool immediate)
+{
+	u64 tmask, wmask;
+	u32 diff, S, R, esize, welem, telem;
+	unsigned char levels = 0, len = 0;
+
+	len = highest_set_bit((N << 6) | ((~imms) & ONES(6)));
+	levels = ZERO_EXTEND(ONES(len), 6);
+
+	if (immediate && ((imms & levels) == levels)) {
+		WARN("unknown instruction");
+		return -1;
+	}
+
+	S = imms & levels;
+	R = immr & levels;
+	diff = ZERO_EXTEND(S - R, 6);
+
+	esize = 1 << len;
+	diff = diff & ONES(len);
+
+	welem = ZERO_EXTEND(ONES(S + 1), esize);
+	telem = ZERO_EXTEND(ONES(diff + 1), esize);
+
+	wmask = replicate(ror(welem, esize, R), esize, 64 / esize);
+	tmask = replicate(telem, esize, 64 / esize);
+
+	return ((__uint128_t)wmask << 64) | tmask;
+
+}
diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
new file mode 100644
index 000000000000..0feb3ae3af5d
--- /dev/null
+++ b/tools/objtool/arch/arm64/decode.c
@@ -0,0 +1,2843 @@
+/*
+ * Copyright (C) 2019 Raphael Gault <raphael.gault@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+
+#include "insn_decode.h"
+#include "cfi.h"
+#include "bit_operations.h"
+
+#include "../../check.h"
+#include "../../arch.h"
+#include "../../elf.h"
+#include "../../warn.h"
+
+
+/*static int (*arm_decode_class)(u32 instr, unsigned int *len, unsigned char *type,
+				unsigned long *immediate, struct stack_op *op);*/
+static arm_decode_class aarch64_insn_class_decode_table[] = {
+	[INSN_RESERVED]			= arm_decode_reserved,
+	[INSN_UNKNOWN]			= arm_decode_unknown,
+	[INSN_SVE_ENC]			= arm_decode_sve_encoding,
+	[INSN_UNALLOC]			= arm_decode_unknown,
+	[INSN_LD_ST_4]			= arm_decode_ld_st,
+	[INSN_DP_REG_5]			= arm_decode_dp_reg,
+	[INSN_LD_ST_6]			= arm_decode_ld_st,
+	[INSN_DP_SIMD_7]		= arm_decode_dp_simd,
+	[0b1000 ... INSN_DP_IMM]	= arm_decode_dp_imm,
+	[0b1010 ... INSN_BRANCH]	= arm_decode_br_sys,
+	[INSN_LD_ST_C]			= arm_decode_ld_st,
+	[INSN_DP_REG_D]			= arm_decode_dp_reg,
+	[INSN_LD_ST_E]			= arm_decode_ld_st,
+	[INSN_DP_SIMD_F]		= arm_decode_dp_simd,
+};
+
+static arm_decode_class aarch64_insn_dp_imm_decode_table[] = {
+	[0 ... INSN_PCREL]	= arm_decode_pcrel,
+	[INSN_ADD_SUB]		= arm_decode_add_sub,
+	[INSN_ADD_TAG]		= arm_decode_add_sub_tags,
+	[INSN_LOGICAL]		= arm_decode_logical,
+	[INSN_MOVE_WIDE]	= arm_decode_move_wide,
+	[INSN_BITFIELD]		= arm_decode_bitfield,
+	[INSN_EXTRACT]		= arm_decode_extract,
+};
+
+bool arch_callee_saved_reg(unsigned char reg)
+{
+	switch(reg){
+		case CFI_R19:
+		case CFI_R20:
+		case CFI_R21:
+		case CFI_R22:
+		case CFI_R23:
+		case CFI_R24:
+		case CFI_R25:
+		case CFI_R26:
+		case CFI_R27:
+		case CFI_R28:
+		case CFI_FP:
+		case CFI_R30:
+			return true;
+		default:
+			return false;
+	}
+}
+
+
+void arch_initial_func_cfi_state(struct cfi_state *state)
+{
+	int i;
+
+	for (i = 0; i < CFI_NUM_REGS; i++) {
+		state->regs[i].base = CFI_UNDEFINED;
+		state->regs[i].offset = 0;
+	}
+
+	/* initial CFA (call frame address) */
+	state->cfa.base = CFI_CFA;
+	state->cfa.offset = 0;
+
+	/* initial RA (return address) */
+	state->regs[CFI_LR].base = CFI_CFA;
+	state->regs[CFI_LR].offset = -8;
+
+
+}
+
+unsigned long arch_compute_rela_sym_offset(int addend)
+{
+	return addend;
+}
+
+static int is_arm64(struct elf *elf)
+{
+	switch(elf->ehdr.e_machine){
+		case EM_AARCH64: //0xB7
+			return 1;
+		default:
+			WARN("unexpected ELF machine type %x", elf->ehdr.e_machine);
+			return 0;
+	}
+}
+
+/*
+ * Arm A64 Instruction set' decode groups (based on op0 bits[28:25]):
+ * Ob0000 - Reserved
+ * 0b0001/0b001x - Unallocated
+ * 0b100x - Data Processing -- Immediate
+ * 0b101x - Branch, Exception Gen., System Instructions.
+ * 0bx1x0 - Loads and Stores
+ * 0bx101 - Data Processing -- Registers
+ * 0bx111 - Data Processing -- Scalar Floating-Points, Advanced SIMD
+ */
+
+int arch_decode_instruction(struct elf *elf, struct section *sec,
+			unsigned long offset, unsigned int maxlen,
+			unsigned int *len, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	int arm64 = 0, ret = 0;
+	u32 insn = 0;
+
+	*len = 4;
+	*immediate = 0;
+
+	op->dest.type = 0;
+	op->dest.reg = 0;
+	op->dest.offset = 0;
+	op->src.type = 0;
+	op->src.reg = 0;
+	op->src.offset = 0;
+
+	//test architucture (make sure it is arm64)
+	arm64 = is_arm64(elf);
+	if (arm64 != 1)
+		return -1;
+
+	//retrieve instruction (from sec->data->offset)
+	insn = *(u32*)(sec->data->d_buf + offset);
+
+	//dispatch according to encoding classes
+	ret = aarch64_insn_class_decode_table[(insn >> 25) & 0xf](insn, type,
+							immediate, op);
+	/*
+	* For instruction that do operations on multiple registers at a time,
+	* like store/load of pairs of registers, we decompose the instruction
+	* into several individual instruction to be able to track the state.
+	* This is useful for PUSH/POP kind of scenarios.
+	* We thus set the len at 0 so that we re-decode the same instruction
+	* again until we have gone to every steps.
+	*/
+	if (ret == INSN_COMPOSED) {
+		*len = 0;
+		ret = 0;
+	}
+	return ret;
+}
+
+int arm_decode_unknown(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	*type = 0;
+	return 0;
+}
+
+int arm_decode_dp_simd(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_reserved(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	*immediate = instr & ONES(16);
+	*type = INSN_BUG;
+	return 0;
+}
+
+int arm_decode_dp_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	return aarch64_insn_dp_imm_decode_table[(instr >> 23) & 0x7](instr,
+							type, immediate, op);
+
+}
+
+int arm_decode_pcrel(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char rd = 0, page = 0;
+	u32 immhi = 0, immlo = 0;
+
+	page = EXTRACT_BIT(instr, 31);
+	rd = instr & 0x1F;
+	immhi = (instr >> 5) & ONES(19);
+	immlo = (instr >> 29) & ONES(2);
+
+	*immediate = SIGN_EXTEND((immhi << 2) | immlo, 21);
+
+	if (page)
+		*immediate = SIGN_EXTEND(*immediate << 12, 33);
+
+	*type = INSN_OTHER;
+	op->src.reg = ADR_SOURCE;
+	op->dest.reg = rd;
+
+	return 0;
+}
+
+int arm_decode_add_sub(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned long imm12 = 0, imm = 0;
+	unsigned char sf = 0, sh = 0, S = 0, op_bit = 0;
+	unsigned char rn = 0, rd = 0;
+
+	S = EXTRACT_BIT(instr, 29);
+	op_bit = EXTRACT_BIT(instr, 30);
+	sf = EXTRACT_BIT(instr, 31);
+	sh = EXTRACT_BIT(instr, 22);
+	rd = instr & ONES(5);
+	rn = (instr >> 5) & ONES(5);
+	imm12 = (instr >> 10) & ONES(12);
+	imm = ZERO_EXTEND(imm12 << (sh * 12), (sf + 1) * 32);
+
+	*type = INSN_OTHER;
+
+	if ((!S && rd == CFI_SP) || rn == CFI_SP) {
+		*type = INSN_STACK;
+		op->dest.type = OP_DEST_REG;
+		op->dest.offset = 0;
+		op->dest.reg = rd;
+		op->src.type = imm12 ? OP_SRC_ADD: OP_SRC_REG;
+		op->src.offset = op_bit ? -1 * imm : imm;
+		op->src.reg = rn;
+	}
+	return 0;
+}
+
+int arm_decode_add_sub_tags(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char decode_field = 0, rn = 0, rd = 0, uimm6 = 0;
+
+	decode_field = (instr >> 29) & ONES(3);
+	rd = instr & ONES(5);
+	rn = (instr >> 5) & ONES(5);
+	uimm6 = (instr >> 16) & ONES(6);
+
+	*immediate = uimm6;
+	*type = INSN_OTHER;
+
+#define ADDG_DECODE	4
+#define SUBG_DECODE	5
+	if (decode_field != ADDG_DECODE && decode_field != SUBG_DECODE)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+#undef ADDG_DECODE
+#undef SUBG_DECODE
+	op->dest.type = OP_DEST_REG;
+	op->dest.offset = 0;
+	op->dest.reg = rd;
+	op->src.type = OP_SRC_ADD;
+	op->src.offset = 0;
+	op->src.reg = rn;
+
+	if (rd == CFI_SP)
+		*type = INSN_STACK;
+
+	return 0;
+}
+
+int arm_decode_logical(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, opc = 0, N = 0;
+	unsigned char imms = 0, immr = 0, rn = 0, rd = 0;
+
+	rd = instr & ONES(5);
+	rn = (instr >> 5) & ONES(5);
+
+	imms = (instr >> 10) & ONES(6);
+	immr = (instr >> 16) & ONES(6);
+
+	N = EXTRACT_BIT(instr, 22);
+	opc = (instr >> 29) & ONES(2);
+	sf = EXTRACT_BIT(instr, 31);
+
+
+	if (N == 1 && sf == 0)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+	*immediate = (decode_bit_masks(N, imms, immr, true) >> 64);
+#define ANDS_DECODE	0b11
+	if (opc == ANDS_DECODE)
+		return 0;
+#undef ANDS_DECODE
+	if (rd == CFI_SP) {
+		*type = INSN_STACK;
+		op->dest.type = OP_DEST_REG;
+		op->dest.offset = 0;
+		op->dest.reg = CFI_SP;
+
+		op->src.type = OP_SRC_AND;
+		op->src.offset = 0;
+		op->src.reg = rn;
+	}
+
+	return 0;
+}
+
+int arm_decode_move_wide(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 imm16 = 0;
+	unsigned char hw = 0, opc = 0, sf = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	opc = (instr >> 29) & ONES(2);
+	hw = (instr >> 21) & ONES(2);
+	imm16 = (instr >> 5) & ONES(16);
+
+	if ((sf == 0 && (hw & 0x2)) || opc == 0x1)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+	*immediate = imm16;
+
+	return 0;
+}
+
+
+int arm_decode_bitfield(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, opc = 0, N = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	opc = (instr >> 29) & ONES(2);
+	N = EXTRACT_BIT(instr, 22);
+
+	if ((opc == 0x3) || (sf != N))
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+
+	return 0;
+}
+
+int arm_decode_extract(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, op21 = 0, N = 0, o0 = 0;
+	unsigned char imms = 0;
+	unsigned char decode_field = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	op21 = (instr >> 29) & ONES(2);
+	N = EXTRACT_BIT(instr, 22);
+	o0 = EXTRACT_BIT(instr, 21);
+	imms = (instr >> 10) & ONES(6);
+
+	decode_field = (sf << 4) | (op21 << 2) | (N << 1) | o0;
+	*type = INSN_OTHER;
+	*immediate = imms;
+
+	if ((decode_field == 0 && !EXTRACT_BIT(imms, 5))
+		|| decode_field == 0b10010)
+		return 0;
+
+	return arm_decode_unknown(instr, type, immediate, op);
+}
+
+static struct aarch64_insn_decoder br_sys_decoder[] = {
+	{
+		.mask = 0b1111000000000000000000,
+		.value = 0b0100000000000000000000,
+		.decode_func = arm_decode_br_cond_imm,
+	},
+	{
+		.mask = 0b1111100000000000000000,
+		.value = 0b1100000000000000000000,
+		.decode_func = arm_decode_except_gen,
+	},
+	{
+		.mask = 0b1111111111111111111111,
+		.value = 0b1100100000011001011111,
+		.decode_func = arm_decode_hints,
+	},
+	{
+		.mask = 0b1111111111111111100000,
+		.value = 0b1100100000011001100000,
+		.decode_func = arm_decode_barriers,
+	},
+	{
+		.mask = 0b1111111111000111100000,
+		.value = 0b1100100000000010000000,
+		.decode_func = arm_decode_pstate,
+	},
+	{
+		.mask = 0b1111111011000000000000,
+		.value = 0b1100100001000000000000,
+		.decode_func = arm_decode_system_insn,
+	},
+	{
+		.mask = 0b1111111010000000000000,
+		.value = 0b1100100010000000000000,
+		.decode_func = arm_decode_system_regs,
+	},
+	{
+		.mask = 0b1111000000000000000000,
+		.value = 0b1101000000000000000000,
+		.decode_func = arm_decode_br_uncond_reg,
+	},
+	{
+		.mask = 0b0110000000000000000000,
+		.value = 0b0000000000000000000000,
+		.decode_func = arm_decode_br_uncond_imm,
+	},
+	{
+		.mask = 0b0111000000000000000000,
+		.value = 0b0010000000000000000000,
+		.decode_func = arm_decode_br_comp_imm,
+	},
+	{
+		.mask = 0b0111000000000000000000,
+		.value = 0b0011000000000000000000,
+		.decode_func = arm_decode_br_tst_imm,
+	},
+
+};
+
+int arm_decode_br_sys(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 decode_field = 0, op1 = 0;
+	unsigned char op0 = 0, op2 = 0;
+	int i = 0;
+
+	op0 = (instr >> 29) & ONES(3);
+	op1 = (instr >> 12) & ONES(14);
+	op2 = instr & ONES(5);
+
+	decode_field = op0;
+	decode_field = (decode_field << 19) | (op1 << 5) | op2;
+
+	for (i = 0; i < ARRAY_SIZE(br_sys_decoder); i++) {
+		if ((decode_field & br_sys_decoder[i].mask) == br_sys_decoder[i].value) {
+			return br_sys_decoder[i].decode_func(instr,
+							type, immediate, op);
+		}
+	}
+
+	return arm_decode_unknown(instr, type, immediate, op);
+}
+
+
+int arm_decode_br_cond_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char o0 = 0, o1 = 0;
+	u32 imm19;
+
+	o0 = EXTRACT_BIT(instr, 4);
+	o1 = EXTRACT_BIT(instr, 24);
+	imm19 = (instr >> 5) & ONES(19);
+
+	*immediate = SIGN_EXTEND(imm19 << 2, 19);
+
+	if ((o1 << 1) | o0)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_JUMP_CONDITIONAL;
+
+	return 0;
+
+}
+
+static struct aarch64_insn_decoder except_gen_decoder[] = {
+	{
+		.mask = 0b00000100,
+		.value = 0b00000100,
+	},
+	{
+		.mask = 0b00001000,
+		.value = 0b00001000,
+	},
+	{
+		.mask = 0b00010000,
+		.value = 0b00010000,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b00000000,
+	},
+	{
+		.mask = 0b11111101,
+		.value = 0b00100001,
+	},
+	{
+		.mask = 0b11111110,
+		.value = 0b00100010,
+	},
+	{
+		.mask = 0b11111101,
+		.value = 0b01000001,
+	},
+	{
+		.mask = 0b11111110,
+		.value = 0b01000010,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b01100001,
+	},
+	{
+		.mask = 0b11111110,
+		.value = 0b01100010,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b10000000,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b10100000,
+	},
+	{
+		.mask = 0b11111100,
+		.value = 0b11000000,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b11100001,
+	},
+	{
+		.mask = 0b11111110,
+		.value = 0b11100010,
+	},
+};
+
+int arm_decode_except_gen(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 imm16 = 0;
+	unsigned char opc = 0, op2 = 0, LL = 0, decode_field = 0;
+	int i = 0;
+
+	imm16 = (instr >> 5) & ONES(16);
+	opc = (instr >> 21) & ONES(3);
+	op2 = (instr >> 2) & ONES(3);
+	LL = instr & ONES(2);
+	decode_field = (opc << 5) | (op2 << 2) | LL;
+
+	for (i = 0; i < ARRAY_SIZE(except_gen_decoder); i++) {
+		if ((decode_field & except_gen_decoder[i].mask)
+				== except_gen_decoder[i].value) {
+			return arm_decode_unknown(instr, type, immediate, op);
+		}
+	}
+
+#define INSN_SVC	0b00000001
+#define INSN_HVC	0b00000010
+#define INSN_SMC	0b00000011
+#define INSN_BRK	0b00100000
+#define INSN_HLT	0b01000000
+#define INSN_DCPS1	0b10100001
+#define INSN_DCPS2	0b10100010
+#define INSN_DCPS3	0b10100011
+
+	switch(decode_field){
+		case INSN_SVC:
+		case INSN_HVC:
+		case INSN_SMC:
+			*immediate = imm16;
+			*type = INSN_CONTEXT_SWITCH;
+			return 0;
+		case INSN_BRK:
+			if (imm16 == 0x800)
+				*type = INSN_BUG;
+			else if (imm16 == 0x100 || imm16 >= 0x900)
+				*type = INSN_CONTEXT_SWITCH;
+			else
+				*type = INSN_OTHER;
+			return 0;
+		case INSN_HLT:
+		case INSN_DCPS1:
+		case INSN_DCPS2:
+		case INSN_DCPS3:
+			*immediate = imm16;
+			*type = INSN_OTHER;
+			return 0;
+		default:
+			return arm_decode_unknown(instr, type, immediate, op);
+	}
+
+#undef INSN_SVC
+#undef INSN_HVC
+#undef INSN_SMC
+#undef INSN_BRK
+#undef INSN_HLT
+#undef INSN_DCPS1
+#undef INSN_DCPS2
+#undef INSN_DCPS3
+}
+
+int arm_decode_hints(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	*type = INSN_NOP;
+	return 0;
+}
+
+int arm_decode_barriers(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	/* TODO:check unallocated */
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_pstate(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	/* TODO:check unallocated */
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_system_insn(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	/* TODO:check unallocated */
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_system_regs(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	/* TODO:check unallocated */
+	*type = INSN_OTHER;
+	return 0;
+}
+
+
+static struct aarch64_insn_decoder ret_decoder[] = {
+	/*
+	 * RET, RETAA, RETAB
+	 */
+	{
+		.mask = 0b1111111111111110000011111,
+		.value = 0b0010111110000000000000000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111111111111111,
+		.value = 0b0010111110000101111111111,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111111111111111,
+		.value = 0b0010111110000111111111111,
+		.decode_func = NULL,
+	},
+};
+
+static struct aarch64_insn_decoder br_decoder[] = {
+	/*
+	 * BR, BRAA, BRAAZ, BRAB, BRABZ
+	 */
+	{
+		.mask = 0b1111111111111110000011111,
+		.value = 0b0000111110000000000000000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111110000011111,
+		.value = 0b0000111110000100000011111,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111110000011111,
+		.value = 0b0000111110000110000011111,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111110000000000,
+		.value = 0b1000111110000100000000000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111110000000000,
+		.value = 0b1000111110000110000000000,
+		.decode_func = NULL,
+	},
+};
+
+#define INSN_DRPS_FIELD		0b0101111110000001111100000
+#define INSN_DRPS_MASK		0b1111111111111111111111111
+
+static struct aarch64_insn_decoder ct_sw_decoder[] = {
+	/*
+	 * ERET, ERETAA, ERETAB
+	 */
+	{
+		.mask = INSN_DRPS_MASK,
+		.value = 0b0100111110000001111100000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = INSN_DRPS_MASK,
+		.value = 0b0100111110000101111111111,
+		.decode_func = NULL,
+	},
+	{
+		.mask = INSN_DRPS_MASK,
+		.value = 0b0100111110000111111111111,
+		.decode_func = NULL,
+	},
+};
+
+
+static struct aarch64_insn_decoder call_decoder[] = {
+	/*
+	 * BLR, BLRAA, BLRAAZ, BLRAB, BLRABZ
+	 */
+	{
+		.mask = 0b1111111111111110000011111,
+		.value =  0b0001111110000000000000000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111110000011111,
+		.value = 0b0001111110000100000011111,
+		.decode_func = NULL,
+	},
+	{
+		0b1111111111111110000011111,
+		0b0001111110000110000011111,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111110000000000,
+		.value = 0b1001111110000100000000000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b1111111111111110000000000,
+		.value = 0b1001111110000110000000000,
+		.decode_func = NULL,
+	},
+};
+
+int arm_decode_br_uncond_reg(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+
+	u32 decode_field = 0;
+	int i = 0;
+
+	decode_field = instr & ONES(25);
+	*type = 0;
+	for (i = 0; i < ARRAY_SIZE(br_decoder); i++) {
+		if ((decode_field & br_decoder[i].mask) == br_decoder[i].value)
+			*type = INSN_JUMP_DYNAMIC;
+	}
+	for (i = 0; i < ARRAY_SIZE(call_decoder); i++) {
+		if ((decode_field & call_decoder[i].value) == call_decoder[i].value)
+			*type = INSN_CALL_DYNAMIC;
+	}
+	for (i = 0; i < ARRAY_SIZE(ret_decoder); i++) {
+		if ((decode_field & ret_decoder[i].mask) == ret_decoder[i].value)
+			*type = INSN_RETURN;
+	}
+	for (i = 0; i < ARRAY_SIZE(ct_sw_decoder); i++) {
+		if ((decode_field & ct_sw_decoder[i].mask) == ct_sw_decoder[i].value)
+			*type = INSN_CONTEXT_SWITCH;
+	}
+	if ((decode_field & INSN_DRPS_MASK) == INSN_DRPS_FIELD)
+		*type = INSN_OTHER;
+	if (*type == 0)
+		return arm_decode_unknown(instr, type, immediate, op);
+	return 0;
+}
+#undef INSN_DRPS_FIELD
+#undef INSN_DRPS_MASK
+
+int arm_decode_br_uncond_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char decode_field = 0;
+	u32 imm26 = 0;
+
+	decode_field = EXTRACT_BIT(instr, 31);
+	imm26 = instr & ONES(26);
+
+	*immediate = SIGN_EXTEND(imm26 << 2, 28);
+	if (decode_field == 0)
+		*type = INSN_JUMP_UNCONDITIONAL;
+	else
+		*type = INSN_CALL;
+
+	return 0;
+
+}
+
+int arm_decode_br_comp_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 imm19 = (instr >> 5) & ONES(19);
+
+	*immediate = SIGN_EXTEND(imm19 << 2, 21);
+	*type = INSN_JUMP_CONDITIONAL;
+	return 0;
+}
+
+int arm_decode_br_tst_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 imm14 = (instr >> 5) & ONES(14);
+
+	*immediate = SIGN_EXTEND(imm14 << 2, 16);
+	*type = INSN_JUMP_CONDITIONAL;
+	return 0;
+}
+
+static struct aarch64_insn_decoder ld_st_decoder[] = {
+	{
+		.mask = 0b101111111111100,
+		.value = 0b000010000000000,
+		.decode_func = arm_decode_adv_simd_mult,
+	},
+	{
+		.mask = 0b101111110000000,
+		.value = 0b000010100000000,
+		.decode_func = arm_decode_adv_simd_mult_post,
+	},
+	{
+		.mask = 0b101111101111100,
+		.value = 0b000011000000000,
+		.decode_func = arm_decode_adv_simd_single,
+	},
+	{
+		.mask = 0b101111100000000,
+		.value = 0b000011100000000,
+		.decode_func = arm_decode_adv_simd_single_post,
+	},
+	{
+		.mask = 0b111111010000000,
+		.value = 0b110101010000000,
+		.decode_func = arm_decode_ld_st_mem_tags,
+	},
+	{
+		.mask = 0b001111000000000,
+		.value = 0b000000000000000,
+		.decode_func = arm_decode_ld_st_exclusive,
+	},
+	{
+		.mask = 0b001111010000011,
+		.value = 0b000101000000000,
+		.decode_func = arm_decode_ldapr_stlr_unsc_imm,
+	},
+	{
+		.mask = 0b001101000000000,
+		.value = 0b000100000000000,
+		.decode_func = arm_decode_ld_regs_literal,
+	},
+	{
+		.mask = 0b001101100000000,
+		.value = 0b001000000000000,
+		.decode_func = arm_decode_ld_st_noalloc_pair_off,
+	},
+	{
+		.mask = 0b001101100000000,
+		.value = 0b001000100000000,
+		.decode_func = arm_decode_ld_st_regs_pair_post,
+	},
+	{
+		.mask = 0b001101100000000,
+		.value = 0b001001000000000,
+		.decode_func = arm_decode_ld_st_regs_pair_off,
+	},
+	{
+		.mask = 0b001101100000000,
+		.value = 0b001001100000000,
+		.decode_func = arm_decode_ld_st_regs_pair_pre,
+	},
+	{
+		.mask = 0b001101010000011,
+		.value = 0b001100000000000,
+		.decode_func = arm_decode_ld_st_regs_unsc_imm,
+	},
+	{
+		.mask = 0b001101010000011,
+		.value = 0b001100000000001,
+		.decode_func = arm_decode_ld_st_imm_post,
+	},
+	{
+		.mask = 0b001101010000011,
+		.value = 0b001100000000010,
+		.decode_func = arm_decode_ld_st_imm_unpriv,
+	},
+	{
+		.mask = 0b001101010000011,
+		.value = 0b001100000000011,
+		.decode_func = arm_decode_ld_st_imm_pre,
+	},
+	{
+		.mask = 0b001101010000011,
+		.value = 0b001100010000000,
+		.decode_func = arm_decode_atomic,
+	},
+	{
+		.mask = 0b001101010000011,
+		.value = 0b001100010000010,
+		.decode_func = arm_decode_ld_st_regs_off,
+	},
+	{
+		.mask = 0b001101010000001,
+		.value = 0b001100010000001,
+		.decode_func = arm_decode_ld_st_regs_pac,
+	},
+	{
+		.mask = 0b001101000000000,
+		.value = 0b001101000000000,
+		.decode_func = arm_decode_ld_st_regs_unsigned,
+	},
+};
+
+int arm_decode_ld_st(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 decode_field = 0;
+	int i = 0;
+	unsigned char op0 = 0, op1 = 0, op2 = 0, op3 = 0, op4 = 0;
+
+	op0 = (instr >> 28) & ONES(4);
+	op1 = EXTRACT_BIT(instr, 26);
+	op2 = (instr >> 23) & ONES(2);
+	op3 = (instr >> 16) & ONES(6);
+	op4 = (instr >> 10) & ONES(2);
+	decode_field = (op0 << 3) | (op1 << 2) | op2;
+	decode_field = (decode_field << 8) | (op3 << 2) | op4;
+
+	for (i = 0; i < ARRAY_SIZE(ld_st_decoder); i++) {
+		if ((decode_field & ld_st_decoder[i].mask) == ld_st_decoder[i].value) {
+			return ld_st_decoder[i].decode_func(instr, type, immediate, op);
+		}
+	}
+	return arm_decode_unknown(instr, type, immediate, op);
+}
+
+static int adv_simd_mult_fields[] = {
+	0b00000,
+	0b00010,
+	0b00100,
+	0b00110,
+	0b00111,
+	0b01000,
+	0b01010,
+	0b10000,
+	0b10010,
+	0b10100,
+	0b10110,
+	0b10111,
+	0b11000,
+	0b11010,
+};
+
+int arm_decode_adv_simd_mult(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char L = 0, opcode = 0, rn = 0, rt = 0;
+	unsigned char decode_field = 0;
+	int i = 0;
+
+	L = EXTRACT_BIT(instr, 22);
+	opcode = (instr >> 12) & ONES(4);
+
+	decode_field = (L << 4) | opcode;
+	rn = (instr >> 5) & ONES(5);
+	rt = instr & ONES(5);
+	*type = INSN_OTHER;
+
+
+	for (i = 0; i < ARRAY_SIZE(adv_simd_mult_fields); i++) {
+		if ((decode_field & 0b11111) == adv_simd_mult_fields[i]) {
+			if (rn != 31)
+				return 0;
+			*type = INSN_STACK;
+		}
+	}
+	if (*type != INSN_STACK)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	if (!L) {
+		op->dest.type = OP_DEST_REG_INDIRECT;
+		op->dest.reg = CFI_SP;
+		op->dest.offset = 0;
+		op->src.type = OP_SRC_REG;
+		op->src.reg = rt;
+		op->src.offset = 0;
+	}
+	else {
+		op->src.type = OP_SRC_REG_INDIRECT;
+		op->src.reg = CFI_SP;
+		op->src.offset = 0;
+		op->dest.type = OP_SRC_REG;
+		op->dest.reg = rt;
+		op->dest.offset = 0;
+	}
+
+	return 0;
+}
+
+int arm_decode_adv_simd_mult_post(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	/* same opcode as for the no offset variant */
+	unsigned char rm = 0;
+	int ret = 0;
+	rm = (instr >> 16) & ONES(5);
+
+	ret = arm_decode_adv_simd_mult(instr, type, immediate, op);
+
+	/*
+	* This is actually irrelevent if the offset is given by a register
+	* however there is no way to know the offset value from the encoding
+	* in such a case.
+	*/
+	if (op->dest.type == OP_DEST_REG_INDIRECT)
+		op->dest.offset = rm;
+	if (op->src.type == OP_SRC_REG_INDIRECT)
+		op->src.offset = rm;
+	return ret;
+}
+
+
+static struct aarch64_insn_decoder simd_single_decoder[] = {
+	{
+		.mask = 0b11111000,
+		.value = 0b00000000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111000,
+		.value = 0b00001000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b00010000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b00011000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b00100000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b00100001,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b00101000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b00101001,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111000,
+		.value = 0b01000000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111000,
+		.value = 0b01001000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b01010000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b01011000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b01100000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b01100001,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b01101000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b01101001,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111000,
+		.value = 0b10000000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111000,
+		.value = 0b10001000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b10010000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b10011000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b10100000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b10100001,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b10101000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b10101001,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111100,
+		.value = 0b10110000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111100,
+		.value = 0b10111000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11000000,
+		.value = 0b11111000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111000,
+		.value = 0b11001000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b11010000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b11011000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b11100000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b11100001,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b11101000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b11101001,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111100,
+		.value = 0b11110000,
+		.decode_func = NULL,
+	},
+	{
+		.mask = 0b11111100,
+		.value = 0b11111000,
+		.decode_func = NULL,
+	},
+};
+
+int arm_decode_adv_simd_single(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char L = 0, R = 0, S = 0, opcode = 0, size = 0;
+	unsigned char rn = 0, rt = 0, dfield = 0;
+	int i = 0;
+
+	L = EXTRACT_BIT(instr, 22);
+	R = EXTRACT_BIT(instr, 21);
+	S = EXTRACT_BIT(instr, 12);
+	opcode = (instr >> 13) & ONES(3);
+	size = (instr >> 10) & ONES(2);
+
+	dfield = (L << 7) | (R << 6) | (opcode << 3) | (S << 2) | size;
+
+	*type = INSN_OTHER;
+	rn = (instr << 5) & ONES(5);
+
+	for (i = 0; i < ARRAY_SIZE(simd_single_decoder); i++) {
+		if ((dfield & simd_single_decoder[i].mask) == simd_single_decoder[i].value) {
+			if (rn != CFI_SP)
+				return 0;
+			*type = INSN_STACK;
+		}
+	}
+
+	if (*type == INSN_OTHER)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	rt = instr & ONES(5);
+	if (!L) {
+		op->dest.type = OP_DEST_REG_INDIRECT;
+		op->dest.reg = CFI_SP;
+		op->dest.offset = 0;
+		op->src.type = OP_SRC_REG;
+		op->src.reg = rt;
+		op->src.offset = 0;
+	}
+	else {
+		op->src.type = OP_SRC_REG_INDIRECT;
+		op->src.reg = CFI_SP;
+		op->src.offset = 0;
+		op->dest.type = OP_DEST_REG;
+		op->dest.reg = rt;
+		op->dest.offset = 0;
+	}
+	return 0;
+}
+
+int arm_decode_adv_simd_single_post(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	/* same opcode as for the no offset variant */
+	unsigned char rm = 0;
+	int ret = 0;
+	rm = (instr >> 16) & ONES(5);
+
+	ret = arm_decode_adv_simd_single(instr, type, immediate, op);
+
+	/*
+	* This is actually irrelevent if the offset is given by a register
+	* however there is no way to know the offset value from the encoding
+	* in such a case.
+	*/
+	if (op->dest.type == OP_DEST_REG_INDIRECT)
+		op->dest.offset = rm;
+	if (op->src.type == OP_SRC_REG_INDIRECT)
+		op->src.offset = rm;
+	return ret;
+}
+
+int arm_decode_ld_st_mem_tags(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 imm9 = 0;
+	unsigned char opc = 0, op2 = 0, rn = 0, rt = 0, decode_field = 0;
+
+	imm9 = (instr >> 12) & ONES(9);
+	opc = (instr >> 22) & ONES(2);
+	op2 = (instr >> 10) & ONES(2);
+	rn = (instr >> 5) & ONES(5);
+	rt = instr & ONES(6);
+
+	decode_field = (opc << 2) | op2;
+
+	if (decode_field == 0x0
+		|| (decode_field == 0x8 && imm9 != 0)
+		|| (decode_field == 0xC && imm9 != 0)) {
+		return arm_decode_unknown(instr, type, immediate, op);
+	}
+
+	if (rn != CFI_SP) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+	*type = INSN_STACK;
+	*immediate = imm9;
+
+	/*
+	* Offset should normally be shifted to the
+	* left of LOG2_TAG_GRANULE
+	*/
+	switch (decode_field) {
+		case 1:
+		case 5:
+		case 9:
+		case 13:
+			/* post index */
+		case 3:
+		case 7:
+		case 8:
+		case 11:
+		case 15:
+			/* pre index */
+			op->dest.reg = CFI_SP;
+			op->dest.type = OP_DEST_PUSH;
+			op->dest.offset = SIGN_EXTEND(imm9, 9);
+			op->src.reg = rt;
+			op->src.type = OP_SRC_REG;
+			op->src.offset = 0;
+			return 0;
+		case 2:
+		case 6:
+		case 10:
+		case 14:
+			/* store */
+			op->dest.reg = CFI_SP;
+			op->dest.type = OP_DEST_REG_INDIRECT;
+			op->dest.offset = SIGN_EXTEND(imm9, 9);
+			op->src.reg = rt;
+			op->src.type = OP_SRC_REG;
+			op->src.offset = 0;
+			return 0;
+		case 4:
+		case 12:
+			/* load */
+			op->src.reg = CFI_SP;
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.offset = SIGN_EXTEND(imm9, 9);
+			op->dest.reg = rt;
+			op->dest.type = OP_DEST_REG;
+			op->dest.offset = 0;
+			return 0;
+	}
+
+	return -1;
+
+}
+
+#define ST_EXCL_UNALLOC_1 0b001010
+#define ST_EXCL_UNALLOC_2 0b000010
+
+#define LDXRB		0b000100
+#define LDAXRB		0b000101
+#define LDLARB		0b001100
+#define LDARB		0b001101
+#define LDXRH		0b010100
+#define LDAXRH		0b010101
+#define LDLARH		0b011100
+#define LDARH		0b011101
+#define LDXR		0b100100
+#define LDAXR		0b100101
+#define LDXP		0b100110
+#define LDAXP		0b100111
+#define LDLAR		0b101100
+#define LDAR		0b101101
+#define LDXR_64		0b110100
+#define LDAXR_64	0b110101
+#define LDXP_64		0b110110
+#define LDAXP_64	0b110111
+#define LDLAR_64	0b111100
+#define LDAR_64		0b111101
+
+#define LD_EXCL_NUMBER	20
+
+static int ld_excl_masks[] = {
+	LDXRB,
+	LDAXRB,
+	LDLARB,
+	LDARB,
+	LDXRH,
+	LDAXRH,
+	LDLARH,
+	LDARH,
+	LDXR,
+	LDAXR,
+	LDXP,
+	LDAXP,
+	LDLAR,
+	LDAR,
+	LDXR_64,
+	LDAXR_64,
+	LDXP_64,
+	LDAXP_64,
+	LDLAR_64,
+	LDAR_64,
+};
+
+int arm_decode_ld_st_exclusive(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char size = 0, o2 = 0, L = 0, o1 = 0, o0 = 0;
+	unsigned char rt = 0, rt2 = 0, rn = 0;
+	unsigned char decode_field = 0;
+	int i = 0;
+
+	size = (instr >> 30) & ONES(2);
+	o2 = EXTRACT_BIT(instr, 23);
+	L = EXTRACT_BIT(instr, 22);
+	o1 = EXTRACT_BIT(instr, 21);
+	o0 = EXTRACT_BIT(instr, 15);
+
+	rt2 = (instr >> 10) & ONES(5);
+	rn = (instr >> 5) & ONES(5);
+	rt = instr & ONES(5);
+
+	decode_field = (size << 4) | (o2 << 3) | (L << 2) | (o1 << 1) | o0;
+
+	if ((decode_field & ST_EXCL_UNALLOC_1) == ST_EXCL_UNALLOC_1
+		|| (decode_field & 0b101010) == ST_EXCL_UNALLOC_2){
+		if (rt2 != 31) {
+			return arm_decode_unknown(instr, type, immediate, op);
+		}
+	}
+
+	if (rn != 31) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+
+	*type = INSN_STACK;
+	for (i = 0; i < LD_EXCL_NUMBER; i++) {
+		if ((decode_field & 0b111111) == ld_excl_masks[i]) {
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_SP;
+			op->src.offset = 0;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = rt;
+			op->dest.offset = 0;
+			return 0;
+		}
+	}
+
+	op->dest.type = OP_DEST_REG_INDIRECT;
+	op->dest.reg = CFI_SP;
+	op->dest.offset = 0;
+	op->src.type = OP_SRC_REG;
+	op->src.reg = rt;
+	op->src.offset = 0;
+
+	return 0;
+}
+#undef ST_EXCL_UNALLOC_1
+#undef ST_EXCL_UNALLOC_2
+
+#undef LD_EXCL_NUMBER
+
+#undef LDXRB
+#undef LDAXRB
+#undef LDLARB
+#undef LDARB
+#undef LDXRH
+#undef LDAXRH
+#undef LDLARH
+#undef LDARH
+#undef LDXR
+#undef LDAXR
+#undef LDXP
+#undef LDAXP
+#undef LDLAR
+#undef LDAR
+#undef LDXR_64
+#undef LDAXR_64
+#undef LDXP_64
+#undef LDAXP_64
+#undef LDLAR_64
+#undef LDAR_64
+
+
+int arm_decode_ldapr_stlr_unsc_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 imm9 = 0;
+	unsigned char size = 0, opc = 0, rn = 0, rt = 0, decode_field = 0;
+
+	imm9 = (instr >> 12) & ONES(9);
+	size = (instr >> 30) & ONES(2);
+	opc = (instr >> 22) & ONES(2);
+	rn = (instr >> 5) & ONES(5);
+	rt = instr & ONES(5);
+
+	decode_field = (size << 2) | opc;
+	if (decode_field == 0xB
+		|| decode_field == 0xE
+		|| decode_field == 0xF) {
+		return arm_decode_unknown(instr, type, immediate, op);
+	}
+
+	if (rn != 31) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+	*type = INSN_STACK;
+	*immediate = imm9;
+	switch (decode_field) {
+		case 1:
+		case 2:
+		case 3:
+		case 5:
+		case 6:
+		case 7:
+		case 9:
+		case 10:
+		case 13:
+			/* load */
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_SP;
+			op->src.offset = SIGN_EXTEND(imm9, 9);
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = rt;
+			op->dest.offset = 0;
+			break;
+		default:
+			/* store */
+			op->dest.type = OP_SRC_REG_INDIRECT;
+			op->dest.reg = CFI_SP;
+			op->dest.offset = SIGN_EXTEND(imm9, 9);
+			op->src.type = OP_SRC_REG;
+			op->src.reg = rt;
+			op->src.offset = 0;
+			break;
+	}
+
+	return 0;
+}
+
+int arm_decode_ld_regs_literal(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char opc = 0, V = 0;
+	opc = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+
+	if (((opc << 1) | V) == 0x7)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_ld_st_noalloc_pair_off(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+
+	unsigned char opc = 0, V = 0, L = 0;
+	unsigned char decode_field = 0;
+
+	opc = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+	L = EXTRACT_BIT(instr, 22);
+
+	decode_field = (opc << 2) | (V << 1) | L;
+
+	if (decode_field == 0x4 || decode_field == 0x5
+		|| decode_field >= 12) {
+		return arm_decode_unknown(instr, type, immediate, op);
+	}
+	return arm_decode_ld_st_regs_pair_off(instr, type, immediate, op);
+}
+
+/*
+ * We use this to decompose the load/store of pairs
+ * into two distinct instructions so that we can track
+ * the update of the stack as if several push/pop were
+ * done consecutively
+ */
+int arm_decode_ld_st_regs_pair_off(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char opc = 0, V = 0, L = 0, bit = 0;
+	unsigned char imm7 = 0, rt2 = 0, rt = 0, rn = 0;
+	unsigned char decode_field = 0;
+	int scale = 0;
+
+	static struct insn_decode_state state = {
+		.is_composed_insn = false,
+		.current_reg_num = 0,
+		.insn_regs_num = 2,
+		.insn_type = 0,
+		.immediate = 0,
+		.curr_offset = 0,
+		.op = {
+			.dest = {
+				.type = 0,
+				.reg = 0,
+				.offset = 0
+			},
+			.src = {
+				.type = 0,
+				.reg = 0,
+				.offset = 0
+			},
+		},
+		.regs = {0, 0},
+	};
+
+	if (state.is_composed_insn) {
+		*op = state.op;
+		if (op->dest.type == OP_DEST_REG_INDIRECT) {
+			op->src.reg = state.regs[state.insn_regs_num
+						- (++state.current_reg_num)];
+			op->dest.offset = state.curr_offset;
+		}
+		else {
+			op->dest.reg = state.regs[state.current_reg_num++];
+			op->src.offset = state.curr_offset;
+		}
+
+		*type = state.insn_type;
+		*immediate = state.immediate / state.insn_regs_num;
+		if (state.current_reg_num >= state.insn_regs_num) {
+			state.is_composed_insn = false;
+			return 0;
+		}
+		return INSN_COMPOSED;
+	}
+
+	opc = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+	L = EXTRACT_BIT(instr, 22);
+	imm7 = (instr >> 15) & ONES(7);
+	rt2 = (instr >> 10) & ONES(5);
+	rn = (instr >> 5) & ONES(5);
+	rt = instr & ONES(5);
+	bit = EXTRACT_BIT(opc, 1);
+	scale = 2 + bit;
+
+	decode_field = (opc << 2) | (V << 1) | L;
+
+	if (decode_field >= 0xC)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*immediate = (SIGN_EXTEND(imm7, 7)) << scale;
+
+	if (rn != CFI_SP) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+
+	*type = INSN_STACK;
+
+	state.is_composed_insn = true;
+	state.current_reg_num = 1;
+	state.insn_regs_num = 2;
+	state.insn_type = *type;
+	state.immediate = *immediate;
+	state.curr_offset = 0;
+	state.regs[0] = rt;
+	state.regs[1] = rt2;
+
+	switch (decode_field) {
+		case 1:
+		case 3:
+		case 5:
+		case 7:
+		case 9:
+		case 11:
+			/* load */
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_SP;
+			op->src.offset = 0;
+			state.curr_offset = 8;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = state.regs[0];
+			op->dest.offset = 0;
+			break;
+		default:
+			op->dest.type = OP_DEST_REG_INDIRECT;
+			op->dest.reg = CFI_SP;
+			op->dest.offset = 8;
+			state.curr_offset = 0;
+			op->src.type = OP_SRC_REG;
+			op->src.reg = state.regs[1];
+			op->src.offset = 0;
+			/* store */
+	}
+	state.op = *op;
+	return INSN_COMPOSED;
+}
+
+int arm_decode_ld_st_regs_pair_post(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	int ret = 0;
+
+	ret = arm_decode_ld_st_regs_pair_off(instr, type, immediate, op);
+	if (ret < 0 || *type == INSN_OTHER)
+		return ret;
+	if (op->dest.type == OP_DEST_REG_INDIRECT) {
+		op->dest.type = OP_DEST_PUSH;
+		op->dest.reg = CFI_SP;
+	}
+
+	if (op->src.type == OP_SRC_REG_INDIRECT) {
+		op->src.type = OP_SRC_POP;
+		op->src.reg = CFI_SP;
+	}
+
+	return ret;
+}
+
+int arm_decode_ld_st_regs_pair_pre(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	return arm_decode_ld_st_regs_pair_post(instr, type, immediate, op);
+}
+
+int arm_decode_ld_st_regs_unsc_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	u32 imm9 = 0;
+	unsigned char size = 0, V = 0, opc = 0, rn = 0, rt = 0;
+	unsigned char decode_field = 0;
+
+	size = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+	opc = (instr >> 22) & ONES(2);
+
+	imm9 = (instr >> 12) & ONES(9);
+	rn = (instr >> 5) & ONES(5);
+	rt = instr & ONES(5);
+
+	decode_field = (size << 2) | (V << 2) | opc;
+
+	switch (decode_field) {
+		case 0b01110:
+		case 0b01111:
+		case 0b11110:
+		case 0b11111:
+		case 0b10011:
+		case 0b11011:
+		case 0b10110:
+		case 0b10111:
+			return arm_decode_unknown(instr, type, immediate, op);
+		case 26:
+			/* prefetch */
+			*type = INSN_OTHER;
+			return 0;
+		case 1:
+		case 2:
+		case 3:
+		case 5:
+		case 7:
+		case 9:
+		case 10:
+		case 11:
+		case 13:
+		case 17:
+		case 18:
+		case 21:
+		case 25:
+		case 29:
+			/* load */
+			if (rn != CFI_SP) {
+				*type = INSN_OTHER;
+				return 0;
+			}
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_SP;
+			op->src.offset = SIGN_EXTEND(imm9, 9);
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = rt;
+			op->dest.offset = 0;
+			break;
+		default:
+			if (rn != CFI_SP) {
+				*type = INSN_OTHER;
+				return 0;
+			}
+			op->dest.type = OP_DEST_REG_INDIRECT;
+			op->dest.reg = CFI_SP;
+			op->dest.offset = SIGN_EXTEND(imm9, 9);
+			op->src.type = OP_DEST_REG;
+			op->src.reg = rt;
+			op->src.offset = 0;
+			break;
+	}
+
+	*type = INSN_STACK;
+	return 0;
+}
+
+static struct aarch64_insn_decoder ld_unsig_unalloc_decoder[] = {
+	{
+		.mask = 0b01110,
+		.value = 0b01110,
+	},
+	{
+		.mask = 0b10111,
+		.value = 0b10011,
+	},
+	{
+		.mask = 0b10110,
+		.value = 0b10110,
+	},
+};
+
+int arm_decode_ld_st_regs_unsigned(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char size = 0, V = 0, opc = 0, rn = 0, rt = 0;
+	unsigned char decode_field = 0;
+	u32 imm12 = 0;
+	int i = 0;
+
+	size = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+	opc = (instr >> 22) & ONES(2);
+
+	decode_field = (size << 3) | (V << 2) | opc;
+	for (i = 0; i < ARRAY_SIZE(ld_unsig_unalloc_decoder); i++) {
+		if ((decode_field & ld_unsig_unalloc_decoder[i].mask)
+				== ld_unsig_unalloc_decoder[i].value) {
+			return arm_decode_unknown(instr, type,
+						immediate, op);
+		}
+	}
+
+	imm12 = (instr >> 10) & ONES(12);
+	rn = (instr >> 5) & ONES(5);
+	rt = instr & ONES(5);
+
+	if (rn != CFI_SP || decode_field == 26) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+
+	*type = INSN_STACK;
+
+	switch (decode_field) {
+		case 1:
+		case 2:
+		case 3:
+		case 5:
+		case 7:
+		case 9:
+		case 10:
+		case 11:
+		case 13:
+		case 17:
+		case 18:
+		case 21:
+		case 25:
+			/* load */
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_SP;
+			op->src.offset = imm12;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = rt;
+			op->dest.offset = 0;
+			break;
+		default: /* store */
+			op->dest.type = OP_DEST_REG_INDIRECT;
+			op->dest.reg = CFI_SP;
+			op->dest.offset = imm12;
+			op->src.type = OP_DEST_REG;
+			op->src.reg = rt;
+			op->src.offset = 0;
+	}
+
+	return 0;
+
+}
+
+int arm_decode_ld_st_imm_post(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char size = 0, V = 0, opc = 0;
+	unsigned char decode_field = 0;
+	u32 imm9 = 0;
+	int ret = 0;
+
+	size = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+	opc = (instr >> 22) & ONES(2);
+
+	imm9 = (instr >> 12) & ONES(9);
+
+	decode_field = (size << 2) | (V << 2) | opc;
+
+	if (decode_field == 0b11010)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	ret = arm_decode_ld_st_regs_unsigned(instr, type, immediate, op);
+	if (ret < 0 || *type == INSN_OTHER)
+		return ret;
+
+	if (op->dest.type == OP_DEST_REG_INDIRECT) {
+		op->dest.type = OP_DEST_PUSH;
+		op->dest.reg = CFI_SP;
+		op->dest.offset = SIGN_EXTEND(imm9, 9);
+	}
+
+	if (op->src.type == OP_SRC_REG_INDIRECT) {
+		op->src.type = OP_SRC_POP;
+		op->src.reg = CFI_SP;
+		op->src.offset = SIGN_EXTEND(imm9, 9);
+	}
+
+	return 0;
+}
+
+int arm_decode_ld_st_imm_pre(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	return arm_decode_ld_st_imm_post(instr, type, immediate, op);
+}
+
+#define LD_UNPR_UNALLOC_1 0b10011
+#define LD_UNPR_UNALLOC_2 0b11010
+int arm_decode_ld_st_imm_unpriv(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char size = 0, V = 0, opc = 0, rn = 0, rt = 0;
+	unsigned char decode_field = 0;
+	u32 imm9 = 0;
+
+	size = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+	opc = (instr >> 22) & ONES(2);
+
+	imm9 = (instr >> 12) & ONES(9);
+
+	decode_field = (size << 3) | (V << 2) | opc;
+	if (V == 1
+		|| (decode_field & 0b10111) == LD_UNPR_UNALLOC_1
+		|| (decode_field & 0b11111) == LD_UNPR_UNALLOC_2) {
+		return arm_decode_unknown(instr, type, immediate, op);
+	}
+#undef LD_UNPR_UNALLOC_1
+#undef LD_UNPR_UNALLOC_2
+
+	if (rn != CFI_SP) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+	*type = INSN_STACK;
+
+	switch(decode_field) {
+		case 1:
+		case 2:
+		case 3:
+		case 9:
+		case 10:
+		case 11:
+		case 17:
+		case 18:
+		case 25:
+			/* load */
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_SP;
+			op->src.offset = SIGN_EXTEND(imm9, 9);
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = rt;
+			op->dest.offset = 0;
+			break;
+		default:
+			/* store */
+			op->dest.type = OP_DEST_REG_INDIRECT;
+			op->dest.reg = CFI_SP;
+			op->dest.offset = SIGN_EXTEND(imm9, 9);
+			op->src.type = OP_DEST_REG;
+			op->src.reg = rt;
+			op->src.offset = 0;
+			break;
+	}
+	return 0;
+
+}
+
+static struct aarch64_insn_decoder atom_unallocs_decoder[] = {
+	{
+		.mask = 0b1001111,
+		.value = 0b0001001,
+	},
+	{
+		.mask = 0b1001110,
+		.value = 0b0001010,
+	},
+	{
+		.mask = 0b1001111,
+		.value = 0b0001101,
+	},
+	{
+		.mask = 0b1001110,
+		.value = 0b0001110,
+	},
+	{
+		.mask = 0b1101111,
+		.value = 0b0001100,
+	},
+	{
+		.mask = 0b1111111,
+		.value = 0b0111100,
+	},
+	{
+		.mask = 0b1000000,
+		.value = 0b1000000,
+	},
+};
+
+int arm_decode_atomic(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char V = 0, A = 0, R = 0, o3 = 0, opc = 0;
+	unsigned char rn = 0, rt = 0;
+	unsigned char decode_field = 0;
+	int i = 0;
+
+	V = EXTRACT_BIT(instr, 26);
+	A = EXTRACT_BIT(instr, 23);
+	R = EXTRACT_BIT(instr, 22);
+	o3 = EXTRACT_BIT(instr, 15);
+	opc = (instr >> 12) & ONES(3);
+
+	decode_field = (V << 6) | (A << 5) | (R << 4) | (o3 << 3) | opc;
+
+	for (i = 0; i < ARRAY_SIZE(atom_unallocs_decoder); i++) {
+		if ((decode_field & atom_unallocs_decoder[i].mask)
+				== atom_unallocs_decoder[i].value) {
+			return arm_decode_unknown(instr,
+						type, immediate, op);
+		}
+	}
+
+	rn = (instr >> 5) & ONES(5);
+	rt = instr & ONES(5);
+
+	if (rn != CFI_SP) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+	*type = INSN_STACK;
+	op->src.reg = CFI_SP;
+	op->src.type = OP_DEST_REG_INDIRECT;
+	op->src.offset = 0;
+	op->dest.type = OP_DEST_REG;
+	op->dest.reg = rt;
+	op->dest.offset = 0;
+
+	return 0;
+
+}
+
+int arm_decode_ld_st_regs_off(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char size = 0, V = 0, opc = 0, option = 0;
+	unsigned char rm = 0, rn = 0, rt = 0;
+	unsigned char decode_field = 0;
+
+	size = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+	opc = (instr >> 22) & ONES(2);
+	option = (instr >> 13) & ONES(3);
+
+#define LD_ROFF_UNALLOC_1	0b01110
+#define LD_ROFF_UNALLOC_2	0b10110
+#define LD_ROFF_UNALLOC_3	0b10011
+	decode_field = (size << 3) | (V << 2) | opc;
+	if (!EXTRACT_BIT(option, 1)
+		|| (decode_field & LD_ROFF_UNALLOC_1) == LD_ROFF_UNALLOC_1
+		|| (decode_field & LD_ROFF_UNALLOC_2) == LD_ROFF_UNALLOC_2
+		|| (decode_field & 0b10111) == LD_ROFF_UNALLOC_3) {
+		return arm_decode_unknown(instr, type, immediate, op);
+	}
+#undef LD_ROFF_UNALLOC_1
+#undef LD_ROFF_UNALLOC_2
+#undef LD_ROFF_UNALLOC_3
+
+	rn = (instr >> 5) & ONES(5);
+
+#define LD_ROFF_PRFM	0b11010
+	if (rn != CFI_SP || decode_field == LD_ROFF_PRFM) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+#undef LD_ROFF_PRFM
+
+	rt = instr & ONES(5);
+	rm = (instr >> 16) & ONES(5);
+
+	switch (decode_field & ONES(3)) {
+		case 0b001:
+		case 0b010:
+		case 0b011:
+		case 0b101:
+		case 0b111:
+			/* load */
+			op->src.type = OP_SRC_REG_INDIRECT;
+			op->src.reg = CFI_SP;
+			op->src.offset = rm;
+			op->dest.type = OP_DEST_REG;
+			op->dest.reg = rt;
+			op->dest.offset = 0;
+			break;
+		default:
+			/* store */
+			op->dest.type = OP_DEST_REG_INDIRECT;
+			op->dest.reg = CFI_SP;
+			op->dest.offset = rm;
+			op->src.type = OP_DEST_REG;
+			op->src.reg = rt;
+			op->src.offset = 0;
+			break;
+
+	}
+
+	return 0;
+}
+
+int arm_decode_ld_st_regs_pac(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char size = 0, V = 0, W = 0, S = 0;
+	unsigned char rn = 0, rt = 0;
+	u32 imm9 = 0, s10 = 0;
+
+	size = (instr >> 30) & ONES(2);
+	V = EXTRACT_BIT(instr, 26);
+	W = EXTRACT_BIT(instr, 11);
+
+	if (size != 3 || V == 1) {
+		return arm_decode_unknown(instr, type, immediate, op);
+	}
+
+	rn = (instr >> 5) & ONES(5);
+
+	if (rn != CFI_SP) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+
+	S = EXTRACT_BIT(instr, 22);
+	s10 = (S << 9) | imm9;
+
+	op->dest.reg = rt;
+	op->dest.type = OP_DEST_REG;
+	op->dest.offset = 0;
+	op->src.offset = (SIGN_EXTEND(s10, 9) << 3);
+	if (W) { /* pre-indexed/writeback */
+		op->src.type = OP_SRC_POP;
+		op->src.reg = CFI_SP;
+	}
+	else {
+		op->src.type = OP_SRC_REG_INDIRECT;
+		op->src.reg = CFI_SP;
+	}
+
+	return 0;
+}
+
+
+static struct aarch64_insn_decoder dp_reg_decoder[] = {
+	{
+		.mask = 0b111111000000,
+		.value = 0b010110000000,
+		.decode_func = arm_decode_dp_reg_2src,
+	},
+	{
+		.mask = 0b111111000000,
+		.value = 0b110110000000,
+		.decode_func = arm_decode_dp_reg_1src,
+	},
+	{
+		.mask = 0b011000000000,
+		.value = 0b000000000000,
+		.decode_func = arm_decode_dp_reg_logi,
+	},
+	{
+		.mask = 0b011001000000,
+		.value = 0b001000000000,
+		.decode_func = arm_decode_dp_reg_adds,
+	},
+	{
+		.mask = 0b011001000000,
+		.value = 0b001001000000,
+		.decode_func = arm_decode_dp_reg_adde,
+	},
+	{
+		.mask = 0b011111111111,
+		.value = 0b010000000000,
+		.decode_func = arm_decode_dp_reg_addc,
+	},
+	{
+		.mask = 0b011111011111,
+		.value = 0b010000000001,
+		.decode_func = arm_decode_dp_reg_rota,
+	},
+	{
+		.mask = 0b011111001111,
+		.value = 0b010000000010,
+		.decode_func = arm_decode_dp_reg_eval,
+	},
+	{
+		.mask = 0b011111000010,
+		.value = 0b010010000000,
+		.decode_func = arm_decode_dp_reg_cmpr,
+	},
+	{
+		.mask = 0b011111000010,
+		.value = 0b010010000010,
+		.decode_func = arm_decode_dp_reg_cmpi,
+	},
+	{
+		.mask = 0b011111000000,
+		.value = 0b010100000000,
+		.decode_func = arm_decode_dp_reg_csel,
+	},
+	{
+		.mask = 0b011000000000,
+		.value = 0b011000000000,
+		.decode_func = arm_decode_dp_reg_3src,
+	},
+};
+
+int arm_decode_dp_reg(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char op0 = 0, op1 = 0, op2 = 0, op3 = 0;
+	u32 decode_field = 0;
+	int i = 0;
+
+	op0 = EXTRACT_BIT(instr, 30);
+	op1 = EXTRACT_BIT(instr, 28);
+	op2 = (instr >> 21) & ONES(4);
+	op3 = (instr >> 10) & ONES(6);
+	decode_field = (op0 << 5) | (op1 << 4) | op2;
+	decode_field = (decode_field << 6) | op3;
+
+	for (i = 0; i < ARRAY_SIZE(dp_reg_decoder); i++) {
+		if ((decode_field & dp_reg_decoder[i].mask)
+				== dp_reg_decoder[i].value) {
+			return dp_reg_decoder[i].decode_func(instr, type,
+							immediate, op);
+		}
+	}
+	return arm_decode_unknown(instr, type, immediate, op);
+}
+
+static struct aarch64_insn_decoder dp_reg_2src_decoder[] = {
+	{
+		.mask = 0b00111111,
+		.value = 0b00000001,
+	},
+	{
+		.mask = 0b00111000,
+		.value = 0b00011000,
+	},
+	{
+		.mask = 0b00100000,
+		.value = 0b00100000,
+	},
+	{
+		.mask = 0b01111111,
+		.value = 0b00000101,
+	},
+	{
+		.mask = 0b01111100,
+		.value = 0b00001100,
+	},
+	{
+		.mask = 0b01111110,
+		.value = 0b01000010,
+	},
+	{
+		.mask = 0b01111100,
+		.value = 0b01000100,
+	},
+	{
+		.mask = 0b01111000,
+		.value = 0b01001000,
+	},
+	{
+		.mask = 0b01110000,
+		.value = 0b01010000,
+	},
+	{
+		.mask = 0b10111111,
+		.value = 0b00000000,
+	},
+	{
+		.mask = 0b11111111,
+		.value = 0b00000100,
+	},
+	{
+		.mask = 0b11111110,
+		.value = 0b00000110,
+	},
+	{
+		.mask = 0b11111011,
+		.value = 0b00010011,
+	},
+	{
+		.mask = 0b11111001,
+		.value = 0b10010000,
+	},
+	{
+		.mask = 0b11111010,
+		.value = 0b10010000,
+	},
+};
+
+static int dp_reg_2src_stack_fields[] = {
+	0b10000000,
+	0b10000100,
+	0b10000101,
+	0b10001100,
+	0b11000000,
+};
+
+int arm_decode_dp_reg_2src(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, S = 0, opcode = 0, rn = 0, rd = 0;
+	unsigned char decode_field = 0;
+	int i = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	S = EXTRACT_BIT(instr, 29);
+	opcode = (instr >> 10) & ONES(6);
+
+	decode_field = (sf << 7) | (S << 6) | opcode;
+
+	for (i = 0; i < ARRAY_SIZE(dp_reg_2src_decoder); i++) {
+		if ((decode_field & dp_reg_2src_decoder[i].mask)
+				== dp_reg_2src_decoder[i].value) {
+			return arm_decode_unknown(\
+					instr, type, immediate, op);
+		}
+	}
+
+	*type = 0;
+	for (i = 0; i < ARRAY_SIZE(dp_reg_2src_stack_fields); i++) {
+		if (opcode == dp_reg_2src_stack_fields[i]) {
+			*type = INSN_OTHER;
+			break;
+		}
+	}
+	if (*type == 0) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+
+	rn = (instr >> 5) & ONES(5);
+	rd = instr & ONES(5);
+
+#define IRG_OPCODE	0b10000100
+	if ((rn != CFI_SP && opcode != IRG_OPCODE)
+		|| (opcode == IRG_OPCODE && rd != CFI_SP
+			&& rn != CFI_SP)) {
+		*type = INSN_OTHER;
+		return 0;
+	}
+#undef IRG_OPCODE
+
+	*type = INSN_STACK;
+	op->dest.reg = rd;
+	op->dest.type = OP_DEST_REG;
+	op->dest.offset = 0;
+
+	op->src.reg = rn;
+	op->src.type = OP_DEST_REG;
+	op->src.offset = 0;
+
+	return 0;
+}
+
+
+static struct aarch64_insn_decoder dp_reg_1src_decoder[] = {
+	{
+		.mask = 0b0000000001000,
+		.value = 0b0000000001000,
+	},
+	{
+		.mask = 0b0000000010000,
+		.value = 0b0000000010000,
+	},
+	{
+		.mask = 0b0000000100000,
+		.value = 0b0000000100000,
+	},
+	{
+		.mask = 0b0000001000000,
+		.value = 0b0000001000000,
+	},
+	{
+		.mask = 0b0000010000000,
+		.value = 0b0000010000000,
+	},
+	{
+		.mask = 0b0000100000000,
+		.value = 0b0000100000000,
+	},
+	{
+		.mask = 0b0001000000000,
+		.value = 0b0001000000000,
+	},
+	{
+		.mask = 0b0010000000000,
+		.value = 0b0010000000000,
+	},
+	{
+		.mask = 0b0111111111110,
+		.value = 0b0000000000110,
+	},
+	{
+		.mask = 0b0100000000000,
+		.value = 0b0100000000000,
+	},
+};
+
+int arm_decode_dp_reg_1src(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, S = 0, opcode2 = 0, opcode = 0;
+	u32 decode_field = 0;
+	int i = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	S = EXTRACT_BIT(instr, 29);
+	opcode2 = (instr >> 16) & ONES(5);
+	opcode = (instr >> 10) & ONES(6);
+
+	decode_field = (sf << 6) | (S << 5) | opcode2;
+	decode_field = (decode_field << 6) | opcode;
+
+	for (i = 0; i < ARRAY_SIZE(dp_reg_1src_decoder); i++) {
+		if ((decode_field & dp_reg_1src_decoder[i].mask) ==
+				dp_reg_1src_decoder[i].value) {
+			return arm_decode_unknown(instr, type, immediate, op);
+		}
+	}
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_dp_reg_logi(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, imm6 = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	imm6 = (instr >> 10) & ONES(6);
+
+	if (imm6 >= 0b100000 && !sf)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_dp_reg_adds(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, shift = 0, imm6 = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	shift = (instr >> 22) & ONES(2);
+	imm6 = (instr >> 10) & ONES(6);
+
+	if ((imm6 >= 0b100000 && !sf) || shift == 0b11)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_dp_reg_adde(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+
+	unsigned char S = 0, opt = 0, imm3 = 0, rn = 0, rd = 0;
+
+	S = EXTRACT_BIT(instr, 29);
+	opt = (instr >> 22) & ONES(2);
+	imm3 = (instr >> 10) & ONES(3);
+	rn = (instr >> 5) & ONES(5);
+	rd = instr & ONES(5);
+
+	if (opt != 0 || imm3 >= 0b101)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	if (rd == CFI_SP && S == 0) {
+		*type = INSN_STACK;
+		op->dest.reg = CFI_SP;
+		op->dest.type = OP_DEST_REG;
+		op->src.type = OP_SRC_ADD;
+		op->src.reg = rn;
+
+		return 0;
+	}
+	*type = INSN_OTHER;
+	return 0;
+}
+
+
+int arm_decode_dp_reg_addc(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_dp_reg_rota(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, S = 0, op_bit = 0, o2 = 0;
+	unsigned char decode_field = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	op_bit = EXTRACT_BIT(instr, 30);
+	S = EXTRACT_BIT(instr, 29);
+	o2 = EXTRACT_BIT(instr, 4);
+
+	decode_field = (sf << 3) | (op_bit << 2) | (S << 1) | o2;
+
+	if (decode_field != 0b1010)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+	return 0;
+}
+
+
+int arm_decode_dp_reg_eval(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, S = 0, op_bit = 0, o3 = 0, sz = 0;
+	unsigned char opcode2 = 0, mask = 0;
+	u32 decode_field = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	op_bit = EXTRACT_BIT(instr, 30);
+	S = EXTRACT_BIT(instr, 29);
+	sz = EXTRACT_BIT(instr, 14);
+	o3 = EXTRACT_BIT(instr, 4);
+
+	opcode2 = (instr >> 15) & ONES(6);
+	mask = instr & ONES(4);
+
+	decode_field = (sf << 2) | (op_bit << 1) | S;
+	decode_field = (decode_field << 12) | (opcode2 << 6) | (sz << 5);
+	decode_field |= (o3 << 4) | mask;
+
+#define DP_EVAL_SETF_1	0b001000000001101
+#define DP_EVAL_SETF_2	0b001000000101101
+
+	if (decode_field != DP_EVAL_SETF_1
+		&& decode_field != DP_EVAL_SETF_2) {
+		return arm_decode_unknown(instr, type, immediate, op);
+	}
+
+	*type = INSN_OTHER;
+	return 0;
+#undef DP_EVAL_SETF_1
+#undef DP_EVAL_SETF_2
+}
+
+int arm_decode_dp_reg_cmpr(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char S = 0, o2 = 0, o3 = 0;
+
+	S = EXTRACT_BIT(instr, 29);
+	o2 = EXTRACT_BIT(instr, 10);
+	o3 = EXTRACT_BIT(instr, 4);
+
+	if (!S || o2 || o3)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_dp_reg_csel(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char S = 0, op2 = 0;
+
+	S = EXTRACT_BIT(instr, 29);
+	op2 = (instr >> 10) & ONES(2);
+
+	if (S || op2 >= 0b10)
+		return arm_decode_unknown(instr, type, immediate, op);
+
+	*type = INSN_OTHER;
+	return 0;
+}
+
+int arm_decode_dp_reg_cmpi(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	return arm_decode_dp_reg_cmpr(instr, type, immediate, op);
+}
+
+static int dp_reg_3src_fields[] = {
+};
+
+static struct aarch64_insn_decoder dp_reg_3src_decoder[] = {
+	{
+		.mask = 0b0111111,
+		.value = 0b0000101,
+	},
+	{
+		.mask = 0b0111110,
+		.value = 0b0000110,
+	},
+	{
+		.mask = 0b0111110,
+		.value = 0b0001000,
+	},
+	{
+		.mask = 0b0111111,
+		.value = 0b0001101,
+	},
+	{
+		.mask = 0b0111110,
+		.value = 0b0001110,
+	},
+	{
+		.mask = 0b0110000,
+		.value = 0b0010000,
+	},
+	{
+		.mask = 0b0100000,
+		.value = 0b0100000,
+	},
+	{
+		.mask = 0b1111111,
+		.value = 0b0000010,
+	},
+	{
+		.mask = 0b1111111,
+		.value = 0b0000011,
+	},
+	{
+		.mask = 0b1111111,
+		.value = 0b0000100,
+	},
+	{
+		.mask = 0b1111111,
+		.value = 0b0001010,
+	},
+	{
+		.mask = 0b1111111,
+		.value = 0b0001011,
+	},
+	{
+		.mask = 0b1111111,
+		.value = 0b0001100,
+	},
+};
+
+int arm_decode_dp_reg_3src(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	unsigned char sf = 0, op54 = 0, op31 = 0, o0 = 0;
+	unsigned char decode_field = 0;
+	int i = 0;
+
+	sf = EXTRACT_BIT(instr, 31);
+	op54 = (instr >> 29) & ONES(2);
+	op31 = (instr >> 21) & ONES(3);
+	o0 = EXTRACT_BIT(instr, 15);
+
+	decode_field = (sf << 6) | (op54 << 4) | (op31 << 1) | o0;
+
+	for (i = 0; i < ARRAY_SIZE(dp_reg_3src_fields); i++) {
+		if ((decode_field & dp_reg_3src_decoder[i].mask)
+				== dp_reg_3src_decoder[i].value) {
+			return arm_decode_unknown(instr, type, immediate, op);
+		}
+	}
+
+	*type = INSN_OTHER;
+	return 0;
+
+}
+unsigned long arch_compute_jump_destination(struct instruction *insn)
+{
+	return insn->offset + insn->immediate;
+}
+
+static struct aarch64_insn_decoder sve_enc_decoder[] = {
+	{
+		.mask = 0b1111010000111000,
+		.value = 0b0000010000011000,
+	},
+	{
+		.mask = 0b1111110000111000,
+		.value = 0b0001110000000000,
+	},
+	{
+		.mask = 0b1111010000110000,
+		.value = 0b0011010000010000,
+	},
+	{
+		.mask = 0b1111011100111000,
+		.value = 0b0011010100101000,
+	},
+	{
+		.mask = 0b1111011000110000,
+		.value = 0b0011011000100000,
+	},
+	{
+		.mask = 0b1111010000100000,
+		.value = 0b0100000000100000,
+	},
+	{
+		.mask = 0b1111000000000000,
+		.value = 0b0101000000000000,
+	},
+	{
+		.mask = 0b1111011111111000,
+		.value = 0b0110000000101000,
+	},
+	{
+		.mask = 0b1111011111110000,
+		.value = 0b0110000000110000,
+	},
+	{
+		.mask = 0b1111011111100000,
+		.value = 0b0110000001100000,
+	},
+	{
+		.mask = 0b1111011110100000,
+		.value = 0b0110000010100000,
+	},
+	{
+		.mask = 0b1111011100100000,
+		.value = 0b0110000100100000,
+	},
+	{
+		.mask = 0b1111011000100000,
+		.value = 0b0110001000100000,
+	},
+	{
+		.mask = 0b1111010000110110,
+		.value = 0b0110010000000010,
+	},
+	{
+		.mask = 0b1111010000111111,
+		.value = 0b0110010000001001,
+	},
+	{
+		.mask = 0b1111010000111100,
+		.value = 0b0110010000001100,
+	},
+	{
+		.mask = 0b1111010000110000,
+		.value = 0b0110010000010000,
+	},
+	{
+		.mask = 0b1111010000100000,
+		.value = 0b0110010000100000,
+	},
+	{
+		.mask = 0b1111011100111100,
+		.value = 0b0111000100001000,
+	},
+};
+
+/*
+ * Since these instructions are optional (not present on all arm processors)
+ * we consider that they will never be used to save/restore stack frame.
+ */
+int arm_decode_sve_encoding(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op)
+{
+	int i = 0;
+	unsigned char op0 = 0, op1 = 0, op2 = 0, op3 = 0;
+	u32 decode_field = 0;
+
+	op0 = (instr >> 29) & ONES(3);
+	op1 = (instr >> 23) & ONES(2);
+	op2 = (instr >> 17) & ONES(5);
+	op3 = (instr >> 10) & ONES(6);
+
+	decode_field = (op0 << 2) | op1;
+	decode_field = (decode_field << 5) | op2;
+	decode_field = (decode_field << 6) | op3;
+
+	for (i = 0; i < ARRAY_SIZE(sve_enc_decoder); i++) {
+		if ((decode_field & sve_enc_decoder[i].mask)
+			== sve_enc_decoder[i].value)
+			return arm_decode_unknown(instr, type, immediate, op);
+	}
+
+	*type = INSN_OTHER;
+
+	return 0;
+}
diff --git a/tools/objtool/arch/arm64/include/arch_special.h b/tools/objtool/arch/arm64/include/arch_special.h
new file mode 100644
index 000000000000..54bcce4c58c0
--- /dev/null
+++ b/tools/objtool/arch/arm64/include/arch_special.h
@@ -0,0 +1,44 @@
+/*
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#define EX_ENTRY_SIZE		8
+#define EX_ORIG_OFFSET		0
+#define EX_NEW_OFFSET		4
+
+#define JUMP_ENTRY_SIZE		16
+#define JUMP_ORIG_OFFSET	0
+#define JUMP_NEW_OFFSET		4
+
+#define ALT_ENTRY_SIZE		12
+#define ALT_ORIG_OFFSET		0
+#define ALT_NEW_OFFSET		4
+#define ALT_FEATURE_OFFSET	8
+#define ALT_ORIG_LEN_OFFSET	10
+#define ALT_NEW_LEN_OFFSET	11
+
+/*
+ * On arm64 the .altinstr_replacement is not always marked
+ * as containing executable instruction. But we still want
+ * to process it so we ignore the SHF_EXEC flag
+ */
+#define IGNORE_SHF_EXEC_FLAG	1
+
+/*
+ * The jump table detection is not the same on arm64 so for
+ * now we just detect if it is a dynamic jump (br <Xn> insn)
+ */
+#define JUMP_DYNAMIC_IS_SWITCH_TABLE	1
+
+#define X86_FEATURE_POPCNT (4*32+23)
diff --git a/tools/objtool/arch/arm64/include/asm/orc_types.h b/tools/objtool/arch/arm64/include/asm/orc_types.h
new file mode 100644
index 000000000000..46f516dd80ce
--- /dev/null
+++ b/tools/objtool/arch/arm64/include/asm/orc_types.h
@@ -0,0 +1,109 @@
+/*
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _ORC_TYPES_H
+#define _ORC_TYPES_H
+
+#include <linux/types.h>
+#include <linux/compiler.h>
+
+/*
+ * The ORC_REG_* registers are base registers which are used to find other
+ * registers on the stack.
+ *
+ * ORC_REG_PREV_SP, also known as DWARF Call Frame Address (CFA), is the
+ * address of the previous frame: the caller's SP before it called the current
+ * function.
+ *
+ * ORC_REG_UNDEFINED means the corresponding register's value didn't change in
+ * the current frame.
+ *
+ * The most commonly used base registers are SP and BP -- which the previous SP
+ * is usually based on -- and PREV_SP and UNDEFINED -- which the previous BP is
+ * usually based on.
+ *
+ * The rest of the base registers are needed for special cases like entry code
+ * and GCC realigned stacks.
+ */
+#define ORC_REG_UNDEFINED		0
+#define ORC_REG_PREV_SP			1
+#define ORC_REG_DX			2
+#define ORC_REG_DI			3
+#define ORC_REG_BP			4
+#define ORC_REG_SP			5
+#define ORC_REG_R10			6
+#define ORC_REG_R13			7
+#define ORC_REG_BP_INDIRECT		8
+#define ORC_REG_SP_INDIRECT		9
+#define ORC_REG_MAX			15
+
+/*
+ * ORC_TYPE_CALL: Indicates that sp_reg+sp_offset resolves to PREV_SP (the
+ * caller's SP right before it made the call).  Used for all callable
+ * functions, i.e. all C code and all callable asm functions.
+ *
+ * ORC_TYPE_REGS: Used in entry code to indicate that sp_reg+sp_offset points
+ * to a fully populated pt_regs from a syscall, interrupt, or exception.
+ *
+ * ORC_TYPE_REGS_IRET: Used in entry code to indicate that sp_reg+sp_offset
+ * points to the iret return frame.
+ *
+ * The UNWIND_HINT macros are used only for the unwind_hint struct.  They
+ * aren't used in struct orc_entry due to size and complexity constraints.
+ * Objtool converts them to real types when it converts the hints to orc
+ * entries.
+ */
+#define ORC_TYPE_CALL			0
+#define ORC_TYPE_REGS			1
+#define ORC_TYPE_REGS_IRET		2
+#define UNWIND_HINT_TYPE_SAVE		3
+#define UNWIND_HINT_TYPE_RESTORE	4
+
+#ifndef __ASSEMBLY__
+/*
+ * This struct is more or less a vastly simplified version of the DWARF Call
+ * Frame Information standard.  It contains only the necessary parts of DWARF
+ * CFI, simplified for ease of access by the in-kernel unwinder.  It tells the
+ * unwinder how to find the previous SP and BP (and sometimes entry regs) on
+ * the stack for a given code address.  Each instance of the struct corresponds
+ * to one or more code locations.
+ */
+struct orc_entry {
+	s16		sp_offset;
+	s16		bp_offset;
+	unsigned	sp_reg:4;
+	unsigned	bp_reg:4;
+	unsigned	type:2;
+	unsigned	end:1;
+} __packed;
+
+/*
+ * This struct is used by asm and inline asm code to manually annotate the
+ * location of registers on the stack for the ORC unwinder.
+ *
+ * Type can be either ORC_TYPE_* or UNWIND_HINT_TYPE_*.
+ */
+struct unwind_hint {
+	u32		ip;
+	s16		sp_offset;
+	u8		sp_reg;
+	u8		type;
+	u8		end;
+};
+#endif /* __ASSEMBLY__ */
+
+#endif /* _ORC_TYPES_H */
diff --git a/tools/objtool/arch/arm64/include/bit_operations.h b/tools/objtool/arch/arm64/include/bit_operations.h
new file mode 100644
index 000000000000..bdfa9d183995
--- /dev/null
+++ b/tools/objtool/arch/arm64/include/bit_operations.h
@@ -0,0 +1,22 @@
+#ifndef _BIT_OPERATIONS_H
+#define _BIT_OPERATIONS_H
+
+#include <stdint.h>
+#include <stdbool.h>
+#include <linux/types.h>
+
+#define ONES(N) 		(((__uint128_t)1 << (N)) - 1)
+#define ZERO_EXTEND(X, N)	((X) & ONES(N))
+#define EXTRACT_BIT(X, N)	(((X) >> (N)) & ONES(1))
+#define SIGN_EXTEND(X, N)	((((unsigned long) -1 + (EXTRACT_BIT(X, N - 1) ^ 1)) << N) | X)
+
+u64 replicate(u64 x, int size, int n);
+
+u64 ror(u64 x, int size, int shift);
+
+int highest_set_bit(u32 x);
+
+__uint128_t decode_bit_masks(unsigned char N, unsigned char imms,
+			unsigned char immr, bool immediate);
+
+#endif /* _BIT_OPERATIONS_H */
diff --git a/tools/objtool/arch/arm64/include/cfi.h b/tools/objtool/arch/arm64/include/cfi.h
new file mode 100644
index 000000000000..8084b95543b8
--- /dev/null
+++ b/tools/objtool/arch/arm64/include/cfi.h
@@ -0,0 +1,76 @@
+/*
+ * Copyright (C) 2015-2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _OBJTOOL_CFI_H
+#define _OBJTOOL_CFI_H
+
+#define CFI_UNDEFINED		-1
+#define CFI_CFA			-2
+#define CFI_SP_INDIRECT		-3
+#define CFI_BP_INDIRECT		-4
+
+#define CFI_R0			0
+#define CFI_R1			1
+#define CFI_R2			2
+#define CFI_R3			3
+#define CFI_R4			4
+#define CFI_R5			5
+#define CFI_R6			6
+#define CFI_R7			7
+#define CFI_R8			8
+#define CFI_R9			9
+#define CFI_R10			10
+#define CFI_R11			11
+#define CFI_R12			12
+#define CFI_R13			13
+#define CFI_R14			14
+#define CFI_R15			15
+#define CFI_R16			16
+#define CFI_R17			17
+#define CFI_R18			18
+#define CFI_R19			19
+#define CFI_R20			20
+#define CFI_R21			21
+#define CFI_R22			22
+#define CFI_R23			23
+#define CFI_R24			24
+#define CFI_R25			25
+#define CFI_R26			26
+#define CFI_R27			27
+#define CFI_R28			28
+#define CFI_R29			29
+#define CFI_FP			CFI_R29
+#define CFI_BP			CFI_FP
+#define CFI_R30			30
+#define CFI_LR			CFI_R30
+#define CFI_SP			31
+
+
+
+#define CFI_NUM_REGS		32
+
+struct cfi_reg {
+	int base;
+	int offset;
+};
+
+struct cfi_state {
+	struct cfi_reg cfa;
+	struct cfi_reg regs[CFI_NUM_REGS];
+};
+
+#endif /* _OBJTOOL_CFI_H */
diff --git a/tools/objtool/arch/arm64/include/insn_decode.h b/tools/objtool/arch/arm64/include/insn_decode.h
new file mode 100644
index 000000000000..b4229cbe3ae1
--- /dev/null
+++ b/tools/objtool/arch/arm64/include/insn_decode.h
@@ -0,0 +1,219 @@
+/*
+ * Copyright (C) 2019 Raphael Gault <raphael.gault@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#ifndef _ARM_INSN_DECODE_H
+#define _ARM_INSN_DECODE_H
+
+#include "../../../arch.h"
+
+#define INSN_RESERVED	0b0000
+#define INSN_UNKNOWN	0b0001
+#define INSN_SVE_ENC	0b0010
+#define INSN_UNALLOC	0b0011
+#define INSN_DP_IMM	0b1001	//0x100x
+#define INSN_BRANCH	0b1011	//0x101x
+#define INSN_LD_ST_4	0b0100	//0bx1x0
+#define INSN_LD_ST_6	0b0110	//0bx1x0
+#define INSN_LD_ST_C	0b1100	//0bx1x0
+#define INSN_LD_ST_E	0b1110	//0bx1x0
+#define INSN_DP_REG_5	0b0101	//0bx101
+#define INSN_DP_REG_D	0b1101	//0bx101
+#define INSN_DP_SIMD_7	0b0111	//0bx111
+#define INSN_DP_SIMD_F	0b1111	//0bx111
+
+#define INSN_PCREL	0b001	//0b00x
+#define INSN_ADD_SUB	0b010
+#define INSN_ADD_TAG	0b011
+#define INSN_LOGICAL	0b100
+#define INSN_MOVE_WIDE	0b101
+#define INSN_BITFIELD	0b110
+#define INSN_EXTRACT	0b111
+
+#define INSN_BR_UNCOND_IMM_L	0b0001
+#define INSN_CP_BR_IMM_L	0b0010
+#define INSN_TST_BR_IMM_L	0b0011
+#define INSN_BR_COND_IMM	0b0100
+#define INSN_BR_UNKNOWN_IMM	0b0111
+#define INSN_BR_UNCOND_IMM_H	0b1001
+#define INSN_CP_BR_IMM_H	0b1010
+#define INSN_TST_BR_IMM_H	0b1011
+#define INSN_BR_SYS_NO_IMM	0b1101
+
+#define INSN_OP1_HINTS		0b01000000110010
+#define INSN_OP1_BARRIERS	0b01000000110011
+
+#define COMPOSED_INSN_REGS_NUM	2
+#define INSN_COMPOSED	1
+
+#define ADR_SOURCE	-1
+
+
+struct insn_decode_state {
+	bool is_composed_insn;
+	unsigned char current_reg_num;
+	unsigned char insn_regs_num;
+	unsigned char insn_type;
+	unsigned long immediate;
+	unsigned long curr_offset;
+	struct stack_op op;
+	unsigned char regs[COMPOSED_INSN_REGS_NUM];
+};
+
+typedef int (*arm_decode_class)(u32 instr, unsigned char *type,
+				unsigned long *immediate, struct stack_op *op);
+
+struct aarch64_insn_decoder {
+	u32 mask;
+	u32 value;
+	arm_decode_class decode_func;
+};
+
+/* arm64 instruction classes */
+int arm_decode_reserved(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_sve_encoding(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_br_sys(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_simd(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_unknown(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+
+
+/* arm64 data processing -- immediate subclasses */
+int arm_decode_pcrel(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_add_sub(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_add_sub_tags(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_logical(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_move_wide(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_bitfield(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_extract(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+
+/* arm64 branch, exception generation, system insn subclasses */
+int arm_decode_br_uncond_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_br_comp_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_br_tst_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_br_cond_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+#if 0
+int arm_decode_br_sys_no_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+#endif
+
+int arm_decode_br_uncond_reg(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+
+int arm_decode_br_reg(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_except_gen(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_hints(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_barriers(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_pstate(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_system_insn(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_system_regs(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+
+/* arm64 load/store instructions */
+int arm_decode_adv_simd_mult(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_adv_simd_mult_post(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_adv_simd_single(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_adv_simd_single_post(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_mem_tags(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ldapr_stlr_unsc_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_regs_literal(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_noalloc_pair_off(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_regs_pair_post(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_regs_pair_off(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_regs_pair_pre(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_regs_unsc_imm(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_imm_post(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_imm_unpriv(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_imm_pre(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_atomic(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_regs_off(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_regs_pac(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_ld_st_regs_unsigned(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+
+int arm_decode_ld_st_exclusive(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+
+/* arm64 data processing -- registers instructions */
+int arm_decode_dp_reg_1src(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_2src(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_3src(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_adde(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_cmpi(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_eval(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_cmpr(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_rota(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_csel(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_addc(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_adds(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+int arm_decode_dp_reg_logi(u32 instr, unsigned char *type,
+			unsigned long *immediate, struct stack_op *op);
+#endif /* _ARM_INSN_DECODE_H */
diff --git a/tools/objtool/arch/arm64/orc_gen.c b/tools/objtool/arch/arm64/orc_gen.c
new file mode 100644
index 000000000000..81383d34a743
--- /dev/null
+++ b/tools/objtool/arch/arm64/orc_gen.c
@@ -0,0 +1,40 @@
+/*
+ * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <stdlib.h>
+#include <string.h>
+
+#include "../../orc.h"
+#include "../../check.h"
+#include "../../warn.h"
+
+int arch_create_orc(struct objtool_file *file)
+{
+	WARN("arm64 architecture does not yet support orc");
+	return -1;
+}
+
+int arch_create_orc_sections(struct objtool_file *file)
+{
+	WARN("arm64 architecture does not yet support orc");
+	return -1;
+}
+
+int arch_orc_read_unwind_hints(struct objtool_file *file)
+{
+	return 0;
+}
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
  2019-04-09 13:52 ` [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support Raphael Gault
  2019-04-09 13:52 ` [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool Raphael Gault
@ 2019-04-09 13:52 ` Raphael Gault
  2019-04-09 16:12   ` Peter Zijlstra
                     ` (2 more replies)
  2019-04-09 13:52 ` [RFC 4/6] arm64: assembler: Add macro to annotate asm function having non standard stack-frame Raphael Gault
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 36+ messages in thread
From: Raphael Gault @ 2019-04-09 13:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon, julien.thierry,
	Raphael Gault

Since the way the initial stack frame when entering a function is different that what is done
in the x86_64 architecture, we need to add some more check to support the different cases.
As opposed as for x86_64, the return address is not stored by the call instruction but is instead
loaded in a register. The initial stack frame is thus empty when entering a function and 2 push
operations are needed to set it up correctly. All the different combinations need to be
taken into account.

On arm64, the .altinstr_replacement section is not flagged as containing executable instructions
but we still need to process it.

Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able
to identify in which case we are when looking for it.

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 tools/objtool/arch.h              |  2 +
 tools/objtool/arch/arm64/decode.c | 27 +++++++++
 tools/objtool/arch/x86/decode.c   |  5 ++
 tools/objtool/check.c             | 95 +++++++++++++++++++++++++++----
 tools/objtool/elf.c               |  3 +-
 5 files changed, 120 insertions(+), 12 deletions(-)

diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index 0eff166ca613..f3bef3f2cef3 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn);
 
 unsigned long arch_compute_rela_sym_offset(int addend);
 
+bool arch_is_insn_sibling_call(struct instruction *insn);
+
 #endif /* _ARCH_H */
diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
index 0feb3ae3af5d..8b293eae2b38 100644
--- a/tools/objtool/arch/arm64/decode.c
+++ b/tools/objtool/arch/arm64/decode.c
@@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
 	return addend;
 }
 
+/*
+ * In order to know if we are in presence of a sibling
+ * call and not in presence of a switch table we look
+ * back at the previous instructions and see if we are
+ * jumping inside the same function that we are already
+ * in.
+ */
+bool arch_is_insn_sibling_call(struct instruction *insn)
+{
+	struct instruction *prev;
+	struct list_head *l;
+	struct symbol *sym;
+	list_for_each_prev(l, &insn->list) {
+		prev = (void *)l;
+		if (!prev->func
+			|| prev->func->pfunc != insn->func->pfunc)
+			return false;
+		if (prev->stack_op.src.reg != ADR_SOURCE)
+			continue;
+		sym = find_symbol_containing(insn->sec, insn->immediate);
+		if (!sym || sym->type != STT_FUNC
+			|| sym->pfunc != insn->func->pfunc)
+			return true;
+		break;
+	}
+	return true;
+}
 static int is_arm64(struct elf *elf)
 {
 	switch(elf->ehdr.e_machine){
diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
index 1af7b4996307..88c3d99c76be 100644
--- a/tools/objtool/arch/x86/decode.c
+++ b/tools/objtool/arch/x86/decode.c
@@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend)
 	return addend + 4;
 }
 
+bool arch_is_insn_sibling_call(struct instruction *insn)
+{
+	return true;
+}
+
 int arch_orc_read_unwind_hints(struct objtool_file *file)
 {
 	struct section *sec, *relasec;
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 17fcd8c8f9c1..fa6106214318 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file)
 	unsigned long offset;
 	struct instruction *insn;
 	int ret;
+	static int composed_insn = 0;
 
 	for_each_sec(file, sec) {
 
-		if (!(sec->sh.sh_flags & SHF_EXECINSTR))
+		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
+			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
 			continue;
 
 		if (strcmp(sec->name, ".altinstr_replacement") &&
@@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file)
 				WARN_FUNC("invalid instruction type %d",
 					  insn->sec, insn->offset, insn->type);
 				ret = -1;
-				goto err;
+				free(insn);
+				continue;
 			}
-
-			hash_add(file->insn_hash, &insn->hash, insn->offset);
+			/*
+			 * For arm64 architecture, we sometime split instructions so that
+			 * we can track the state evolution (i.e. load/store of pairs of registers).
+			 * We thus need to take both into account and not erase the previous ones.
+			 */
+			if (composed_insn > 0)
+				hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn);
+			else
+				hash_add(file->insn_hash, &insn->hash, insn->offset);
+			if (insn->len == 0)
+				composed_insn++;
+			else
+				composed_insn = 0;
 			list_add_tail(&insn->list, &file->insn_list);
 		}
 
@@ -510,10 +524,10 @@ static int add_jump_destinations(struct objtool_file *file)
 			dest_off = arch_compute_jump_destination(insn);
 		} else if (rela->sym->type == STT_SECTION) {
 			dest_sec = rela->sym->sec;
-			dest_off = rela->addend + 4;
+			dest_off = arch_compute_rela_sym_offset(rela->addend);
 		} else if (rela->sym->sec->idx) {
 			dest_sec = rela->sym->sec;
-			dest_off = rela->sym->sym.st_value + rela->addend + 4;
+			dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend);
 		} else if (strstr(rela->sym->name, "_indirect_thunk_")) {
 			/*
 			 * Retpoline jumps are really dynamic jumps in
@@ -663,7 +677,7 @@ static int handle_group_alt(struct objtool_file *file,
 		last_orig_insn = insn;
 	}
 
-	if (next_insn_same_sec(file, last_orig_insn)) {
+	if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) {
 		fake_jump = malloc(sizeof(*fake_jump));
 		if (!fake_jump) {
 			WARN("malloc failed");
@@ -976,6 +990,17 @@ static struct rela *find_switch_table(struct objtool_file *file,
 		if (find_symbol_containing(rodata_sec, table_offset))
 			continue;
 
+		/*
+		 * If we are on arm64 architecture, we now that we
+		 * are in presence of a switch table thanks to
+		 * the `br <Xn>` insn. but we can't retrieve it yet.
+		 * So we just ignore unreachable for this file.
+		 */
+		if (JUMP_DYNAMIC_IS_SWITCH_TABLE) {
+			file->ignore_unreachables = true;
+			return NULL;
+		}
+
 		rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
 		if (rodata_rela) {
 			/*
@@ -1258,8 +1283,8 @@ static void save_reg(struct insn_state *state, unsigned char reg, int base,
 
 static void restore_reg(struct insn_state *state, unsigned char reg)
 {
-	state->regs[reg].base = CFI_UNDEFINED;
-	state->regs[reg].offset = 0;
+	state->regs[reg].base = initial_func_cfi.regs[reg].base;
+	state->regs[reg].offset = initial_func_cfi.regs[reg].offset;
 }
 
 /*
@@ -1415,8 +1440,33 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
 
 				/* add imm, %rsp */
 				state->stack_size -= op->src.offset;
-				if (cfa->base == CFI_SP)
+				if (cfa->base == CFI_SP) {
 					cfa->offset -= op->src.offset;
+					if (state->stack_size == 0
+							&& initial_func_cfi.cfa.base == CFI_CFA) {
+						cfa->base = CFI_CFA;
+						cfa->offset = 0;
+					}
+				}
+				/*
+				 * on arm64 the save/restore of sp into fp is not automatic
+				 * and the first one can be done without the other so we
+				 * need to be careful not to invalidate the stack frame in such
+				 * cases.
+				 */
+				else if (cfa->base == CFI_BP) {
+					if (state->stack_size == 0
+							&& initial_func_cfi.cfa.base == CFI_CFA) {
+						cfa->base = CFI_CFA;
+						cfa->offset = 0;
+						restore_reg(state, CFI_BP);
+					}
+				}
+				else if (cfa->base == CFI_CFA) {
+					cfa->base = CFI_SP;
+					if (state->stack_size >= 16)
+						cfa->offset = 16;
+				}
 				break;
 			}
 
@@ -1427,6 +1477,16 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
 				break;
 			}
 
+			if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP &&
+			    cfa->base == CFI_SP &&
+			    regs[CFI_BP].base == CFI_CFA &&
+			    regs[CFI_BP].offset == -cfa->offset) {
+
+				/* mov %rsp, %rbp */
+				cfa->base = op->dest.reg;
+				state->bp_scratch = false;
+				break;
+			}
 			if (op->src.reg == CFI_SP && cfa->base == CFI_SP) {
 
 				/* drap: lea disp(%rsp), %drap */
@@ -1518,6 +1578,9 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
 			state->stack_size -= 8;
 			if (cfa->base == CFI_SP)
 				cfa->offset -= 8;
+			if (cfa->base == CFI_SP && cfa->offset == 0
+					&& initial_func_cfi.cfa.base == CFI_CFA)
+				cfa->base = CFI_CFA;
 
 			break;
 
@@ -1557,6 +1620,8 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
 
 	case OP_DEST_PUSH:
 		state->stack_size += 8;
+		if (cfa->base == CFI_CFA)
+			cfa->base = CFI_SP;
 		if (cfa->base == CFI_SP)
 			cfa->offset += 8;
 
@@ -1728,7 +1793,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 	insn = first;
 	sec = insn->sec;
 
-	if (insn->alt_group && list_empty(&insn->alts)) {
+	if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) {
 		WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
 			  sec, insn->offset);
 		return 1;
@@ -1871,6 +1936,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
 		case INSN_JUMP_DYNAMIC:
 			if (func && list_empty(&insn->alts) &&
 			    has_modified_stack_frame(&state)) {
+				/*
+				 * On arm64 `br <Xn>` insn can be used for switch-tables
+				 * but it cannot be distinguished in itself from a sibling
+				 * call thus we need to have a look at the previous instructions
+				 * to determine which it is
+				 */
+				if (!arch_is_insn_sibling_call(insn))
+					break;
 				WARN_FUNC("sibling call from callable instruction with modified stack frame",
 					  sec, insn->offset);
 				return 1;
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index b8f3cca8e58b..136f9b9fb1d1 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -74,7 +74,8 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset)
 	struct symbol *sym;
 
 	list_for_each_entry(sym, &sec->symbol_list, list)
-		if (sym->type != STT_SECTION &&
+		if (sym->type != STT_NOTYPE &&
+		    sym->type != STT_SECTION &&
 		    sym->offset == offset)
 			return sym;
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC 4/6] arm64: assembler: Add macro to annotate asm function having non standard stack-frame.
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
                   ` (2 preceding siblings ...)
  2019-04-09 13:52 ` [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture Raphael Gault
@ 2019-04-09 13:52 ` Raphael Gault
  2019-04-24 10:44   ` Julien Thierry
  2019-04-09 13:52 ` [RFC 5/6] arm64: sleep: Add stack frame setup for __cpu_supsend_enter Raphael Gault
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Raphael Gault @ 2019-04-09 13:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon, julien.thierry,
	Raphael Gault

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 arch/arm64/include/asm/assembler.h | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
index 4feb6119c3c9..636a07a7eb76 100644
--- a/arch/arm64/include/asm/assembler.h
+++ b/arch/arm64/include/asm/assembler.h
@@ -748,4 +748,22 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
 .Lyield_out_\@ :
 	.endm
 
+
+#ifdef	CONFIG_STACK_VALIDATION
+	/*
+	 * This macro is the arm64 assembler equivalent of the
+	 * macro STACK_FRAME_NON_STANDARD define at
+	 * ~/include/linux/frame.h
+	 */
+	.macro	asm_stack_frame_non_standard	func
+	.pushsection ".discard.func_stack_frame_non_standard"
+	.8byte	\func
+	.popsection
+	.endm
+#else
+	.macro	asm_stack_frame_non_standard	func
+	.endm
+#endif
+
+
 #endif	/* __ASM_ASSEMBLER_H */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC 5/6] arm64: sleep: Add stack frame setup for __cpu_supsend_enter
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
                   ` (3 preceding siblings ...)
  2019-04-09 13:52 ` [RFC 4/6] arm64: assembler: Add macro to annotate asm function having non standard stack-frame Raphael Gault
@ 2019-04-09 13:52 ` Raphael Gault
  2019-04-23 20:37   ` Josh Poimboeuf
  2019-04-09 13:52 ` [RFC 6/6] objtool: arm64: Enable stack validation for arm64 Raphael Gault
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Raphael Gault @ 2019-04-09 13:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon, julien.thierry,
	Raphael Gault

Annotate cpu_resume and _cpu_resume to silence objtool warning
about non-standard stack frame.

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 arch/arm64/kernel/sleep.S | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
index 3e53ffa07994..eb434525fe82 100644
--- a/arch/arm64/kernel/sleep.S
+++ b/arch/arm64/kernel/sleep.S
@@ -90,6 +90,7 @@ ENTRY(__cpu_suspend_enter)
 	str	x0, [x1]
 	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
 	stp	x29, lr, [sp, #-16]!
+	mov	x29, sp
 	bl	cpu_do_suspend
 	ldp	x29, lr, [sp], #16
 	mov	x0, #1
@@ -146,3 +147,6 @@ ENTRY(_cpu_resume)
 	mov	x0, #0
 	ret
 ENDPROC(_cpu_resume)
+
+	asm_stack_frame_non_standard cpu_resume
+	asm_stack_frame_non_standard _cpu_resume
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC 6/6] objtool: arm64: Enable stack validation for arm64
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
                   ` (4 preceding siblings ...)
  2019-04-09 13:52 ` [RFC 5/6] arm64: sleep: Add stack frame setup for __cpu_supsend_enter Raphael Gault
@ 2019-04-09 13:52 ` Raphael Gault
  2019-04-09 14:57 ` [PATCH 0/6] objtool: Add support for Arm64 Josh Poimboeuf
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Raphael Gault @ 2019-04-09 13:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon, julien.thierry,
	Raphael Gault

Signed-off-by: Raphael Gault <raphael.gault@arm.com>
---
 arch/arm64/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4168d366127..314ca1a3ea70 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -151,6 +151,7 @@ config ARM64
 	select HAVE_RCU_TABLE_INVALIDATE
 	select HAVE_RSEQ
 	select HAVE_STACKPROTECTOR
+	select HAVE_STACK_VALIDATION
 	select HAVE_SYSCALL_TRACEPOINTS
 	select HAVE_KPROBES
 	select HAVE_KRETPROBES
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/6] objtool: Add support for Arm64
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
                   ` (5 preceding siblings ...)
  2019-04-09 13:52 ` [RFC 6/6] objtool: arm64: Enable stack validation for arm64 Raphael Gault
@ 2019-04-09 14:57 ` Josh Poimboeuf
  2019-04-09 17:43 ` Ard Biesheuvel
  2019-04-23 21:09 ` Josh Poimboeuf
  8 siblings, 0 replies; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-09 14:57 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, catalin.marinas,
	will.deacon, julien.thierry

On Tue, Apr 09, 2019 at 02:52:37PM +0100, Raphael Gault wrote:
> Hi,
> 
> As of now, objtool only supports the x86_64 architecture but the
> groundwork has already been done in order to add support for other
> architecture without too much effort.

Hi Raphael,

This was a pleasant surprise in my inbox.  From a quick glance I'm
actually surprised at how "easy" it looks, though I can tell it was
quite a big effort.  Doing the first arch port is always the hardest.
Kudos!  I will give it a proper review soon.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-09 13:52 ` [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture Raphael Gault
@ 2019-04-09 16:12   ` Peter Zijlstra
  2019-04-09 16:24     ` Mark Rutland
  2019-04-23 20:36   ` Josh Poimboeuf
  2019-04-24 10:36   ` Julien Thierry
  2 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2019-04-09 16:12 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, jpoimboe, catalin.marinas,
	will.deacon, julien.thierry


I'm just doing my initial read-through,.. however

On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>  			continue;

could you please not format code like that. Operators go at the end of
the line, and continuation should match the indentation of the opening
paren. So the above would look like:

> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
> +		    (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>  			continue;

You appear to be doing that quit consistently, and it is against style.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool.
  2019-04-09 13:52 ` [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool Raphael Gault
@ 2019-04-09 16:20   ` Peter Zijlstra
  2019-04-23 20:18   ` Josh Poimboeuf
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-04-09 16:20 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, jpoimboe, catalin.marinas,
	will.deacon, julien.thierry

On Tue, Apr 09, 2019 at 02:52:39PM +0100, Raphael Gault wrote:

> The decoding of the instruction is split into classes and subclasses as described into
> the Instruction Encoding in the ArmV8.5 Architecture Reference Manual.

>  tools/objtool/arch/arm64/decode.c             | 2843 +++++++++++++++++

Oh man, I hope you generated much of that from the ARM64 Instruction Set
Architecture XML files [*]. Otherwise that's been a lot of typing.

Anyway, since you now have that, would it make sense to use this same
decoder in your kernel tree? I found at least one partial decoder in
arch/arm64/kernel/probes/decode-insn.c, but I suspect you have more.

(Note kprobes is how x86 initially grew its instruction decoder IIRC)


* https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-09 16:12   ` Peter Zijlstra
@ 2019-04-09 16:24     ` Mark Rutland
  2019-04-09 16:27       ` Julien Thierry
  0 siblings, 1 reply; 36+ messages in thread
From: Mark Rutland @ 2019-04-09 16:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Raphael Gault, linux-kernel, linux-arm-kernel, jpoimboe,
	catalin.marinas, will.deacon, julien.thierry

On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote:
> 
> I'm just doing my initial read-through,.. however
> 
> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
> > +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> > +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
> >  			continue;
> 
> could you please not format code like that. Operators go at the end of
> the line, and continuation should match the indentation of the opening
> paren. So the above would look like:
> 
> > +		if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
> > +		    (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
> >  			continue;
> 
> You appear to be doing that quit consistently, and it is against style.

Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like
this. You can run it over a list of patches, so for a patch series you
can run:

 $ ./scripts/checkpatch.pl *.patch

... and hopefully most of the output will be reasonable.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-09 16:24     ` Mark Rutland
@ 2019-04-09 16:27       ` Julien Thierry
  2019-04-09 16:33         ` Raphaël Gault
  0 siblings, 1 reply; 36+ messages in thread
From: Julien Thierry @ 2019-04-09 16:27 UTC (permalink / raw)
  To: Mark Rutland, Peter Zijlstra
  Cc: Raphael Gault, linux-kernel, linux-arm-kernel, jpoimboe,
	catalin.marinas, will.deacon



On 09/04/2019 17:24, Mark Rutland wrote:
> On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote:
>>
>> I'm just doing my initial read-through,.. however
>>
>> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
>>> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
>>> +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>>  			continue;
>>
>> could you please not format code like that. Operators go at the end of
>> the line, and continuation should match the indentation of the opening
>> paren. So the above would look like:
>>
>>> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
>>> +		    (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>>  			continue;
>>
>> You appear to be doing that quit consistently, and it is against style.
> 
> Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like
> this. You can run it over a list of patches, so for a patch series you
> can run:
> 
>  $ ./scripts/checkpatch.pl *.patch
> 
> ... and hopefully most of the output will be reasonable.
> 

For this particular case, checkpatch only warns about it if you pass it
"--strict" option. So in general it might be useful to include this
option at least for the first pass at including large pieces of code.

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-09 16:27       ` Julien Thierry
@ 2019-04-09 16:33         ` Raphaël Gault
  0 siblings, 0 replies; 36+ messages in thread
From: Raphaël Gault @ 2019-04-09 16:33 UTC (permalink / raw)
  To: Julien Thierry, Mark Rutland, Peter Zijlstra
  Cc: linux-kernel, linux-arm-kernel, jpoimboe, Catalin Marinas, Will Deacon

Hi,

On 4/9/19 5:27 PM, Julien Thierry wrote:
>
>
> On 09/04/2019 17:24, Mark Rutland wrote:
>> On Tue, Apr 09, 2019 at 06:12:04PM +0200, Peter Zijlstra wrote:
>>>
>>> I'm just doing my initial read-through,.. however
>>>
>>> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
>>>> +if (!(sec->sh.sh_flags & SHF_EXECINSTR)
>>>> +&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>>>   continue;
>>>
>>> could you please not format code like that. Operators go at the end of
>>> the line, and continuation should match the indentation of the opening
>>> paren. So the above would look like:
>>>
>>>> +if (!(sec->sh.sh_flags & SHF_EXECINSTR) &&
>>>> +    (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>>>   continue;
>>>
>>> You appear to be doing that quit consistently, and it is against style.

Thank you for these remarks, I will correct this!

>>
>> Raphael, as a heads-up, ./scripts/checkpatch.pl can catch issues like
>> this. You can run it over a list of patches, so for a patch series you
>> can run:
>>
>>   $ ./scripts/checkpatch.pl *.patch
>>
>> ... and hopefully most of the output will be reasonable.
>>
>
> For this particular case, checkpatch only warns about it if you pass it
> "--strict" option. So in general it might be useful to include this
> option at least for the first pass at including large pieces of code.
>

Indeed that sounds usefull, thanks,

> Cheers,
>

Cheers,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/6] objtool: Add support for Arm64
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
                   ` (6 preceding siblings ...)
  2019-04-09 14:57 ` [PATCH 0/6] objtool: Add support for Arm64 Josh Poimboeuf
@ 2019-04-09 17:43 ` Ard Biesheuvel
  2019-04-10  3:37   ` Josh Poimboeuf
  2019-04-23 21:09 ` Josh Poimboeuf
  8 siblings, 1 reply; 36+ messages in thread
From: Ard Biesheuvel @ 2019-04-09 17:43 UTC (permalink / raw)
  To: Raphael Gault
  Cc: Linux Kernel Mailing List, linux-arm-kernel, Julien Thierry,
	Peter Zijlstra, Catalin Marinas, Will Deacon, Josh Poimboeuf

On Tue, 9 Apr 2019 at 06:53, Raphael Gault <raphael.gault@arm.com> wrote:
>
> Hi,
>
> As of now, objtool only supports the x86_64 architecture but the
> groundwork has already been done in order to add support for other
> architecture without too much effort.
>
> This series of patches adds support for the arm64 architecture
> based on the Armv8.5 Architecture Reference Manual.
>

I think it makes sense to clarify *why* we want this on arm64. Also,
we should identify things that objtool does today that maybe we don't
want on arm64, rather than buy into all of it by default.

> * Patch 1 adapts the existing code to be able to add support for other
>   architecture.
> * Patch 2 provide implementation of the required function for the arm64
>   architecture.
> * Patch 3 adapts the checking of the stack state for the arm64
>   architecture.
> * Patch 4 & 5 fix some warning objtool raised in some particular
>   functions of ~/arch/arm64/kernel/sleep.S. Patch 4 add a macro to
>   signal that some function should be ignored by objtool.
> * Patch 6 enables stack validation for arm64.
>
> Theses patches should provide support for the main cases and behaviour.
> However a few corner cases are not yet handled by objtool:
>
> * In the `~/arch/arm64/crypto/` directory, I noticed that some plain
>   data are sometimes stored in the `.text` section causing objtool to mistake
>   this for instructions and trying (and failing) to interprete them.  If someone
>   could explain to me why we store data directly in the .text section I would
>   appreciate it.
>

Just for convenience, basically. Not specifying a section at all when
emitting a literal amounts to putting it into .text, and a a bonus, it
is guaranteed to be in range for a ADR instructions (which is not the
case when the code is built into the kernel rather than enabled as a
module)

Due to Spectre I have already updated some of the code so larger
tables are moved into .rodata (since literal data might contain binary
sequences that are usable as spectre gadgets), but some work remains
to be done.

> * In the support for arm32 architecture such as in `~/arch/arm64/kernel/kuser32.S`
>   some A32 instructions are used but such instructions are not understood by
>   objtool causing a warning.
>
> I also have a few unclear points I would like to bring to your
> attention:
>
> * For x86_64, when looking for a symbol relocation with explicit
>   addend, objtool systematically adds a +4 offset to the addend.
>   I don't understand why even if I have a feeling it is related
>   to the type of relacation.
>
> * I currently don't have a clear understanding about how switch-tables
>   are generated on arm64 and how to retrieve them (based on relocations).
>
> Please provide me with any feedback and comments as well on the content
> than the style of these patches.
>
> Thanks,
>
> Raphael
>
> ->
>
> Raphael Gault (6):
>   objtool: Refactor code to make it more suitable for multiple
>     architecture support
>   objtool: arm64: Add required implementation for supporting the aarch64
>     architecture in objtool.
>   objtool: arm64: Adapt the stack frame checks and the section analysis
>     for the arm architecture
>   arm64: assembler: Add macro to annotate asm function having non
>     standard stack-frame.
>   arm64: sleep: Add stack frame setup for __cpu_supsend_enter
>   objtool: arm64: Enable stack validation for arm64
>
>  arch/arm64/Kconfig                            |    1 +
>  arch/arm64/include/asm/assembler.h            |   18 +
>  arch/arm64/kernel/sleep.S                     |    4 +
>  tools/objtool/Build                           |    1 -
>  tools/objtool/arch.h                          |   11 +
>  tools/objtool/arch/arm64/Build                |    6 +
>  tools/objtool/arch/arm64/bit_operations.c     |   65 +
>  tools/objtool/arch/arm64/decode.c             | 2870 +++++++++++++++++
>  .../objtool/arch/arm64/include/arch_special.h |   44 +
>  .../arch/arm64/include/asm/orc_types.h        |  109 +
>  .../arch/arm64/include/bit_operations.h       |   22 +
>  tools/objtool/arch/arm64/include/cfi.h        |   76 +
>  .../objtool/arch/arm64/include/insn_decode.h  |  219 ++
>  tools/objtool/arch/arm64/orc_gen.c            |   40 +
>  tools/objtool/arch/x86/Build                  |    1 +
>  tools/objtool/arch/x86/decode.c               |  111 +
>  tools/objtool/arch/x86/include/arch_special.h |   35 +
>  tools/objtool/{ => arch/x86/include}/cfi.h    |    0
>  tools/objtool/{ => arch/x86}/orc_gen.c        |   10 +-
>  tools/objtool/check.c                         |  209 +-
>  tools/objtool/check.h                         |    1 +
>  tools/objtool/elf.c                           |    3 +-
>  tools/objtool/orc.h                           |    4 +-
>  tools/objtool/special.c                       |   18 +-
>  24 files changed, 3740 insertions(+), 138 deletions(-)
>  create mode 100644 tools/objtool/arch/arm64/Build
>  create mode 100644 tools/objtool/arch/arm64/bit_operations.c
>  create mode 100644 tools/objtool/arch/arm64/decode.c
>  create mode 100644 tools/objtool/arch/arm64/include/arch_special.h
>  create mode 100644 tools/objtool/arch/arm64/include/asm/orc_types.h
>  create mode 100644 tools/objtool/arch/arm64/include/bit_operations.h
>  create mode 100644 tools/objtool/arch/arm64/include/cfi.h
>  create mode 100644 tools/objtool/arch/arm64/include/insn_decode.h
>  create mode 100644 tools/objtool/arch/arm64/orc_gen.c
>  create mode 100644 tools/objtool/arch/x86/include/arch_special.h
>  rename tools/objtool/{ => arch/x86/include}/cfi.h (100%)
>  rename tools/objtool/{ => arch/x86}/orc_gen.c (96%)
>
> --
> 2.17.1
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/6] objtool: Add support for Arm64
  2019-04-09 17:43 ` Ard Biesheuvel
@ 2019-04-10  3:37   ` Josh Poimboeuf
  2019-04-10  7:20     ` Julien Thierry
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-10  3:37 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Raphael Gault, Linux Kernel Mailing List, linux-arm-kernel,
	Julien Thierry, Peter Zijlstra, Catalin Marinas, Will Deacon

On Tue, Apr 09, 2019 at 10:43:18AM -0700, Ard Biesheuvel wrote:
> On Tue, 9 Apr 2019 at 06:53, Raphael Gault <raphael.gault@arm.com> wrote:
> >
> > Hi,
> >
> > As of now, objtool only supports the x86_64 architecture but the
> > groundwork has already been done in order to add support for other
> > architecture without too much effort.
> >
> > This series of patches adds support for the arm64 architecture
> > based on the Armv8.5 Architecture Reference Manual.
> >
> 
> I think it makes sense to clarify *why* we want this on arm64. Also,
> we should identify things that objtool does today that maybe we don't
> want on arm64, rather than buy into all of it by default.

Agreed, the "why" should at least be in the cover letter.  From my
perspective, the "why" includes:

- Live patching - objtool stack validation is the foundation for a
  reliable unwinder

- ORC unwinder - benefits include presumed improved overall performance
  from disabling frame pointers, and the ability to unwind across
  interrupts and exceptions

- PeterZ's new uaccess validation?

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/6] objtool: Add support for Arm64
  2019-04-10  3:37   ` Josh Poimboeuf
@ 2019-04-10  7:20     ` Julien Thierry
  0 siblings, 0 replies; 36+ messages in thread
From: Julien Thierry @ 2019-04-10  7:20 UTC (permalink / raw)
  To: Josh Poimboeuf, Ard Biesheuvel
  Cc: Raphael Gault, Linux Kernel Mailing List, linux-arm-kernel,
	Peter Zijlstra, Catalin Marinas, Will Deacon



On 10/04/2019 04:37, Josh Poimboeuf wrote:
> On Tue, Apr 09, 2019 at 10:43:18AM -0700, Ard Biesheuvel wrote:
>> On Tue, 9 Apr 2019 at 06:53, Raphael Gault <raphael.gault@arm.com> wrote:
>>>
>>> Hi,
>>>
>>> As of now, objtool only supports the x86_64 architecture but the
>>> groundwork has already been done in order to add support for other
>>> architecture without too much effort.
>>>
>>> This series of patches adds support for the arm64 architecture
>>> based on the Armv8.5 Architecture Reference Manual.
>>>
>>
>> I think it makes sense to clarify *why* we want this on arm64. Also,
>> we should identify things that objtool does today that maybe we don't
>> want on arm64, rather than buy into all of it by default.
> 
> Agreed, the "why" should at least be in the cover letter.  From my
> perspective, the "why" includes:
> 
> - Live patching - objtool stack validation is the foundation for a
>   reliable unwinder
> 

Yes, as I understand Live patching is a work in progress for arm64.
Having objtool to provide more guarantees would be nice.

> - ORC unwinder - benefits include presumed improved overall performance
>   from disabling frame pointers, and the ability to unwind across
>   interrupts and exceptions
> 

I'm unsure this will be part of the plan. I believe so far arm64 code
heavily relies on the presence of frame pointers. It's also part of the
Aarch64 Procedure Call Standard.

But who knows.

> - PeterZ's new uaccess validation?
> 

Yes, we've reverted twice our implementation of user_access_begin/end()
on arm64 because of the issue of potentially calling sleeping functions.

Once we have the base of objtool work, this would be one of the next
work items.

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support
  2019-04-09 13:52 ` [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support Raphael Gault
@ 2019-04-23 20:13   ` Josh Poimboeuf
  2019-04-24 16:11     ` Raphael Gault
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-23 20:13 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, catalin.marinas,
	will.deacon, julien.thierry

On Tue, Apr 09, 2019 at 02:52:38PM +0100, Raphael Gault wrote:
> The jump destination and relocation offset used previously are only reliable
> on x86_64 architecture. We abstract these computations by calling arch-dependant

"dependent"

> implementation.
> 
> The control flow information and register macro definitions were based on
> the x86_64 architecture but should be abstract so that each architecture
> can define the correct values for the registers, especially the registers
> related to the stack frame (Frame Pointer, Stack Pointer and possibly
> Return Address).
> 
> The ORC unwinder is only supported on x86 at the moment and should thus be
> in the x86 architecture code. In order not to break the whole structure in
> case another architecture decides to support the ORC unwinder via objtool
> we choose to let the implementation be done in the architecture dependant
> code.

It's good practice to put each logical change into a separate patch.
That will also make the patches easier to review.

For example, each of the above three paragraphs should be a separate
commit.

Also it's a good idea to run the patches through checkpatch.pl (though
its warnings should be taken with a grain of salt).

> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> ---
>  tools/objtool/Build                           |   1 -
>  tools/objtool/arch.h                          |   9 ++
>  tools/objtool/arch/x86/Build                  |   1 +
>  tools/objtool/arch/x86/decode.c               | 106 ++++++++++++++++
>  tools/objtool/arch/x86/include/arch_special.h |  35 ++++++
>  tools/objtool/{ => arch/x86/include}/cfi.h    |   0
>  tools/objtool/{ => arch/x86}/orc_gen.c        |  10 +-
>  tools/objtool/check.c                         | 114 ++----------------
>  tools/objtool/check.h                         |   1 +
>  tools/objtool/orc.h                           |   4 +-
>  tools/objtool/special.c                       |  18 +--
>  11 files changed, 173 insertions(+), 126 deletions(-)
>  create mode 100644 tools/objtool/arch/x86/include/arch_special.h
>  rename tools/objtool/{ => arch/x86/include}/cfi.h (100%)
>  rename tools/objtool/{ => arch/x86}/orc_gen.c (96%)
> 
> diff --git a/tools/objtool/Build b/tools/objtool/Build
> index 749becdf5b90..ec925d49565a 100644
> --- a/tools/objtool/Build
> +++ b/tools/objtool/Build
> @@ -2,7 +2,6 @@ objtool-y += arch/$(SRCARCH)/
>  objtool-y += builtin-check.o
>  objtool-y += builtin-orc.o
>  objtool-y += check.o
> -objtool-y += orc_gen.o

I'm not sure whether moving ORC out to the arch-specific code is the
best option.  I expect a lot of the ORC code to be generic.  But this
might be ok for now, until we get another ORC implementation.

Another possibility would be to make weak versions of the orc functions
somewhere (check.c?) and only compile the generic orc_gen.c on arches
which support it.  Then we could abstract out the arch-specific ORC bits
later.

>  objtool-y += orc_dump.o

orc_dump.o doesn't need to be built on arm64.  The "orc dump" option
should fail accordingly.

>  objtool-y += elf.o
>  objtool-y += special.o
> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
> index b0d7dc3d71b5..0eff166ca613 100644
> --- a/tools/objtool/arch.h
> +++ b/tools/objtool/arch.h
> @@ -22,6 +22,7 @@
>  #include <linux/list.h>
>  #include "elf.h"
>  #include "cfi.h"
> +#include "orc.h"
>  
>  #define INSN_JUMP_CONDITIONAL	1
>  #define INSN_JUMP_UNCONDITIONAL	2
> @@ -70,6 +71,8 @@ struct stack_op {
>  	struct op_src src;
>  };
>  
> +struct instruction;
> +
>  void arch_initial_func_cfi_state(struct cfi_state *state);
>  
>  int arch_decode_instruction(struct elf *elf, struct section *sec,
> @@ -79,4 +82,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
>  
>  bool arch_callee_saved_reg(unsigned char reg);
>  
> +int arch_orc_read_unwind_hints(struct objtool_file *file);
> +
> +unsigned long arch_compute_jump_destination(struct instruction *insn);
> +
> +unsigned long arch_compute_rela_sym_offset(int addend);

arch_dest_rela_addend_offset() might be a more descriptive name.  Also
it might be simpler to just make it an arch-specific macro which is 0 on
arm64 and 4 on x86.

"compute" is implied, it can probably be removed from the names to make
them a little more concise.

> +
>  #endif /* _ARCH_H */
> diff --git a/tools/objtool/arch/x86/Build b/tools/objtool/arch/x86/Build
> index b998412c017d..74015be53ef0 100644
> --- a/tools/objtool/arch/x86/Build
> +++ b/tools/objtool/arch/x86/Build
> @@ -1,4 +1,5 @@
>  objtool-y += decode.o
> +objtool-y += orc_gen.o
>  
>  inat_tables_script = arch/x86/tools/gen-insn-attr-x86.awk
>  inat_tables_maps = arch/x86/lib/x86-opcode-map.txt
> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> index 540a209b78ab..1af7b4996307 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -23,6 +23,8 @@
>  #include "lib/inat.c"
>  #include "lib/insn.c"
>  
> +
> +#include "../../check.h"
>  #include "../../elf.h"
>  #include "../../arch.h"
>  #include "../../warn.h"
> @@ -78,6 +80,105 @@ bool arch_callee_saved_reg(unsigned char reg)
>  	}
>  }
>  
> +unsigned long arch_compute_rela_sym_offset(int addend)
> +{
> +	return addend + 4;
> +}
> +
> +int arch_orc_read_unwind_hints(struct objtool_file *file)

I think this function would be better suited for orc_gen.c (which could
be renamed to just orc.c).

> diff --git a/tools/objtool/arch/x86/include/arch_special.h b/tools/objtool/arch/x86/include/arch_special.h
> new file mode 100644
> index 000000000000..bd91b1096359
> --- /dev/null
> +++ b/tools/objtool/arch/x86/include/arch_special.h
> @@ -0,0 +1,35 @@
> +/*
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */

This needs the standard header macro guards, e.g.

#ifndef _X86_ARCH_SPECIAL_H
#define _X86_ARCH_SPECIAL_H

> +
> +#define EX_ENTRY_SIZE		12
> +#define EX_ORIG_OFFSET		0
> +#define EX_NEW_OFFSET		4
> +
> +#define JUMP_ENTRY_SIZE		16
> +#define JUMP_ORIG_OFFSET	0
> +#define JUMP_NEW_OFFSET		4
> +
> +#define ALT_ENTRY_SIZE		13
> +#define ALT_ORIG_OFFSET		0
> +#define ALT_NEW_OFFSET		4
> +#define ALT_FEATURE_OFFSET	8
> +#define ALT_ORIG_LEN_OFFSET	10
> +#define ALT_NEW_LEN_OFFSET	11
> +
> +#define IGNORE_SHF_EXEC_FLAG	0
> +
> +#define JUMP_DYNAMIC_IS_SWITCH_TABLE	0

These flags should be added with the commit which actually uses them.

Also "arch_special.h" is specific to special section parsing, so I'm
thinking these two macros don't really belong here.  Or maybe the header
file could be renamed to something more generic.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool.
  2019-04-09 13:52 ` [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool Raphael Gault
  2019-04-09 16:20   ` Peter Zijlstra
@ 2019-04-23 20:18   ` Josh Poimboeuf
  2019-04-24 16:16     ` Raphael Gault
  1 sibling, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-23 20:18 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, catalin.marinas,
	will.deacon, julien.thierry

On Tue, Apr 09, 2019 at 02:52:39PM +0100, Raphael Gault wrote:
> Provide implementation for the arch-dependent functions that are called by the main check
> function of objtool.
> The ORC unwinder is not yet supported by the arm64 architecture so we only provide a dummy
> interface for now.
> The decoding of the instruction is split into classes and subclasses as described into
> the Instruction Encoding in the ArmV8.5 Architecture Reference Manual.

Where did the code for the decoder come from?  Was it written from
scratch?

> diff --git a/tools/objtool/arch/arm64/include/arch_special.h b/tools/objtool/arch/arm64/include/arch_special.h
> new file mode 100644
> index 000000000000..54bcce4c58c0
> --- /dev/null
> +++ b/tools/objtool/arch/arm64/include/arch_special.h
> @@ -0,0 +1,44 @@
> +/*
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +

Needs a header guard.

> +#define EX_ENTRY_SIZE		8
> +#define EX_ORIG_OFFSET		0
> +#define EX_NEW_OFFSET		4
> +
> +#define JUMP_ENTRY_SIZE		16
> +#define JUMP_ORIG_OFFSET	0
> +#define JUMP_NEW_OFFSET		4
> +
> +#define ALT_ENTRY_SIZE		12
> +#define ALT_ORIG_OFFSET		0
> +#define ALT_NEW_OFFSET		4
> +#define ALT_FEATURE_OFFSET	8
> +#define ALT_ORIG_LEN_OFFSET	10
> +#define ALT_NEW_LEN_OFFSET	11
> +
> +/*
> + * On arm64 the .altinstr_replacement is not always marked
> + * as containing executable instruction. But we still want
> + * to process it so we ignore the SHF_EXEC flag
> + */
> +#define IGNORE_SHF_EXEC_FLAG	1
> +
> +/*
> + * The jump table detection is not the same on arm64 so for
> + * now we just detect if it is a dynamic jump (br <Xn> insn)
> + */
> +#define JUMP_DYNAMIC_IS_SWITCH_TABLE	1

Same as for x86, these flags should be added in the same patch which
uses them.

> +
> +#define X86_FEATURE_POPCNT (4*32+23)
> diff --git a/tools/objtool/arch/arm64/include/asm/orc_types.h b/tools/objtool/arch/arm64/include/asm/orc_types.h
> new file mode 100644
> index 000000000000..46f516dd80ce
> --- /dev/null
> +++ b/tools/objtool/arch/arm64/include/asm/orc_types.h
> @@ -0,0 +1,109 @@
> +/*
> + * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version 2
> + * of the License, or (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#ifndef _ORC_TYPES_H
> +#define _ORC_TYPES_H
> +
> +#include <linux/types.h>
> +#include <linux/compiler.h>
> +
> +/*
> + * The ORC_REG_* registers are base registers which are used to find other
> + * registers on the stack.
> + *
> + * ORC_REG_PREV_SP, also known as DWARF Call Frame Address (CFA), is the
> + * address of the previous frame: the caller's SP before it called the current
> + * function.
> + *
> + * ORC_REG_UNDEFINED means the corresponding register's value didn't change in
> + * the current frame.
> + *
> + * The most commonly used base registers are SP and BP -- which the previous SP
> + * is usually based on -- and PREV_SP and UNDEFINED -- which the previous BP is
> + * usually based on.
> + *
> + * The rest of the base registers are needed for special cases like entry code
> + * and GCC realigned stacks.
> + */
> +#define ORC_REG_UNDEFINED		0
> +#define ORC_REG_PREV_SP			1
> +#define ORC_REG_DX			2
> +#define ORC_REG_DI			3
> +#define ORC_REG_BP			4
> +#define ORC_REG_SP			5
> +#define ORC_REG_R10			6
> +#define ORC_REG_R13			7
> +#define ORC_REG_BP_INDIRECT		8
> +#define ORC_REG_SP_INDIRECT		9
> +#define ORC_REG_MAX			15
> +
> +/*
> + * ORC_TYPE_CALL: Indicates that sp_reg+sp_offset resolves to PREV_SP (the
> + * caller's SP right before it made the call).  Used for all callable
> + * functions, i.e. all C code and all callable asm functions.
> + *
> + * ORC_TYPE_REGS: Used in entry code to indicate that sp_reg+sp_offset points
> + * to a fully populated pt_regs from a syscall, interrupt, or exception.
> + *
> + * ORC_TYPE_REGS_IRET: Used in entry code to indicate that sp_reg+sp_offset
> + * points to the iret return frame.
> + *
> + * The UNWIND_HINT macros are used only for the unwind_hint struct.  They
> + * aren't used in struct orc_entry due to size and complexity constraints.
> + * Objtool converts them to real types when it converts the hints to orc
> + * entries.
> + */
> +#define ORC_TYPE_CALL			0
> +#define ORC_TYPE_REGS			1
> +#define ORC_TYPE_REGS_IRET		2
> +#define UNWIND_HINT_TYPE_SAVE		3
> +#define UNWIND_HINT_TYPE_RESTORE	4
> +
> +#ifndef __ASSEMBLY__
> +/*
> + * This struct is more or less a vastly simplified version of the DWARF Call
> + * Frame Information standard.  It contains only the necessary parts of DWARF
> + * CFI, simplified for ease of access by the in-kernel unwinder.  It tells the
> + * unwinder how to find the previous SP and BP (and sometimes entry regs) on
> + * the stack for a given code address.  Each instance of the struct corresponds
> + * to one or more code locations.
> + */
> +struct orc_entry {
> +	s16		sp_offset;
> +	s16		bp_offset;
> +	unsigned	sp_reg:4;
> +	unsigned	bp_reg:4;
> +	unsigned	type:2;
> +	unsigned	end:1;
> +} __packed;
> +
> +/*
> + * This struct is used by asm and inline asm code to manually annotate the
> + * location of registers on the stack for the ORC unwinder.
> + *
> + * Type can be either ORC_TYPE_* or UNWIND_HINT_TYPE_*.
> + */
> +struct unwind_hint {
> +	u32		ip;
> +	s16		sp_offset;
> +	u8		sp_reg;
> +	u8		type;
> +	u8		end;
> +};
> +#endif /* __ASSEMBLY__ */
> +
> +#endif /* _ORC_TYPES_H */

It seems odd to have the above header file in arm64 code, since it
doesn't implement ORC.  Is it really needed?

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-09 13:52 ` [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture Raphael Gault
  2019-04-09 16:12   ` Peter Zijlstra
@ 2019-04-23 20:36   ` Josh Poimboeuf
  2019-04-24 16:32     ` Raphael Gault
  2019-04-24 10:36   ` Julien Thierry
  2 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-23 20:36 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, catalin.marinas,
	will.deacon, julien.thierry

On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
> Since the way the initial stack frame when entering a function is different that what is done
> in the x86_64 architecture, we need to add some more check to support the different cases.
> As opposed as for x86_64, the return address is not stored by the call instruction but is instead
> loaded in a register. The initial stack frame is thus empty when entering a function and 2 push
> operations are needed to set it up correctly. All the different combinations need to be
> taken into account.
> 
> On arm64, the .altinstr_replacement section is not flagged as containing executable instructions
> but we still need to process it.
> 
> Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able
> to identify in which case we are when looking for it.
> 
> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> ---
>  tools/objtool/arch.h              |  2 +
>  tools/objtool/arch/arm64/decode.c | 27 +++++++++
>  tools/objtool/arch/x86/decode.c   |  5 ++
>  tools/objtool/check.c             | 95 +++++++++++++++++++++++++++----
>  tools/objtool/elf.c               |  3 +-
>  5 files changed, 120 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
> index 0eff166ca613..f3bef3f2cef3 100644
> --- a/tools/objtool/arch.h
> +++ b/tools/objtool/arch.h
> @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn);
>  
>  unsigned long arch_compute_rela_sym_offset(int addend);
>  
> +bool arch_is_insn_sibling_call(struct instruction *insn);
> +
>  #endif /* _ARCH_H */
> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
> index 0feb3ae3af5d..8b293eae2b38 100644
> --- a/tools/objtool/arch/arm64/decode.c
> +++ b/tools/objtool/arch/arm64/decode.c
> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>  	return addend;
>  }
>  
> +/*
> + * In order to know if we are in presence of a sibling
> + * call and not in presence of a switch table we look
> + * back at the previous instructions and see if we are
> + * jumping inside the same function that we are already
> + * in.
> + */
> +bool arch_is_insn_sibling_call(struct instruction *insn)
> +{
> +	struct instruction *prev;
> +	struct list_head *l;
> +	struct symbol *sym;
> +	list_for_each_prev(l, &insn->list) {
> +		prev = (void *)l;
> +		if (!prev->func
> +			|| prev->func->pfunc != insn->func->pfunc)
> +			return false;
> +		if (prev->stack_op.src.reg != ADR_SOURCE)
> +			continue;
> +		sym = find_symbol_containing(insn->sec, insn->immediate);
> +		if (!sym || sym->type != STT_FUNC
> +			|| sym->pfunc != insn->func->pfunc)
> +			return true;
> +		break;
> +	}
> +	return true;
> +}

I get the feeling there might be a better way to do this, but I can't
figure out what this function is actually doing.  It looks like it
searches backwards in the function for an instruction which has
stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
it do anything with the instruction after it finds it?

>  static int is_arm64(struct elf *elf)
>  {
>  	switch(elf->ehdr.e_machine){
> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> index 1af7b4996307..88c3d99c76be 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>  	return addend + 4;
>  }
>  
> +bool arch_is_insn_sibling_call(struct instruction *insn)
> +{
> +	return true;
> +}

All x86 instructions are sibling calls?

> +
>  int arch_orc_read_unwind_hints(struct objtool_file *file)
>  {
>  	struct section *sec, *relasec;
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 17fcd8c8f9c1..fa6106214318 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file)
>  	unsigned long offset;
>  	struct instruction *insn;
>  	int ret;
> +	static int composed_insn = 0;
>  
>  	for_each_sec(file, sec) {
>  
> -		if (!(sec->sh.sh_flags & SHF_EXECINSTR))
> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>  			continue;

We should just fix the root cause instead, presumably:

diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
index b9f8d787eea9..e9e6b81e3eb4 100644
--- a/arch/arm64/include/asm/alternative.h
+++ b/arch/arm64/include/asm/alternative.h
@@ -71,7 +71,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
 	ALTINSTR_ENTRY(feature,cb)					\
 	".popsection\n"							\
 	" .if " __stringify(cb) " == 0\n"				\
-	".pushsection .altinstr_replacement, \"a\"\n"			\
+	".pushsection .altinstr_replacement, \"ax\"\n"			\
 	"663:\n\t"							\
 	newinstr "\n"							\
 	"664:\n\t"							\


>  
>  		if (strcmp(sec->name, ".altinstr_replacement") &&
> @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file)
>  				WARN_FUNC("invalid instruction type %d",
>  					  insn->sec, insn->offset, insn->type);
>  				ret = -1;
> -				goto err;
> +				free(insn);
> +				continue;

What's the purpose of this change?  If it's really needed then it looks
like it should be a separate patch.

>  			}
> -
> -			hash_add(file->insn_hash, &insn->hash, insn->offset);
> +			/*
> +			 * For arm64 architecture, we sometime split instructions so that
> +			 * we can track the state evolution (i.e. load/store of pairs of registers).
> +			 * We thus need to take both into account and not erase the previous ones.
> +			 */

Ew...  Is this an architectural thing, or just a quirk of the arm64
decoder?

> +			if (composed_insn > 0)
> +				hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn);
> +			else
> +				hash_add(file->insn_hash, &insn->hash, insn->offset);
> +			if (insn->len == 0)
> +				composed_insn++;
> +			else
> +				composed_insn = 0;
>  			list_add_tail(&insn->list, &file->insn_list);
>  		}
>  
> @@ -510,10 +524,10 @@ static int add_jump_destinations(struct objtool_file *file)
>  			dest_off = arch_compute_jump_destination(insn);
>  		} else if (rela->sym->type == STT_SECTION) {
>  			dest_sec = rela->sym->sec;
> -			dest_off = rela->addend + 4;
> +			dest_off = arch_compute_rela_sym_offset(rela->addend);
>  		} else if (rela->sym->sec->idx) {
>  			dest_sec = rela->sym->sec;
> -			dest_off = rela->sym->sym.st_value + rela->addend + 4;
> +			dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend);
>  		} else if (strstr(rela->sym->name, "_indirect_thunk_")) {
>  			/*
>  			 * Retpoline jumps are really dynamic jumps in
> @@ -663,7 +677,7 @@ static int handle_group_alt(struct objtool_file *file,
>  		last_orig_insn = insn;
>  	}
>  
> -	if (next_insn_same_sec(file, last_orig_insn)) {
> +	if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) {

This might belong in a separate patch which explains the reason for the
change.

>  		fake_jump = malloc(sizeof(*fake_jump));
>  		if (!fake_jump) {
>  			WARN("malloc failed");
> @@ -976,6 +990,17 @@ static struct rela *find_switch_table(struct objtool_file *file,
>  		if (find_symbol_containing(rodata_sec, table_offset))
>  			continue;
>  
> +		/*
> +		 * If we are on arm64 architecture, we now that we

"know"

> +		 * are in presence of a switch table thanks to
> +		 * the `br <Xn>` insn. but we can't retrieve it yet.
> +		 * So we just ignore unreachable for this file.
> +		 */
> +		if (JUMP_DYNAMIC_IS_SWITCH_TABLE) {
> +			file->ignore_unreachables = true;
> +			return NULL;
> +		}
> +

I think this means switch table reading is not yet supported?  If so
then maybe the flag should be called SWITCH_TABLE_NOT_SUPPORTED.

But really this needs to be fixed anyway before merging the code.

>  		rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
>  		if (rodata_rela) {
>  			/*
> @@ -1258,8 +1283,8 @@ static void save_reg(struct insn_state *state, unsigned char reg, int base,
>  
>  static void restore_reg(struct insn_state *state, unsigned char reg)
>  {
> -	state->regs[reg].base = CFI_UNDEFINED;
> -	state->regs[reg].offset = 0;
> +	state->regs[reg].base = initial_func_cfi.regs[reg].base;
> +	state->regs[reg].offset = initial_func_cfi.regs[reg].offset;
>  }
>  
>  /*
> @@ -1415,8 +1440,33 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>  
>  				/* add imm, %rsp */
>  				state->stack_size -= op->src.offset;
> -				if (cfa->base == CFI_SP)
> +				if (cfa->base == CFI_SP) {
>  					cfa->offset -= op->src.offset;
> +					if (state->stack_size == 0
> +							&& initial_func_cfi.cfa.base == CFI_CFA) {
> +						cfa->base = CFI_CFA;
> +						cfa->offset = 0;
> +					}
> +				}
> +				/*
> +				 * on arm64 the save/restore of sp into fp is not automatic
> +				 * and the first one can be done without the other so we
> +				 * need to be careful not to invalidate the stack frame in such
> +				 * cases.
> +				 */
> +				else if (cfa->base == CFI_BP) {
> +					if (state->stack_size == 0
> +							&& initial_func_cfi.cfa.base == CFI_CFA) {
> +						cfa->base = CFI_CFA;
> +						cfa->offset = 0;
> +						restore_reg(state, CFI_BP);
> +					}
> +				}
> +				else if (cfa->base == CFI_CFA) {
> +					cfa->base = CFI_SP;
> +					if (state->stack_size >= 16)
> +						cfa->offset = 16;
> +				}
>  				break;
>  			}
>  
> @@ -1427,6 +1477,16 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>  				break;
>  			}
>  
> +			if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP &&
> +			    cfa->base == CFI_SP &&
> +			    regs[CFI_BP].base == CFI_CFA &&
> +			    regs[CFI_BP].offset == -cfa->offset) {
> +
> +				/* mov %rsp, %rbp */
> +				cfa->base = op->dest.reg;
> +				state->bp_scratch = false;
> +				break;
> +			}
>  			if (op->src.reg == CFI_SP && cfa->base == CFI_SP) {
>  
>  				/* drap: lea disp(%rsp), %drap */
> @@ -1518,6 +1578,9 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>  			state->stack_size -= 8;
>  			if (cfa->base == CFI_SP)
>  				cfa->offset -= 8;
> +			if (cfa->base == CFI_SP && cfa->offset == 0
> +					&& initial_func_cfi.cfa.base == CFI_CFA)
> +				cfa->base = CFI_CFA;
>  
>  			break;
>  
> @@ -1557,6 +1620,8 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>  
>  	case OP_DEST_PUSH:
>  		state->stack_size += 8;
> +		if (cfa->base == CFI_CFA)
> +			cfa->base = CFI_SP;
>  		if (cfa->base == CFI_SP)
>  			cfa->offset += 8;
>  
> @@ -1728,7 +1793,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
>  	insn = first;
>  	sec = insn->sec;
>  
> -	if (insn->alt_group && list_empty(&insn->alts)) {
> +	if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) {

Why?  This looks like another one that might belong in a separate patch.

>  		WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
>  			  sec, insn->offset);
>  		return 1;
> @@ -1871,6 +1936,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
>  		case INSN_JUMP_DYNAMIC:
>  			if (func && list_empty(&insn->alts) &&
>  			    has_modified_stack_frame(&state)) {
> +				/*
> +				 * On arm64 `br <Xn>` insn can be used for switch-tables
> +				 * but it cannot be distinguished in itself from a sibling
> +				 * call thus we need to have a look at the previous instructions
> +				 * to determine which it is
> +				 */
> +				if (!arch_is_insn_sibling_call(insn))
> +					break;
>  				WARN_FUNC("sibling call from callable instruction with modified stack frame",
>  					  sec, insn->offset);
>  				return 1;
> diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
> index b8f3cca8e58b..136f9b9fb1d1 100644
> --- a/tools/objtool/elf.c
> +++ b/tools/objtool/elf.c
> @@ -74,7 +74,8 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset)
>  	struct symbol *sym;
>  
>  	list_for_each_entry(sym, &sec->symbol_list, list)
> -		if (sym->type != STT_SECTION &&
> +		if (sym->type != STT_NOTYPE &&
> +		    sym->type != STT_SECTION &&

Why?  Another potential separate patch.

>  		    sym->offset == offset)
>  			return sym;
>  
> -- 
> 2.17.1
> 

-- 
Josh

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC 5/6] arm64: sleep: Add stack frame setup for __cpu_supsend_enter
  2019-04-09 13:52 ` [RFC 5/6] arm64: sleep: Add stack frame setup for __cpu_supsend_enter Raphael Gault
@ 2019-04-23 20:37   ` Josh Poimboeuf
  0 siblings, 0 replies; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-23 20:37 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, catalin.marinas,
	will.deacon, julien.thierry

In $SUBJECT, s/supsend/suspend/.

On Tue, Apr 09, 2019 at 02:52:42PM +0100, Raphael Gault wrote:
> Annotate cpu_resume and _cpu_resume to silence objtool warning
> about non-standard stack frame.
> 
> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> ---
>  arch/arm64/kernel/sleep.S | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/kernel/sleep.S b/arch/arm64/kernel/sleep.S
> index 3e53ffa07994..eb434525fe82 100644
> --- a/arch/arm64/kernel/sleep.S
> +++ b/arch/arm64/kernel/sleep.S
> @@ -90,6 +90,7 @@ ENTRY(__cpu_suspend_enter)
>  	str	x0, [x1]
>  	add	x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS
>  	stp	x29, lr, [sp, #-16]!
> +	mov	x29, sp

Why is it changing the actual code?  The patch description indicates
that it's only adding annotations.

>  	bl	cpu_do_suspend
>  	ldp	x29, lr, [sp], #16
>  	mov	x0, #1
> @@ -146,3 +147,6 @@ ENTRY(_cpu_resume)
>  	mov	x0, #0
>  	ret
>  ENDPROC(_cpu_resume)
> +
> +	asm_stack_frame_non_standard cpu_resume
> +	asm_stack_frame_non_standard _cpu_resume
> -- 
> 2.17.1
> 

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/6] objtool: Add support for Arm64
  2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
                   ` (7 preceding siblings ...)
  2019-04-09 17:43 ` Ard Biesheuvel
@ 2019-04-23 21:09 ` Josh Poimboeuf
  2019-04-24 16:08   ` Raphael Gault
  8 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-23 21:09 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, catalin.marinas,
	will.deacon, julien.thierry

On Tue, Apr 09, 2019 at 02:52:37PM +0100, Raphael Gault wrote:
> Hi,
> 
> As of now, objtool only supports the x86_64 architecture but the
> groundwork has already been done in order to add support for other
> architecture without too much effort.
> 
> This series of patches adds support for the arm64 architecture
> based on the Armv8.5 Architecture Reference Manual.
> 
> * Patch 1 adapts the existing code to be able to add support for other
>   architecture.
> * Patch 2 provide implementation of the required function for the arm64
>   architecture.
> * Patch 3 adapts the checking of the stack state for the arm64
>   architecture.
> * Patch 4 & 5 fix some warning objtool raised in some particular
>   functions of ~/arch/arm64/kernel/sleep.S. Patch 4 add a macro to
>   signal that some function should be ignored by objtool. 
> * Patch 6 enables stack validation for arm64.
> 
> Theses patches should provide support for the main cases and behaviour.
> However a few corner cases are not yet handled by objtool:
> 
> * In the `~/arch/arm64/crypto/` directory, I noticed that some plain
>   data are sometimes stored in the `.text` section causing objtool to mistake
>   this for instructions and trying (and failing) to interprete them.  If someone
>   could explain to me why we store data directly in the .text section I would
>   appreciate it.

I haven't looked, but it should probably be moved to .rodata.  We had
cases like that for x86.

> * In the support for arm32 architecture such as in `~/arch/arm64/kernel/kuser32.S`
>   some A32 instructions are used but such instructions are not understood by
>   objtool causing a warning.
> 
> I also have a few unclear points I would like to bring to your
> attention:
> 
> * For x86_64, when looking for a symbol relocation with explicit
>   addend, objtool systematically adds a +4 offset to the addend.
>   I don't understand why even if I have a feeling it is related
>   to the type of relacation.

This is because of how relative call/jump addresses are implemented on
x86.  It calculates the call/jump destination by adding the encoded offset
to the *ending* address of the instruction, rather than to the address
of the encoded offset itself.

For example:

  119ca0:       e8 00 00 00 00          callq  119ca5 <__ia32_sys_sched_rr_get_interval+0x5>
                        119ca1: R_X86_64_PC32   __fentry__-0x4

This instruction is a call to the __fentry__ function.  The rela addend
is the address of the destination function (__fentry__) minus 4.  After
applying the relocation, it resolves to:

ffffffff81002010:       e8 eb f7 9f 00          callq  ffffffff81a01800 <__fentry__>

The destination address is "0x9ff7eb", which is indeed __fentry__ - 4.

x86 expects it to be that way, because the x86 CPU adds the offset to
the *end* of the instruction: ffffffff81002015 (last digit is 5, not 0).

0xffffffff81002015 + 0x9ff7eb = 0xfffffff81a01800, which is indeed the
address of __fentry__.

And there's always a 4-byte gap between the relocation target and the
end of the instruction, so the rela addend always has the -4.  So when
reading the relocation we have to add the 4 bytes back to get the actual
destination address.

> * I currently don't have a clear understanding about how switch-tables
>   are generated on arm64 and how to retrieve them (based on relocations).

This is indeed a bit tricky on x86.

> Please provide me with any feedback and comments as well on the content
> than the style of these patches.

Overall it looks like a great start.  I added some per-patch comments.

Has the cross-compile path been tested?  Specifically, compiling for a
arm64 target on an x86 host?  In other words, objtool would be an x86
binary which reads arm64 objects.  I imagine that will be a semi-common
use case.  Objtool already supports cross-compiling for an x86-64 target
on an x86-32 host (and also a powerpc host IIRC), so it should be
do-able in theory, and it might "just work".

For the next version, please base it on the -tip tree, as that's where
all the latest objtool changes are.  Peter refactored some code which
has some minor conflicts with yours.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-09 13:52 ` [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture Raphael Gault
  2019-04-09 16:12   ` Peter Zijlstra
  2019-04-23 20:36   ` Josh Poimboeuf
@ 2019-04-24 10:36   ` Julien Thierry
  2 siblings, 0 replies; 36+ messages in thread
From: Julien Thierry @ 2019-04-24 10:36 UTC (permalink / raw)
  To: Raphael Gault, linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon

Hi Raphaël,

I think you could split the patch in at least 3:

- Handling the split instructions
- Handling the jump offset
- Dynamic jumps/switch table


On 09/04/2019 14:52, Raphael Gault wrote:
> Since the way the initial stack frame when entering a function is different that what is done

Nit: "different from what is done"

> in the x86_64 architecture, we need to add some more check to support the different cases.
> As opposed as for x86_64, the return address is not stored by the call instruction but is instead
> loaded in a register. The initial stack frame is thus empty when entering a function and 2 push
> operations are needed to set it up correctly. All the different combinations need to be
> taken into account.
> 
> On arm64, the .altinstr_replacement section is not flagged as containing executable instructions
> but we still need to process it.
> 
> Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able

Nit: also

> to identify in which case we are when looking for it.
> 
> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
> ---
>  tools/objtool/arch.h              |  2 +
>  tools/objtool/arch/arm64/decode.c | 27 +++++++++
>  tools/objtool/arch/x86/decode.c   |  5 ++
>  tools/objtool/check.c             | 95 +++++++++++++++++++++++++++----
>  tools/objtool/elf.c               |  3 +-
>  5 files changed, 120 insertions(+), 12 deletions(-)
> 
> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
> index 0eff166ca613..f3bef3f2cef3 100644
> --- a/tools/objtool/arch.h
> +++ b/tools/objtool/arch.h
> @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn);
>  
>  unsigned long arch_compute_rela_sym_offset(int addend);
>  
> +bool arch_is_insn_sibling_call(struct instruction *insn);
> +
>  #endif /* _ARCH_H */
> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
> index 0feb3ae3af5d..8b293eae2b38 100644
> --- a/tools/objtool/arch/arm64/decode.c
> +++ b/tools/objtool/arch/arm64/decode.c
> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>  	return addend;
>  }
>  
> +/*
> + * In order to know if we are in presence of a sibling
> + * call and not in presence of a switch table we look
> + * back at the previous instructions and see if we are
> + * jumping inside the same function that we are already
> + * in.
> + */
> +bool arch_is_insn_sibling_call(struct instruction *insn)
> +{
> +	struct instruction *prev;
> +	struct list_head *l;
> +	struct symbol *sym;
> +	list_for_each_prev(l, &insn->list) {
> +		prev = (void *)l;

Please use list_entry() instead of casting, this only happens to work
because list is the first member of the struct instruction.

> +		if (!prev->func
> +			|| prev->func->pfunc != insn->func->pfunc)
> +			return false;
> +		if (prev->stack_op.src.reg != ADR_SOURCE)
> +			continue;
> +		sym = find_symbol_containing(insn->sec, insn->immediate);
> +		if (!sym || sym->type != STT_FUNC
> +			|| sym->pfunc != insn->func->pfunc)
> +			return true;
> +		break;
> +	}
> +	return true;
> +}
>  static int is_arm64(struct elf *elf)
>  {
>  	switch(elf->ehdr.e_machine){
> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
> index 1af7b4996307..88c3d99c76be 100644
> --- a/tools/objtool/arch/x86/decode.c
> +++ b/tools/objtool/arch/x86/decode.c
> @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>  	return addend + 4;
>  }
>  
> +bool arch_is_insn_sibling_call(struct instruction *insn)
> +{
> +	return true;

The naming and what the function returns does seem to be really fitting.
Makes it seem like every x86 instruction is a sibling call, which is
unlikely to be the case.

> +}
> +
>  int arch_orc_read_unwind_hints(struct objtool_file *file)
>  {
>  	struct section *sec, *relasec;
> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
> index 17fcd8c8f9c1..fa6106214318 100644
> --- a/tools/objtool/check.c
> +++ b/tools/objtool/check.c
> @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file)
>  	unsigned long offset;
>  	struct instruction *insn;
>  	int ret;
> +	static int composed_insn = 0;
>  
>  	for_each_sec(file, sec) {
>  
> -		if (!(sec->sh.sh_flags & SHF_EXECINSTR))
> +		if (!(sec->sh.sh_flags & SHF_EXECINSTR)
> +			&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>  			continue;
>  
>  		if (strcmp(sec->name, ".altinstr_replacement") &&
> @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file)
>  				WARN_FUNC("invalid instruction type %d",
>  					  insn->sec, insn->offset, insn->type);
>  				ret = -1;
> -				goto err;
> +				free(insn);
> +				continue;
>  			}
> -
> -			hash_add(file->insn_hash, &insn->hash, insn->offset);
> +			/*
> +			 * For arm64 architecture, we sometime split instructions so that
> +			 * we can track the state evolution (i.e. load/store of pairs of registers).
> +			 * We thus need to take both into account and not erase the previous ones.
> +			 */
> +			if (composed_insn > 0)
> +				hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn);
> +			else
> +				hash_add(file->insn_hash, &insn->hash, insn->offset);

composed_insn has no reason to be negative, right? So this if is
equivalent to:

	hash_add(file->insn_hash, &insn->hash,
		 insn->offset + composed_insn);


Also, that means that this only works because all arm64 instructions are
4 bytes long. Otherwise you could have an overlap between the "second"
part of the composed instruction and the instruction that follows it.

It feels a bit too hackish for the arch independent code.

If we want to use that trick, maybe it should be the arm64 decode that
should return a length of 1 or 2 for composed insn, and when decoding an
instruction that isn't 4-bytes aligned, we would know to look 2 bytes
before to find what we were decoding, then return (4 - len) so that we
end up on the next instruction on the next itertation.

This way we don't have to change anything to the decode loop.

Also, I've got the impression that this hash table is most often use to
find the instruction at the starting offset of a function. Meaning it is
unlikely we'll end up looking up that composed instruction. Might be
worth checking whether this is the case and if so, maybe we can just add
the one real instruction to the hash table and focus on the instruction
list for our instruction splitting.

> +			if (insn->len == 0)
> +				composed_insn++;
> +			else
> +				composed_insn = 0;
>  			list_add_tail(&insn->list, &file->insn_list);
>  		}
>  

Thanks,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 4/6] arm64: assembler: Add macro to annotate asm function having non standard stack-frame.
  2019-04-09 13:52 ` [RFC 4/6] arm64: assembler: Add macro to annotate asm function having non standard stack-frame Raphael Gault
@ 2019-04-24 10:44   ` Julien Thierry
  0 siblings, 0 replies; 36+ messages in thread
From: Julien Thierry @ 2019-04-24 10:44 UTC (permalink / raw)
  To: Raphael Gault, linux-kernel, linux-arm-kernel
  Cc: jpoimboe, peterz, catalin.marinas, will.deacon

Hi Raphaël,

On 09/04/2019 14:52, Raphael Gault wrote:
> Signed-off-by: Raphael Gault <raphael.gault@arm.com>

Even there is not much to say, we include a commit message to explain
what the patch does and/or why we want it.

> ---
>  arch/arm64/include/asm/assembler.h | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h
> index 4feb6119c3c9..636a07a7eb76 100644
> --- a/arch/arm64/include/asm/assembler.h
> +++ b/arch/arm64/include/asm/assembler.h
> @@ -748,4 +748,22 @@ USER(\label, ic	ivau, \tmp2)			// invalidate I line PoU
>  .Lyield_out_\@ :
>  	.endm
>  
> +
> +#ifdef	CONFIG_STACK_VALIDATION
> +	/*
> +	 * This macro is the arm64 assembler equivalent of the
> +	 * macro STACK_FRAME_NON_STANDARD define at
> +	 * ~/include/linux/frame.h
> +	 */
> +	.macro	asm_stack_frame_non_standard	func
> +	.pushsection ".discard.func_stack_frame_non_standard"
> +	.8byte	\func
> +	.popsection
> +	.endm
> +#else
> +	.macro	asm_stack_frame_non_standard	func
> +	.endm
> +#endif
> +
> +

This can be simplified as:

	.macro asm_stack_frame_non_standard func
#ifdef CONFIG_STACK_VALIDATION
	[...]
#endif
	.endm

>  #endif	/* __ASM_ASSEMBLER_H */
> 

Cheers,

-- 
Julien Thierry

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/6] objtool: Add support for Arm64
  2019-04-23 21:09 ` Josh Poimboeuf
@ 2019-04-24 16:08   ` Raphael Gault
  2019-04-24 16:14     ` Josh Poimboeuf
  0 siblings, 1 reply; 36+ messages in thread
From: Raphael Gault @ 2019-04-24 16:08 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

Hi Josh,

On 4/23/19 10:09 PM, Josh Poimboeuf wrote:
> On Tue, Apr 09, 2019 at 02:52:37PM +0100, Raphael Gault wrote:
>> Hi,
>>
>> As of now, objtool only supports the x86_64 architecture but the
>> groundwork has already been done in order to add support for other
>> architecture without too much effort.
>>
>> This series of patches adds support for the arm64 architecture
>> based on the Armv8.5 Architecture Reference Manual.
>>
>> * Patch 1 adapts the existing code to be able to add support for other
>>    architecture.
>> * Patch 2 provide implementation of the required function for the arm64
>>    architecture.
>> * Patch 3 adapts the checking of the stack state for the arm64
>>    architecture.
>> * Patch 4 & 5 fix some warning objtool raised in some particular
>>    functions of ~/arch/arm64/kernel/sleep.S. Patch 4 add a macro to
>>    signal that some function should be ignored by objtool.
>> * Patch 6 enables stack validation for arm64.
>>
>> Theses patches should provide support for the main cases and behaviour.
>> However a few corner cases are not yet handled by objtool:
>>
>> * In the `~/arch/arm64/crypto/` directory, I noticed that some plain
>>    data are sometimes stored in the `.text` section causing objtool to mistake
>>    this for instructions and trying (and failing) to interprete them.  If someone
>>    could explain to me why we store data directly in the .text section I would
>>    appreciate it.
>
> I haven't looked, but it should probably be moved to .rodata.  We had
> cases like that for x86.
>
>> * In the support for arm32 architecture such as in `~/arch/arm64/kernel/kuser32.S`
>>    some A32 instructions are used but such instructions are not understood by
>>    objtool causing a warning.
>>
>> I also have a few unclear points I would like to bring to your
>> attention:
>>
>> * For x86_64, when looking for a symbol relocation with explicit
>>    addend, objtool systematically adds a +4 offset to the addend.
>>    I don't understand why even if I have a feeling it is related
>>    to the type of relacation.
>
> This is because of how relative call/jump addresses are implemented on
> x86.  It calculates the call/jump destination by adding the encoded offset
> to the *ending* address of the instruction, rather than to the address
> of the encoded offset itself.
>
> For example:
>
>    119ca0:       e8 00 00 00 00          callq  119ca5 <__ia32_sys_sched_rr_get_interval+0x5>
>                          119ca1: R_X86_64_PC32   __fentry__-0x4
>
> This instruction is a call to the __fentry__ function.  The rela addend
> is the address of the destination function (__fentry__) minus 4.  After
> applying the relocation, it resolves to:
>
> ffffffff81002010:       e8 eb f7 9f 00          callq  ffffffff81a01800 <__fentry__>
>
> The destination address is "0x9ff7eb", which is indeed __fentry__ - 4.
>
> x86 expects it to be that way, because the x86 CPU adds the offset to
> the *end* of the instruction: ffffffff81002015 (last digit is 5, not 0).
>
> 0xffffffff81002015 + 0x9ff7eb = 0xfffffff81a01800, which is indeed the
> address of __fentry__.
>
> And there's always a 4-byte gap between the relocation target and the
> end of the instruction, so the rela addend always has the -4.  So when
> reading the relocation we have to add the 4 bytes back to get the actual
> destination address.
>

Thank you for the explanation!

>> * I currently don't have a clear understanding about how switch-tables
>>    are generated on arm64 and how to retrieve them (based on relocations).
>
> This is indeed a bit tricky on x86.
>
>> Please provide me with any feedback and comments as well on the content
>> than the style of these patches.
>
> Overall it looks like a great start.  I added some per-patch comments.
>

Thank you, I will address them.

> Has the cross-compile path been tested?  Specifically, compiling for a
> arm64 target on an x86 host?  In other words, objtool would be an x86
> binary which reads arm64 objects.  I imagine that will be a semi-common
> use case.  Objtool already supports cross-compiling for an x86-64 target
> on an x86-32 host (and also a powerpc host IIRC), so it should be
> do-able in theory, and it might "just work".
>

It has indeed been tested to build for arm64 target on a x86 host and it
works fine.

> For the next version, please base it on the -tip tree, as that's where
> all the latest objtool changes are.  Peter refactored some code which
> has some minor conflicts with yours.
>

Thanks for letting me know about this. Just a quick precision, when you
talk about the -tip tree, are you talking about this tree ?
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support
  2019-04-23 20:13   ` Josh Poimboeuf
@ 2019-04-24 16:11     ` Raphael Gault
  2019-04-24 16:17       ` Josh Poimboeuf
  0 siblings, 1 reply; 36+ messages in thread
From: Raphael Gault @ 2019-04-24 16:11 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

On 4/23/19 9:13 PM, Josh Poimboeuf wrote:
> On Tue, Apr 09, 2019 at 02:52:38PM +0100, Raphael Gault wrote:
>> The jump destination and relocation offset used previously are only reliable
>> on x86_64 architecture. We abstract these computations by calling arch-dependant
>
> "dependent"
>
>> implementation.
>>
>> The control flow information and register macro definitions were based on
>> the x86_64 architecture but should be abstract so that each architecture
>> can define the correct values for the registers, especially the registers
>> related to the stack frame (Frame Pointer, Stack Pointer and possibly
>> Return Address).
>>
>> The ORC unwinder is only supported on x86 at the moment and should thus be
>> in the x86 architecture code. In order not to break the whole structure in
>> case another architecture decides to support the ORC unwinder via objtool
>> we choose to let the implementation be done in the architecture dependant
>> code.
>
> It's good practice to put each logical change into a separate patch.
> That will also make the patches easier to review.
>

Indeed this will be better.

> For example, each of the above three paragraphs should be a separate
> commit.
>
> Also it's a good idea to run the patches through checkpatch.pl (though
> its warnings should be taken with a grain of salt).
>

Yes, I realised after the comment from Peter that I needed to correct
quite a few things, thank you.

>> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
>> ---
>>   tools/objtool/Build                           |   1 -
>>   tools/objtool/arch.h                          |   9 ++
>>   tools/objtool/arch/x86/Build                  |   1 +
>>   tools/objtool/arch/x86/decode.c               | 106 ++++++++++++++++
>>   tools/objtool/arch/x86/include/arch_special.h |  35 ++++++
>>   tools/objtool/{ => arch/x86/include}/cfi.h    |   0
>>   tools/objtool/{ => arch/x86}/orc_gen.c        |  10 +-
>>   tools/objtool/check.c                         | 114 ++----------------
>>   tools/objtool/check.h                         |   1 +
>>   tools/objtool/orc.h                           |   4 +-
>>   tools/objtool/special.c                       |  18 +--
>>   11 files changed, 173 insertions(+), 126 deletions(-)
>>   create mode 100644 tools/objtool/arch/x86/include/arch_special.h
>>   rename tools/objtool/{ => arch/x86/include}/cfi.h (100%)
>>   rename tools/objtool/{ => arch/x86}/orc_gen.c (96%)
>>
>> diff --git a/tools/objtool/Build b/tools/objtool/Build
>> index 749becdf5b90..ec925d49565a 100644
>> --- a/tools/objtool/Build
>> +++ b/tools/objtool/Build
>> @@ -2,7 +2,6 @@ objtool-y += arch/$(SRCARCH)/
>>   objtool-y += builtin-check.o
>>   objtool-y += builtin-orc.o
>>   objtool-y += check.o
>> -objtool-y += orc_gen.o
>
> I'm not sure whether moving ORC out to the arch-specific code is the
> best option.  I expect a lot of the ORC code to be generic.  But this
> might be ok for now, until we get another ORC implementation.
>
> Another possibility would be to make weak versions of the orc functions
> somewhere (check.c?) and only compile the generic orc_gen.c on arches
> which support it.  Then we could abstract out the arch-specific ORC bits
> later.
>
>>   objtool-y += orc_dump.o
>
> orc_dump.o doesn't need to be built on arm64.  The "orc dump" option
> should fail accordingly.
>
>>   objtool-y += elf.o
>>   objtool-y += special.o
>> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
>> index b0d7dc3d71b5..0eff166ca613 100644
>> --- a/tools/objtool/arch.h
>> +++ b/tools/objtool/arch.h
>> @@ -22,6 +22,7 @@
>>   #include <linux/list.h>
>>   #include "elf.h"
>>   #include "cfi.h"
>> +#include "orc.h"
>>
>>   #define INSN_JUMP_CONDITIONAL1
>>   #define INSN_JUMP_UNCONDITIONAL2
>> @@ -70,6 +71,8 @@ struct stack_op {
>>   struct op_src src;
>>   };
>>
>> +struct instruction;
>> +
>>   void arch_initial_func_cfi_state(struct cfi_state *state);
>>
>>   int arch_decode_instruction(struct elf *elf, struct section *sec,
>> @@ -79,4 +82,10 @@ int arch_decode_instruction(struct elf *elf, struct section *sec,
>>
>>   bool arch_callee_saved_reg(unsigned char reg);
>>
>> +int arch_orc_read_unwind_hints(struct objtool_file *file);
>> +
>> +unsigned long arch_compute_jump_destination(struct instruction *insn);
>> +
>> +unsigned long arch_compute_rela_sym_offset(int addend);
>
> arch_dest_rela_addend_offset() might be a more descriptive name.  Also
> it might be simpler to just make it an arch-specific macro which is 0 on
> arm64 and 4 on x86.
>
> "compute" is implied, it can probably be removed from the names to make
> them a little more concise.
>

I am more in favor of the functions, I have to admit I don't know which
is really better but in any event I will rename them.

>> +
>>   #endif /* _ARCH_H */
>> diff --git a/tools/objtool/arch/x86/Build b/tools/objtool/arch/x86/Build
>> index b998412c017d..74015be53ef0 100644
>> --- a/tools/objtool/arch/x86/Build
>> +++ b/tools/objtool/arch/x86/Build
>> @@ -1,4 +1,5 @@
>>   objtool-y += decode.o
>> +objtool-y += orc_gen.o
>>
>>   inat_tables_script = arch/x86/tools/gen-insn-attr-x86.awk
>>   inat_tables_maps = arch/x86/lib/x86-opcode-map.txt
>> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
>> index 540a209b78ab..1af7b4996307 100644
>> --- a/tools/objtool/arch/x86/decode.c
>> +++ b/tools/objtool/arch/x86/decode.c
>> @@ -23,6 +23,8 @@
>>   #include "lib/inat.c"
>>   #include "lib/insn.c"
>>
>> +
>> +#include "../../check.h"
>>   #include "../../elf.h"
>>   #include "../../arch.h"
>>   #include "../../warn.h"
>> @@ -78,6 +80,105 @@ bool arch_callee_saved_reg(unsigned char reg)
>>   }
>>   }
>>
>> +unsigned long arch_compute_rela_sym_offset(int addend)
>> +{
>> +return addend + 4;
>> +}
>> +
>> +int arch_orc_read_unwind_hints(struct objtool_file *file)
>
> I think this function would be better suited for orc_gen.c (which could
> be renamed to just orc.c).
>

That is indeed true. I did that for the arm64 side so for consistency it
should be the same here.

>> diff --git a/tools/objtool/arch/x86/include/arch_special.h b/tools/objtool/arch/x86/include/arch_special.h
>> new file mode 100644
>> index 000000000000..bd91b1096359
>> --- /dev/null
>> +++ b/tools/objtool/arch/x86/include/arch_special.h
>> @@ -0,0 +1,35 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version 2
>> + * of the License, or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>
> This needs the standard header macro guards, e.g.
>
> #ifndef _X86_ARCH_SPECIAL_H
> #define _X86_ARCH_SPECIAL_H
>

I will correct this! Thank you.
>> +
>> +#define EX_ENTRY_SIZE12
>> +#define EX_ORIG_OFFSET0
>> +#define EX_NEW_OFFSET4
>> +
>> +#define JUMP_ENTRY_SIZE16
>> +#define JUMP_ORIG_OFFSET0
>> +#define JUMP_NEW_OFFSET4
>> +
>> +#define ALT_ENTRY_SIZE13
>> +#define ALT_ORIG_OFFSET0
>> +#define ALT_NEW_OFFSET4
>> +#define ALT_FEATURE_OFFSET8
>> +#define ALT_ORIG_LEN_OFFSET10
>> +#define ALT_NEW_LEN_OFFSET11
>> +
>> +#define IGNORE_SHF_EXEC_FLAG0
>> +
>> +#define JUMP_DYNAMIC_IS_SWITCH_TABLE0
>
> These flags should be added with the commit which actually uses them.
>
> Also "arch_special.h" is specific to special section parsing, so I'm
> thinking these two macros don't really belong here.  Or maybe the header
> file could be renamed to something more generic.
>

My approach was indeed wrong. In the next version I'll get rid of those
macros which are inelegant and unrelated to x86.

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/6] objtool: Add support for Arm64
  2019-04-24 16:08   ` Raphael Gault
@ 2019-04-24 16:14     ` Josh Poimboeuf
  0 siblings, 0 replies; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-24 16:14 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

On Wed, Apr 24, 2019 at 04:08:15PM +0000, Raphael Gault wrote:
> > For the next version, please base it on the -tip tree, as that's where
> > all the latest objtool changes are.  Peter refactored some code which
> > has some minor conflicts with yours.
> >
> 
> Thanks for letting me know about this. Just a quick precision, when you
> talk about the -tip tree, are you talking about this tree ?
> https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/

Indeed.  The git link I use is:

  git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git

And generally the master branch of that tree seems to be a good one to
use.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool.
  2019-04-23 20:18   ` Josh Poimboeuf
@ 2019-04-24 16:16     ` Raphael Gault
  2019-04-24 16:23       ` Josh Poimboeuf
  0 siblings, 1 reply; 36+ messages in thread
From: Raphael Gault @ 2019-04-24 16:16 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

On 4/23/19 9:18 PM, Josh Poimboeuf wrote:
> On Tue, Apr 09, 2019 at 02:52:39PM +0100, Raphael Gault wrote:
>> Provide implementation for the arch-dependent functions that are called by the main check
>> function of objtool.
>> The ORC unwinder is not yet supported by the arm64 architecture so we only provide a dummy
>> interface for now.
>> The decoding of the instruction is split into classes and subclasses as described into
>> the Instruction Encoding in the ArmV8.5 Architecture Reference Manual.
>
> Where did the code for the decoder come from?  Was it written from
> scratch?
>

This decoder was indeed written from scratch based on the armv8 ARM. The
automatic generation idea hasn't really been discussed yet.

>> diff --git a/tools/objtool/arch/arm64/include/arch_special.h b/tools/objtool/arch/arm64/include/arch_special.h
>> new file mode 100644
>> index 000000000000..54bcce4c58c0
>> --- /dev/null
>> +++ b/tools/objtool/arch/arm64/include/arch_special.h
>> @@ -0,0 +1,44 @@
>> +/*
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version 2
>> + * of the License, or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>
> Needs a header guard.
>
>> +#define EX_ENTRY_SIZE8
>> +#define EX_ORIG_OFFSET0
>> +#define EX_NEW_OFFSET4
>> +
>> +#define JUMP_ENTRY_SIZE16
>> +#define JUMP_ORIG_OFFSET0
>> +#define JUMP_NEW_OFFSET4
>> +
>> +#define ALT_ENTRY_SIZE12
>> +#define ALT_ORIG_OFFSET0
>> +#define ALT_NEW_OFFSET4
>> +#define ALT_FEATURE_OFFSET8
>> +#define ALT_ORIG_LEN_OFFSET10
>> +#define ALT_NEW_LEN_OFFSET11
>> +
>> +/*
>> + * On arm64 the .altinstr_replacement is not always marked
>> + * as containing executable instruction. But we still want
>> + * to process it so we ignore the SHF_EXEC flag
>> + */
>> +#define IGNORE_SHF_EXEC_FLAG1
>> +
>> +/*
>> + * The jump table detection is not the same on arm64 so for
>> + * now we just detect if it is a dynamic jump (br <Xn> insn)
>> + */
>> +#define JUMP_DYNAMIC_IS_SWITCH_TABLE1
>
> Same as for x86, these flags should be added in the same patch which
> uses them.
>

This will disappear in the next version, thanks.

>> +
>> +#define X86_FEATURE_POPCNT (4*32+23)
>> diff --git a/tools/objtool/arch/arm64/include/asm/orc_types.h b/tools/objtool/arch/arm64/include/asm/orc_types.h
>> new file mode 100644
>> index 000000000000..46f516dd80ce
>> --- /dev/null
>> +++ b/tools/objtool/arch/arm64/include/asm/orc_types.h
>> @@ -0,0 +1,109 @@
>> +/*
>> + * Copyright (C) 2017 Josh Poimboeuf <jpoimboe@redhat.com>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version 2
>> + * of the License, or (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#ifndef _ORC_TYPES_H
>> +#define _ORC_TYPES_H
>> +
>> +#include <linux/types.h>
>> +#include <linux/compiler.h>
>> +
>> +/*
>> + * The ORC_REG_* registers are base registers which are used to find other
>> + * registers on the stack.
>> + *
>> + * ORC_REG_PREV_SP, also known as DWARF Call Frame Address (CFA), is the
>> + * address of the previous frame: the caller's SP before it called the current
>> + * function.
>> + *
>> + * ORC_REG_UNDEFINED means the corresponding register's value didn't change in
>> + * the current frame.
>> + *
>> + * The most commonly used base registers are SP and BP -- which the previous SP
>> + * is usually based on -- and PREV_SP and UNDEFINED -- which the previous BP is
>> + * usually based on.
>> + *
>> + * The rest of the base registers are needed for special cases like entry code
>> + * and GCC realigned stacks.
>> + */
>> +#define ORC_REG_UNDEFINED0
>> +#define ORC_REG_PREV_SP1
>> +#define ORC_REG_DX2
>> +#define ORC_REG_DI3
>> +#define ORC_REG_BP4
>> +#define ORC_REG_SP5
>> +#define ORC_REG_R106
>> +#define ORC_REG_R137
>> +#define ORC_REG_BP_INDIRECT8
>> +#define ORC_REG_SP_INDIRECT9
>> +#define ORC_REG_MAX15
>> +
>> +/*
>> + * ORC_TYPE_CALL: Indicates that sp_reg+sp_offset resolves to PREV_SP (the
>> + * caller's SP right before it made the call).  Used for all callable
>> + * functions, i.e. all C code and all callable asm functions.
>> + *
>> + * ORC_TYPE_REGS: Used in entry code to indicate that sp_reg+sp_offset points
>> + * to a fully populated pt_regs from a syscall, interrupt, or exception.
>> + *
>> + * ORC_TYPE_REGS_IRET: Used in entry code to indicate that sp_reg+sp_offset
>> + * points to the iret return frame.
>> + *
>> + * The UNWIND_HINT macros are used only for the unwind_hint struct.  They
>> + * aren't used in struct orc_entry due to size and complexity constraints.
>> + * Objtool converts them to real types when it converts the hints to orc
>> + * entries.
>> + */
>> +#define ORC_TYPE_CALL0
>> +#define ORC_TYPE_REGS1
>> +#define ORC_TYPE_REGS_IRET2
>> +#define UNWIND_HINT_TYPE_SAVE3
>> +#define UNWIND_HINT_TYPE_RESTORE4
>> +
>> +#ifndef __ASSEMBLY__
>> +/*
>> + * This struct is more or less a vastly simplified version of the DWARF Call
>> + * Frame Information standard.  It contains only the necessary parts of DWARF
>> + * CFI, simplified for ease of access by the in-kernel unwinder.  It tells the
>> + * unwinder how to find the previous SP and BP (and sometimes entry regs) on
>> + * the stack for a given code address.  Each instance of the struct corresponds
>> + * to one or more code locations.
>> + */
>> +struct orc_entry {
>> +s16sp_offset;
>> +s16bp_offset;
>> +unsignedsp_reg:4;
>> +unsignedbp_reg:4;
>> +unsignedtype:2;
>> +unsignedend:1;
>> +} __packed;
>> +
>> +/*
>> + * This struct is used by asm and inline asm code to manually annotate the
>> + * location of registers on the stack for the ORC unwinder.
>> + *
>> + * Type can be either ORC_TYPE_* or UNWIND_HINT_TYPE_*.
>> + */
>> +struct unwind_hint {
>> +u32ip;
>> +s16sp_offset;
>> +u8sp_reg;
>> +u8type;
>> +u8end;
>> +};
>> +#endif /* __ASSEMBLY__ */
>> +
>> +#endif /* _ORC_TYPES_H */
>
> It seems odd to have the above header file in arm64 code, since it
> doesn't implement ORC.  Is it really needed?
>

The unwind_hint part can safely be removed. However the orc_entry seems
to be needed since the struct instruction comports a struct orc_entry
field. I have chosen to let it here as well but maybe a better approach
is possible.

Cheers,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support
  2019-04-24 16:11     ` Raphael Gault
@ 2019-04-24 16:17       ` Josh Poimboeuf
  0 siblings, 0 replies; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-24 16:17 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

On Wed, Apr 24, 2019 at 04:11:57PM +0000, Raphael Gault wrote:
> On 4/23/19 9:13 PM, Josh Poimboeuf wrote:
> > arch_dest_rela_addend_offset() might be a more descriptive name.  Also
> > it might be simpler to just make it an arch-specific macro which is 0 on
> > arm64 and 4 on x86.
> >
> > "compute" is implied, it can probably be removed from the names to make
> > them a little more concise.
> >
> 
> I am more in favor of the functions, I have to admit I don't know which
> is really better but in any event I will rename them.

If you prefer functions, that's fine with me.  I don't have a strong
preference.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool.
  2019-04-24 16:16     ` Raphael Gault
@ 2019-04-24 16:23       ` Josh Poimboeuf
  0 siblings, 0 replies; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-24 16:23 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

On Wed, Apr 24, 2019 at 04:16:26PM +0000, Raphael Gault wrote:
> On 4/23/19 9:18 PM, Josh Poimboeuf wrote:
> > On Tue, Apr 09, 2019 at 02:52:39PM +0100, Raphael Gault wrote:
> >> Provide implementation for the arch-dependent functions that are called by the main check
> >> function of objtool.
> >> The ORC unwinder is not yet supported by the arm64 architecture so we only provide a dummy
> >> interface for now.
> >> The decoding of the instruction is split into classes and subclasses as described into
> >> the Instruction Encoding in the ArmV8.5 Architecture Reference Manual.
> >
> > Where did the code for the decoder come from?  Was it written from
> > scratch?
> >
> 
> This decoder was indeed written from scratch based on the armv8 ARM. The
> automatic generation idea hasn't really been discussed yet.

Ok.  That's a lot of code.  Hopefully ARM folks can review it closely.

> >> +#ifndef __ASSEMBLY__
> >> +/*
> >> + * This struct is more or less a vastly simplified version of the DWARF Call
> >> + * Frame Information standard.  It contains only the necessary parts of DWARF
> >> + * CFI, simplified for ease of access by the in-kernel unwinder.  It tells the
> >> + * unwinder how to find the previous SP and BP (and sometimes entry regs) on
> >> + * the stack for a given code address.  Each instance of the struct corresponds
> >> + * to one or more code locations.
> >> + */
> >> +struct orc_entry {
> >> +s16sp_offset;
> >> +s16bp_offset;
> >> +unsignedsp_reg:4;
> >> +unsignedbp_reg:4;
> >> +unsignedtype:2;
> >> +unsignedend:1;
> >> +} __packed;
> >> +
> >> +/*
> >> + * This struct is used by asm and inline asm code to manually annotate the
> >> + * location of registers on the stack for the ORC unwinder.
> >> + *
> >> + * Type can be either ORC_TYPE_* or UNWIND_HINT_TYPE_*.
> >> + */
> >> +struct unwind_hint {
> >> +u32ip;
> >> +s16sp_offset;
> >> +u8sp_reg;
> >> +u8type;
> >> +u8end;
> >> +};
> >> +#endif /* __ASSEMBLY__ */
> >> +
> >> +#endif /* _ORC_TYPES_H */
> >
> > It seems odd to have the above header file in arm64 code, since it
> > doesn't implement ORC.  Is it really needed?
> >
> 
> The unwind_hint part can safely be removed. However the orc_entry seems
> to be needed since the struct instruction comports a struct orc_entry
> field. I have chosen to let it here as well but maybe a better approach
> is possible.

Ideally we can figure out a way to decouple 'struct instruction' from
'struct orc_entry'.  But yes, I think your approach is fine for now.

Though I think using an arch-independent header file would be better, to
avoid creating duplicated (dead) code.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-23 20:36   ` Josh Poimboeuf
@ 2019-04-24 16:32     ` Raphael Gault
  2019-04-24 16:56       ` Josh Poimboeuf
  0 siblings, 1 reply; 36+ messages in thread
From: Raphael Gault @ 2019-04-24 16:32 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

Hi,

As Julien also pointed out I realise that similarly to the first patch
of this series, this patch should be split into smaller ones. I will do
this for the next version.

On 4/23/19 9:36 PM, Josh Poimboeuf wrote:
> On Tue, Apr 09, 2019 at 02:52:40PM +0100, Raphael Gault wrote:
>> Since the way the initial stack frame when entering a function is different that what is done
>> in the x86_64 architecture, we need to add some more check to support the different cases.
>> As opposed as for x86_64, the return address is not stored by the call instruction but is instead
>> loaded in a register. The initial stack frame is thus empty when entering a function and 2 push
>> operations are needed to set it up correctly. All the different combinations need to be
>> taken into account.
>>
>> On arm64, the .altinstr_replacement section is not flagged as containing executable instructions
>> but we still need to process it.
>>
>> Switch tables are alse stored in a different way on arm64 than on x86_64 so we need to be able
>> to identify in which case we are when looking for it.
>>
>> Signed-off-by: Raphael Gault <raphael.gault@arm.com>
>> ---
>>   tools/objtool/arch.h              |  2 +
>>   tools/objtool/arch/arm64/decode.c | 27 +++++++++
>>   tools/objtool/arch/x86/decode.c   |  5 ++
>>   tools/objtool/check.c             | 95 +++++++++++++++++++++++++++----
>>   tools/objtool/elf.c               |  3 +-
>>   5 files changed, 120 insertions(+), 12 deletions(-)
>>
>> diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
>> index 0eff166ca613..f3bef3f2cef3 100644
>> --- a/tools/objtool/arch.h
>> +++ b/tools/objtool/arch.h
>> @@ -88,4 +88,6 @@ unsigned long arch_compute_jump_destination(struct instruction *insn);
>>
>>   unsigned long arch_compute_rela_sym_offset(int addend);
>>
>> +bool arch_is_insn_sibling_call(struct instruction *insn);
>> +
>>   #endif /* _ARCH_H */
>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
>> index 0feb3ae3af5d..8b293eae2b38 100644
>> --- a/tools/objtool/arch/arm64/decode.c
>> +++ b/tools/objtool/arch/arm64/decode.c
>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>   return addend;
>>   }
>>
>> +/*
>> + * In order to know if we are in presence of a sibling
>> + * call and not in presence of a switch table we look
>> + * back at the previous instructions and see if we are
>> + * jumping inside the same function that we are already
>> + * in.
>> + */
>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>> +{
>> +struct instruction *prev;
>> +struct list_head *l;
>> +struct symbol *sym;
>> +list_for_each_prev(l, &insn->list) {
>> +prev = (void *)l;
>> +if (!prev->func
>> +|| prev->func->pfunc != insn->func->pfunc)
>> +return false;
>> +if (prev->stack_op.src.reg != ADR_SOURCE)
>> +continue;
>> +sym = find_symbol_containing(insn->sec, insn->immediate);
>> +if (!sym || sym->type != STT_FUNC
>> +|| sym->pfunc != insn->func->pfunc)
>> +return true;
>> +break;
>> +}
>> +return true;
>> +}
>
> I get the feeling there might be a better way to do this, but I can't
> figure out what this function is actually doing.  It looks like it
> searches backwards in the function for an instruction which has
> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
> it do anything with the instruction after it finds it?
>

I will indeed try to make it better.

>>   static int is_arm64(struct elf *elf)
>>   {
>>   switch(elf->ehdr.e_machine){
>> diff --git a/tools/objtool/arch/x86/decode.c b/tools/objtool/arch/x86/decode.c
>> index 1af7b4996307..88c3d99c76be 100644
>> --- a/tools/objtool/arch/x86/decode.c
>> +++ b/tools/objtool/arch/x86/decode.c
>> @@ -85,6 +85,11 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>   return addend + 4;
>>   }
>>
>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>> +{
>> +return true;
>> +}
>
> All x86 instructions are sibling calls?
>

The sementic here is indeed wrong. I implemented it like that on the x86
side because at the place it is called from in check.c we already know
that we are in presence of a sibling calls when on x86. I will improve
these bits on the next iteration.

>> +
>>   int arch_orc_read_unwind_hints(struct objtool_file *file)
>>   {
>>   struct section *sec, *relasec;
>> diff --git a/tools/objtool/check.c b/tools/objtool/check.c
>> index 17fcd8c8f9c1..fa6106214318 100644
>> --- a/tools/objtool/check.c
>> +++ b/tools/objtool/check.c
>> @@ -261,10 +261,12 @@ static int decode_instructions(struct objtool_file *file)
>>   unsigned long offset;
>>   struct instruction *insn;
>>   int ret;
>> +static int composed_insn = 0;
>>
>>   for_each_sec(file, sec) {
>>
>> -if (!(sec->sh.sh_flags & SHF_EXECINSTR))
>> +if (!(sec->sh.sh_flags & SHF_EXECINSTR)
>> +&& (strcmp(sec->name, ".altinstr_replacement") || !IGNORE_SHF_EXEC_FLAG))
>>   continue;
>
> We should just fix the root cause instead, presumably:
>
> diff --git a/arch/arm64/include/asm/alternative.h b/arch/arm64/include/asm/alternative.h
> index b9f8d787eea9..e9e6b81e3eb4 100644
> --- a/arch/arm64/include/asm/alternative.h
> +++ b/arch/arm64/include/asm/alternative.h
> @@ -71,7 +71,7 @@ static inline void apply_alternatives_module(void *start, size_t length) { }
>   ALTINSTR_ENTRY(feature,cb)\
>   ".popsection\n"\
>   " .if " __stringify(cb) " == 0\n"\
> -".pushsection .altinstr_replacement, \"a\"\n"\
> +".pushsection .altinstr_replacement, \"ax\"\n"\
>   "663:\n\t"\
>   newinstr "\n"\
>   "664:\n\t"\
>
>

I will take your advice and suggest this in the next version as well.

>>
>>   if (strcmp(sec->name, ".altinstr_replacement") &&
>> @@ -297,10 +299,22 @@ static int decode_instructions(struct objtool_file *file)
>>   WARN_FUNC("invalid instruction type %d",
>>     insn->sec, insn->offset, insn->type);
>>   ret = -1;
>> -goto err;
>> +free(insn);
>> +continue;
>
> What's the purpose of this change?  If it's really needed then it looks
> like it should be a separate patch.
>
>>   }
>> -
>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
>> +/*
>> + * For arm64 architecture, we sometime split instructions so that
>> + * we can track the state evolution (i.e. load/store of pairs of registers).
>> + * We thus need to take both into account and not erase the previous ones.
>> + */
>
> Ew...  Is this an architectural thing, or just a quirk of the arm64
> decoder?
>

The motivation for this is to simulate the two consecutive operations
that would be executed on x86 but are done in one on arm64. This is
strictly a decoder related quirk. I don't know if there is a better way
to do it without modifying the struct op_src and struct instruction.

>> +if (composed_insn > 0)
>> +hash_add(file->insn_hash, &insn->hash, insn->offset + composed_insn);
>> +else
>> +hash_add(file->insn_hash, &insn->hash, insn->offset);
>> +if (insn->len == 0)
>> +composed_insn++;
>> +else
>> +composed_insn = 0;
>>   list_add_tail(&insn->list, &file->insn_list);
>>   }
>>
>> @@ -510,10 +524,10 @@ static int add_jump_destinations(struct objtool_file *file)
>>   dest_off = arch_compute_jump_destination(insn);
>>   } else if (rela->sym->type == STT_SECTION) {
>>   dest_sec = rela->sym->sec;
>> -dest_off = rela->addend + 4;
>> +dest_off = arch_compute_rela_sym_offset(rela->addend);
>>   } else if (rela->sym->sec->idx) {
>>   dest_sec = rela->sym->sec;
>> -dest_off = rela->sym->sym.st_value + rela->addend + 4;
>> +dest_off = rela->sym->sym.st_value + arch_compute_rela_sym_offset(rela->addend);
>>   } else if (strstr(rela->sym->name, "_indirect_thunk_")) {
>>   /*
>>    * Retpoline jumps are really dynamic jumps in
>> @@ -663,7 +677,7 @@ static int handle_group_alt(struct objtool_file *file,
>>   last_orig_insn = insn;
>>   }
>>
>> -if (next_insn_same_sec(file, last_orig_insn)) {
>> +if (last_orig_insn && next_insn_same_sec(file, last_orig_insn)) {
>
> This might belong in a separate patch which explains the reason for the
> change.
>

I did this because I encountered situations where the last_orig_insn was
NULL because of the offset check perform in the preceding loop. This
caused a null dereference to occur. But it definitely should be split up.

>>   fake_jump = malloc(sizeof(*fake_jump));
>>   if (!fake_jump) {
>>   WARN("malloc failed");
>> @@ -976,6 +990,17 @@ static struct rela *find_switch_table(struct objtool_file *file,
>>   if (find_symbol_containing(rodata_sec, table_offset))
>>   continue;
>>
>> +/*
>> + * If we are on arm64 architecture, we now that we
>
> "know"
>
>> + * are in presence of a switch table thanks to
>> + * the `br <Xn>` insn. but we can't retrieve it yet.
>> + * So we just ignore unreachable for this file.
>> + */
>> +if (JUMP_DYNAMIC_IS_SWITCH_TABLE) {
>> +file->ignore_unreachables = true;
>> +return NULL;
>> +}
>> +
>
> I think this means switch table reading is not yet supported?  If so
> then maybe the flag should be called SWITCH_TABLE_NOT_SUPPORTED.
>
> But really this needs to be fixed anyway before merging the code.
>

This was indeed an ugly way to do this. I will propose a better solution
for this part in the next iteration.

>>   rodata_rela = find_rela_by_dest(rodata_sec, table_offset);
>>   if (rodata_rela) {
>>   /*
>> @@ -1258,8 +1283,8 @@ static void save_reg(struct insn_state *state, unsigned char reg, int base,
>>
>>   static void restore_reg(struct insn_state *state, unsigned char reg)
>>   {
>> -state->regs[reg].base = CFI_UNDEFINED;
>> -state->regs[reg].offset = 0;
>> +state->regs[reg].base = initial_func_cfi.regs[reg].base;
>> +state->regs[reg].offset = initial_func_cfi.regs[reg].offset;
>>   }
>>
>>   /*
>> @@ -1415,8 +1440,33 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>>
>>   /* add imm, %rsp */
>>   state->stack_size -= op->src.offset;
>> -if (cfa->base == CFI_SP)
>> +if (cfa->base == CFI_SP) {
>>   cfa->offset -= op->src.offset;
>> +if (state->stack_size == 0
>> +&& initial_func_cfi.cfa.base == CFI_CFA) {
>> +cfa->base = CFI_CFA;
>> +cfa->offset = 0;
>> +}
>> +}
>> +/*
>> + * on arm64 the save/restore of sp into fp is not automatic
>> + * and the first one can be done without the other so we
>> + * need to be careful not to invalidate the stack frame in such
>> + * cases.
>> + */
>> +else if (cfa->base == CFI_BP) {
>> +if (state->stack_size == 0
>> +&& initial_func_cfi.cfa.base == CFI_CFA) {
>> +cfa->base = CFI_CFA;
>> +cfa->offset = 0;
>> +restore_reg(state, CFI_BP);
>> +}
>> +}
>> +else if (cfa->base == CFI_CFA) {
>> +cfa->base = CFI_SP;
>> +if (state->stack_size >= 16)
>> +cfa->offset = 16;
>> +}
>>   break;
>>   }
>>
>> @@ -1427,6 +1477,16 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>>   break;
>>   }
>>
>> +if (op->src.reg == CFI_SP && op->dest.reg == CFI_BP &&
>> +    cfa->base == CFI_SP &&
>> +    regs[CFI_BP].base == CFI_CFA &&
>> +    regs[CFI_BP].offset == -cfa->offset) {
>> +
>> +/* mov %rsp, %rbp */
>> +cfa->base = op->dest.reg;
>> +state->bp_scratch = false;
>> +break;
>> +}
>>   if (op->src.reg == CFI_SP && cfa->base == CFI_SP) {
>>
>>   /* drap: lea disp(%rsp), %drap */
>> @@ -1518,6 +1578,9 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>>   state->stack_size -= 8;
>>   if (cfa->base == CFI_SP)
>>   cfa->offset -= 8;
>> +if (cfa->base == CFI_SP && cfa->offset == 0
>> +&& initial_func_cfi.cfa.base == CFI_CFA)
>> +cfa->base = CFI_CFA;
>>
>>   break;
>>
>> @@ -1557,6 +1620,8 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
>>
>>   case OP_DEST_PUSH:
>>   state->stack_size += 8;
>> +if (cfa->base == CFI_CFA)
>> +cfa->base = CFI_SP;
>>   if (cfa->base == CFI_SP)
>>   cfa->offset += 8;
>>
>> @@ -1728,7 +1793,7 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
>>   insn = first;
>>   sec = insn->sec;
>>
>> -if (insn->alt_group && list_empty(&insn->alts)) {
>> +if (!insn->visited && insn->alt_group && list_empty(&insn->alts)) {
>
> Why?  This looks like another one that might belong in a separate patch.
>

Indeed it belongs to a separate patch. I did this because I encountered
a situation where one of the alternative instructions (in
.altinstr_replacement) jumps at the offset of an instruction which is
replaced as well. I thus considered that if one of the instruction of
the group is replaced then all instructions of the group should be
replaced and thus this would never happen at execution.

>>   WARN_FUNC("don't know how to handle branch to middle of alternative instruction group",
>>     sec, insn->offset);
>>   return 1;
>> @@ -1871,6 +1936,14 @@ static int validate_branch(struct objtool_file *file, struct instruction *first,
>>   case INSN_JUMP_DYNAMIC:
>>   if (func && list_empty(&insn->alts) &&
>>       has_modified_stack_frame(&state)) {
>> +/*
>> + * On arm64 `br <Xn>` insn can be used for switch-tables
>> + * but it cannot be distinguished in itself from a sibling
>> + * call thus we need to have a look at the previous instructions
>> + * to determine which it is
>> + */
>> +if (!arch_is_insn_sibling_call(insn))
>> +break;
>>   WARN_FUNC("sibling call from callable instruction with modified stack frame",
>>     sec, insn->offset);
>>   return 1;
>> diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
>> index b8f3cca8e58b..136f9b9fb1d1 100644
>> --- a/tools/objtool/elf.c
>> +++ b/tools/objtool/elf.c
>> @@ -74,7 +74,8 @@ struct symbol *find_symbol_by_offset(struct section *sec, unsigned long offset)
>>   struct symbol *sym;
>>
>>   list_for_each_entry(sym, &sec->symbol_list, list)
>> -if (sym->type != STT_SECTION &&
>> +if (sym->type != STT_NOTYPE &&
>> +    sym->type != STT_SECTION &&
>
> Why?  Another potential separate patch.
>

This is again a special case I encountered where several symbols can be
found with the same offset so the sole condition of the offset wasn't
enough to distinguish the correct destination. I thus added some extra
check on the type of the symbol.

>>       sym->offset == offset)
>>   return sym;
>>
>> --
>> 2.17.1
>>
>

Please fill free to comment back on my answers. I would really like to
get you opinion on some of the special cases I mentioned.

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-24 16:32     ` Raphael Gault
@ 2019-04-24 16:56       ` Josh Poimboeuf
  2019-04-25  8:12         ` Raphael Gault
  0 siblings, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-24 16:56 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
> >> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
> >> index 0feb3ae3af5d..8b293eae2b38 100644
> >> --- a/tools/objtool/arch/arm64/decode.c
> >> +++ b/tools/objtool/arch/arm64/decode.c
> >> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
> >>   return addend;
> >>   }
> >>
> >> +/*
> >> + * In order to know if we are in presence of a sibling
> >> + * call and not in presence of a switch table we look
> >> + * back at the previous instructions and see if we are
> >> + * jumping inside the same function that we are already
> >> + * in.
> >> + */
> >> +bool arch_is_insn_sibling_call(struct instruction *insn)
> >> +{
> >> +struct instruction *prev;
> >> +struct list_head *l;
> >> +struct symbol *sym;
> >> +list_for_each_prev(l, &insn->list) {
> >> +prev = (void *)l;
> >> +if (!prev->func
> >> +|| prev->func->pfunc != insn->func->pfunc)
> >> +return false;
> >> +if (prev->stack_op.src.reg != ADR_SOURCE)
> >> +continue;
> >> +sym = find_symbol_containing(insn->sec, insn->immediate);
> >> +if (!sym || sym->type != STT_FUNC
> >> +|| sym->pfunc != insn->func->pfunc)
> >> +return true;
> >> +break;
> >> +}
> >> +return true;
> >> +}
> >
> > I get the feeling there might be a better way to do this, but I can't
> > figure out what this function is actually doing.  It looks like it
> > searches backwards in the function for an instruction which has
> > stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
> > it do anything with the instruction after it finds it?
> >
> 
> I will indeed try to make it better.

I still don't quite get what it's trying to accomplish, but I wonder if
there's some kind of tracking you can add in validate_branch() to keep
track of whatever you're looking for, leading up to the indirect jump.

> >> -hash_add(file->insn_hash, &insn->hash, insn->offset);
> >> +/*
> >> + * For arm64 architecture, we sometime split instructions so that
> >> + * we can track the state evolution (i.e. load/store of pairs of registers).
> >> + * We thus need to take both into account and not erase the previous ones.
> >> + */
> >
> > Ew...  Is this an architectural thing, or just a quirk of the arm64
> > decoder?
> >
> 
> The motivation for this is to simulate the two consecutive operations
> that would be executed on x86 but are done in one on arm64. This is
> strictly a decoder related quirk. I don't know if there is a better way
> to do it without modifying the struct op_src and struct instruction.

Ah.  Which ops are those?  Hopefully we can find a better way to
represent that with a single instruction.  Adding fake instructions is
fragile.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-24 16:56       ` Josh Poimboeuf
@ 2019-04-25  8:12         ` Raphael Gault
  2019-04-25  8:33           ` Peter Zijlstra
  2019-04-25 16:25           ` Josh Poimboeuf
  0 siblings, 2 replies; 36+ messages in thread
From: Raphael Gault @ 2019-04-25  8:12 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

Hi Josh,

On 4/24/19 5:56 PM, Josh Poimboeuf wrote:
> On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
>>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
>>>> index 0feb3ae3af5d..8b293eae2b38 100644
>>>> --- a/tools/objtool/arch/arm64/decode.c
>>>> +++ b/tools/objtool/arch/arm64/decode.c
>>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>>>    return addend;
>>>>    }
>>>>
>>>> +/*
>>>> + * In order to know if we are in presence of a sibling
>>>> + * call and not in presence of a switch table we look
>>>> + * back at the previous instructions and see if we are
>>>> + * jumping inside the same function that we are already
>>>> + * in.
>>>> + */
>>>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>>>> +{
>>>> +struct instruction *prev;
>>>> +struct list_head *l;
>>>> +struct symbol *sym;
>>>> +list_for_each_prev(l, &insn->list) {
>>>> +prev = (void *)l;
>>>> +if (!prev->func
>>>> +|| prev->func->pfunc != insn->func->pfunc)
>>>> +return false;
>>>> +if (prev->stack_op.src.reg != ADR_SOURCE)
>>>> +continue;
>>>> +sym = find_symbol_containing(insn->sec, insn->immediate);
>>>> +if (!sym || sym->type != STT_FUNC
>>>> +|| sym->pfunc != insn->func->pfunc)
>>>> +return true;
>>>> +break;
>>>> +}
>>>> +return true;
>>>> +}
>>>
>>> I get the feeling there might be a better way to do this, but I can't
>>> figure out what this function is actually doing.  It looks like it
>>> searches backwards in the function for an instruction which has
>>> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
>>> it do anything with the instruction after it finds it?
>>>
>>
>> I will indeed try to make it better.
>
> I still don't quite get what it's trying to accomplish, but I wonder if
> there's some kind of tracking you can add in validate_branch() to keep
> track of whatever you're looking for, leading up to the indirect jump.
>

The motivation behind this is that the `br <Xn>` instruction is a
dynamic jump (jump to the address contained in the provided register).
This instruction is used for sibling calls but can also be used for
switch table. I use this to differentiate these two cases from one another:

Generally the `adr/adrp` instruction is used prior to `br` in order to
load the address into the register. What I do here is go back throught
the instructions and try to identify if the address loaded.

I also thought of implementing some sort of tracking in validate branch
because it could be useful for identifying the switch tables as well.
But it seemed to me like a major change in the sementic of this tool:
indeed, from my perspective I would have to track the state of the
registers and I don't know if we want to do that.

>>>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
>>>> +/*
>>>> + * For arm64 architecture, we sometime split instructions so that
>>>> + * we can track the state evolution (i.e. load/store of pairs of registers).
>>>> + * We thus need to take both into account and not erase the previous ones.
>>>> + */
>>>
>>> Ew...  Is this an architectural thing, or just a quirk of the arm64
>>> decoder?
>>>
>>
>> The motivation for this is to simulate the two consecutive operations
>> that would be executed on x86 but are done in one on arm64. This is
>> strictly a decoder related quirk. I don't know if there is a better way
>> to do it without modifying the struct op_src and struct instruction.
>
> Ah.  Which ops are those?  Hopefully we can find a better way to
> represent that with a single instruction.  Adding fake instructions is
> fragile.
>

Those are the load/store of pairs of registers, mainly stp/ldp. Those
are often use in the function prologues/epilogues to save/restore the
stack pointers and frame pointers however it can be used with any
register pair.

The idea to add a new instruction could work but I would need to extend
the `struct op_src` as well I think.

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-25  8:12         ` Raphael Gault
@ 2019-04-25  8:33           ` Peter Zijlstra
  2019-04-25 16:25           ` Josh Poimboeuf
  1 sibling, 0 replies; 36+ messages in thread
From: Peter Zijlstra @ 2019-04-25  8:33 UTC (permalink / raw)
  To: Raphael Gault
  Cc: Josh Poimboeuf, linux-kernel, linux-arm-kernel, Catalin Marinas,
	Will Deacon, Julien Thierry

On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote:
> The motivation behind this is that the `br <Xn>` instruction is a
> dynamic jump (jump to the address contained in the provided register).
> This instruction is used for sibling calls but can also be used for
> switch table. I use this to differentiate these two cases from one another:
> 
> Generally the `adr/adrp` instruction is used prior to `br` in order to
> load the address into the register. What I do here is go back throught
> the instructions and try to identify if the address loaded.

Yikes, be very careful with simply going back on the instruction stream.

The problem case would be something like:

	adr	...
	b	1f

	...

	b	...
1:	br	...

In that case, simply going backwards from 1 will not yield the desired
result.

At some point I did a pass storing all the fwd jumps in their
destination and used that to 'rewind' the instruction stream, but that
wasn't very pretty either.

The best way might be to, while doing validate_branch() keep a state
table of all most recent ADR(P) instructions and when encountering BR
check that state to see what it really is.

We currently don't do dynamic insn->type, but it shouldn't be too hard
(famous last words of course).

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-25  8:12         ` Raphael Gault
  2019-04-25  8:33           ` Peter Zijlstra
@ 2019-04-25 16:25           ` Josh Poimboeuf
  2019-04-30 12:20             ` Raphael Gault
  1 sibling, 1 reply; 36+ messages in thread
From: Josh Poimboeuf @ 2019-04-25 16:25 UTC (permalink / raw)
  To: Raphael Gault
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote:
> Hi Josh,
> 
> On 4/24/19 5:56 PM, Josh Poimboeuf wrote:
> > On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
> >>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
> >>>> index 0feb3ae3af5d..8b293eae2b38 100644
> >>>> --- a/tools/objtool/arch/arm64/decode.c
> >>>> +++ b/tools/objtool/arch/arm64/decode.c
> >>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
> >>>>    return addend;
> >>>>    }
> >>>>
> >>>> +/*
> >>>> + * In order to know if we are in presence of a sibling
> >>>> + * call and not in presence of a switch table we look
> >>>> + * back at the previous instructions and see if we are
> >>>> + * jumping inside the same function that we are already
> >>>> + * in.
> >>>> + */
> >>>> +bool arch_is_insn_sibling_call(struct instruction *insn)
> >>>> +{
> >>>> +struct instruction *prev;
> >>>> +struct list_head *l;
> >>>> +struct symbol *sym;
> >>>> +list_for_each_prev(l, &insn->list) {
> >>>> +prev = (void *)l;
> >>>> +if (!prev->func
> >>>> +|| prev->func->pfunc != insn->func->pfunc)
> >>>> +return false;
> >>>> +if (prev->stack_op.src.reg != ADR_SOURCE)
> >>>> +continue;
> >>>> +sym = find_symbol_containing(insn->sec, insn->immediate);
> >>>> +if (!sym || sym->type != STT_FUNC
> >>>> +|| sym->pfunc != insn->func->pfunc)
> >>>> +return true;
> >>>> +break;
> >>>> +}
> >>>> +return true;
> >>>> +}
> >>>
> >>> I get the feeling there might be a better way to do this, but I can't
> >>> figure out what this function is actually doing.  It looks like it
> >>> searches backwards in the function for an instruction which has
> >>> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
> >>> it do anything with the instruction after it finds it?
> >>>
> >>
> >> I will indeed try to make it better.
> >
> > I still don't quite get what it's trying to accomplish, but I wonder if
> > there's some kind of tracking you can add in validate_branch() to keep
> > track of whatever you're looking for, leading up to the indirect jump.
> >
> 
> The motivation behind this is that the `br <Xn>` instruction is a
> dynamic jump (jump to the address contained in the provided register).
> This instruction is used for sibling calls but can also be used for
> switch table. I use this to differentiate these two cases from one another:
> 
> Generally the `adr/adrp` instruction is used prior to `br` in order to
> load the address into the register. What I do here is go back throught
> the instructions and try to identify if the address loaded.
> 
> I also thought of implementing some sort of tracking in validate branch
> because it could be useful for identifying the switch tables as well.
> But it seemed to me like a major change in the sementic of this tool:
> indeed, from my perspective I would have to track the state of the
> registers and I don't know if we want to do that.

I don't have much time to look at this today (and I'll be out next
week), but we had a similar problem in x86.  See the comments above
find_switch_table(), particularly #3.  Does that function not work for
the arm64 case?

> >>>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
> >>>> +/*
> >>>> + * For arm64 architecture, we sometime split instructions so that
> >>>> + * we can track the state evolution (i.e. load/store of pairs of registers).
> >>>> + * We thus need to take both into account and not erase the previous ones.
> >>>> + */
> >>>
> >>> Ew...  Is this an architectural thing, or just a quirk of the arm64
> >>> decoder?
> >>>
> >>
> >> The motivation for this is to simulate the two consecutive operations
> >> that would be executed on x86 but are done in one on arm64. This is
> >> strictly a decoder related quirk. I don't know if there is a better way
> >> to do it without modifying the struct op_src and struct instruction.
> >
> > Ah.  Which ops are those?  Hopefully we can find a better way to
> > represent that with a single instruction.  Adding fake instructions is
> > fragile.
> >
> 
> Those are the load/store of pairs of registers, mainly stp/ldp. Those
> are often use in the function prologues/epilogues to save/restore the
> stack pointers and frame pointers however it can be used with any
> register pair.
> 
> The idea to add a new instruction could work but I would need to extend
> the `struct op_src` as well I think.

Again I don't have much time to look at it, but I do think that changing
op_src/dest to allow for the stp/ldp instructions would work better than
inserting a fake instruction to emulate x86.

Or another idea would be to associate multiple stack_ops with a single
instruction.

-- 
Josh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-25 16:25           ` Josh Poimboeuf
@ 2019-04-30 12:20             ` Raphael Gault
  2019-05-01 15:09               ` Raphael Gault
  0 siblings, 1 reply; 36+ messages in thread
From: Raphael Gault @ 2019-04-30 12:20 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: linux-kernel, linux-arm-kernel, peterz, Catalin Marinas,
	Will Deacon, Julien Thierry

Hi Josh,

On 4/25/19 5:25 PM, Josh Poimboeuf wrote:
> On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote:
>> Hi Josh,
>>
>> On 4/24/19 5:56 PM, Josh Poimboeuf wrote:
>>> On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
>>>>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
>>>>>> index 0feb3ae3af5d..8b293eae2b38 100644
>>>>>> --- a/tools/objtool/arch/arm64/decode.c
>>>>>> +++ b/tools/objtool/arch/arm64/decode.c
>>>>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>>>>>     return addend;
>>>>>>     }
>>>>>>
>>>>>> +/*
>>>>>> + * In order to know if we are in presence of a sibling
>>>>>> + * call and not in presence of a switch table we look
>>>>>> + * back at the previous instructions and see if we are
>>>>>> + * jumping inside the same function that we are already
>>>>>> + * in.
>>>>>> + */
>>>>>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>>>>>> +{
>>>>>> +struct instruction *prev;
>>>>>> +struct list_head *l;
>>>>>> +struct symbol *sym;
>>>>>> +list_for_each_prev(l, &insn->list) {
>>>>>> +prev = (void *)l;
>>>>>> +if (!prev->func
>>>>>> +|| prev->func->pfunc != insn->func->pfunc)
>>>>>> +return false;
>>>>>> +if (prev->stack_op.src.reg != ADR_SOURCE)
>>>>>> +continue;
>>>>>> +sym = find_symbol_containing(insn->sec, insn->immediate);
>>>>>> +if (!sym || sym->type != STT_FUNC
>>>>>> +|| sym->pfunc != insn->func->pfunc)
>>>>>> +return true;
>>>>>> +break;
>>>>>> +}
>>>>>> +return true;
>>>>>> +}
>>>>>
>>>>> I get the feeling there might be a better way to do this, but I can't
>>>>> figure out what this function is actually doing.  It looks like it
>>>>> searches backwards in the function for an instruction which has
>>>>> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
>>>>> it do anything with the instruction after it finds it?
>>>>>
>>>>
>>>> I will indeed try to make it better.
>>>
>>> I still don't quite get what it's trying to accomplish, but I wonder if
>>> there's some kind of tracking you can add in validate_branch() to keep
>>> track of whatever you're looking for, leading up to the indirect jump.
>>>
>>
>> The motivation behind this is that the `br <Xn>` instruction is a
>> dynamic jump (jump to the address contained in the provided register).
>> This instruction is used for sibling calls but can also be used for
>> switch table. I use this to differentiate these two cases from one another:
>>
>> Generally the `adr/adrp` instruction is used prior to `br` in order to
>> load the address into the register. What I do here is go back throught
>> the instructions and try to identify if the address loaded.
>>
>> I also thought of implementing some sort of tracking in validate branch
>> because it could be useful for identifying the switch tables as well.
>> But it seemed to me like a major change in the sementic of this tool:
>> indeed, from my perspective I would have to track the state of the
>> registers and I don't know if we want to do that.
>
> I don't have much time to look at this today (and I'll be out next
> week), but we had a similar problem in x86.  See the comments above
> find_switch_table(), particularly #3.  Does that function not work for
> the arm64 case?
>

Honestly, I don't have a full understanding of how the switch tables are
handled on arm64. All I know is that I've identified a case in which it
doesn't work (and I get an unreachable instruction warning).
When trying to figure out how the switch tables work on arm64 and how
objtool is retrieving them (on x86 at least) I realised that you look
for 2 relocations :
- One from (.rela).text which refers to the .rodata section
- One from (.rela).rodata which refers somewhere else.
On the case I identified the second relocation doesn't exist thus the
function doesn't find the switch table.

Again since I do not have a good understanding about this I am not able
to say if it is a corner case or not.

>>>>>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
>>>>>> +/*
>>>>>> + * For arm64 architecture, we sometime split instructions so that
>>>>>> + * we can track the state evolution (i.e. load/store of pairs of registers).
>>>>>> + * We thus need to take both into account and not erase the previous ones.
>>>>>> + */
>>>>>
>>>>> Ew...  Is this an architectural thing, or just a quirk of the arm64
>>>>> decoder?
>>>>>
>>>>
>>>> The motivation for this is to simulate the two consecutive operations
>>>> that would be executed on x86 but are done in one on arm64. This is
>>>> strictly a decoder related quirk. I don't know if there is a better way
>>>> to do it without modifying the struct op_src and struct instruction.
>>>
>>> Ah.  Which ops are those?  Hopefully we can find a better way to
>>> represent that with a single instruction.  Adding fake instructions is
>>> fragile.
>>>
>>
>> Those are the load/store of pairs of registers, mainly stp/ldp. Those
>> are often use in the function prologues/epilogues to save/restore the
>> stack pointers and frame pointers however it can be used with any
>> register pair.
>>
>> The idea to add a new instruction could work but I would need to extend
>> the `struct op_src` as well I think.
>
> Again I don't have much time to look at it, but I do think that changing
> op_src/dest to allow for the stp/ldp instructions would work better than
> inserting a fake instruction to emulate x86.
>
> Or another idea would be to associate multiple stack_ops with a single
> instruction.
>

I haven't looked at it in depth yet but I will try to figure out a good
way to represent those instructions on a more proper manner.

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture
  2019-04-30 12:20             ` Raphael Gault
@ 2019-05-01 15:09               ` Raphael Gault
  0 siblings, 0 replies; 36+ messages in thread
From: Raphael Gault @ 2019-05-01 15:09 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Julien Thierry, peterz, Catalin Marinas, Will Deacon,
	linux-kernel, linux-arm-kernel

Hi Josh,

On 4/30/19 1:20 PM, Raphael Gault wrote:
> Hi Josh,
>
> On 4/25/19 5:25 PM, Josh Poimboeuf wrote:
>> On Thu, Apr 25, 2019 at 08:12:24AM +0000, Raphael Gault wrote:
>>> Hi Josh,
>>>
>>> On 4/24/19 5:56 PM, Josh Poimboeuf wrote:
>>>> On Wed, Apr 24, 2019 at 04:32:44PM +0000, Raphael Gault wrote:
>>>>>>> diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
>>>>>>> index 0feb3ae3af5d..8b293eae2b38 100644
>>>>>>> --- a/tools/objtool/arch/arm64/decode.c
>>>>>>> +++ b/tools/objtool/arch/arm64/decode.c
>>>>>>> @@ -105,6 +105,33 @@ unsigned long arch_compute_rela_sym_offset(int addend)
>>>>>>>      return addend;
>>>>>>>      }
>>>>>>>
>>>>>>> +/*
>>>>>>> + * In order to know if we are in presence of a sibling
>>>>>>> + * call and not in presence of a switch table we look
>>>>>>> + * back at the previous instructions and see if we are
>>>>>>> + * jumping inside the same function that we are already
>>>>>>> + * in.
>>>>>>> + */
>>>>>>> +bool arch_is_insn_sibling_call(struct instruction *insn)
>>>>>>> +{
>>>>>>> +struct instruction *prev;
>>>>>>> +struct list_head *l;
>>>>>>> +struct symbol *sym;
>>>>>>> +list_for_each_prev(l, &insn->list) {
>>>>>>> +prev = (void *)l;
>>>>>>> +if (!prev->func
>>>>>>> +|| prev->func->pfunc != insn->func->pfunc)
>>>>>>> +return false;
>>>>>>> +if (prev->stack_op.src.reg != ADR_SOURCE)
>>>>>>> +continue;
>>>>>>> +sym = find_symbol_containing(insn->sec, insn->immediate);
>>>>>>> +if (!sym || sym->type != STT_FUNC
>>>>>>> +|| sym->pfunc != insn->func->pfunc)
>>>>>>> +return true;
>>>>>>> +break;
>>>>>>> +}
>>>>>>> +return true;
>>>>>>> +}
>>>>>>
>>>>>> I get the feeling there might be a better way to do this, but I can't
>>>>>> figure out what this function is actually doing.  It looks like it
>>>>>> searches backwards in the function for an instruction which has
>>>>>> stack_op.src.reg != ADR_SOURCE -- what does that mean?  And why doesn't
>>>>>> it do anything with the instruction after it finds it?
>>>>>>
>>>>>
>>>>> I will indeed try to make it better.
>>>>
>>>> I still don't quite get what it's trying to accomplish, but I wonder if
>>>> there's some kind of tracking you can add in validate_branch() to keep
>>>> track of whatever you're looking for, leading up to the indirect jump.
>>>>
>>>
>>> The motivation behind this is that the `br <Xn>` instruction is a
>>> dynamic jump (jump to the address contained in the provided register).
>>> This instruction is used for sibling calls but can also be used for
>>> switch table. I use this to differentiate these two cases from one another:
>>>
>>> Generally the `adr/adrp` instruction is used prior to `br` in order to
>>> load the address into the register. What I do here is go back throught
>>> the instructions and try to identify if the address loaded.
>>>
>>> I also thought of implementing some sort of tracking in validate branch
>>> because it could be useful for identifying the switch tables as well.
>>> But it seemed to me like a major change in the sementic of this tool:
>>> indeed, from my perspective I would have to track the state of the
>>> registers and I don't know if we want to do that.
>>
>> I don't have much time to look at this today (and I'll be out next
>> week), but we had a similar problem in x86.  See the comments above
>> find_switch_table(), particularly #3.  Does that function not work for
>> the arm64 case?
>>
>
> Honestly, I don't have a full understanding of how the switch tables are
> handled on arm64. All I know is that I've identified a case in which it
> doesn't work (and I get an unreachable instruction warning).
> When trying to figure out how the switch tables work on arm64 and how
> objtool is retrieving them (on x86 at least) I realised that you look
> for 2 relocations :
> - One from (.rela).text which refers to the .rodata section
> - One from (.rela).rodata which refers somewhere else.
> On the case I identified the second relocation doesn't exist thus the
> function doesn't find the switch table.
>
> Again since I do not have a good understanding about this I am not able
> to say if it is a corner case or not.
>
>>>>>>> -hash_add(file->insn_hash, &insn->hash, insn->offset);
>>>>>>> +/*
>>>>>>> + * For arm64 architecture, we sometime split instructions so that
>>>>>>> + * we can track the state evolution (i.e. load/store of pairs of registers).
>>>>>>> + * We thus need to take both into account and not erase the previous ones.
>>>>>>> + */
>>>>>>
>>>>>> Ew...  Is this an architectural thing, or just a quirk of the arm64
>>>>>> decoder?
>>>>>>
>>>>>
>>>>> The motivation for this is to simulate the two consecutive operations
>>>>> that would be executed on x86 but are done in one on arm64. This is
>>>>> strictly a decoder related quirk. I don't know if there is a better way
>>>>> to do it without modifying the struct op_src and struct instruction.
>>>>
>>>> Ah.  Which ops are those?  Hopefully we can find a better way to
>>>> represent that with a single instruction.  Adding fake instructions is
>>>> fragile.
>>>>
>>>
>>> Those are the load/store of pairs of registers, mainly stp/ldp. Those
>>> are often use in the function prologues/epilogues to save/restore the
>>> stack pointers and frame pointers however it can be used with any
>>> register pair.
>>>
>>> The idea to add a new instruction could work but I would need to extend
>>> the `struct op_src` as well I think.
>>
>> Again I don't have much time to look at it, but I do think that changing
>> op_src/dest to allow for the stp/ldp instructions would work better than
>> inserting a fake instruction to emulate x86.
>>
>> Or another idea would be to associate multiple stack_ops with a single
>> instruction.
>>
>
> I haven't looked at it in depth yet but I will try to figure out a good
> way to represent those instructions on a more proper manner.

I wanted to get your thoughts on the solution I found without waiting
for v2. If it's too much trouble reviewing it now I'll wait for the v2.

I added a field to the struct stack_op in order to have access to an
extra register. This way I can provide the extra register to the instruction
in order to use it later.

diff --git a/tools/objtool/arch.h b/tools/objtool/arch.h
index 52599ebd89fb..ae9ae25b3bdc 100644
--- a/tools/objtool/arch.h
+++ b/tools/objtool/arch.h
@@ -51,6 +51,7 @@ struct op_dest {
  int offset;
  };

+
  enum op_src_type {
  OP_SRC_REG,
  OP_SRC_REG_INDIRECT,
@@ -66,9 +67,16 @@ struct op_src {
  int offset;
  };

+struct op_extra {
+unsigned char used;
+unsigned char reg;
+int offset;
+};
+
  struct stack_op {
  struct op_dest dest;
  struct op_src src;
+struct op_extra extra;
  };

  struct instruction;
diff --git a/tools/objtool/arch/arm64/decode.c b/tools/objtool/arch/arm64/decode.c
index 17b5d59f16ad..86ad37c8397c 100644
--- a/tools/objtool/arch/arm64/decode.c
+++ b/tools/objtool/arch/arm64/decode.c
@@ -1743,23 +1673,26 @@ int arm_decode_ld_st_regs_pair_off(u32 instr, unsigned char *type,
  op->src.type = OP_SRC_REG_INDIRECT;
  op->src.reg = CFI_SP;
  op->src.offset = 0;
-state.curr_offset = 8;
  op->dest.type = OP_DEST_REG;
-op->dest.reg = state.regs[0];
+op->dest.reg = rt;
  op->dest.offset = 0;
+op->extra.used = 1;
+op->extra.reg = rt2;
+op->extra.offset = 8;
  break;
  default:
  op->dest.type = OP_DEST_REG_INDIRECT;
  op->dest.reg = CFI_SP;
  op->dest.offset = 8;
-state.curr_offset = 0;
  op->src.type = OP_SRC_REG;
-op->src.reg = state.regs[1];
+op->src.reg = rt2;
  op->src.offset = 0;
+op->extra.used = 1;
+op->extra.reg = rt;
+op->extra.offset = 0;
  /* store */
  }
-state.op = *op;
-return INSN_COMPOSED;
+return 0;
  }

  int arm_decode_ld_st_regs_pair_post(u32 instr, unsigned char *type,
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 5ddb25414de5..df1fb6ce1e8f 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -1580,6 +1569,18 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
      initial_func_cfi.cfa.base == CFI_CFA)
  cfa->base = CFI_CFA;

+if (op->extra.used) {
+if (regs[op->extra.reg].offset == -state->stack_size)
+restore_reg(state, op->extra.reg);
+state->stack_size -= 8;
+if (cfa->base == CFI_SP)
+cfa->offset -= 8;
+if (cfa->base == CFI_SP &&
+    cfa->offset == 0 &&
+    initial_func_cfi.cfa.base == CFI_CFA)
+cfa->base = CFI_CFA;
+}
+
  break;

  case OP_SRC_REG_INDIRECT:
@@ -1598,12 +1599,22 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
  /* drap: mov disp(%rbp), %reg */
  restore_reg(state, op->dest.reg);

+if (op->extra.used &&
+    op->src.reg == CFI_BP &&
+    op->extra.offset == regs[op->extra.reg].offset)
+restore_reg(state, op->extra.reg);
+
  } else if (op->src.reg == cfa->base &&
      op->src.offset == regs[op->dest.reg].offset + cfa->offset) {

  /* mov disp(%rbp), %reg */
  /* mov disp(%rsp), %reg */
  restore_reg(state, op->dest.reg);
+
+if (op->extra.used &&
+    op->src.reg == cfa->base &&
+    op->extra.offset == regs[op->extra.reg].offset + cfa->offset)
+restore_reg(state, op->extra.reg);
  }

  break;
@@ -1653,6 +1664,21 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
  save_reg(state, op->src.reg, CFI_CFA, -state->stack_size);
  }

+if (op->extra.used) {
+state->stack_size += 8;
+if (cfa->base == CFI_CFA)
+cfa->base = CFI_SP;
+if (cfa->base == CFI_SP)
+cfa->offset += 8;
+if (!state->drap ||
+    (!(op->extra.reg == cfa->base &&
+       op->extra.reg == state->drap_reg) &&
+     !(op->extra.reg == CFI_BP &&
+       cfa->base == state->drap_reg) &&
+     regs[op->extra.reg].base == CFI_UNDEFINED))
+save_reg(state, op->extra.reg, CFI_CFA,
+ -state->stack_size);
+}
  /* detect when asm code uses rbp as a scratch register */
  if (!no_fp && insn->func && op->src.reg == CFI_BP &&
      cfa->base != CFI_BP)
@@ -1671,11 +1697,19 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
  /* save drap offset so we know when to restore it */
  state->drap_offset = op->dest.offset;
  }
+if (op->extra.used && op->extra.reg == cfa->base &&
+    op->extra.reg == state->drap_reg) {
+cfa->base = CFI_BP_INDIRECT;
+cfa->offset = op->extra.offset;
+}

  else if (regs[op->src.reg].base == CFI_UNDEFINED) {

  /* drap: mov reg, disp(%rbp) */
  save_reg(state, op->src.reg, CFI_BP, op->dest.offset);
+if (op->extra.used)
+save_reg(state, op->extra.reg, CFI_BP,
+ op->extra.offset);
  }

  } else if (op->dest.reg == cfa->base) {
@@ -1684,8 +1718,12 @@ static int update_insn_state(struct instruction *insn, struct insn_state *state)
  /* mov reg, disp(%rsp) */
  save_reg(state, op->src.reg, CFI_CFA,
   op->dest.offset - state->cfa.offset);
+if (op->extra.used)
+save_reg(state, op->extra.reg, CFI_CFA,
+ op->extra.offset - state->cfa.offset);
  }

+
  break;

  case OP_DEST_LEAVE:

Thanks,

--
Raphael Gault
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-05-01 15:09 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-09 13:52 [PATCH 0/6] objtool: Add support for Arm64 Raphael Gault
2019-04-09 13:52 ` [RFC 1/6] objtool: Refactor code to make it more suitable for multiple architecture support Raphael Gault
2019-04-23 20:13   ` Josh Poimboeuf
2019-04-24 16:11     ` Raphael Gault
2019-04-24 16:17       ` Josh Poimboeuf
2019-04-09 13:52 ` [RFC 2/6] objtool: arm64: Add required implementation for supporting the aarch64 architecture in objtool Raphael Gault
2019-04-09 16:20   ` Peter Zijlstra
2019-04-23 20:18   ` Josh Poimboeuf
2019-04-24 16:16     ` Raphael Gault
2019-04-24 16:23       ` Josh Poimboeuf
2019-04-09 13:52 ` [RFC 3/6] objtool: arm64: Adapt the stack frame checks and the section analysis for the arm architecture Raphael Gault
2019-04-09 16:12   ` Peter Zijlstra
2019-04-09 16:24     ` Mark Rutland
2019-04-09 16:27       ` Julien Thierry
2019-04-09 16:33         ` Raphaël Gault
2019-04-23 20:36   ` Josh Poimboeuf
2019-04-24 16:32     ` Raphael Gault
2019-04-24 16:56       ` Josh Poimboeuf
2019-04-25  8:12         ` Raphael Gault
2019-04-25  8:33           ` Peter Zijlstra
2019-04-25 16:25           ` Josh Poimboeuf
2019-04-30 12:20             ` Raphael Gault
2019-05-01 15:09               ` Raphael Gault
2019-04-24 10:36   ` Julien Thierry
2019-04-09 13:52 ` [RFC 4/6] arm64: assembler: Add macro to annotate asm function having non standard stack-frame Raphael Gault
2019-04-24 10:44   ` Julien Thierry
2019-04-09 13:52 ` [RFC 5/6] arm64: sleep: Add stack frame setup for __cpu_supsend_enter Raphael Gault
2019-04-23 20:37   ` Josh Poimboeuf
2019-04-09 13:52 ` [RFC 6/6] objtool: arm64: Enable stack validation for arm64 Raphael Gault
2019-04-09 14:57 ` [PATCH 0/6] objtool: Add support for Arm64 Josh Poimboeuf
2019-04-09 17:43 ` Ard Biesheuvel
2019-04-10  3:37   ` Josh Poimboeuf
2019-04-10  7:20     ` Julien Thierry
2019-04-23 21:09 ` Josh Poimboeuf
2019-04-24 16:08   ` Raphael Gault
2019-04-24 16:14     ` Josh Poimboeuf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).