All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] ARM/arm64: module-plt: split core and init PLT sections
@ 2017-02-21 22:12 Ard Biesheuvel
  2017-02-21 22:12 ` [PATCH v2 1/2] ARM: module: " Ard Biesheuvel
  2017-02-21 22:12 ` [PATCH v2 2/2] arm64: " Ard Biesheuvel
  0 siblings, 2 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2017-02-21 22:12 UTC (permalink / raw)
  To: linux-arm-kernel

This fixes a thinko on my part in both the ARM and the arm64 implementations
of module PLTs. What I failed to realise is that the core module sections and
the init sections are allocated independently, which means they could end up
further away from each other than the range of a branch instruction. This
implies that they cannot share a single array of PLT entries, and so this
series splits them into core and init PLT sections. To complicate matters
further, this also means that .init.text code may require PLT entries to branch
into the .text section of the same module.

For ARM, this is actually a revert of commit 35fa91eed817 ("ARM: kernel:
merge core and init PLTs") [although the patch in this series is not a
straight revert.] This means that the issue is a regression, and the patch
should probably go to -stable.

For arm64, the likelihood of this issue ever occurring is very small, due
to the 128 MB range of its branch instructions. Also, the arm64 version of
the code was never correct, so it is not a regression. But for correctness,
it is fixed in the same way as for ARM.

Build tested on ARM+arm64, and runtime tested on arm64.

v2: deal with init PLT entries referring to .text symbols of the same module
    fix warning in ARM code

Ard Biesheuvel (2):
  ARM: module: split core and init PLT sections
  arm64: module: split core and init PLT sections

 arch/arm/include/asm/module.h   |   9 +-
 arch/arm/kernel/module-plts.c   |  87 +++++++++++-----
 arch/arm/kernel/module.lds      |   1 +
 arch/arm64/include/asm/module.h |   9 +-
 arch/arm64/kernel/module-plts.c | 108 ++++++++++++--------
 arch/arm64/kernel/module.c      |   2 +-
 arch/arm64/kernel/module.lds    |   1 +
 7 files changed, 140 insertions(+), 77 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] ARM: module: split core and init PLT sections
  2017-02-21 22:12 [PATCH v2 0/2] ARM/arm64: module-plt: split core and init PLT sections Ard Biesheuvel
@ 2017-02-21 22:12 ` Ard Biesheuvel
  2017-02-22 11:29   ` Angus Clark
  2017-04-25 22:30   ` [v2,1/2] " Florian Fainelli
  2017-02-21 22:12 ` [PATCH v2 2/2] arm64: " Ard Biesheuvel
  1 sibling, 2 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2017-02-21 22:12 UTC (permalink / raw)
  To: linux-arm-kernel

Since commit 35fa91eed817 ("ARM: kernel: merge core and init PLTs"),
the ARM module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.

However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This puts the PLT entries out of reach of the relocated branch
instructions, defeating the whole purpose of PLTs.

So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section.

Fixes: 35fa91eed817 ("ARM: kernel: merge core and init PLTs")
Reported-by: Angus Clark <angus@angusclark.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/include/asm/module.h |  9 +-
 arch/arm/kernel/module-plts.c | 87 ++++++++++++++------
 arch/arm/kernel/module.lds    |  1 +
 3 files changed, 68 insertions(+), 29 deletions(-)

diff --git a/arch/arm/include/asm/module.h b/arch/arm/include/asm/module.h
index 464748b9fd7d..ed2319663a1e 100644
--- a/arch/arm/include/asm/module.h
+++ b/arch/arm/include/asm/module.h
@@ -18,13 +18,18 @@ enum {
 };
 #endif
 
+struct mod_plt_sec {
+	struct elf32_shdr	*plt;
+	int			plt_count;
+};
+
 struct mod_arch_specific {
 #ifdef CONFIG_ARM_UNWIND
 	struct unwind_table *unwind[ARM_SEC_MAX];
 #endif
 #ifdef CONFIG_ARM_MODULE_PLTS
-	struct elf32_shdr   *plt;
-	int		    plt_count;
+	struct mod_plt_sec	core;
+	struct mod_plt_sec	init;
 #endif
 };
 
diff --git a/arch/arm/kernel/module-plts.c b/arch/arm/kernel/module-plts.c
index 3a5cba90c971..3d0c2e4dda1d 100644
--- a/arch/arm/kernel/module-plts.c
+++ b/arch/arm/kernel/module-plts.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2014-2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -31,9 +31,17 @@ struct plt_entries {
 	u32	lit[PLT_ENT_COUNT];
 };
 
+static bool in_init(const struct module *mod, unsigned long loc)
+{
+	return loc - (u32)mod->init_layout.base < mod->init_layout.size;
+}
+
 u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
 {
-	struct plt_entries *plt = (struct plt_entries *)mod->arch.plt->sh_addr;
+	struct mod_plt_sec *pltsec = !in_init(mod, loc) ? &mod->arch.core :
+							  &mod->arch.init;
+
+	struct plt_entries *plt = (struct plt_entries *)pltsec->plt->sh_addr;
 	int idx = 0;
 
 	/*
@@ -41,9 +49,9 @@ u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
 	 * relocations are sorted, this will be the last entry we allocated.
 	 * (if one exists).
 	 */
-	if (mod->arch.plt_count > 0) {
-		plt += (mod->arch.plt_count - 1) / PLT_ENT_COUNT;
-		idx = (mod->arch.plt_count - 1) % PLT_ENT_COUNT;
+	if (pltsec->plt_count > 0) {
+		plt += (pltsec->plt_count - 1) / PLT_ENT_COUNT;
+		idx = (pltsec->plt_count - 1) % PLT_ENT_COUNT;
 
 		if (plt->lit[idx] == val)
 			return (u32)&plt->ldr[idx];
@@ -53,8 +61,8 @@ u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
 			plt++;
 	}
 
-	mod->arch.plt_count++;
-	BUG_ON(mod->arch.plt_count * PLT_ENT_SIZE > mod->arch.plt->sh_size);
+	pltsec->plt_count++;
+	BUG_ON(pltsec->plt_count * PLT_ENT_SIZE > pltsec->plt->sh_size);
 
 	if (!idx)
 		/* Populate a new set of entries */
@@ -129,7 +137,7 @@ static bool duplicate_rel(Elf32_Addr base, const Elf32_Rel *rel, int num)
 
 /* Count how many PLT entries we may need */
 static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
-			       const Elf32_Rel *rel, int num)
+			       const Elf32_Rel *rel, int num, Elf32_Word dstidx)
 {
 	unsigned int ret = 0;
 	const Elf32_Sym *s;
@@ -144,13 +152,17 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
 		case R_ARM_THM_JUMP24:
 			/*
 			 * We only have to consider branch targets that resolve
-			 * to undefined symbols. This is not simply a heuristic,
-			 * it is a fundamental limitation, since the PLT itself
-			 * is part of the module, and needs to be within range
-			 * as well, so modules can never grow beyond that limit.
+			 * to symbols that are defined in a different section.
+			 * This is not simply a heuristic, it is a fundamental
+			 * limitation, since there is no guaranteed way to emit
+			 * PLT entries sufficiently close to the branch if the
+			 * section size exceeds the range of a branch
+			 * instruction. So ignore relocations against defined
+			 * symbols if they live in the same section as the
+			 * relocation target.
 			 */
 			s = syms + ELF32_R_SYM(rel[i].r_info);
-			if (s->st_shndx != SHN_UNDEF)
+			if (s->st_shndx == dstidx)
 				break;
 
 			/*
@@ -161,7 +173,12 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
 			 * So we need to support them, but there is no need to
 			 * take them into consideration when trying to optimize
 			 * this code. So let's only check for duplicates when
-			 * the addend is zero.
+			 * the addend is zero. (Note that calls into the core
+			 * module via init PLT entries could involve section
+			 * relative symbol references with non-zero addends, for
+			 * which we may end up emitting duplicates, but the init
+			 * PLT is released along with the rest of the .init
+			 * region as soon as module loading completes.)
 			 */
 			if (!is_zero_addend_relocation(base, rel + i) ||
 			    !duplicate_rel(base, rel, i))
@@ -174,7 +191,8 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
 int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 			      char *secstrings, struct module *mod)
 {
-	unsigned long plts = 0;
+	unsigned long core_plts = 0;
+	unsigned long init_plts = 0;
 	Elf32_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
 	Elf32_Sym *syms = NULL;
 
@@ -184,13 +202,15 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 	 */
 	for (s = sechdrs; s < sechdrs_end; ++s) {
 		if (strcmp(".plt", secstrings + s->sh_name) == 0)
-			mod->arch.plt = s;
+			mod->arch.core.plt = s;
+		else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
+			mod->arch.init.plt = s;
 		else if (s->sh_type == SHT_SYMTAB)
 			syms = (Elf32_Sym *)s->sh_addr;
 	}
 
-	if (!mod->arch.plt) {
-		pr_err("%s: module PLT section missing\n", mod->name);
+	if (!mod->arch.core.plt || !mod->arch.init.plt) {
+		pr_err("%s: module PLT section(s) missing\n", mod->name);
 		return -ENOEXEC;
 	}
 	if (!syms) {
@@ -213,16 +233,29 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 		/* sort by type and symbol index */
 		sort(rels, numrels, sizeof(Elf32_Rel), cmp_rel, NULL);
 
-		plts += count_plts(syms, dstsec->sh_addr, rels, numrels);
+		if (strncmp(secstrings + dstsec->sh_name, ".init", 5) != 0)
+			core_plts += count_plts(syms, dstsec->sh_addr, rels,
+						numrels, s->sh_info);
+		else
+			init_plts += count_plts(syms, dstsec->sh_addr, rels,
+						numrels, s->sh_info);
 	}
 
-	mod->arch.plt->sh_type = SHT_NOBITS;
-	mod->arch.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
-	mod->arch.plt->sh_addralign = L1_CACHE_BYTES;
-	mod->arch.plt->sh_size = round_up(plts * PLT_ENT_SIZE,
-					  sizeof(struct plt_entries));
-	mod->arch.plt_count = 0;
-
-	pr_debug("%s: plt=%x\n", __func__, mod->arch.plt->sh_size);
+	mod->arch.core.plt->sh_type = SHT_NOBITS;
+	mod->arch.core.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.core.plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core.plt->sh_size = round_up(core_plts * PLT_ENT_SIZE,
+					       sizeof(struct plt_entries));
+	mod->arch.core.plt_count = 0;
+
+	mod->arch.init.plt->sh_type = SHT_NOBITS;
+	mod->arch.init.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.init.plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.init.plt->sh_size = round_up(init_plts * PLT_ENT_SIZE,
+					       sizeof(struct plt_entries));
+	mod->arch.init.plt_count = 0;
+
+	pr_debug("%s: plt=%x, init.plt=%x\n", __func__,
+		 mod->arch.core.plt->sh_size, mod->arch.init.plt->sh_size);
 	return 0;
 }
diff --git a/arch/arm/kernel/module.lds b/arch/arm/kernel/module.lds
index 05881e2b414c..eacb5c67f61e 100644
--- a/arch/arm/kernel/module.lds
+++ b/arch/arm/kernel/module.lds
@@ -1,3 +1,4 @@
 SECTIONS {
 	.plt : { BYTE(0) }
+	.init.plt : { BYTE(0) }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] arm64: module: split core and init PLT sections
  2017-02-21 22:12 [PATCH v2 0/2] ARM/arm64: module-plt: split core and init PLT sections Ard Biesheuvel
  2017-02-21 22:12 ` [PATCH v2 1/2] ARM: module: " Ard Biesheuvel
@ 2017-02-21 22:12 ` Ard Biesheuvel
  2017-04-26 11:32   ` Catalin Marinas
  1 sibling, 1 reply; 8+ messages in thread
From: Ard Biesheuvel @ 2017-02-21 22:12 UTC (permalink / raw)
  To: linux-arm-kernel

The arm64 module PLT code allocates all PLT entries in a single core
section, since the overhead of having a separate init PLT section is
not justified by the small number of PLT entries usually required for
init code.

However, the core and init module regions are allocated independently,
and there is a corner case where the core region may be allocated from
the VMALLOC region if the dedicated module region is exhausted, but the
init region, being much smaller, can still be allocated from the module
region. This leads to relocation failures if the distance between those
regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
occur on arm64, but the issue has been observed on ARM, whose module
region is much smaller).

So split the core and init PLT regions, and name the latter ".init.plt"
so it gets allocated along with (and sufficiently close to) the .init
sections that it serves. Also, given that init PLT entries may need to
be emitted for branches that target the core module, modify the logic
that disregards defined symbols to only disregard symbols that are
defined in the same section as the relocated branch instruction.

Since there may now be two PLT entries associated with each entry in
the symbol table, we can no longer hijack the symbol::st_size fields
to record the addresses of PLT entries as we emit them for zero-addend
relocations. So instead, perform an explicit comparison to check for
duplicate entries.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/include/asm/module.h |   9 +-
 arch/arm64/kernel/module-plts.c | 108 ++++++++++++--------
 arch/arm64/kernel/module.c      |   2 +-
 arch/arm64/kernel/module.lds    |   1 +
 4 files changed, 72 insertions(+), 48 deletions(-)

diff --git a/arch/arm64/include/asm/module.h b/arch/arm64/include/asm/module.h
index 06ff7fd9e81f..b6c6fa29fe56 100644
--- a/arch/arm64/include/asm/module.h
+++ b/arch/arm64/include/asm/module.h
@@ -22,14 +22,19 @@
 #define MODULE_ARCH_VERMAGIC	"aarch64"
 
 #ifdef CONFIG_ARM64_MODULE_PLTS
-struct mod_arch_specific {
+struct mod_plt_sec {
 	struct elf64_shdr	*plt;
 	int			plt_num_entries;
 	int			plt_max_entries;
 };
+
+struct mod_arch_specific {
+	struct mod_plt_sec	core;
+	struct mod_plt_sec	init;
+};
 #endif
 
-u64 module_emit_plt_entry(struct module *mod, const Elf64_Rela *rela,
+u64 module_emit_plt_entry(struct module *mod, void *loc, const Elf64_Rela *rela,
 			  Elf64_Sym *sym);
 
 #ifdef CONFIG_RANDOMIZE_BASE
diff --git a/arch/arm64/kernel/module-plts.c b/arch/arm64/kernel/module-plts.c
index 1ce90d8450ae..d05dbe658409 100644
--- a/arch/arm64/kernel/module-plts.c
+++ b/arch/arm64/kernel/module-plts.c
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2014-2016 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ * Copyright (C) 2014-2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -26,35 +26,21 @@ struct plt_entry {
 	__le32	br;	/* br	x16				*/
 };
 
-u64 module_emit_plt_entry(struct module *mod, const Elf64_Rela *rela,
+static bool in_init(const struct module *mod, void *loc)
+{
+	return (u64)loc - (u64)mod->init_layout.base < mod->init_layout.size;
+}
+
+u64 module_emit_plt_entry(struct module *mod, void *loc, const Elf64_Rela *rela,
 			  Elf64_Sym *sym)
 {
-	struct plt_entry *plt = (struct plt_entry *)mod->arch.plt->sh_addr;
-	int i = mod->arch.plt_num_entries;
+	struct mod_plt_sec *pltsec = !in_init(mod, loc) ? &mod->arch.core :
+							  &mod->arch.init;
+	struct plt_entry *plt = (struct plt_entry *)pltsec->plt->sh_addr;
+	int i = pltsec->plt_num_entries;
 	u64 val = sym->st_value + rela->r_addend;
 
 	/*
-	 * We only emit PLT entries against undefined (SHN_UNDEF) symbols,
-	 * which are listed in the ELF symtab section, but without a type
-	 * or a size.
-	 * So, similar to how the module loader uses the Elf64_Sym::st_value
-	 * field to store the resolved addresses of undefined symbols, let's
-	 * borrow the Elf64_Sym::st_size field (whose value is never used by
-	 * the module loader, even for symbols that are defined) to record
-	 * the address of a symbol's associated PLT entry as we emit it for a
-	 * zero addend relocation (which is the only kind we have to deal with
-	 * in practice). This allows us to find duplicates without having to
-	 * go through the table every time.
-	 */
-	if (rela->r_addend == 0 && sym->st_size != 0) {
-		BUG_ON(sym->st_size < (u64)plt || sym->st_size >= (u64)&plt[i]);
-		return sym->st_size;
-	}
-
-	mod->arch.plt_num_entries++;
-	BUG_ON(mod->arch.plt_num_entries > mod->arch.plt_max_entries);
-
-	/*
 	 * MOVK/MOVN/MOVZ opcode:
 	 * +--------+------------+--------+-----------+-------------+---------+
 	 * | sf[31] | opc[30:29] | 100101 | hw[22:21] | imm16[20:5] | Rd[4:0] |
@@ -72,8 +58,19 @@ u64 module_emit_plt_entry(struct module *mod, const Elf64_Rela *rela,
 		cpu_to_le32(0xd61f0200)
 	};
 
-	if (rela->r_addend == 0)
-		sym->st_size = (u64)&plt[i];
+	/*
+	 * Check if the entry we just created is a duplicate. Given that the
+	 * relocations are sorted, this will be the last entry we allocated.
+	 * (if one exists).
+	 */
+	if (i > 0 &&
+	    plt[i].mov0 == plt[i - 1].mov0 &&
+	    plt[i].mov1 == plt[i - 1].mov1 &&
+	    plt[i].mov2 == plt[i - 1].mov2)
+		return (u64)&plt[i - 1];
+
+	pltsec->plt_num_entries++;
+	BUG_ON(pltsec->plt_num_entries > pltsec->plt_max_entries);
 
 	return (u64)&plt[i];
 }
@@ -104,7 +101,8 @@ static bool duplicate_rel(const Elf64_Rela *rela, int num)
 	return num > 0 && cmp_rela(rela + num, rela + num - 1) == 0;
 }
 
-static unsigned int count_plts(Elf64_Sym *syms, Elf64_Rela *rela, int num)
+static unsigned int count_plts(Elf64_Sym *syms, Elf64_Rela *rela, int num,
+			       Elf64_Word dstidx)
 {
 	unsigned int ret = 0;
 	Elf64_Sym *s;
@@ -116,13 +114,17 @@ static unsigned int count_plts(Elf64_Sym *syms, Elf64_Rela *rela, int num)
 		case R_AARCH64_CALL26:
 			/*
 			 * We only have to consider branch targets that resolve
-			 * to undefined symbols. This is not simply a heuristic,
-			 * it is a fundamental limitation, since the PLT itself
-			 * is part of the module, and needs to be within 128 MB
-			 * as well, so modules can never grow beyond that limit.
+			 * to symbols that are defined in a different section.
+			 * This is not simply a heuristic, it is a fundamental
+			 * limitation, since there is no guaranteed way to emit
+			 * PLT entries sufficiently close to the branch if the
+			 * section size exceeds the range of a branch
+			 * instruction. So ignore relocations against defined
+			 * symbols if they live in the same section as the
+			 * relocation target.
 			 */
 			s = syms + ELF64_R_SYM(rela[i].r_info);
-			if (s->st_shndx != SHN_UNDEF)
+			if (s->st_shndx == dstidx)
 				break;
 
 			/*
@@ -149,7 +151,8 @@ static unsigned int count_plts(Elf64_Sym *syms, Elf64_Rela *rela, int num)
 int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 			      char *secstrings, struct module *mod)
 {
-	unsigned long plt_max_entries = 0;
+	unsigned long core_plts = 0;
+	unsigned long init_plts = 0;
 	Elf64_Sym *syms = NULL;
 	int i;
 
@@ -158,14 +161,16 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 	 * entries. Record the symtab address as well.
 	 */
 	for (i = 0; i < ehdr->e_shnum; i++) {
-		if (strcmp(".plt", secstrings + sechdrs[i].sh_name) == 0)
-			mod->arch.plt = sechdrs + i;
+		if (!strcmp(secstrings + sechdrs[i].sh_name, ".plt"))
+			mod->arch.core.plt = sechdrs + i;
+		else if (!strcmp(secstrings + sechdrs[i].sh_name, ".init.plt"))
+			mod->arch.init.plt = sechdrs + i;
 		else if (sechdrs[i].sh_type == SHT_SYMTAB)
 			syms = (Elf64_Sym *)sechdrs[i].sh_addr;
 	}
 
-	if (!mod->arch.plt) {
-		pr_err("%s: module PLT section missing\n", mod->name);
+	if (!mod->arch.core.plt || !mod->arch.init.plt) {
+		pr_err("%s: module PLT section(s) missing\n", mod->name);
 		return -ENOEXEC;
 	}
 	if (!syms) {
@@ -188,14 +193,27 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
 		/* sort by type, symbol index and addend */
 		sort(rels, numrels, sizeof(Elf64_Rela), cmp_rela, NULL);
 
-		plt_max_entries += count_plts(syms, rels, numrels);
+		if (strncmp(secstrings + dstsec->sh_name, ".init", 5) != 0)
+			core_plts += count_plts(syms, rels, numrels,
+						sechdrs[i].sh_info);
+		else
+			init_plts += count_plts(syms, rels, numrels,
+						sechdrs[i].sh_info);
 	}
 
-	mod->arch.plt->sh_type = SHT_NOBITS;
-	mod->arch.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
-	mod->arch.plt->sh_addralign = L1_CACHE_BYTES;
-	mod->arch.plt->sh_size = plt_max_entries * sizeof(struct plt_entry);
-	mod->arch.plt_num_entries = 0;
-	mod->arch.plt_max_entries = plt_max_entries;
+	mod->arch.core.plt->sh_type = SHT_NOBITS;
+	mod->arch.core.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.core.plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.core.plt->sh_size = (core_plts  + 1) * sizeof(struct plt_entry);
+	mod->arch.core.plt_num_entries = 0;
+	mod->arch.core.plt_max_entries = core_plts;
+
+	mod->arch.init.plt->sh_type = SHT_NOBITS;
+	mod->arch.init.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
+	mod->arch.init.plt->sh_addralign = L1_CACHE_BYTES;
+	mod->arch.init.plt->sh_size = (init_plts + 1) * sizeof(struct plt_entry);
+	mod->arch.init.plt_num_entries = 0;
+	mod->arch.init.plt_max_entries = init_plts;
+
 	return 0;
 }
diff --git a/arch/arm64/kernel/module.c b/arch/arm64/kernel/module.c
index 7f316982ce00..c9a2ab446dc6 100644
--- a/arch/arm64/kernel/module.c
+++ b/arch/arm64/kernel/module.c
@@ -380,7 +380,7 @@ int apply_relocate_add(Elf64_Shdr *sechdrs,
 
 			if (IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) &&
 			    ovf == -ERANGE) {
-				val = module_emit_plt_entry(me, &rel[i], sym);
+				val = module_emit_plt_entry(me, loc, &rel[i], sym);
 				ovf = reloc_insn_imm(RELOC_OP_PREL, loc, val, 2,
 						     26, AARCH64_INSN_IMM_26);
 			}
diff --git a/arch/arm64/kernel/module.lds b/arch/arm64/kernel/module.lds
index 8949f6c6f729..f7c9781a9d48 100644
--- a/arch/arm64/kernel/module.lds
+++ b/arch/arm64/kernel/module.lds
@@ -1,3 +1,4 @@
 SECTIONS {
 	.plt (NOLOAD) : { BYTE(0) }
+	.init.plt (NOLOAD) : { BYTE(0) }
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] ARM: module: split core and init PLT sections
  2017-02-21 22:12 ` [PATCH v2 1/2] ARM: module: " Ard Biesheuvel
@ 2017-02-22 11:29   ` Angus Clark
  2017-02-22 11:30     ` Ard Biesheuvel
  2017-04-25 22:30   ` [v2,1/2] " Florian Fainelli
  1 sibling, 1 reply; 8+ messages in thread
From: Angus Clark @ 2017-02-22 11:29 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ard,

Thanks.  I can confirm v2 works fine on my setup (with a minor change
to backport to a 4.1 kernel).

Cheers,

Angus

On 21 February 2017 at 22:12, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> Since commit 35fa91eed817 ("ARM: kernel: merge core and init PLTs"),
> the ARM module PLT code allocates all PLT entries in a single core
> section, since the overhead of having a separate init PLT section is
> not justified by the small number of PLT entries usually required for
> init code.
>
> However, the core and init module regions are allocated independently,
> and there is a corner case where the core region may be allocated from
> the VMALLOC region if the dedicated module region is exhausted, but the
> init region, being much smaller, can still be allocated from the module
> region. This puts the PLT entries out of reach of the relocated branch
> instructions, defeating the whole purpose of PLTs.
>
> So split the core and init PLT regions, and name the latter ".init.plt"
> so it gets allocated along with (and sufficiently close to) the .init
> sections that it serves. Also, given that init PLT entries may need to
> be emitted for branches that target the core module, modify the logic
> that disregards defined symbols to only disregard symbols that are
> defined in the same section.
>
> Fixes: 35fa91eed817 ("ARM: kernel: merge core and init PLTs")
> Reported-by: Angus Clark <angus@angusclark.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm/include/asm/module.h |  9 +-
>  arch/arm/kernel/module-plts.c | 87 ++++++++++++++------
>  arch/arm/kernel/module.lds    |  1 +
>  3 files changed, 68 insertions(+), 29 deletions(-)
>
> diff --git a/arch/arm/include/asm/module.h b/arch/arm/include/asm/module.h
> index 464748b9fd7d..ed2319663a1e 100644
> --- a/arch/arm/include/asm/module.h
> +++ b/arch/arm/include/asm/module.h
> @@ -18,13 +18,18 @@ enum {
>  };
>  #endif
>
> +struct mod_plt_sec {
> +       struct elf32_shdr       *plt;
> +       int                     plt_count;
> +};
> +
>  struct mod_arch_specific {
>  #ifdef CONFIG_ARM_UNWIND
>         struct unwind_table *unwind[ARM_SEC_MAX];
>  #endif
>  #ifdef CONFIG_ARM_MODULE_PLTS
> -       struct elf32_shdr   *plt;
> -       int                 plt_count;
> +       struct mod_plt_sec      core;
> +       struct mod_plt_sec      init;
>  #endif
>  };
>
> diff --git a/arch/arm/kernel/module-plts.c b/arch/arm/kernel/module-plts.c
> index 3a5cba90c971..3d0c2e4dda1d 100644
> --- a/arch/arm/kernel/module-plts.c
> +++ b/arch/arm/kernel/module-plts.c
> @@ -1,5 +1,5 @@
>  /*
> - * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
> + * Copyright (C) 2014-2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
>   *
>   * This program is free software; you can redistribute it and/or modify
>   * it under the terms of the GNU General Public License version 2 as
> @@ -31,9 +31,17 @@ struct plt_entries {
>         u32     lit[PLT_ENT_COUNT];
>  };
>
> +static bool in_init(const struct module *mod, unsigned long loc)
> +{
> +       return loc - (u32)mod->init_layout.base < mod->init_layout.size;
> +}
> +
>  u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
>  {
> -       struct plt_entries *plt = (struct plt_entries *)mod->arch.plt->sh_addr;
> +       struct mod_plt_sec *pltsec = !in_init(mod, loc) ? &mod->arch.core :
> +                                                         &mod->arch.init;
> +
> +       struct plt_entries *plt = (struct plt_entries *)pltsec->plt->sh_addr;
>         int idx = 0;
>
>         /*
> @@ -41,9 +49,9 @@ u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
>          * relocations are sorted, this will be the last entry we allocated.
>          * (if one exists).
>          */
> -       if (mod->arch.plt_count > 0) {
> -               plt += (mod->arch.plt_count - 1) / PLT_ENT_COUNT;
> -               idx = (mod->arch.plt_count - 1) % PLT_ENT_COUNT;
> +       if (pltsec->plt_count > 0) {
> +               plt += (pltsec->plt_count - 1) / PLT_ENT_COUNT;
> +               idx = (pltsec->plt_count - 1) % PLT_ENT_COUNT;
>
>                 if (plt->lit[idx] == val)
>                         return (u32)&plt->ldr[idx];
> @@ -53,8 +61,8 @@ u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
>                         plt++;
>         }
>
> -       mod->arch.plt_count++;
> -       BUG_ON(mod->arch.plt_count * PLT_ENT_SIZE > mod->arch.plt->sh_size);
> +       pltsec->plt_count++;
> +       BUG_ON(pltsec->plt_count * PLT_ENT_SIZE > pltsec->plt->sh_size);
>
>         if (!idx)
>                 /* Populate a new set of entries */
> @@ -129,7 +137,7 @@ static bool duplicate_rel(Elf32_Addr base, const Elf32_Rel *rel, int num)
>
>  /* Count how many PLT entries we may need */
>  static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
> -                              const Elf32_Rel *rel, int num)
> +                              const Elf32_Rel *rel, int num, Elf32_Word dstidx)
>  {
>         unsigned int ret = 0;
>         const Elf32_Sym *s;
> @@ -144,13 +152,17 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
>                 case R_ARM_THM_JUMP24:
>                         /*
>                          * We only have to consider branch targets that resolve
> -                        * to undefined symbols. This is not simply a heuristic,
> -                        * it is a fundamental limitation, since the PLT itself
> -                        * is part of the module, and needs to be within range
> -                        * as well, so modules can never grow beyond that limit.
> +                        * to symbols that are defined in a different section.
> +                        * This is not simply a heuristic, it is a fundamental
> +                        * limitation, since there is no guaranteed way to emit
> +                        * PLT entries sufficiently close to the branch if the
> +                        * section size exceeds the range of a branch
> +                        * instruction. So ignore relocations against defined
> +                        * symbols if they live in the same section as the
> +                        * relocation target.
>                          */
>                         s = syms + ELF32_R_SYM(rel[i].r_info);
> -                       if (s->st_shndx != SHN_UNDEF)
> +                       if (s->st_shndx == dstidx)
>                                 break;
>
>                         /*
> @@ -161,7 +173,12 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
>                          * So we need to support them, but there is no need to
>                          * take them into consideration when trying to optimize
>                          * this code. So let's only check for duplicates when
> -                        * the addend is zero.
> +                        * the addend is zero. (Note that calls into the core
> +                        * module via init PLT entries could involve section
> +                        * relative symbol references with non-zero addends, for
> +                        * which we may end up emitting duplicates, but the init
> +                        * PLT is released along with the rest of the .init
> +                        * region as soon as module loading completes.)
>                          */
>                         if (!is_zero_addend_relocation(base, rel + i) ||
>                             !duplicate_rel(base, rel, i))
> @@ -174,7 +191,8 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
>  int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>                               char *secstrings, struct module *mod)
>  {
> -       unsigned long plts = 0;
> +       unsigned long core_plts = 0;
> +       unsigned long init_plts = 0;
>         Elf32_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
>         Elf32_Sym *syms = NULL;
>
> @@ -184,13 +202,15 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>          */
>         for (s = sechdrs; s < sechdrs_end; ++s) {
>                 if (strcmp(".plt", secstrings + s->sh_name) == 0)
> -                       mod->arch.plt = s;
> +                       mod->arch.core.plt = s;
> +               else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
> +                       mod->arch.init.plt = s;
>                 else if (s->sh_type == SHT_SYMTAB)
>                         syms = (Elf32_Sym *)s->sh_addr;
>         }
>
> -       if (!mod->arch.plt) {
> -               pr_err("%s: module PLT section missing\n", mod->name);
> +       if (!mod->arch.core.plt || !mod->arch.init.plt) {
> +               pr_err("%s: module PLT section(s) missing\n", mod->name);
>                 return -ENOEXEC;
>         }
>         if (!syms) {
> @@ -213,16 +233,29 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>                 /* sort by type and symbol index */
>                 sort(rels, numrels, sizeof(Elf32_Rel), cmp_rel, NULL);
>
> -               plts += count_plts(syms, dstsec->sh_addr, rels, numrels);
> +               if (strncmp(secstrings + dstsec->sh_name, ".init", 5) != 0)
> +                       core_plts += count_plts(syms, dstsec->sh_addr, rels,
> +                                               numrels, s->sh_info);
> +               else
> +                       init_plts += count_plts(syms, dstsec->sh_addr, rels,
> +                                               numrels, s->sh_info);
>         }
>
> -       mod->arch.plt->sh_type = SHT_NOBITS;
> -       mod->arch.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> -       mod->arch.plt->sh_addralign = L1_CACHE_BYTES;
> -       mod->arch.plt->sh_size = round_up(plts * PLT_ENT_SIZE,
> -                                         sizeof(struct plt_entries));
> -       mod->arch.plt_count = 0;
> -
> -       pr_debug("%s: plt=%x\n", __func__, mod->arch.plt->sh_size);
> +       mod->arch.core.plt->sh_type = SHT_NOBITS;
> +       mod->arch.core.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> +       mod->arch.core.plt->sh_addralign = L1_CACHE_BYTES;
> +       mod->arch.core.plt->sh_size = round_up(core_plts * PLT_ENT_SIZE,
> +                                              sizeof(struct plt_entries));
> +       mod->arch.core.plt_count = 0;
> +
> +       mod->arch.init.plt->sh_type = SHT_NOBITS;
> +       mod->arch.init.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
> +       mod->arch.init.plt->sh_addralign = L1_CACHE_BYTES;
> +       mod->arch.init.plt->sh_size = round_up(init_plts * PLT_ENT_SIZE,
> +                                              sizeof(struct plt_entries));
> +       mod->arch.init.plt_count = 0;
> +
> +       pr_debug("%s: plt=%x, init.plt=%x\n", __func__,
> +                mod->arch.core.plt->sh_size, mod->arch.init.plt->sh_size);
>         return 0;
>  }
> diff --git a/arch/arm/kernel/module.lds b/arch/arm/kernel/module.lds
> index 05881e2b414c..eacb5c67f61e 100644
> --- a/arch/arm/kernel/module.lds
> +++ b/arch/arm/kernel/module.lds
> @@ -1,3 +1,4 @@
>  SECTIONS {
>         .plt : { BYTE(0) }
> +       .init.plt : { BYTE(0) }
>  }
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/2] ARM: module: split core and init PLT sections
  2017-02-22 11:29   ` Angus Clark
@ 2017-02-22 11:30     ` Ard Biesheuvel
  0 siblings, 0 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2017-02-22 11:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 22 February 2017 at 11:29, Angus Clark <angus@angusclark.org> wrote:
> Hi Ard,
>
> Thanks.  I can confirm v2 works fine on my setup (with a minor change
> to backport to a 4.1 kernel).
>

Thanks Angus. I will add your Tested-by

-- 
Ard.


> On 21 February 2017 at 22:12, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
>> Since commit 35fa91eed817 ("ARM: kernel: merge core and init PLTs"),
>> the ARM module PLT code allocates all PLT entries in a single core
>> section, since the overhead of having a separate init PLT section is
>> not justified by the small number of PLT entries usually required for
>> init code.
>>
>> However, the core and init module regions are allocated independently,
>> and there is a corner case where the core region may be allocated from
>> the VMALLOC region if the dedicated module region is exhausted, but the
>> init region, being much smaller, can still be allocated from the module
>> region. This puts the PLT entries out of reach of the relocated branch
>> instructions, defeating the whole purpose of PLTs.
>>
>> So split the core and init PLT regions, and name the latter ".init.plt"
>> so it gets allocated along with (and sufficiently close to) the .init
>> sections that it serves. Also, given that init PLT entries may need to
>> be emitted for branches that target the core module, modify the logic
>> that disregards defined symbols to only disregard symbols that are
>> defined in the same section.
>>
>> Fixes: 35fa91eed817 ("ARM: kernel: merge core and init PLTs")
>> Reported-by: Angus Clark <angus@angusclark.org>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>> ---
>>  arch/arm/include/asm/module.h |  9 +-
>>  arch/arm/kernel/module-plts.c | 87 ++++++++++++++------
>>  arch/arm/kernel/module.lds    |  1 +
>>  3 files changed, 68 insertions(+), 29 deletions(-)
>>
>> diff --git a/arch/arm/include/asm/module.h b/arch/arm/include/asm/module.h
>> index 464748b9fd7d..ed2319663a1e 100644
>> --- a/arch/arm/include/asm/module.h
>> +++ b/arch/arm/include/asm/module.h
>> @@ -18,13 +18,18 @@ enum {
>>  };
>>  #endif
>>
>> +struct mod_plt_sec {
>> +       struct elf32_shdr       *plt;
>> +       int                     plt_count;
>> +};
>> +
>>  struct mod_arch_specific {
>>  #ifdef CONFIG_ARM_UNWIND
>>         struct unwind_table *unwind[ARM_SEC_MAX];
>>  #endif
>>  #ifdef CONFIG_ARM_MODULE_PLTS
>> -       struct elf32_shdr   *plt;
>> -       int                 plt_count;
>> +       struct mod_plt_sec      core;
>> +       struct mod_plt_sec      init;
>>  #endif
>>  };
>>
>> diff --git a/arch/arm/kernel/module-plts.c b/arch/arm/kernel/module-plts.c
>> index 3a5cba90c971..3d0c2e4dda1d 100644
>> --- a/arch/arm/kernel/module-plts.c
>> +++ b/arch/arm/kernel/module-plts.c
>> @@ -1,5 +1,5 @@
>>  /*
>> - * Copyright (C) 2014 Linaro Ltd. <ard.biesheuvel@linaro.org>
>> + * Copyright (C) 2014-2017 Linaro Ltd. <ard.biesheuvel@linaro.org>
>>   *
>>   * This program is free software; you can redistribute it and/or modify
>>   * it under the terms of the GNU General Public License version 2 as
>> @@ -31,9 +31,17 @@ struct plt_entries {
>>         u32     lit[PLT_ENT_COUNT];
>>  };
>>
>> +static bool in_init(const struct module *mod, unsigned long loc)
>> +{
>> +       return loc - (u32)mod->init_layout.base < mod->init_layout.size;
>> +}
>> +
>>  u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
>>  {
>> -       struct plt_entries *plt = (struct plt_entries *)mod->arch.plt->sh_addr;
>> +       struct mod_plt_sec *pltsec = !in_init(mod, loc) ? &mod->arch.core :
>> +                                                         &mod->arch.init;
>> +
>> +       struct plt_entries *plt = (struct plt_entries *)pltsec->plt->sh_addr;
>>         int idx = 0;
>>
>>         /*
>> @@ -41,9 +49,9 @@ u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
>>          * relocations are sorted, this will be the last entry we allocated.
>>          * (if one exists).
>>          */
>> -       if (mod->arch.plt_count > 0) {
>> -               plt += (mod->arch.plt_count - 1) / PLT_ENT_COUNT;
>> -               idx = (mod->arch.plt_count - 1) % PLT_ENT_COUNT;
>> +       if (pltsec->plt_count > 0) {
>> +               plt += (pltsec->plt_count - 1) / PLT_ENT_COUNT;
>> +               idx = (pltsec->plt_count - 1) % PLT_ENT_COUNT;
>>
>>                 if (plt->lit[idx] == val)
>>                         return (u32)&plt->ldr[idx];
>> @@ -53,8 +61,8 @@ u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr val)
>>                         plt++;
>>         }
>>
>> -       mod->arch.plt_count++;
>> -       BUG_ON(mod->arch.plt_count * PLT_ENT_SIZE > mod->arch.plt->sh_size);
>> +       pltsec->plt_count++;
>> +       BUG_ON(pltsec->plt_count * PLT_ENT_SIZE > pltsec->plt->sh_size);
>>
>>         if (!idx)
>>                 /* Populate a new set of entries */
>> @@ -129,7 +137,7 @@ static bool duplicate_rel(Elf32_Addr base, const Elf32_Rel *rel, int num)
>>
>>  /* Count how many PLT entries we may need */
>>  static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
>> -                              const Elf32_Rel *rel, int num)
>> +                              const Elf32_Rel *rel, int num, Elf32_Word dstidx)
>>  {
>>         unsigned int ret = 0;
>>         const Elf32_Sym *s;
>> @@ -144,13 +152,17 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
>>                 case R_ARM_THM_JUMP24:
>>                         /*
>>                          * We only have to consider branch targets that resolve
>> -                        * to undefined symbols. This is not simply a heuristic,
>> -                        * it is a fundamental limitation, since the PLT itself
>> -                        * is part of the module, and needs to be within range
>> -                        * as well, so modules can never grow beyond that limit.
>> +                        * to symbols that are defined in a different section.
>> +                        * This is not simply a heuristic, it is a fundamental
>> +                        * limitation, since there is no guaranteed way to emit
>> +                        * PLT entries sufficiently close to the branch if the
>> +                        * section size exceeds the range of a branch
>> +                        * instruction. So ignore relocations against defined
>> +                        * symbols if they live in the same section as the
>> +                        * relocation target.
>>                          */
>>                         s = syms + ELF32_R_SYM(rel[i].r_info);
>> -                       if (s->st_shndx != SHN_UNDEF)
>> +                       if (s->st_shndx == dstidx)
>>                                 break;
>>
>>                         /*
>> @@ -161,7 +173,12 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
>>                          * So we need to support them, but there is no need to
>>                          * take them into consideration when trying to optimize
>>                          * this code. So let's only check for duplicates when
>> -                        * the addend is zero.
>> +                        * the addend is zero. (Note that calls into the core
>> +                        * module via init PLT entries could involve section
>> +                        * relative symbol references with non-zero addends, for
>> +                        * which we may end up emitting duplicates, but the init
>> +                        * PLT is released along with the rest of the .init
>> +                        * region as soon as module loading completes.)
>>                          */
>>                         if (!is_zero_addend_relocation(base, rel + i) ||
>>                             !duplicate_rel(base, rel, i))
>> @@ -174,7 +191,8 @@ static unsigned int count_plts(const Elf32_Sym *syms, Elf32_Addr base,
>>  int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>>                               char *secstrings, struct module *mod)
>>  {
>> -       unsigned long plts = 0;
>> +       unsigned long core_plts = 0;
>> +       unsigned long init_plts = 0;
>>         Elf32_Shdr *s, *sechdrs_end = sechdrs + ehdr->e_shnum;
>>         Elf32_Sym *syms = NULL;
>>
>> @@ -184,13 +202,15 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>>          */
>>         for (s = sechdrs; s < sechdrs_end; ++s) {
>>                 if (strcmp(".plt", secstrings + s->sh_name) == 0)
>> -                       mod->arch.plt = s;
>> +                       mod->arch.core.plt = s;
>> +               else if (strcmp(".init.plt", secstrings + s->sh_name) == 0)
>> +                       mod->arch.init.plt = s;
>>                 else if (s->sh_type == SHT_SYMTAB)
>>                         syms = (Elf32_Sym *)s->sh_addr;
>>         }
>>
>> -       if (!mod->arch.plt) {
>> -               pr_err("%s: module PLT section missing\n", mod->name);
>> +       if (!mod->arch.core.plt || !mod->arch.init.plt) {
>> +               pr_err("%s: module PLT section(s) missing\n", mod->name);
>>                 return -ENOEXEC;
>>         }
>>         if (!syms) {
>> @@ -213,16 +233,29 @@ int module_frob_arch_sections(Elf_Ehdr *ehdr, Elf_Shdr *sechdrs,
>>                 /* sort by type and symbol index */
>>                 sort(rels, numrels, sizeof(Elf32_Rel), cmp_rel, NULL);
>>
>> -               plts += count_plts(syms, dstsec->sh_addr, rels, numrels);
>> +               if (strncmp(secstrings + dstsec->sh_name, ".init", 5) != 0)
>> +                       core_plts += count_plts(syms, dstsec->sh_addr, rels,
>> +                                               numrels, s->sh_info);
>> +               else
>> +                       init_plts += count_plts(syms, dstsec->sh_addr, rels,
>> +                                               numrels, s->sh_info);
>>         }
>>
>> -       mod->arch.plt->sh_type = SHT_NOBITS;
>> -       mod->arch.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
>> -       mod->arch.plt->sh_addralign = L1_CACHE_BYTES;
>> -       mod->arch.plt->sh_size = round_up(plts * PLT_ENT_SIZE,
>> -                                         sizeof(struct plt_entries));
>> -       mod->arch.plt_count = 0;
>> -
>> -       pr_debug("%s: plt=%x\n", __func__, mod->arch.plt->sh_size);
>> +       mod->arch.core.plt->sh_type = SHT_NOBITS;
>> +       mod->arch.core.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
>> +       mod->arch.core.plt->sh_addralign = L1_CACHE_BYTES;
>> +       mod->arch.core.plt->sh_size = round_up(core_plts * PLT_ENT_SIZE,
>> +                                              sizeof(struct plt_entries));
>> +       mod->arch.core.plt_count = 0;
>> +
>> +       mod->arch.init.plt->sh_type = SHT_NOBITS;
>> +       mod->arch.init.plt->sh_flags = SHF_EXECINSTR | SHF_ALLOC;
>> +       mod->arch.init.plt->sh_addralign = L1_CACHE_BYTES;
>> +       mod->arch.init.plt->sh_size = round_up(init_plts * PLT_ENT_SIZE,
>> +                                              sizeof(struct plt_entries));
>> +       mod->arch.init.plt_count = 0;
>> +
>> +       pr_debug("%s: plt=%x, init.plt=%x\n", __func__,
>> +                mod->arch.core.plt->sh_size, mod->arch.init.plt->sh_size);
>>         return 0;
>>  }
>> diff --git a/arch/arm/kernel/module.lds b/arch/arm/kernel/module.lds
>> index 05881e2b414c..eacb5c67f61e 100644
>> --- a/arch/arm/kernel/module.lds
>> +++ b/arch/arm/kernel/module.lds
>> @@ -1,3 +1,4 @@
>>  SECTIONS {
>>         .plt : { BYTE(0) }
>> +       .init.plt : { BYTE(0) }
>>  }
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [v2,1/2] ARM: module: split core and init PLT sections
  2017-02-21 22:12 ` [PATCH v2 1/2] ARM: module: " Ard Biesheuvel
  2017-02-22 11:29   ` Angus Clark
@ 2017-04-25 22:30   ` Florian Fainelli
  2017-04-25 22:48     ` Ard Biesheuvel
  1 sibling, 1 reply; 8+ messages in thread
From: Florian Fainelli @ 2017-04-25 22:30 UTC (permalink / raw)
  To: linux-arm-kernel

On 02/21/2017 02:12 PM, Ard Biesheuvel wrote:
> Since commit 35fa91eed817 ("ARM: kernel: merge core and init PLTs"),
> the ARM module PLT code allocates all PLT entries in a single core
> section, since the overhead of having a separate init PLT section is
> not justified by the small number of PLT entries usually required for
> init code.
> 
> However, the core and init module regions are allocated independently,
> and there is a corner case where the core region may be allocated from
> the VMALLOC region if the dedicated module region is exhausted, but the
> init region, being much smaller, can still be allocated from the module
> region. This puts the PLT entries out of reach of the relocated branch
> instructions, defeating the whole purpose of PLTs.
> 
> So split the core and init PLT regions, and name the latter ".init.plt"
> so it gets allocated along with (and sufficiently close to) the .init
> sections that it serves. Also, given that init PLT entries may need to
> be emitted for branches that target the core module, modify the logic
> that disregards defined symbols to only disregard symbols that are
> defined in the same section.
> 
> Fixes: 35fa91eed817 ("ARM: kernel: merge core and init PLTs")
> Reported-by: Angus Clark <angus@angusclark.org>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

Ard, has this been submitted to Russell's patch tracker? If not, can you
do it?

Thanks a bunch!
-- 
Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [v2,1/2] ARM: module: split core and init PLT sections
  2017-04-25 22:30   ` [v2,1/2] " Florian Fainelli
@ 2017-04-25 22:48     ` Ard Biesheuvel
  0 siblings, 0 replies; 8+ messages in thread
From: Ard Biesheuvel @ 2017-04-25 22:48 UTC (permalink / raw)
  To: linux-arm-kernel

On 25 April 2017 at 23:30, Florian Fainelli <f.fainelli@gmail.com> wrote:
> On 02/21/2017 02:12 PM, Ard Biesheuvel wrote:
>> Since commit 35fa91eed817 ("ARM: kernel: merge core and init PLTs"),
>> the ARM module PLT code allocates all PLT entries in a single core
>> section, since the overhead of having a separate init PLT section is
>> not justified by the small number of PLT entries usually required for
>> init code.
>>
>> However, the core and init module regions are allocated independently,
>> and there is a corner case where the core region may be allocated from
>> the VMALLOC region if the dedicated module region is exhausted, but the
>> init region, being much smaller, can still be allocated from the module
>> region. This puts the PLT entries out of reach of the relocated branch
>> instructions, defeating the whole purpose of PLTs.
>>
>> So split the core and init PLT regions, and name the latter ".init.plt"
>> so it gets allocated along with (and sufficiently close to) the .init
>> sections that it serves. Also, given that init PLT entries may need to
>> be emitted for branches that target the core module, modify the logic
>> that disregards defined symbols to only disregard symbols that are
>> defined in the same section.
>>
>> Fixes: 35fa91eed817 ("ARM: kernel: merge core and init PLTs")
>> Reported-by: Angus Clark <angus@angusclark.org>
>> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
>
> Ard, has this been submitted to Russell's patch tracker? If not, can you
> do it?
>

It's already queued for v4.12:

b7ede5a1f5905ac394cc8e61712a13e3c5cb7b8f
ARM: 8662/1: module: split core and init PLT sections

Regards,
Ard.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 2/2] arm64: module: split core and init PLT sections
  2017-02-21 22:12 ` [PATCH v2 2/2] arm64: " Ard Biesheuvel
@ 2017-04-26 11:32   ` Catalin Marinas
  0 siblings, 0 replies; 8+ messages in thread
From: Catalin Marinas @ 2017-04-26 11:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 21, 2017 at 10:12:57PM +0000, Ard Biesheuvel wrote:
> The arm64 module PLT code allocates all PLT entries in a single core
> section, since the overhead of having a separate init PLT section is
> not justified by the small number of PLT entries usually required for
> init code.
> 
> However, the core and init module regions are allocated independently,
> and there is a corner case where the core region may be allocated from
> the VMALLOC region if the dedicated module region is exhausted, but the
> init region, being much smaller, can still be allocated from the module
> region. This leads to relocation failures if the distance between those
> regions exceeds 128 MB. (In fact, this corner case is highly unlikely to
> occur on arm64, but the issue has been observed on ARM, whose module
> region is much smaller).
> 
> So split the core and init PLT regions, and name the latter ".init.plt"
> so it gets allocated along with (and sufficiently close to) the .init
> sections that it serves. Also, given that init PLT entries may need to
> be emitted for branches that target the core module, modify the logic
> that disregards defined symbols to only disregard symbols that are
> defined in the same section as the relocated branch instruction.
> 
> Since there may now be two PLT entries associated with each entry in
> the symbol table, we can no longer hijack the symbol::st_size fields
> to record the addresses of PLT entries as we emit them for zero-addend
> relocations. So instead, perform an explicit comparison to check for
> duplicate entries.
> 
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>

I forgot about this patch. I'll queue it for 4.12. Thanks.

-- 
Catalin

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-26 11:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 22:12 [PATCH v2 0/2] ARM/arm64: module-plt: split core and init PLT sections Ard Biesheuvel
2017-02-21 22:12 ` [PATCH v2 1/2] ARM: module: " Ard Biesheuvel
2017-02-22 11:29   ` Angus Clark
2017-02-22 11:30     ` Ard Biesheuvel
2017-04-25 22:30   ` [v2,1/2] " Florian Fainelli
2017-04-25 22:48     ` Ard Biesheuvel
2017-02-21 22:12 ` [PATCH v2 2/2] arm64: " Ard Biesheuvel
2017-04-26 11:32   ` Catalin Marinas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.