kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Function Granular KASLR
@ 2020-05-21 16:56 Kristen Carlson Accardi
  2020-05-21 16:56 ` [PATCH v2 1/9] objtool: Do not assume order of parent/child functions Kristen Carlson Accardi
                   ` (10 more replies)
  0 siblings, 11 replies; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp
  Cc: arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi

Function Granular Kernel Address Space Layout Randomization (fgkaslr)
---------------------------------------------------------------------

This patch set is an implementation of finer grained kernel address space
randomization. It rearranges your kernel code at load time 
on a per-function level granularity, with only around a second added to
boot time.

Changes in v2:
--------------
* Fix to address i386 build failure
* Allow module reordering patch to be configured separately so that
  arm (or other non-x86_64 arches) can take advantage of module function
  reordering. This support has not be tested by me, but smoke tested by
  Ard Biesheuvel <ardb@kernel.org> on arm.
* Fix build issue when building on arm as reported by
  Ard Biesheuvel <ardb@kernel.org> 
* minor chages for certain checkpatch warnings and review feedback.

Patches to objtool are included because they are dependencies for this
patchset, however they have been submitted by their maintainer separately.

Background
----------
KASLR was merged into the kernel with the objective of increasing the
difficulty of code reuse attacks. Code reuse attacks reused existing code
snippets to get around existing memory protections. They exploit software bugs
which expose addresses of useful code snippets to control the flow of
execution for their own nefarious purposes. KASLR moves the entire kernel
code text as a unit at boot time in order to make addresses less predictable.
The order of the code within the segment is unchanged - only the base address
is shifted. There are a few shortcomings to this algorithm.

1. Low Entropy - there are only so many locations the kernel can fit in. This
   means an attacker could guess without too much trouble.
2. Knowledge of a single address can reveal the offset of the base address,
   exposing all other locations for a published/known kernel image.
3. Info leaks abound.

Finer grained ASLR has been proposed as a way to make ASLR more resistant
to info leaks. It is not a new concept at all, and there are many variations
possible. Function reordering is an implementation of finer grained ASLR
which randomizes the layout of an address space on a function level
granularity. We use the term "fgkaslr" in this document to refer to the
technique of function reordering when used with KASLR, as well as finer grained
KASLR in general.

Proposed Improvement
--------------------
This patch set proposes adding function reordering on top of the existing
KASLR base address randomization. The over-arching objective is incremental
improvement over what we already have. It is designed to work in combination
with the existing solution. The implementation is really pretty simple, and
there are 2 main area where changes occur:

* Build time

GCC has had an option to place functions into individual .text sections for
many years now. This option can be used to implement function reordering at
load time. The final compiled vmlinux retains all the section headers, which
can be used to help find the address ranges of each function. Using this
information and an expanded table of relocation addresses, individual text
sections can be suffled immediately after decompression. Some data tables
inside the kernel that have assumptions about order require re-sorting
after being updated when applying relocations. In order to modify these tables,
a few key symbols are excluded from the objcopy symbol stripping process for
use after shuffling the text segments.

Some highlights from the build time changes to look for:

The top level kernel Makefile was modified to add the gcc flag if it
is supported. Currently, I am applying this flag to everything it is
possible to randomize. Anything that is written in C and is a function is
randomized. Future work could turn off this flags for selected
files or even entire subsystems, although obviously at the cost of security.

The relocs tool is updated to add relative relocations. This information
previously wasn't included because it wasn't necessary when moving the
entire .text segment as a unit. 

A new file was created to contain a list of symbols that objcopy should
keep. We use those symbols at load time as described below.

* Load time

The boot kernel was modified to parse the vmlinux elf file after
decompression to check for our interesting symbols that we kept, and to
look for any .text.* sections to randomize. We then shuffle the sections
and update any tables that need to be updated or resorted. The existing
code which updated relocation addresses was modified to account for not
just a fixed delta from the load address, but the offset that the function
section was moved to. This requires inspection of each address to see if
it was impacted by a randomization. We use a bsearch to make this less
horrible on performance.

In order to hide our new layout, symbols reported through /proc/kallsyms
will be sorted by name alphabetically rather than by address.

Security Considerations
-----------------------
The objective of this patch set is to improve a technology that is already
merged into the kernel (KASLR). This code will not prevent all attacks,
but should instead be considered as one of several tools that can be used.
In particular, this code is meant to make KASLR more effective in the presence
of info leaks.

How much entropy we are adding to the existing entropy of standard KASLR will
depend on a few variables. Firstly and most obviously, the number of functions
that are randomized matters. This implementation keeps the existing .text
section for code that cannot be randomized - for example, because it was
assembly code. The less sections to randomize, the less entropy. In addition,
due to alignment (16 bytes for x86_64), the number of bits in a address that
the attacker needs to guess is reduced, as the lower bits are identical.

Performance Impact
------------------
There are two areas where function reordering can impact performance: boot
time latency, and run time performance.

* Boot time latency
This implementation of finer grained KASLR impacts the boot time of the kernel
in several places. It requires additional parsing of the kernel ELF file to
obtain the section headers of the sections to be randomized. It calls the
random number generator for each section to be randomized to determine that
section's new memory location. It copies the decompressed kernel into a new
area of memory to avoid corruption when laying out the newly randomized
sections. It increases the number of relocations the kernel has to perform at
boot time vs. standard KASLR, and it also requires a lookup on each address
that needs to be relocated to see if it was in a randomized section and needs
to be adjusted by a new offset. Finally, it re-sorts a few data tables that
are required to be sorted by address.

Booting a test VM on a modern, well appointed system showed an increase in
latency of approximately 1 second.

* Run time
The performance impact at run-time of function reordering varies by workload.
Using kcbench, a kernel compilation benchmark, the performance of a kernel
build with finer grained KASLR was about 1% slower than a kernel with standard
KASLR. Analysis with perf showed a slightly higher percentage of 
L1-icache-load-misses. Other workloads were examined as well, with varied
results. Some workloads performed significantly worse under FGKASLR, while
others stayed the same or were mysteriously better. In general, it will
depend on the code flow whether or not finer grained KASLR will impact
your workload, and how the underlying code was designed. Because the layout
changes per boot, each time a system is rebooted the performance of a workload
may change.

Future work could identify hot areas that may not be randomized and either
leave them in the .text section or group them together into a single section
that may be randomized. If grouping things together helps, one other thing to
consider is that if we could identify text blobs that should be grouped together
to benefit a particular code flow, it could be interesting to explore
whether this security feature could be also be used as a performance
feature if you are interested in optimizing your kernel layout for a
particular workload at boot time. Optimizing function layout for a particular
workload has been researched and proven effective - for more information
read the Facebook paper "Optimizing Function Placement for Large-Scale
Data-Center Applications" (see references section below).

Image Size
----------
Adding additional section headers as a result of compiling with
-ffunction-sections will increase the size of the vmlinux ELF file.
With a standard distro config, the resulting vmlinux was increased by
about 3%. The compressed image is also increased due to the header files,
as well as the extra relocations that must be added. You can expect fgkaslr
to increase the size of the compressed image by about 15%.

Memory Usage
------------
fgkaslr increases the amount of heap that is required at boot time,
although this extra memory is released when the kernel has finished
decompression. As a result, it may not be appropriate to use this feature on
systems without much memory.

Building
--------
To enable fine grained KASLR, you need to have the following config options
set (including all the ones you would use to build normal KASLR)

CONFIG_FG_KASLR=y

In addition, fgkaslr is only supported for the X86_64 architecture.

Modules
-------
Modules are randomized similarly to the rest of the kernel by shuffling
the sections at load time prior to moving them into memory. The module must
also have been build with the -ffunction-sections compiler option.

Although fgkaslr for the kernel is only supported for the X86_64 architecture,
it is possible to use fgkaslr with modules on other architectures. To enable
this feature, select

CONFIG_MODULE_FG_KASLR=y

This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.

Disabling
---------
Disabling normal KASLR using the nokaslr command line option also disables
fgkaslr. It is also possible to disable fgkaslr separately by booting with
fgkaslr=off on the commandline.

References
----------
There are a lot of academic papers which explore finer grained ASLR.
This paper in particular contributed the most to my implementation design
as well as my overall understanding of the problem space:

Selfrando: Securing the Tor Browser against De-anonymization Exploits,
M. Conti, S. Crane, T. Frassetto, et al.

For more information on how function layout impacts performance, see:

Optimizing Function Placement for Large-Scale Data-Center Applications,
G. Ottoni, B. Maher

Kees Cook (1):
  x86/boot: Allow a "silent" kaslr random byte fetch

Kristen Carlson Accardi (8):
  objtool: Do not assume order of parent/child functions
  x86: tools/relocs: Support >64K section headers
  x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  x86: Make sure _etext includes function sections
  x86/tools: Add relative relocs for randomized functions
  x86: Add support for function granular KASLR
  kallsyms: Hide layout
  module: Reorder functions

 Documentation/security/fgkaslr.rst       | 155 +++++
 Documentation/security/index.rst         |   1 +
 Makefile                                 |   4 +
 arch/x86/Kconfig                         |  14 +
 arch/x86/Makefile                        |   3 +
 arch/x86/boot/compressed/Makefile        |  10 +-
 arch/x86/boot/compressed/fgkaslr.c       | 823 +++++++++++++++++++++++
 arch/x86/boot/compressed/kaslr.c         |   4 -
 arch/x86/boot/compressed/misc.c          | 109 ++-
 arch/x86/boot/compressed/misc.h          |  34 +
 arch/x86/boot/compressed/utils.c         |  12 +
 arch/x86/boot/compressed/vmlinux.symbols |  17 +
 arch/x86/include/asm/boot.h              |  15 +-
 arch/x86/kernel/vmlinux.lds.S            |  18 +-
 arch/x86/lib/kaslr.c                     |  18 +-
 arch/x86/tools/relocs.c                  | 143 +++-
 arch/x86/tools/relocs.h                  |   4 +-
 arch/x86/tools/relocs_common.c           |  15 +-
 include/asm-generic/vmlinux.lds.h        |   2 +-
 include/uapi/linux/elf.h                 |   1 +
 init/Kconfig                             |  11 +
 kernel/kallsyms.c                        | 138 +++-
 kernel/module.c                          |  81 +++
 tools/objtool/elf.c                      |   8 +-
 24 files changed, 1578 insertions(+), 62 deletions(-)
 create mode 100644 Documentation/security/fgkaslr.rst
 create mode 100644 arch/x86/boot/compressed/fgkaslr.c
 create mode 100644 arch/x86/boot/compressed/utils.c
 create mode 100644 arch/x86/boot/compressed/vmlinux.symbols


base-commit: b9bbe6ed63b2b9f2c9ee5cbd0f2c946a2723f4ce
-- 
2.20.1


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 1/9] objtool: Do not assume order of parent/child functions
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 16:56 ` [PATCH v2 2/9] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, Josh Poimboeuf, Peter Zijlstra
  Cc: arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi

If a .cold function is examined prior to it's parent, the link
to the parent/child function can be overwritten when the parent
is examined. Only update pfunc and cfunc if they were previously
nil to prevent this from happening.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 tools/objtool/elf.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index c4857fa3f1d1..b998c853a1f0 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -408,7 +408,13 @@ static int read_symbols(struct elf *elf)
 			size_t pnamelen;
 			if (sym->type != STT_FUNC)
 				continue;
-			sym->pfunc = sym->cfunc = sym;
+
+			if (sym->pfunc == NULL)
+				sym->pfunc = sym;
+
+			if (sym->cfunc == NULL)
+				sym->cfunc = sym;
+
 			coldstr = strstr(sym->name, ".cold");
 			if (!coldstr)
 				continue;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 2/9] x86: tools/relocs: Support >64K section headers
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
  2020-05-21 16:56 ` [PATCH v2 1/9] objtool: Do not assume order of parent/child functions Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 16:56 ` [PATCH v2 3/9] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck

While the relocs tool already supports finding the total number
of section headers if vmlinux exceeds 64K sections, it fails to
read the extended symbol table to get section header indexes for symbols,
causing incorrect symbol table indexes to be used when there are > 64K
symbols.

Parse the elf file to read the extended symbol table info, and then
replace all direct references to st_shndx with calls to sym_index(),
which will determine whether the value can be read directly or
whether the value should be pulled out of the extended table.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/tools/relocs.c | 104 ++++++++++++++++++++++++++++++----------
 1 file changed, 78 insertions(+), 26 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..a00dc133f109 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -14,6 +14,10 @@
 static Elf_Ehdr		ehdr;
 static unsigned long	shnum;
 static unsigned int	shstrndx;
+static unsigned int	shsymtabndx;
+static unsigned int	shxsymtabndx;
+
+static int sym_index(Elf_Sym *sym);
 
 struct relocs {
 	uint32_t	*offset;
@@ -32,6 +36,7 @@ struct section {
 	Elf_Shdr       shdr;
 	struct section *link;
 	Elf_Sym        *symtab;
+	Elf32_Word     *xsymtab;
 	Elf_Rel        *reltab;
 	char           *strtab;
 };
@@ -265,7 +270,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym *sym)
 		name = sym_strtab + sym->st_name;
 	}
 	else {
-		name = sec_name(sym->st_shndx);
+		name = sec_name(sym_index(sym));
 	}
 	return name;
 }
@@ -335,6 +340,23 @@ static uint64_t elf64_to_cpu(uint64_t val)
 #define elf_xword_to_cpu(x)	elf32_to_cpu(x)
 #endif
 
+static int sym_index(Elf_Sym *sym)
+{
+	Elf_Sym *symtab = secs[shsymtabndx].symtab;
+	Elf32_Word *xsymtab = secs[shxsymtabndx].xsymtab;
+	unsigned long offset;
+	int index;
+
+	if (sym->st_shndx != SHN_XINDEX)
+		return sym->st_shndx;
+
+	/* calculate offset of sym from head of table. */
+	offset = (unsigned long) sym - (unsigned long) symtab;
+	index = offset/sizeof(*sym);
+
+	return elf32_to_cpu(xsymtab[index]);
+}
+
 static void read_ehdr(FILE *fp)
 {
 	if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1) {
@@ -468,31 +490,60 @@ static void read_strtabs(FILE *fp)
 static void read_symtabs(FILE *fp)
 {
 	int i,j;
+
 	for (i = 0; i < shnum; i++) {
 		struct section *sec = &secs[i];
-		if (sec->shdr.sh_type != SHT_SYMTAB) {
+		int num_syms;
+
+		switch (sec->shdr.sh_type) {
+		case SHT_SYMTAB_SHNDX:
+			sec->xsymtab = malloc(sec->shdr.sh_size);
+			if (!sec->xsymtab) {
+				die("malloc of %d bytes for xsymtab failed\n",
+					sec->shdr.sh_size);
+			}
+			if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+				die("Seek to %d failed: %s\n",
+					sec->shdr.sh_offset, strerror(errno));
+			}
+			if (fread(sec->xsymtab, 1, sec->shdr.sh_size, fp)
+					!= sec->shdr.sh_size) {
+				die("Cannot read extended symbol table: %s\n",
+					strerror(errno));
+			}
+			shxsymtabndx = i;
+			continue;
+
+		case SHT_SYMTAB:
+			num_syms = sec->shdr.sh_size/sizeof(Elf_Sym);
+
+			sec->symtab = malloc(sec->shdr.sh_size);
+			if (!sec->symtab) {
+				die("malloc of %d bytes for symtab failed\n",
+					sec->shdr.sh_size);
+			}
+			if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+				die("Seek to %d failed: %s\n",
+					sec->shdr.sh_offset, strerror(errno));
+			}
+			if (fread(sec->symtab, 1, sec->shdr.sh_size, fp)
+					!= sec->shdr.sh_size) {
+				die("Cannot read symbol table: %s\n",
+					strerror(errno));
+			}
+			for (j = 0; j < num_syms; j++) {
+				Elf_Sym *sym = &sec->symtab[j];
+
+				sym->st_name  = elf_word_to_cpu(sym->st_name);
+				sym->st_value = elf_addr_to_cpu(sym->st_value);
+				sym->st_size  = elf_xword_to_cpu(sym->st_size);
+				sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
+			}
+			shsymtabndx = i;
+			continue;
+
+		default:
 			continue;
-		}
-		sec->symtab = malloc(sec->shdr.sh_size);
-		if (!sec->symtab) {
-			die("malloc of %d bytes for symtab failed\n",
-				sec->shdr.sh_size);
-		}
-		if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
-			die("Seek to %d failed: %s\n",
-				sec->shdr.sh_offset, strerror(errno));
-		}
-		if (fread(sec->symtab, 1, sec->shdr.sh_size, fp)
-		    != sec->shdr.sh_size) {
-			die("Cannot read symbol table: %s\n",
-				strerror(errno));
-		}
-		for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Sym); j++) {
-			Elf_Sym *sym = &sec->symtab[j];
-			sym->st_name  = elf_word_to_cpu(sym->st_name);
-			sym->st_value = elf_addr_to_cpu(sym->st_value);
-			sym->st_size  = elf_xword_to_cpu(sym->st_size);
-			sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
 		}
 	}
 }
@@ -759,13 +810,14 @@ static void percpu_init(void)
  */
 static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 {
-	return (sym->st_shndx == per_cpu_shndx) &&
+	int shndx = sym_index(sym);
+
+	return (shndx == per_cpu_shndx) &&
 		strcmp(symname, "__init_begin") &&
 		strcmp(symname, "__per_cpu_load") &&
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
-
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
 {
@@ -1088,7 +1140,7 @@ static int do_reloc_info(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		sec_name(sec->shdr.sh_info),
 		rel_type(ELF_R_TYPE(rel->r_info)),
 		symname,
-		sec_name(sym->st_shndx));
+		sec_name(sym_index(sym)));
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 3/9] x86/boot: Allow a "silent" kaslr random byte fetch
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
  2020-05-21 16:56 ` [PATCH v2 1/9] objtool: Do not assume order of parent/child functions Kristen Carlson Accardi
  2020-05-21 16:56 ` [PATCH v2 2/9] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 16:56 ` [PATCH v2 4/9] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi

From: Kees Cook <keescook@chromium.org>

Under earlyprintk, each RNG call produces a debug report line. When
shuffling hundreds of functions, this is not useful information (each
line is identical and tells us nothing new). Instead, allow for a NULL
"purpose" to suppress the debug reporting.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/lib/kaslr.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index a53665116458..2b3eb8c948a3 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -56,11 +56,14 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	unsigned long raw, random = get_boot_seed();
 	bool use_i8254 = true;
 
-	debug_putstr(purpose);
-	debug_putstr(" KASLR using");
+	if (purpose) {
+		debug_putstr(purpose);
+		debug_putstr(" KASLR using");
+	}
 
 	if (has_cpuflag(X86_FEATURE_RDRAND)) {
-		debug_putstr(" RDRAND");
+		if (purpose)
+			debug_putstr(" RDRAND");
 		if (rdrand_long(&raw)) {
 			random ^= raw;
 			use_i8254 = false;
@@ -68,7 +71,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	}
 
 	if (has_cpuflag(X86_FEATURE_TSC)) {
-		debug_putstr(" RDTSC");
+		if (purpose)
+			debug_putstr(" RDTSC");
 		raw = rdtsc();
 
 		random ^= raw;
@@ -76,7 +80,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	}
 
 	if (use_i8254) {
-		debug_putstr(" i8254");
+		if (purpose)
+			debug_putstr(" i8254");
 		random ^= i8254();
 	}
 
@@ -86,7 +91,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	    : "a" (random), "rm" (mix_const));
 	random += raw;
 
-	debug_putstr("...\n");
+	if (purpose)
+		debug_putstr("...\n");
 
 	return random;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 4/9] x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
                   ` (2 preceding siblings ...)
  2020-05-21 16:56 ` [PATCH v2 3/9] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 20:00   ` Kees Cook
  2020-05-21 16:56 ` [PATCH v2 5/9] x86: Make sure _etext includes function sections Kristen Carlson Accardi
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, Masahiro Yamada, Michal Marek, x86,
	H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck, linux-kbuild

Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
the make file to build with -ffunction-sections if CONFIG_FG_KASLR

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 Makefile         |  4 ++++
 arch/x86/Kconfig | 13 +++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/Makefile b/Makefile
index 04f5662ae61a..28e515baa824 100644
--- a/Makefile
+++ b/Makefile
@@ -862,6 +862,10 @@ ifdef CONFIG_LIVEPATCH
 KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
 endif
 
+ifdef CONFIG_FG_KASLR
+KBUILD_CFLAGS += -ffunction-sections
+endif
+
 # arch Makefile may override CC so keep this after arch Makefile is included
 NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2d3f963fd6f1..50e83ea57d70 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2183,6 +2183,19 @@ config RANDOMIZE_BASE
 
 	  If unsure, say Y.
 
+config FG_KASLR
+	bool "Function Granular Kernel Address Space Layout Randomization"
+	depends on $(cc-option, -ffunction-sections)
+	depends on RANDOMIZE_BASE && X86_64
+	help
+	  This option improves the randomness of the kernel text
+	  over basic Kernel Address Space Layout Randomization (KASLR)
+	  by reordering the kernel text at boot time. This feature
+	  uses information generated at compile time to re-layout the
+	  kernel text section at boot time at function level granularity.
+
+	  If unsure, say N.
+
 # Relocation on x86 needs some additional build support
 config X86_NEED_RELOCS
 	def_bool y
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 5/9] x86: Make sure _etext includes function sections
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
                   ` (3 preceding siblings ...)
  2020-05-21 16:56 ` [PATCH v2 4/9] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 18:11   ` Kees Cook
  2020-05-21 16:56 ` [PATCH v2 6/9] x86/tools: Add relative relocs for randomized functions Kristen Carlson Accardi
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin, Arnd Bergmann
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck, linux-arch

When using -ffunction-sections to place each function in
it's own text section so it can be randomized at load time, the
linker considers these .text.* sections "orphaned sections", and
will place them after the first similar section (.text). In order
to accurately represent the end of the text section and the
orphaned sections, _etext must be moved so that it is after both
.text and .text.* The text size must also be calculated to
include .text AND .text.*

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/vmlinux.lds.S     | 18 +++++++++++++++++-
 include/asm-generic/vmlinux.lds.h |  2 +-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 1bf7e312361f..044f7528a2f0 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -147,8 +147,24 @@ SECTIONS
 #endif
 	} :text =0xcccc
 
-	/* End of text section, which should occupy whole number of pages */
+#ifdef CONFIG_FG_KASLR
+	/*
+	 * -ffunction-sections creates .text.* sections, which are considered
+	 * "orphan sections" and added after the first similar section (.text).
+	 * Adding this ALIGN statement causes the address of _etext
+	 * to be below that of all the .text.* orphaned sections
+	 */
+	. = ALIGN(PAGE_SIZE);
+#endif
 	_etext = .;
+
+	/*
+	 * the size of the .text section is used to calculate the address
+	 * range for orc lookups. If we just use SIZEOF(.text), we will
+	 * miss all the .text.* sections. Calculate the size using _etext
+	 * and _stext and save the value for later.
+	 */
+	text_size = _etext - _stext;
 	. = ALIGN(PAGE_SIZE);
 
 	X86_ALIGN_RODATA_BEGIN
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 71e387a5fe90..f5baee74854c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -813,7 +813,7 @@
 	. = ALIGN(4);							\
 	.orc_lookup : AT(ADDR(.orc_lookup) - LOAD_OFFSET) {		\
 		orc_lookup = .;						\
-		. += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) /	\
+		. += (((text_size + LOOKUP_BLOCK_SIZE - 1) /	\
 			LOOKUP_BLOCK_SIZE) + 1) * 4;			\
 		orc_lookup_end = .;					\
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 6/9] x86/tools: Add relative relocs for randomized functions
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
                   ` (4 preceding siblings ...)
  2020-05-21 16:56 ` [PATCH v2 5/9] x86: Make sure _etext includes function sections Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 19:54   ` Kees Cook
  2020-05-21 16:56 ` [PATCH v2 7/9] x86: Add support for function granular KASLR Kristen Carlson Accardi
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck

When reordering functions, the relative offsets for relocs that
are either in the randomized sections, or refer to the randomized
sections will need to be adjusted. Add code to detect whether a
reloc satisifies these cases, and if so, add them to the appropriate
reloc list.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/boot/compressed/Makefile |  7 +++-
 arch/x86/tools/relocs.c           | 55 ++++++++++++++++++++++++-------
 arch/x86/tools/relocs.h           |  4 +--
 arch/x86/tools/relocs_common.c    | 15 ++++++---
 4 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 5f7c262bcc99..3a5a004498de 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -117,6 +117,11 @@ $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
 	$(call if_changed,check-and-link-vmlinux)
 
 OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
+
+ifdef CONFIG_FG_KASLR
+	RELOCS_ARGS += --fg-kaslr
+endif
+
 $(obj)/vmlinux.bin: vmlinux FORCE
 	$(call if_changed,objcopy)
 
@@ -124,7 +129,7 @@ targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relo
 
 CMD_RELOCS = arch/x86/tools/relocs
 quiet_cmd_relocs = RELOCS  $@
-      cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
+      cmd_relocs = $(CMD_RELOCS) $(RELOCS_ARGS) $< > $@;$(CMD_RELOCS) $(RELOCS_ARGS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
 	$(call if_changed,relocs)
 
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index a00dc133f109..bf51ff1854ff 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -42,6 +42,8 @@ struct section {
 };
 static struct section *secs;
 
+static int fg_kaslr;
+
 static const char * const sym_regex_kernel[S_NSYMTYPES] = {
 /*
  * Following symbols have been audited. There values are constant and do
@@ -351,8 +353,8 @@ static int sym_index(Elf_Sym *sym)
 		return sym->st_shndx;
 
 	/* calculate offset of sym from head of table. */
-	offset = (unsigned long) sym - (unsigned long) symtab;
-	index = offset/sizeof(*sym);
+	offset = (unsigned long)sym - (unsigned long)symtab;
+	index = offset / sizeof(*sym);
 
 	return elf32_to_cpu(xsymtab[index]);
 }
@@ -500,22 +502,22 @@ static void read_symtabs(FILE *fp)
 			sec->xsymtab = malloc(sec->shdr.sh_size);
 			if (!sec->xsymtab) {
 				die("malloc of %d bytes for xsymtab failed\n",
-					sec->shdr.sh_size);
+				    sec->shdr.sh_size);
 			}
 			if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
 				die("Seek to %d failed: %s\n",
-					sec->shdr.sh_offset, strerror(errno));
+				    sec->shdr.sh_offset, strerror(errno));
 			}
 			if (fread(sec->xsymtab, 1, sec->shdr.sh_size, fp)
-					!= sec->shdr.sh_size) {
+			    != sec->shdr.sh_size) {
 				die("Cannot read extended symbol table: %s\n",
-					strerror(errno));
+				    strerror(errno));
 			}
 			shxsymtabndx = i;
 			continue;
 
 		case SHT_SYMTAB:
-			num_syms = sec->shdr.sh_size/sizeof(Elf_Sym);
+			num_syms = sec->shdr.sh_size / sizeof(Elf_Sym);
 
 			sec->symtab = malloc(sec->shdr.sh_size);
 			if (!sec->symtab) {
@@ -818,6 +820,32 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
+static int is_function_section(struct section *sec)
+{
+	const char *name;
+
+	if (!fg_kaslr)
+		return 0;
+
+	name = sec_name(sec->shdr.sh_info);
+
+	return(!strncmp(name, ".text.", 6));
+}
+
+static int is_randomized_sym(ElfW(Sym) *sym)
+{
+	const char *name;
+
+	if (!fg_kaslr)
+		return 0;
+
+	if (sym->st_shndx > shnum)
+		return 0;
+
+	name = sec_name(sym_index(sym));
+	return(!strncmp(name, ".text.", 6));
+}
+
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
 {
@@ -842,13 +870,17 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 	case R_X86_64_PC32:
 	case R_X86_64_PLT32:
 		/*
-		 * PC relative relocations don't need to be adjusted unless
-		 * referencing a percpu symbol.
+		 * we need to keep pc relative relocations for sections which
+		 * might be randomized, and for the percpu section.
+		 * We also need to keep relocations for any offset which might
+		 * reference an address in a section which has been randomized.
 		 *
 		 * NB: R_X86_64_PLT32 can be treated as R_X86_64_PC32.
 		 */
-		if (is_percpu_sym(sym, symname))
+		if (is_function_section(sec) || is_randomized_sym(sym) ||
+		    is_percpu_sym(sym, symname))
 			add_reloc(&relocs32neg, offset);
+
 		break;
 
 	case R_X86_64_PC64:
@@ -1158,8 +1190,9 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 	     int show_absolute_syms, int show_absolute_relocs,
-	     int show_reloc_info)
+	     int show_reloc_info, int fgkaslr)
 {
+	fg_kaslr = fgkaslr;
 	regex_init(use_real_mode);
 	read_ehdr(fp);
 	read_shdrs(fp);
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 43c83c0fd22c..05504052c846 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -31,8 +31,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int fg_kaslr);
 void process_64(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int fg_kaslr);
 #endif /* RELOCS_H */
diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
index 6634352a20bc..1407db72367a 100644
--- a/arch/x86/tools/relocs_common.c
+++ b/arch/x86/tools/relocs_common.c
@@ -12,14 +12,14 @@ void die(char *fmt, ...)
 
 static void usage(void)
 {
-	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \
-	    " vmlinux\n");
+	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode|" \
+	    "--fg-kaslr] vmlinux\n");
 }
 
 int main(int argc, char **argv)
 {
 	int show_absolute_syms, show_absolute_relocs, show_reloc_info;
-	int as_text, use_real_mode;
+	int as_text, use_real_mode, fg_kaslr;
 	const char *fname;
 	FILE *fp;
 	int i;
@@ -30,6 +30,7 @@ int main(int argc, char **argv)
 	show_reloc_info = 0;
 	as_text = 0;
 	use_real_mode = 0;
+	fg_kaslr = 0;
 	fname = NULL;
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
@@ -54,6 +55,10 @@ int main(int argc, char **argv)
 				use_real_mode = 1;
 				continue;
 			}
+			if (strcmp(arg, "--fg-kaslr") == 0) {
+				fg_kaslr = 1;
+				continue;
+			}
 		}
 		else if (!fname) {
 			fname = arg;
@@ -75,11 +80,11 @@ int main(int argc, char **argv)
 	if (e_ident[EI_CLASS] == ELFCLASS64)
 		process_64(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, fg_kaslr);
 	else
 		process_32(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, fg_kaslr);
 	fclose(fp);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 7/9] x86: Add support for function granular KASLR
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
                   ` (5 preceding siblings ...)
  2020-05-21 16:56 ` [PATCH v2 6/9] x86/tools: Add relative relocs for randomized functions Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 21:08   ` Kees Cook
  2020-05-21 16:56 ` [PATCH v2 8/9] kallsyms: Hide layout Kristen Carlson Accardi
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, Jonathan Corbet, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck, linux-doc

At boot time, find all the function sections that have separate .text
sections, shuffle them, and then copy them to new locations. Adjust
any relocations accordingly.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 Documentation/security/fgkaslr.rst       | 155 +++++
 Documentation/security/index.rst         |   1 +
 arch/x86/boot/compressed/Makefile        |   3 +
 arch/x86/boot/compressed/fgkaslr.c       | 823 +++++++++++++++++++++++
 arch/x86/boot/compressed/kaslr.c         |   4 -
 arch/x86/boot/compressed/misc.c          | 109 ++-
 arch/x86/boot/compressed/misc.h          |  34 +
 arch/x86/boot/compressed/utils.c         |  12 +
 arch/x86/boot/compressed/vmlinux.symbols |  17 +
 arch/x86/include/asm/boot.h              |  15 +-
 include/uapi/linux/elf.h                 |   1 +
 11 files changed, 1159 insertions(+), 15 deletions(-)
 create mode 100644 Documentation/security/fgkaslr.rst
 create mode 100644 arch/x86/boot/compressed/fgkaslr.c
 create mode 100644 arch/x86/boot/compressed/utils.c
 create mode 100644 arch/x86/boot/compressed/vmlinux.symbols

diff --git a/Documentation/security/fgkaslr.rst b/Documentation/security/fgkaslr.rst
new file mode 100644
index 000000000000..94939c62c50d
--- /dev/null
+++ b/Documentation/security/fgkaslr.rst
@@ -0,0 +1,155 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================================================
+Function Granular Kernel Address Space Layout Randomization (fgkaslr)
+=====================================================================
+
+:Date: 6 April 2020
+:Author: Kristen Accardi
+
+Kernel Address Space Layout Randomization (KASLR) was merged into the kernel
+with the objective of increasing the difficulty of code reuse attacks. Code
+reuse attacks reused existing code snippets to get around existing memory
+protections. They exploit software bugs which expose addresses of useful code
+snippets to control the flow of execution for their own nefarious purposes.
+KASLR as it was originally implemented moves the entire kernel code text as a
+unit at boot time in order to make addresses less predictable. The order of the
+code within the segment is unchanged - only the base address is shifted. There
+are a few shortcomings to this algorithm.
+
+1. Low Entropy - there are only so many locations the kernel can fit in. This
+   means an attacker could guess without too much trouble.
+2. Knowledge of a single address can reveal the offset of the base address,
+   exposing all other locations for a published/known kernel image.
+3. Info leaks abound.
+
+Finer grained ASLR has been proposed as a way to make ASLR more resistant
+to info leaks. It is not a new concept at all, and there are many variations
+possible. Function reordering is an implementation of finer grained ASLR
+which randomizes the layout of an address space on a function level
+granularity. The term "fgkaslr" is used in this document to refer to the
+technique of function reordering when used with KASLR, as well as finer grained
+KASLR in general.
+
+The objective of this patch set is to improve a technology that is already
+merged into the kernel (KASLR). This code will not prevent all code reuse
+attacks, and should be considered as one of several tools that can be used.
+
+Implementation Details
+======================
+
+The over-arching objective of the fgkaslr implementation is incremental
+improvement over the existing KASLR algorithm. It is designed to work with
+the existing solution, and there are two main area where code changes occur:
+Build time, and Load time.
+
+Build time
+----------
+
+GCC has had an option to place functions into individual .text sections
+for many years now (-ffunction-sections). This option is used to implement
+function reordering at load time. The final compiled vmlinux retains all the
+section headers, which can be used to help find the address ranges of each
+function. Using this information and an expanded table of relocation addresses,
+individual text sections can be shuffled immediately after decompression.
+Some data tables inside the kernel that have assumptions about order
+require sorting after the update. In order to modify these tables,
+a few key symbols from the objcopy symbol stripping process are preserved
+for use after shuffling the text segments.
+
+Load time
+---------
+
+The boot kernel was modified to parse the vmlinux elf file after
+decompression to check for symbols for modifying data tables, and to
+look for any .text.* sections to randomize. The sections are then shuffled,
+and tables are updated or resorted. The existing code which updated relocation
+addresses was modified to account for not just a fixed delta from the load
+address, but the offset that the function section was moved to. This requires
+inspection of each address to see if it was impacted by a randomization.
+
+In order to hide the new layout, symbols reported through /proc/kallsyms will
+be sorted by name alphabetically rather than by address.
+
+Performance Impact
+==================
+
+There are two areas where function reordering can impact performance: boot
+time latency, and run time performance.
+
+Boot time latency
+-----------------
+
+This implementation of finer grained KASLR impacts the boot time of the kernel
+in several places. It requires additional parsing of the kernel ELF file to
+obtain the section headers of the sections to be randomized. It calls the
+random number generator for each section to be randomized to determine that
+section's new memory location. It copies the decompressed kernel into a new
+area of memory to avoid corruption when laying out the newly randomized
+sections. It increases the number of relocations the kernel has to perform at
+boot time vs. standard KASLR, and it also requires a lookup on each address
+that needs to be relocated to see if it was in a randomized section and needs
+to be adjusted by a new offset. Finally, it re-sorts a few data tables that
+are required to be sorted by address.
+
+Booting a test VM on a modern, well appointed system showed an increase in
+latency of approximately 1 second.
+
+Run time
+--------
+
+The performance impact at run-time of function reordering varies by workload.
+Randomly reordering the functions will cause an increase in cache misses
+for some workloads. Some workloads perform significantly worse under FGKASLR,
+while others stay the same or even improve. In general, it will depend on the
+code flow whether or not finer grained KASLR will impact a workload, and how
+the underlying code was designed. Because the layout changes per boot, each
+time a system is rebooted the performance of a workload may change.
+
+Image Size
+==========
+
+fgkaslr increases the size of the kernel binary due to the extra section
+headers that are included, as well as the extra relocations that need to
+be added. You can expect fgkaslr to increase the size of the resulting
+vmlinux by about 3%, and the compressed image (bzImage) by 15%.
+
+Memory Usage
+============
+
+fgkaslr increases the amount of heap that is required at boot time,
+although this extra memory is released when the kernel has finished
+decompression. As a result, it may not be appropriate to use this feature
+on systems without much memory.
+
+Building
+========
+
+To enable fine grained KASLR, you need to have the following config options
+set (including all the ones you would use to build normal KASLR)
+
+``CONFIG_FG_KASLR=y``
+
+fgkaslr for the kernel is only supported for the X86_64 architecture.
+
+Modules
+=======
+
+Modules are randomized similarly to the rest of the kernel by shuffling
+the sections at load time prior to moving them into memory. The module must
+also have been build with the -ffunction-sections compiler option.
+
+Although fgkaslr for the kernel is only supported for the X86_64 architecture,
+it is possible to use fgkaslr with modules on other architectures. To enable
+this feature, select the following config option:
+
+``CONFIG_MODULE_FG_KASLR``
+
+This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.
+
+Disabling
+=========
+
+Disabling normal kaslr using the nokaslr command line option also disables
+fgkaslr. In addtion, it is possible to disable fgkaslr separately by booting
+with fgkaslr=off on the commandline.
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index fc503dd689a7..5e370c409ad2 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -7,6 +7,7 @@ Security Documentation
 
    credentials
    IMA-templates
+   fgkaslr
    keys/index
    lsm
    lsm-development
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 3a5a004498de..0ad5a31f1f0a 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -80,10 +80,12 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o $(obj)/head_$(BITS).o
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
+vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/utils.o
 ifdef CONFIG_X86_64
 	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
 	vmlinux-objs-y += $(obj)/mem_encrypt.o
 	vmlinux-objs-y += $(obj)/pgtable_64.o
+	vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/fgkaslr.o
 endif
 
 vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
@@ -120,6 +122,7 @@ OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 
 ifdef CONFIG_FG_KASLR
 	RELOCS_ARGS += --fg-kaslr
+	OBJCOPYFLAGS += --keep-symbols=$(srctree)/$(src)/vmlinux.symbols
 endif
 
 $(obj)/vmlinux.bin: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/fgkaslr.c b/arch/x86/boot/compressed/fgkaslr.c
new file mode 100644
index 000000000000..451e807de276
--- /dev/null
+++ b/arch/x86/boot/compressed/fgkaslr.c
@@ -0,0 +1,823 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fgkaslr.c
+ *
+ * This contains the routines needed to reorder the kernel text section
+ * at boot time.
+ */
+#include "misc.h"
+#include "error.h"
+#include "pgtable.h"
+#include "../string.h"
+#include "../voffset.h"
+#include <linux/sort.h>
+#include <linux/bsearch.h>
+#include "../../include/asm/extable.h"
+#include "../../include/asm/orc_types.h"
+
+/*
+ * Longest parameter of 'fgkaslr=' is 'off' right now, plus an extra '\0'
+ * for termination.
+ */
+#define MAX_FGKASLR_ARG_LENGTH 4
+int nofgkaslr;
+
+/*
+ * Use normal definitions of mem*() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memcpy
+#undef memset
+#define memzero(s, n)	memset((s), 0, (n))
+#define memmove		memmove
+
+void *memmove(void *dest, const void *src, size_t n);
+
+static unsigned long percpu_start;
+static unsigned long percpu_end;
+
+static long kallsyms_names;
+static long kallsyms_offsets;
+static long kallsyms_num_syms;
+static long kallsyms_relative_base;
+static long kallsyms_markers;
+static long __start___ex_table_addr;
+static long __stop___ex_table_addr;
+static long _stext;
+static long _etext;
+static long _sinittext;
+static long _einittext;
+static long __start_orc_unwind_ip;
+static long __stop_orc_unwind_ip;
+static long __start_orc_unwind;
+
+/* addresses in mapped address space */
+static int *base;
+static u8 *names;
+static unsigned long relative_base;
+static unsigned int *markers_addr;
+
+struct kallsyms_name {
+	u8 len;
+	u8 indecis[256];
+};
+
+struct kallsyms_name *names_table;
+
+struct orc_entry *cur_orc_table;
+int *cur_orc_ip_table;
+
+/*
+ * This is an array of pointers to sections headers for randomized sections
+ */
+Elf64_Shdr **sections;
+
+/*
+ * This is an array of all section headers, randomized or otherwise.
+ */
+Elf64_Shdr *sechdrs;
+
+/*
+ * The number of elements in the randomized section header array (sections)
+ */
+int sections_size;
+
+static bool is_text(long addr)
+{
+	if ((addr >= _stext && addr < _etext) ||
+	    (addr >= _sinittext && addr < _einittext))
+		return true;
+	return false;
+}
+
+bool is_percpu_addr(long pc, long offset)
+{
+	unsigned long ptr;
+	long address;
+
+	address = pc + offset + 4;
+
+	ptr = (unsigned long)address;
+
+	if (ptr >= percpu_start && ptr < percpu_end)
+		return true;
+
+	return false;
+}
+
+static int cmp_section_addr(const void *a, const void *b)
+{
+	unsigned long ptr = (unsigned long)a;
+	Elf64_Shdr *s = *(Elf64_Shdr **)b;
+	unsigned long end = s->sh_addr + s->sh_size;
+
+	if (ptr >= s->sh_addr && ptr < end)
+		return 0;
+
+	if (ptr < s->sh_addr)
+		return -1;
+
+	return 1;
+}
+
+/*
+ * Discover if the address is in a randomized section and if so, adjust
+ * by the saved offset.
+ */
+Elf64_Shdr *adjust_address(long *address)
+{
+	Elf64_Shdr **s;
+	Elf64_Shdr *shdr;
+
+	if (nofgkaslr)
+		return NULL;
+
+	s = bsearch((const void *)*address, sections, sections_size, sizeof(*s),
+		    cmp_section_addr);
+	if (s) {
+		shdr = *s;
+		*address += shdr->sh_offset;
+		return shdr;
+	}
+
+	return NULL;
+}
+
+void adjust_relative_offset(long pc, long *value, Elf64_Shdr *section)
+{
+	Elf64_Shdr *s;
+	long address;
+
+	if (nofgkaslr)
+		return;
+
+	/*
+	 * sometimes we are updating a relative offset that would
+	 * normally be relative to the next instruction (such as a call).
+	 * In this case to calculate the target, you need to add 32bits to
+	 * the pc to get the next instruction value. However, sometimes
+	 * targets are just data that was stored in a table such as ksymtab
+	 * or cpu alternatives. In this case our target is not relative to
+	 * the next instruction.
+	 */
+
+	/*
+	 * Calculate the address that this offset would call.
+	 */
+	if (!is_text(pc))
+		address = pc + *value;
+	else
+		address = pc + *value + 4;
+
+	/*
+	 * if the address is in section that was randomized,
+	 * we need to adjust the offset.
+	 */
+	s = adjust_address(&address);
+	if (s)
+		*value += s->sh_offset;
+
+	/*
+	 * If the PC that this offset was calculated for was in a section
+	 * that has been randomized, the value needs to be adjusted by the
+	 * same amount as the randomized section was adjusted from it's original
+	 * location.
+	 */
+	if (section)
+		*value -= section->sh_offset;
+}
+
+static void kallsyms_swp(void *a, void *b, int size)
+{
+	int idx1, idx2;
+	int temp;
+	struct kallsyms_name name_a;
+
+	/* determine our index into the array */
+	idx1 = (int *)a - base;
+	idx2 = (int *)b - base;
+	temp = base[idx1];
+	base[idx1] = base[idx2];
+	base[idx2] = temp;
+
+	/* also swap the names table */
+	memcpy(&name_a, &names_table[idx1], sizeof(name_a));
+	memcpy(&names_table[idx1], &names_table[idx2],
+	       sizeof(struct kallsyms_name));
+	memcpy(&names_table[idx2], &name_a, sizeof(struct kallsyms_name));
+}
+
+static int kallsyms_cmp(const void *a, const void *b)
+{
+	int addr_a, addr_b;
+	unsigned long uaddr_a, uaddr_b;
+
+	addr_a = *(int *)a;
+	addr_b = *(int *)b;
+
+	if (addr_a >= 0)
+		uaddr_a = addr_a;
+	if (addr_b >= 0)
+		uaddr_b = addr_b;
+
+	if (addr_a < 0)
+		uaddr_a = relative_base - 1 - addr_a;
+	if (addr_b < 0)
+		uaddr_b = relative_base - 1 - addr_b;
+
+	if (uaddr_b > uaddr_a)
+		return -1;
+
+	return 0;
+}
+
+static void deal_with_names(int num_syms)
+{
+	int num_bytes;
+	int i, j;
+	int offset;
+
+	/* we should have num_syms kallsyms_name entries */
+	num_bytes = num_syms * sizeof(*names_table);
+	names_table = malloc(num_syms * sizeof(*names_table));
+	if (!names_table) {
+		debug_putstr("\nbytes requested: ");
+		debug_puthex(num_bytes);
+		error("\nunable to allocate space for names table\n");
+	}
+
+	/* read all the names entries */
+	offset = 0;
+	for (i = 0; i < num_syms; i++) {
+		names_table[i].len = names[offset];
+		offset++;
+		for (j = 0; j < names_table[i].len; j++) {
+			names_table[i].indecis[j] = names[offset];
+			offset++;
+		}
+	}
+}
+
+static void write_sorted_names(int num_syms)
+{
+	int i, j;
+	int offset = 0;
+	unsigned int *markers;
+
+	/*
+	 * we are going to need to regenerate the markers table, which is a
+	 * table of offsets into the compressed stream every 256 symbols.
+	 * this code copied almost directly from scripts/kallsyms.c
+	 */
+	markers = malloc(sizeof(unsigned int) * ((num_syms + 255) / 256));
+	if (!markers) {
+		debug_putstr("\nfailed to allocate heap space of ");
+		debug_puthex(((num_syms + 255) / 256));
+		debug_putstr(" bytes\n");
+		error("Unable to allocate space for markers table");
+	}
+
+	for (i = 0; i < num_syms; i++) {
+		if ((i & 0xFF) == 0)
+			markers[i >> 8] = offset;
+
+		names[offset] = (u8)names_table[i].len;
+		offset++;
+		for (j = 0; j < names_table[i].len; j++) {
+			names[offset] = (u8)names_table[i].indecis[j];
+			offset++;
+		}
+	}
+
+	/* write new markers table over old one */
+	for (i = 0; i < ((num_syms + 255) >> 8); i++)
+		markers_addr[i] = markers[i];
+
+	free(markers);
+	free(names_table);
+}
+
+static void sort_kallsyms(unsigned long map)
+{
+	int num_syms;
+	int i;
+
+	debug_putstr("\nRe-sorting kallsyms...\n");
+
+	num_syms = *(int *)(kallsyms_num_syms + map);
+	base = (int *)(kallsyms_offsets + map);
+	relative_base = *(unsigned long *)(kallsyms_relative_base + map);
+	markers_addr = (unsigned int *)(kallsyms_markers + map);
+	names = (u8 *)(kallsyms_names + map);
+
+	/*
+	 * the kallsyms table was generated prior to any randomization.
+	 * it is a bunch of offsets from "relative base". In order for
+	 * us to check if a symbol has an address that was in a randomized
+	 * section, we need to reconstruct the address to it's original
+	 * value prior to handle_relocations.
+	 */
+	for (i = 0; i < num_syms; i++) {
+		unsigned long addr;
+		int new_base;
+
+		/*
+		 * according to kernel/kallsyms.c, positive offsets are absolute
+		 * values and negative offsets are relative to the base.
+		 */
+		if (base[i] >= 0)
+			addr = base[i];
+		else
+			addr = relative_base - 1 - base[i];
+
+		if (adjust_address(&addr)) {
+			/* here we need to recalcuate the offset */
+			new_base = relative_base - 1 - addr;
+			base[i] = new_base;
+		}
+	}
+
+	/*
+	 * here we need to read in all the kallsyms_names info
+	 * so that we can regenerate it.
+	 */
+	deal_with_names(num_syms);
+
+	sort(base, num_syms, sizeof(int), kallsyms_cmp, kallsyms_swp);
+
+	/* write the newly sorted names table over the old one */
+	write_sorted_names(num_syms);
+}
+
+#include "../../../../lib/extable.c"
+
+static inline unsigned long
+ex_fixup_handler(const struct exception_table_entry *x)
+{
+	return ((unsigned long)&x->handler + x->handler);
+}
+
+static inline unsigned long
+ex_fixup_addr(const struct exception_table_entry *x)
+{
+	return ((unsigned long)&x->fixup + x->fixup);
+}
+
+static void update_ex_table(unsigned long map)
+{
+	struct exception_table_entry *start_ex_table =
+		(struct exception_table_entry *)(__start___ex_table_addr + map);
+	struct exception_table_entry *stop_ex_table =
+		(struct exception_table_entry *)(__stop___ex_table_addr + map);
+	int num_entries =
+		(__stop___ex_table_addr - __start___ex_table_addr) /
+		sizeof(struct exception_table_entry);
+	int i;
+
+	debug_putstr("\nUpdating exception table...");
+	for (i = 0; i < num_entries; i++) {
+		unsigned long insn = ex_to_insn(&start_ex_table[i]);
+		unsigned long fixup = ex_fixup_addr(&start_ex_table[i]);
+		unsigned long handler = ex_fixup_handler(&start_ex_table[i]);
+		unsigned long addr;
+		Elf64_Shdr *s;
+
+		/* check each address to see if it needs adjusting */
+		addr = insn - map;
+		s = adjust_address(&addr);
+		if (s)
+			start_ex_table[i].insn += s->sh_offset;
+
+		addr = fixup - map;
+		s = adjust_address(&addr);
+		if (s)
+			start_ex_table[i].fixup += s->sh_offset;
+
+		addr = handler - map;
+		s = adjust_address(&addr);
+		if (s)
+			start_ex_table[i].handler += s->sh_offset;
+	}
+}
+
+static void sort_ex_table(unsigned long map)
+{
+	struct exception_table_entry *start_ex_table =
+		(struct exception_table_entry *)(__start___ex_table_addr + map);
+	struct exception_table_entry *stop_ex_table =
+		(struct exception_table_entry *)(__stop___ex_table_addr + map);
+
+	debug_putstr("\nRe-sorting exception table...");
+
+	sort_extable(start_ex_table, stop_ex_table);
+}
+
+static inline unsigned long orc_ip(const int *ip)
+{
+	return (unsigned long)ip + *ip;
+}
+
+static void orc_sort_swap(void *_a, void *_b, int size)
+{
+	struct orc_entry *orc_a, *orc_b;
+	struct orc_entry orc_tmp;
+	int *a = _a, *b = _b, tmp;
+	int delta = _b - _a;
+
+	/* Swap the .orc_unwind_ip entries: */
+	tmp = *a;
+	*a = *b + delta;
+	*b = tmp - delta;
+
+	/* Swap the corresponding .orc_unwind entries: */
+	orc_a = cur_orc_table + (a - cur_orc_ip_table);
+	orc_b = cur_orc_table + (b - cur_orc_ip_table);
+	orc_tmp = *orc_a;
+	*orc_a = *orc_b;
+	*orc_b = orc_tmp;
+}
+
+static int orc_sort_cmp(const void *_a, const void *_b)
+{
+	struct orc_entry *orc_a;
+	const int *a = _a, *b = _b;
+	unsigned long a_val = orc_ip(a);
+	unsigned long b_val = orc_ip(b);
+
+	if (a_val > b_val)
+		return 1;
+	if (a_val < b_val)
+		return -1;
+
+	/*
+	 * The "weak" section terminator entries need to always be on the left
+	 * to ensure the lookup code skips them in favor of real entries.
+	 * These terminator entries exist to handle any gaps created by
+	 * whitelisted .o files which didn't get objtool generation.
+	 */
+	orc_a = cur_orc_table + (a - cur_orc_ip_table);
+	return orc_a->sp_reg == ORC_REG_UNDEFINED && !orc_a->end ? -1 : 1;
+}
+
+static void sort_orc_table(unsigned long map)
+{
+	int num_entries =
+		(__stop_orc_unwind_ip - __start_orc_unwind_ip) / sizeof(int);
+
+	cur_orc_ip_table = (int *)(__start_orc_unwind_ip + map);
+	cur_orc_table = (struct orc_entry *)(__start_orc_unwind + map);
+
+	debug_putstr("\nRe-sorting orc tables...\n");
+	sort(cur_orc_ip_table, num_entries, sizeof(int), orc_sort_cmp,
+	     orc_sort_swap);
+}
+
+void post_relocations_cleanup(unsigned long map)
+{
+	if (!nofgkaslr) {
+		update_ex_table(map);
+		sort_ex_table(map);
+		sort_orc_table(map);
+	}
+
+	/*
+	 * maybe one day free will do something. So, we "free" this memory
+	 * in either case
+	 */
+	free(sections);
+	free(sechdrs);
+}
+
+void pre_relocations_cleanup(unsigned long map)
+{
+	if (nofgkaslr)
+		return;
+
+	sort_kallsyms(map);
+}
+
+static void shuffle_sections(int *list, int size)
+{
+	int i;
+	unsigned long j;
+	int temp;
+
+	for (i = size - 1; i > 0; i--) {
+		j = kaslr_get_random_long(NULL) % (i + 1);
+
+		temp = list[i];
+		list[i] = list[j];
+		list[j] = temp;
+	}
+}
+
+static void move_text(int num_sections, char *secstrings, Elf64_Shdr *text,
+		      void *source, void *dest, Elf64_Phdr *phdr)
+{
+	unsigned long adjusted_addr;
+	int copy_bytes;
+	void *stash;
+	Elf64_Shdr **sorted_sections;
+	int *index_list;
+	int i, j;
+
+	memmove(dest, source + text->sh_offset, text->sh_size);
+	copy_bytes = text->sh_size;
+	dest += text->sh_size;
+	adjusted_addr = text->sh_addr + text->sh_size;
+
+	/*
+	 * we leave the sections sorted in their original order
+	 * by s->sh_addr, but shuffle the indexes in a random
+	 * order for copying.
+	 */
+	index_list = malloc(sizeof(int) * num_sections);
+	if (!index_list)
+		error("Failed to allocate space for index list");
+
+	for (i = 0; i < num_sections; i++)
+		index_list[i] = i;
+
+	shuffle_sections(index_list, num_sections);
+
+	/*
+	 * to avoid overwriting earlier sections before they can get
+	 * copied to dest, stash everything into a buffer first.
+	 * this will cause our source address to be off by
+	 * phdr->p_offset though, so we'll adjust s->sh_offset below.
+	 *
+	 * TBD: ideally we'd simply decompress higher up so that our
+	 * copy wasn't in danger of overwriting anything important.
+	 */
+	stash = malloc(phdr->p_filesz);
+	if (!stash)
+		error("Failed to allocate space for text stash");
+
+	memcpy(stash, source + phdr->p_offset, phdr->p_filesz);
+
+	/* now we'd walk through the sections. */
+	for (j = 0; j < num_sections; j++) {
+		unsigned long aligned_addr;
+		Elf64_Shdr *s;
+		const char *sname;
+		void *src;
+		int pad_bytes;
+
+		s = sections[index_list[j]];
+
+		sname = secstrings + s->sh_name;
+
+		/* align addr for this section */
+		aligned_addr = ALIGN(adjusted_addr, s->sh_addralign);
+
+		/*
+		 * copy out of stash, so adjust offset
+		 */
+		src = stash + s->sh_offset - phdr->p_offset;
+
+		/*
+		 * Fill any space between sections with int3
+		 */
+		pad_bytes = aligned_addr - adjusted_addr;
+		memset(dest, 0xcc, pad_bytes);
+
+		dest = (void *)ALIGN((unsigned long)dest, s->sh_addralign);
+
+		memmove(dest, src, s->sh_size);
+
+		dest += s->sh_size;
+		copy_bytes += s->sh_size + pad_bytes;
+		adjusted_addr = aligned_addr + s->sh_size;
+
+		/* we can blow away sh_offset for our own uses */
+		s->sh_offset = aligned_addr - s->sh_addr;
+	}
+
+	free(index_list);
+
+	/*
+	 * move remainder of text segment. Ok to just use original source
+	 * here since this area is untouched.
+	 */
+	memmove(dest, source + text->sh_offset + copy_bytes,
+		phdr->p_filesz - copy_bytes);
+	free(stash);
+}
+
+#define GET_SYM(name)							\
+	do {								\
+		if (!name) {						\
+			if (strcmp(#name, strtab + sym->st_name) == 0) {\
+				name = sym->st_value;			\
+				continue;				\
+			}						\
+		}							\
+	} while (0)
+
+static void parse_symtab(Elf64_Sym *symtab, char *strtab, long num_syms)
+{
+	Elf64_Sym *sym;
+
+	if (!symtab || !strtab)
+		return;
+
+	debug_putstr("\nLooking for symbols... ");
+
+	/*
+	 * walk through the symbol table looking for the symbols
+	 * that we care about.
+	 */
+	for (sym = symtab; --num_syms >= 0; sym++) {
+		if (!sym->st_name)
+			continue;
+
+		GET_SYM(kallsyms_num_syms);
+		GET_SYM(kallsyms_offsets);
+		GET_SYM(kallsyms_relative_base);
+		GET_SYM(kallsyms_names);
+		GET_SYM(kallsyms_markers);
+		GET_SYM(_stext);
+		GET_SYM(_etext);
+		GET_SYM(_sinittext);
+		GET_SYM(_einittext);
+		GET_SYM(__start_orc_unwind_ip);
+		GET_SYM(__stop_orc_unwind_ip);
+		GET_SYM(__start_orc_unwind);
+
+		/*
+		 * the GET_SYM macro can't be used here because these have
+		 * to be renamed due to the inclusion of lib/extable.c
+		 */
+		if (!__start___ex_table_addr) {
+			if (strcmp("__start___ex_table",
+				   strtab + sym->st_name) == 0) {
+				__start___ex_table_addr = sym->st_value;
+				continue;
+			}
+		}
+
+		if (!__stop___ex_table_addr) {
+			if (strcmp("__stop___ex_table",
+				   strtab + sym->st_name) == 0) {
+				__stop___ex_table_addr = sym->st_value;
+				continue;
+			}
+		}
+	}
+}
+
+void parse_sections_headers(void *output, Elf64_Ehdr *ehdr, Elf64_Phdr *phdrs)
+{
+	Elf64_Phdr *phdr;
+	Elf64_Shdr *s;
+	Elf64_Shdr *text = NULL;
+	Elf64_Shdr *percpu = NULL;
+	char *secstrings;
+	const char *sname;
+	int num_sections = 0;
+	Elf64_Sym *symtab = NULL;
+	char *strtab = NULL;
+	long num_syms = 0;
+	void *dest;
+	int i;
+	char arg[MAX_FGKASLR_ARG_LENGTH];
+	Elf64_Shdr shdr;
+	unsigned long shnum;
+	unsigned int shstrndx;
+
+	debug_putstr("\nParsing ELF section headers... ");
+
+	/*
+	 * Even though fgkaslr may have been disabled, we still
+	 * need to parse through the section headers to get the
+	 * start and end of the percpu section. This is because
+	 * if we were built with CONFIG_FG_KASLR, there are more
+	 * relative relocations present in vmlinux.relocs than
+	 * just the percpu, and only the percpu relocs need to be
+	 * adjusted when using just normal base address kaslr.
+	 */
+	if (cmdline_find_option("fgkaslr", arg, sizeof(arg)) == 3 &&
+	    !strncmp(arg, "off", 3)) {
+		warn("FG_KASLR disabled on cmdline.");
+		nofgkaslr = 1;
+	}
+
+	/* read the first section header */
+	shnum = ehdr->e_shnum;
+	shstrndx = ehdr->e_shstrndx;
+	if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
+		memcpy(&shdr, output + ehdr->e_shoff, sizeof(shdr));
+		if (shnum == SHN_UNDEF)
+			shnum = shdr.sh_size;
+		if (shstrndx == SHN_XINDEX)
+			shstrndx = shdr.sh_link;
+	}
+
+	/* we are going to need to allocate space for the section headers */
+	sechdrs = malloc(sizeof(*sechdrs) * shnum);
+	if (!sechdrs)
+		error("Failed to allocate space for shdrs");
+
+	sections = malloc(sizeof(*sections) * shnum);
+	if (!sections)
+		error("Failed to allocate space for section pointers");
+
+	memcpy(sechdrs, output + ehdr->e_shoff,
+	       sizeof(*sechdrs) * shnum);
+
+	/* we need to allocate space for the section string table */
+	s = &sechdrs[shstrndx];
+
+	secstrings = malloc(s->sh_size);
+	if (!secstrings)
+		error("Failed to allocate space for shstr");
+
+	memcpy(secstrings, output + s->sh_offset, s->sh_size);
+
+	/*
+	 * now we need to walk through the section headers and collect the
+	 * sizes of the .text sections to be randomized.
+	 */
+	for (i = 0; i < shnum; i++) {
+		s = &sechdrs[i];
+		sname = secstrings + s->sh_name;
+
+		if (s->sh_type == SHT_SYMTAB) {
+			/* only one symtab per image */
+			symtab = malloc(s->sh_size);
+			if (!symtab)
+				error("Failed to allocate space for symtab");
+
+			memcpy(symtab, output + s->sh_offset, s->sh_size);
+			num_syms = s->sh_size / sizeof(*symtab);
+			continue;
+		}
+
+		if (s->sh_type == SHT_STRTAB && i != ehdr->e_shstrndx) {
+			strtab = malloc(s->sh_size);
+			if (!strtab)
+				error("Failed to allocate space for strtab");
+
+			memcpy(strtab, output + s->sh_offset, s->sh_size);
+		}
+
+		if (!strcmp(sname, ".text")) {
+			text = s;
+			continue;
+		}
+
+		if (!strcmp(sname, ".data..percpu")) {
+			/* get start addr for later */
+			percpu = s;
+		}
+
+		if (!(s->sh_flags & SHF_ALLOC) ||
+		    !(s->sh_flags & SHF_EXECINSTR) ||
+		    !(strstarts(sname, ".text")))
+			continue;
+
+		sections[num_sections] = s;
+
+		num_sections++;
+	}
+	sections[num_sections] = NULL;
+	sections_size = num_sections;
+
+	parse_symtab(symtab, strtab, num_syms);
+
+	for (i = 0; i < ehdr->e_phnum; i++) {
+		phdr = &phdrs[i];
+
+		switch (phdr->p_type) {
+		case PT_LOAD:
+			if ((phdr->p_align % 0x200000) != 0)
+				error("Alignment of LOAD segment isn't multiple of 2MB");
+			dest = output;
+			dest += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
+			if (!nofgkaslr &&
+			    (text && phdr->p_offset == text->sh_offset)) {
+				move_text(num_sections, secstrings, text,
+					  output, dest, phdr);
+			} else {
+				if (percpu &&
+				    phdr->p_offset == percpu->sh_offset) {
+					percpu_start = percpu->sh_addr;
+					percpu_end = percpu_start +
+							phdr->p_filesz;
+				}
+				memmove(dest, output + phdr->p_offset,
+					phdr->p_filesz);
+			}
+			break;
+		default: /* Ignore other PT_* */
+			break;
+		}
+	}
+
+	/* we need to keep the section info to redo relocs */
+	free(secstrings);
+
+	free(phdrs);
+}
+
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index d7408af55738..6f596bd5b6e5 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -39,10 +39,6 @@
 #include <generated/utsrelease.h>
 #include <asm/efi.h>
 
-/* Macros used by the included decompressor code below. */
-#define STATIC
-#include <linux/decompress/mm.h>
-
 #ifdef CONFIG_X86_5LEVEL
 unsigned int __pgtable_l5_enabled;
 unsigned int pgdir_shift __ro_after_init = 39;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 9652d5c2afda..5f08922fd12a 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -26,9 +26,6 @@
  * it is not safe to place pointers in static structures.
  */
 
-/* Macros used by the included decompressor code below. */
-#define STATIC		static
-
 /*
  * Use normal definitions of mem*() from string.c. There are already
  * included header files which expect a definition of memset() and by
@@ -49,6 +46,10 @@ struct boot_params *boot_params;
 
 memptr free_mem_ptr;
 memptr free_mem_end_ptr;
+#ifdef CONFIG_FG_KASLR
+unsigned long malloc_ptr;
+int malloc_count;
+#endif
 
 static char *vidmem;
 static int vidport;
@@ -203,10 +204,21 @@ static void handle_relocations(void *output, unsigned long output_len,
 	if (IS_ENABLED(CONFIG_X86_64))
 		delta = virt_addr - LOAD_PHYSICAL_ADDR;
 
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
+	/*
+	 * it is possible to have delta be zero and
+	 * still have enabled fg kaslr. We need to perform relocations
+	 * for fgkaslr regardless of whether the base address has moved.
+	 */
+	if (!IS_ENABLED(CONFIG_FG_KASLR) ||
+	    cmdline_find_option_bool("nokaslr")) {
+		if (!delta) {
+			debug_putstr("No relocation needed... ");
+			return;
+		}
 	}
+
+	pre_relocations_cleanup(map);
+
 	debug_putstr("Performing relocations... ");
 
 	/*
@@ -230,35 +242,106 @@ static void handle_relocations(void *output, unsigned long output_len,
 	 */
 	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
 		long extended = *reloc;
+		long value;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		(void)adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("32-bit relocation outside of kernel!\n");
 
-		*(uint32_t *)ptr += delta;
+		value = *(int32_t *)ptr;
+
+		/*
+		 * If using fgkaslr, the value of the relocation
+		 * might need to be changed because it referred
+		 * to an address that has moved.
+		 */
+		adjust_address(&value);
+
+		value += delta;
+
+		*(uint32_t *)ptr = value;
 	}
 #ifdef CONFIG_X86_64
 	while (*--reloc) {
 		long extended = *reloc;
+		long value;
+		long oldvalue;
+		Elf64_Shdr *s;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		s = adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("inverse 32-bit relocation outside of kernel!\n");
 
-		*(int32_t *)ptr -= delta;
+		value = *(int32_t *)ptr;
+		oldvalue = value;
+
+		/*
+		 * If using fgkaslr, these relocs will contain
+		 * relative offsets which might need to be
+		 * changed because it referred
+		 * to an address that has moved.
+		 */
+		adjust_relative_offset(*reloc, &value, s);
+
+		/*
+		 * only percpu symbols need to have their values adjusted for
+		 * base address kaslr since relative offsets within the .text
+		 * and .text.* sections are ok wrt each other.
+		 */
+		if (is_percpu_addr(*reloc, oldvalue))
+			value -= delta;
+
+		*(int32_t *)ptr = value;
 	}
 	for (reloc--; *reloc; reloc--) {
 		long extended = *reloc;
+		long value;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		(void)adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("64-bit relocation outside of kernel!\n");
 
-		*(uint64_t *)ptr += delta;
+		value = *(int64_t *)ptr;
+
+		/*
+		 * If using fgkaslr, the value of the relocation
+		 * might need to be changed because it referred
+		 * to an address that has moved.
+		 */
+		(void)adjust_address(&value);
+
+		value += delta;
+
+		*(uint64_t *)ptr = value;
 	}
+	post_relocations_cleanup(map);
 #endif
 }
 #else
@@ -296,6 +379,14 @@ static void parse_elf(void *output)
 
 	memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
 
+	if (IS_ENABLED(CONFIG_FG_KASLR)) {
+		if (!cmdline_find_option_bool("nokaslr")) {
+			parse_sections_headers(output, &ehdr, phdrs);
+			return;
+		}
+		warn("FG_KASLR disabled: 'nokaslr' on cmdline.");
+	}
+
 	for (i = 0; i < ehdr.e_phnum; i++) {
 		phdr = &phdrs[i];
 
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 726e264410ff..e8e45f263eaf 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -39,6 +39,12 @@
 /* misc.c */
 extern memptr free_mem_ptr;
 extern memptr free_mem_end_ptr;
+#define STATIC
+#ifdef CONFIG_FG_KASLR
+#define STATIC_RW_DATA extern
+#endif
+#include <linux/decompress/mm.h>
+
 extern struct boot_params *boot_params;
 void __putstr(const char *s);
 void __puthex(unsigned long value);
@@ -74,6 +80,34 @@ struct mem_vector {
 	unsigned long long size;
 };
 
+#ifdef CONFIG_X86_64
+#define Elf_Ehdr Elf64_Ehdr
+#define Elf_Phdr Elf64_Phdr
+#define Elf_Shdr Elf64_Shdr
+#else
+#define Elf_Ehdr Elf32_Ehdr
+#define Elf_Phdr Elf32_Phdr
+#define Elf_Shdr Elf32_Shdr
+#endif
+
+#if CONFIG_FG_KASLR
+void parse_sections_headers(void *output, Elf_Ehdr *ehdr, Elf_Phdr *phdrs);
+void pre_relocations_cleanup(unsigned long map);
+void post_relocations_cleanup(unsigned long map);
+Elf_Shdr *adjust_address(long *address);
+void adjust_relative_offset(long pc, long *value, Elf_Shdr *section);
+bool is_percpu_addr(long pc, long offset);
+#else
+static inline void parse_sections_headers(void *output, Elf_Ehdr *ehdr,
+					  Elf_Phdr *phdrs) { }
+static inline void pre_relocations_cleanup(unsigned long map) { }
+static inline void post_relocations_cleanup(unsigned long map) { }
+static inline Elf_Shdr *adjust_address(long *address) { return NULL; }
+static inline void adjust_relative_offset(long pc, long *value,
+					  Elf_Shdr *section) { }
+static inline bool is_percpu_addr(long pc, long offset) { return true; }
+#endif /* CONFIG_FG_KASLR */
+
 #if CONFIG_RANDOMIZE_BASE
 /* kaslr.c */
 void choose_random_location(unsigned long input,
diff --git a/arch/x86/boot/compressed/utils.c b/arch/x86/boot/compressed/utils.c
new file mode 100644
index 000000000000..ceefc58d7c71
--- /dev/null
+++ b/arch/x86/boot/compressed/utils.c
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * utils.c
+ *
+ * This contains various libraries that are needed for fgkaslr
+ */
+#define __DISABLE_EXPORTS
+#define _LINUX_KPROBES_H
+#define NOKPROBE_SYMBOL(fname)
+#include "../../../../lib/sort.c"
+#include "../../../../lib/bsearch.c"
+
diff --git a/arch/x86/boot/compressed/vmlinux.symbols b/arch/x86/boot/compressed/vmlinux.symbols
new file mode 100644
index 000000000000..cc86e79a2a3d
--- /dev/null
+++ b/arch/x86/boot/compressed/vmlinux.symbols
@@ -0,0 +1,17 @@
+kallsyms_offsets
+kallsyms_addresses
+kallsyms_num_syms
+kallsyms_relative_base
+kallsyms_names
+kallsyms_token_table
+kallsyms_token_index
+kallsyms_markers
+__start___ex_table
+__stop___ex_table
+_sinittext
+_einittext
+_stext
+_etext
+__start_orc_unwind_ip
+__stop_orc_unwind_ip
+__start_orc_unwind
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 680c320363db..6918d33eb5ef 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -26,8 +26,19 @@
 
 #ifdef CONFIG_KERNEL_BZIP2
 # define BOOT_HEAP_SIZE		0x400000
-#else /* !CONFIG_KERNEL_BZIP2 */
-# define BOOT_HEAP_SIZE		 0x10000
+#elif CONFIG_FG_KASLR
+/*
+ * We need extra boot heap when using fgkaslr because we make a copy
+ * of the original decompressed kernel to avoid issues with writing
+ * over ourselves when shuffling the sections. We also need extra
+ * space for resorting kallsyms after shuffling. This value could
+ * be decreased if free() would release memory properly, or if we
+ * could avoid the kernel copy. It would need to be increased if we
+ * find additional tables that need to be resorted.
+ */
+# define BOOT_HEAP_SIZE		0x4000000
+#else /* !CONFIG_KERNEL_BZIP2 && !CONFIG_FG_KASLR */
+# define BOOT_HEAP_SIZE		0x10000
 #endif
 
 #ifdef CONFIG_X86_64
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index 34c02e4290fe..a85d1792d5a8 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -298,6 +298,7 @@ typedef struct elf64_phdr {
 #define SHN_LIVEPATCH	0xff20
 #define SHN_ABS		0xfff1
 #define SHN_COMMON	0xfff2
+#define SHN_XINDEX	0xffff
 #define SHN_HIRESERVE	0xffff
  
 typedef struct elf32_shdr {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 8/9] kallsyms: Hide layout
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
                   ` (6 preceding siblings ...)
  2020-05-21 16:56 ` [PATCH v2 7/9] x86: Add support for function granular KASLR Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 21:14   ` Kees Cook
  2020-05-21 16:56 ` [PATCH v2 9/9] module: Reorder functions Kristen Carlson Accardi
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp
  Cc: arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck

This patch makes /proc/kallsyms display alphabetically by symbol
name rather than sorted by address in order to hide the newly
randomized address layout.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 kernel/kallsyms.c | 138 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 137 insertions(+), 1 deletion(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 16c8c605f4b0..558963b275ec 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -25,6 +25,7 @@
 #include <linux/filter.h>
 #include <linux/ftrace.h>
 #include <linux/compiler.h>
+#include <linux/list_sort.h>
 
 /*
  * These will be re-linked against their real values
@@ -446,6 +447,11 @@ struct kallsym_iter {
 	int show_value;
 };
 
+struct kallsyms_iter_list {
+	struct kallsym_iter iter;
+	struct list_head next;
+};
+
 int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value,
 			    char *type, char *name)
 {
@@ -660,6 +666,121 @@ int kallsyms_show_value(void)
 	}
 }
 
+static int sorted_show(struct seq_file *m, void *p)
+{
+	struct list_head *list = m->private;
+	struct kallsyms_iter_list *iter;
+	int rc;
+
+	if (list_empty(list))
+		return 0;
+
+	iter = list_first_entry(list, struct kallsyms_iter_list, next);
+
+	m->private = iter;
+	rc = s_show(m, p);
+	m->private = list;
+
+	list_del(&iter->next);
+	kfree(iter);
+
+	return rc;
+}
+
+static void *sorted_start(struct seq_file *m, loff_t *pos)
+{
+	return m->private;
+}
+
+static void *sorted_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	struct list_head *list = m->private;
+
+	(*pos)++;
+
+	if (list_empty(list))
+		return NULL;
+
+	return p;
+}
+
+static const struct seq_operations kallsyms_sorted_op = {
+	.start = sorted_start,
+	.next = sorted_next,
+	.stop = s_stop,
+	.show = sorted_show
+};
+
+static int kallsyms_list_cmp(void *priv, struct list_head *a,
+			     struct list_head *b)
+{
+	struct kallsyms_iter_list *iter_a, *iter_b;
+
+	iter_a = list_entry(a, struct kallsyms_iter_list, next);
+	iter_b = list_entry(b, struct kallsyms_iter_list, next);
+
+	return strcmp(iter_a->iter.name, iter_b->iter.name);
+}
+
+int get_all_symbol_name(void *data, const char *name, struct module *mod,
+			unsigned long addr)
+{
+	unsigned long sym_pos;
+	struct kallsyms_iter_list *node, *last;
+	struct list_head *head = (struct list_head *)data;
+
+	node = kmalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	if (list_empty(head)) {
+		sym_pos = 0;
+		memset(node, 0, sizeof(*node));
+		reset_iter(&node->iter, 0);
+		node->iter.show_value = kallsyms_show_value();
+	} else {
+		last = list_first_entry(head, struct kallsyms_iter_list, next);
+		memcpy(node, last, sizeof(*node));
+		sym_pos = last->iter.pos;
+	}
+
+	INIT_LIST_HEAD(&node->next);
+	list_add(&node->next, head);
+
+	/*
+	 * update_iter returns false when at end of file
+	 * which in this case we don't care about and can
+	 * safely ignore. update_iter() will increment
+	 * the value of iter->pos, for ksymbol_core.
+	 */
+	if (sym_pos >= kallsyms_num_syms)
+		sym_pos++;
+
+	(void)update_iter(&node->iter, sym_pos);
+
+	return 0;
+}
+
+static int kallsyms_sorted_open(struct inode *inode, struct file *file)
+{
+	int ret;
+	struct list_head *list;
+
+	list = __seq_open_private(file, &kallsyms_sorted_op, sizeof(*list));
+	if (!list)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(list);
+
+	ret = kallsyms_on_each_symbol(get_all_symbol_name, list);
+	if (ret != 0)
+		return ret;
+
+	list_sort(NULL, list, kallsyms_list_cmp);
+
+	return 0;
+}
+
 static int kallsyms_open(struct inode *inode, struct file *file)
 {
 	/*
@@ -704,9 +825,24 @@ static const struct proc_ops kallsyms_proc_ops = {
 	.proc_release	= seq_release_private,
 };
 
+static const struct proc_ops kallsyms_sorted_proc_ops = {
+	.proc_open = kallsyms_sorted_open,
+	.proc_read = seq_read,
+	.proc_lseek = seq_lseek,
+	.proc_release = seq_release_private,
+};
+
 static int __init kallsyms_init(void)
 {
-	proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
+	/*
+	 * When fine grained kaslr is enabled, we need to
+	 * print out the symbols sorted by name rather than by
+	 * by address, because this reveals the randomization order.
+	 */
+	if (!IS_ENABLED(CONFIG_FG_KASLR))
+		proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
+	else
+		proc_create("kallsyms", 0444, NULL, &kallsyms_sorted_proc_ops);
 	return 0;
 }
 device_initcall(kallsyms_init);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 9/9] module: Reorder functions
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
                   ` (7 preceding siblings ...)
  2020-05-21 16:56 ` [PATCH v2 8/9] kallsyms: Hide layout Kristen Carlson Accardi
@ 2020-05-21 16:56 ` Kristen Carlson Accardi
  2020-05-21 21:33   ` Kees Cook
  2020-05-21 21:54 ` [PATCH v2 0/9] Function Granular KASLR Kees Cook
  2020-05-21 22:26 ` Thomas Gleixner
  10 siblings, 1 reply; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 16:56 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin, Jessica Yu
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Ard Biesheuvel, Tony Luck

Introduce a new config option to allow modules to be re-ordered
by function. This option can be enabled independently of the
kernel text KASLR or FG_KASLR settings so that it can be used
by architectures that do not support either of these features.
This option will be selected by default if CONFIG_FG_KASLR is
selected.

If a module has functions split out into separate text sections
(i.e. compiled with the -ffunction-sections flag), reorder the
functions to provide some code diversification to modules.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/Kconfig  |  1 +
 arch/x86/Makefile |  3 ++
 init/Kconfig      | 11 +++++++
 kernel/module.c   | 81 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 96 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 50e83ea57d70..d0bdd5c8c432 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2187,6 +2187,7 @@ config FG_KASLR
 	bool "Function Granular Kernel Address Space Layout Randomization"
 	depends on $(cc-option, -ffunction-sections)
 	depends on RANDOMIZE_BASE && X86_64
+	select MODULE_FG_KASLR
 	help
 	  This option improves the randomness of the kernel text
 	  over basic Kernel Address Space Layout Randomization (KASLR)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index b65ec63c7db7..8c830c37c74c 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -51,6 +51,9 @@ ifdef CONFIG_X86_NEED_RELOCS
         LDFLAGS_vmlinux := --emit-relocs --discard-none
 endif
 
+ifdef CONFIG_MODULE_FG_KASLR
+	KBUILD_CFLAGS_MODULE += -ffunction-sections
+endif
 #
 # Prevent GCC from generating any FP code by mistake.
 #
diff --git a/init/Kconfig b/init/Kconfig
index 74a5ac65644f..b19920413bcc 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2227,6 +2227,17 @@ config UNUSED_KSYMS_WHITELIST
 	  one per line. The path can be absolute, or relative to the kernel
 	  source tree.
 
+config MODULE_FG_KASLR
+	depends on $(cc-option, -ffunction-sections)
+	bool "Module Function Granular Layout Randomization"
+	help
+	  This option randomizes the module text section by reordering the text
+	  section by function at module load time. In order to use this
+	  feature, the module must have been compiled with the
+	  -ffunction-sections compiler flag.
+
+	  If unsure, say N.
+
 endif # MODULES
 
 config MODULES_TREE_LOOKUP
diff --git a/kernel/module.c b/kernel/module.c
index 646f1e2330d2..e3cd619c60c2 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -53,6 +53,7 @@
 #include <linux/bsearch.h>
 #include <linux/dynamic_debug.h>
 #include <linux/audit.h>
+#include <linux/random.h>
 #include <uapi/linux/module.h>
 #include "module-internal.h"
 
@@ -2370,6 +2371,83 @@ static long get_offset(struct module *mod, unsigned int *size,
 	return ret;
 }
 
+/*
+ * shuffle_text_list()
+ * Use a Fisher Yates algorithm to shuffle a list of text sections.
+ */
+static void shuffle_text_list(Elf_Shdr **list, int size)
+{
+	int i;
+	unsigned int j;
+	Elf_Shdr *temp;
+
+	for (i = size - 1; i > 0; i--) {
+		/*
+		 * pick a random index from 0 to i
+		 */
+		get_random_bytes(&j, sizeof(j));
+		j = j % (i + 1);
+
+		temp = list[i];
+		list[i] = list[j];
+		list[j] = temp;
+	}
+}
+
+/*
+ * randomize_text()
+ * Look through the core section looking for executable code sections.
+ * Store sections in an array and then shuffle the sections
+ * to reorder the functions.
+ */
+static void randomize_text(struct module *mod, struct load_info *info)
+{
+	int i;
+	int num_text_sections = 0;
+	Elf_Shdr **text_list;
+	int size = 0;
+	int max_sections = info->hdr->e_shnum;
+	unsigned int sec = find_sec(info, ".text");
+
+	if (sec == 0)
+		return;
+
+	text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
+	if (!text_list)
+		return;
+
+	for (i = 0; i < max_sections; i++) {
+		Elf_Shdr *shdr = &info->sechdrs[i];
+		const char *sname = info->secstrings + shdr->sh_name;
+
+		if (!(shdr->sh_flags & SHF_ALLOC) ||
+		    !(shdr->sh_flags & SHF_EXECINSTR) ||
+		    strstarts(sname, ".init"))
+			continue;
+
+		text_list[num_text_sections] = shdr;
+		num_text_sections++;
+	}
+
+	shuffle_text_list(text_list, num_text_sections);
+
+	for (i = 0; i < num_text_sections; i++) {
+		Elf_Shdr *shdr = text_list[i];
+
+		/*
+		 * get_offset has a section index for it's last
+		 * argument, that is only used by arch_mod_section_prepend(),
+		 * which is only defined by parisc. Since this this type
+		 * of randomization isn't supported on parisc, we can
+		 * safely pass in zero as the last argument, as it is
+		 * ignored.
+		 */
+		shdr->sh_entsize = get_offset(mod, &size, shdr, 0);
+	}
+
+	kfree(text_list);
+}
+
 /* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld
    might -- code, read-only data, read-write data, small data.  Tally
    sizes, and place the offsets into sh_entsize fields: high bit means it
@@ -2460,6 +2538,9 @@ static void layout_sections(struct module *mod, struct load_info *info)
 			break;
 		}
 	}
+
+	if (IS_ENABLED(CONFIG_MODULE_FG_KASLR))
+		randomize_text(mod, info);
 }
 
 static void set_license(struct module *mod, const char *license)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 5/9] x86: Make sure _etext includes function sections
  2020-05-21 16:56 ` [PATCH v2 5/9] x86: Make sure _etext includes function sections Kristen Carlson Accardi
@ 2020-05-21 18:11   ` Kees Cook
  0 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2020-05-21 18:11 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, x86, H. Peter Anvin, Arnd Bergmann, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Tony Luck,
	linux-arch

On Thu, May 21, 2020 at 09:56:36AM -0700, Kristen Carlson Accardi wrote:
> When using -ffunction-sections to place each function in
> it's own text section so it can be randomized at load time, the
> linker considers these .text.* sections "orphaned sections", and
> will place them after the first similar section (.text). In order
> to accurately represent the end of the text section and the
> orphaned sections, _etext must be moved so that it is after both
> .text and .text.* The text size must also be calculated to
> include .text AND .text.*
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/kernel/vmlinux.lds.S     | 18 +++++++++++++++++-
>  include/asm-generic/vmlinux.lds.h |  2 +-
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 1bf7e312361f..044f7528a2f0 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -147,8 +147,24 @@ SECTIONS
>  #endif
>  	} :text =0xcccc
>  
> -	/* End of text section, which should occupy whole number of pages */
> +#ifdef CONFIG_FG_KASLR
> +	/*
> +	 * -ffunction-sections creates .text.* sections, which are considered
> +	 * "orphan sections" and added after the first similar section (.text).
> +	 * Adding this ALIGN statement causes the address of _etext
> +	 * to be below that of all the .text.* orphaned sections
> +	 */
> +	. = ALIGN(PAGE_SIZE);
> +#endif
>  	_etext = .;
> +
> +	/*
> +	 * the size of the .text section is used to calculate the address
> +	 * range for orc lookups. If we just use SIZEOF(.text), we will
> +	 * miss all the .text.* sections. Calculate the size using _etext
> +	 * and _stext and save the value for later.
> +	 */
> +	text_size = _etext - _stext;
>  	. = ALIGN(PAGE_SIZE);

I don't think there's any need for this #ifdef (nor the trailing ALIGN).
I think leave the comment to explain why the ALIGN is being moved before
the _etext.

A repeated ALIGN won't move the position:

--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -147,9 +147,11 @@ SECTIONS
 #endif
        } :text =0xcccc
 
+       . = ALIGN(PAGE_SIZE);
        /* End of text section, which should occupy whole number of * pages */
        _etext = .;
        . = ALIGN(PAGE_SIZE);
+       _ktext = .;
 
        X86_ALIGN_RODATA_BEGIN
        RO_DATA(PAGE_SIZE)


$ nm vmlinux | grep '_[ek]text'
ffffffff81e02000 R _etext
ffffffff81e02000 R _ktext

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 6/9] x86/tools: Add relative relocs for randomized functions
  2020-05-21 16:56 ` [PATCH v2 6/9] x86/tools: Add relative relocs for randomized functions Kristen Carlson Accardi
@ 2020-05-21 19:54   ` Kees Cook
  0 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2020-05-21 19:54 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, x86, H. Peter Anvin, arjan, linux-kernel,
	kernel-hardening, rick.p.edgecombe, Tony Luck

On Thu, May 21, 2020 at 09:56:37AM -0700, Kristen Carlson Accardi wrote:
> When reordering functions, the relative offsets for relocs that
> are either in the randomized sections, or refer to the randomized
> sections will need to be adjusted. Add code to detect whether a
> reloc satisifies these cases, and if so, add them to the appropriate
> reloc list.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>

I'm just down to bikeshedding here (see below).

> ---
>  arch/x86/boot/compressed/Makefile |  7 +++-
>  arch/x86/tools/relocs.c           | 55 ++++++++++++++++++++++++-------
>  arch/x86/tools/relocs.h           |  4 +--
>  arch/x86/tools/relocs_common.c    | 15 ++++++---
>  4 files changed, 62 insertions(+), 19 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 5f7c262bcc99..3a5a004498de 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -117,6 +117,11 @@ $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
>  	$(call if_changed,check-and-link-vmlinux)
>  
>  OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
> +
> +ifdef CONFIG_FG_KASLR
> +	RELOCS_ARGS += --fg-kaslr
> +endif
> +
>  $(obj)/vmlinux.bin: vmlinux FORCE
>  	$(call if_changed,objcopy)
>  
> @@ -124,7 +129,7 @@ targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relo
>  
>  CMD_RELOCS = arch/x86/tools/relocs
>  quiet_cmd_relocs = RELOCS  $@
> -      cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
> +      cmd_relocs = $(CMD_RELOCS) $(RELOCS_ARGS) $< > $@;$(CMD_RELOCS) $(RELOCS_ARGS) --abs-relocs $<
>  $(obj)/vmlinux.relocs: vmlinux FORCE
>  	$(call if_changed,relocs)
>  
> diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
> index a00dc133f109..bf51ff1854ff 100644
> --- a/arch/x86/tools/relocs.c
> +++ b/arch/x86/tools/relocs.c
> @@ -42,6 +42,8 @@ struct section {
>  };
>  static struct section *secs;
>  
> +static int fg_kaslr;

I find the shifting name for fg_kaslr, fgkaslr, and global fg_kaslr
confusing. I think it would call this one "fgkaslr_mode" to indicate
that it does control the mode of how the later functions behave.

> +
>  static const char * const sym_regex_kernel[S_NSYMTYPES] = {
>  /*
>   * Following symbols have been audited. There values are constant and do
> @@ -351,8 +353,8 @@ static int sym_index(Elf_Sym *sym)
>  		return sym->st_shndx;
>  
>  	/* calculate offset of sym from head of table. */
> -	offset = (unsigned long) sym - (unsigned long) symtab;
> -	index = offset/sizeof(*sym);
> +	offset = (unsigned long)sym - (unsigned long)symtab;
> +	index = offset / sizeof(*sym);
>  
>  	return elf32_to_cpu(xsymtab[index]);
>  }
> @@ -500,22 +502,22 @@ static void read_symtabs(FILE *fp)
>  			sec->xsymtab = malloc(sec->shdr.sh_size);
>  			if (!sec->xsymtab) {
>  				die("malloc of %d bytes for xsymtab failed\n",
> -					sec->shdr.sh_size);
> +				    sec->shdr.sh_size);
>  			}
>  			if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
>  				die("Seek to %d failed: %s\n",
> -					sec->shdr.sh_offset, strerror(errno));
> +				    sec->shdr.sh_offset, strerror(errno));
>  			}
>  			if (fread(sec->xsymtab, 1, sec->shdr.sh_size, fp)
> -					!= sec->shdr.sh_size) {
> +			    != sec->shdr.sh_size) {
>  				die("Cannot read extended symbol table: %s\n",
> -					strerror(errno));
> +				    strerror(errno));
>  			}
>  			shxsymtabndx = i;
>  			continue;
>  
>  		case SHT_SYMTAB:
> -			num_syms = sec->shdr.sh_size/sizeof(Elf_Sym);
> +			num_syms = sec->shdr.sh_size / sizeof(Elf_Sym);
>  
>  			sec->symtab = malloc(sec->shdr.sh_size);
>  			if (!sec->symtab) {

I would mention these whitespace/readability cleanups in the commit log.

> @@ -818,6 +820,32 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
>  		strncmp(symname, "init_per_cpu_", 13);
>  }
>  
> +static int is_function_section(struct section *sec)
> +{
> +	const char *name;
> +
> +	if (!fg_kaslr)
> +		return 0;
> +
> +	name = sec_name(sec->shdr.sh_info);
> +
> +	return(!strncmp(name, ".text.", 6));
> +}
> +
> +static int is_randomized_sym(ElfW(Sym) *sym)
> +{
> +	const char *name;
> +
> +	if (!fg_kaslr)
> +		return 0;
> +
> +	if (sym->st_shndx > shnum)
> +		return 0;
> +
> +	name = sec_name(sym_index(sym));
> +	return(!strncmp(name, ".text.", 6));
> +}
> +
>  static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
>  		      const char *symname)
>  {
> @@ -842,13 +870,17 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
>  	case R_X86_64_PC32:
>  	case R_X86_64_PLT32:
>  		/*
> -		 * PC relative relocations don't need to be adjusted unless
> -		 * referencing a percpu symbol.
> +		 * we need to keep pc relative relocations for sections which
> +		 * might be randomized, and for the percpu section.
> +		 * We also need to keep relocations for any offset which might
> +		 * reference an address in a section which has been randomized.
>  		 *
>  		 * NB: R_X86_64_PLT32 can be treated as R_X86_64_PC32.
>  		 */
> -		if (is_percpu_sym(sym, symname))
> +		if (is_function_section(sec) || is_randomized_sym(sym) ||
> +		    is_percpu_sym(sym, symname))
>  			add_reloc(&relocs32neg, offset);
> +
>  		break;
>  
>  	case R_X86_64_PC64:
> @@ -1158,8 +1190,9 @@ static void print_reloc_info(void)
>  
>  void process(FILE *fp, int use_real_mode, int as_text,
>  	     int show_absolute_syms, int show_absolute_relocs,
> -	     int show_reloc_info)
> +	     int show_reloc_info, int fgkaslr)
>  {
> +	fg_kaslr = fgkaslr;

Then this becomes a bit more readable:

	fgkaslr_mode = fgkaslr;

>  	regex_init(use_real_mode);
>  	read_ehdr(fp);
>  	read_shdrs(fp);
> diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
> index 43c83c0fd22c..05504052c846 100644
> --- a/arch/x86/tools/relocs.h
> +++ b/arch/x86/tools/relocs.h
> @@ -31,8 +31,8 @@ enum symtype {
>  
>  void process_32(FILE *fp, int use_real_mode, int as_text,
>  		int show_absolute_syms, int show_absolute_relocs,
> -		int show_reloc_info);
> +		int show_reloc_info, int fg_kaslr);
>  void process_64(FILE *fp, int use_real_mode, int as_text,
>  		int show_absolute_syms, int show_absolute_relocs,
> -		int show_reloc_info);
> +		int show_reloc_info, int fg_kaslr);

I think the prototype and declaration should have matching names:
fgkaslr in "process" and fg_kaslr here. I suggest just calling it
fgkaslr in all.

>  #endif /* RELOCS_H */
> diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
> index 6634352a20bc..1407db72367a 100644
> --- a/arch/x86/tools/relocs_common.c
> +++ b/arch/x86/tools/relocs_common.c
> @@ -12,14 +12,14 @@ void die(char *fmt, ...)
>  
>  static void usage(void)
>  {
> -	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \
> -	    " vmlinux\n");
> +	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode|" \
> +	    "--fg-kaslr] vmlinux\n");
>  }
>  
>  int main(int argc, char **argv)
>  {
>  	int show_absolute_syms, show_absolute_relocs, show_reloc_info;
> -	int as_text, use_real_mode;
> +	int as_text, use_real_mode, fg_kaslr;

And I think I'd call this one fgkaslr_opt to show it comes from the opt
parsing.

>  	const char *fname;
>  	FILE *fp;
>  	int i;
> @@ -30,6 +30,7 @@ int main(int argc, char **argv)
>  	show_reloc_info = 0;
>  	as_text = 0;
>  	use_real_mode = 0;
> +	fg_kaslr = 0;
>  	fname = NULL;
>  	for (i = 1; i < argc; i++) {
>  		char *arg = argv[i];
> @@ -54,6 +55,10 @@ int main(int argc, char **argv)
>  				use_real_mode = 1;
>  				continue;
>  			}
> +			if (strcmp(arg, "--fg-kaslr") == 0) {
> +				fg_kaslr = 1;
> +				continue;
> +			}
>  		}
>  		else if (!fname) {
>  			fname = arg;
> @@ -75,11 +80,11 @@ int main(int argc, char **argv)
>  	if (e_ident[EI_CLASS] == ELFCLASS64)
>  		process_64(fp, use_real_mode, as_text,
>  			   show_absolute_syms, show_absolute_relocs,
> -			   show_reloc_info);
> +			   show_reloc_info, fg_kaslr);
>  	else
>  		process_32(fp, use_real_mode, as_text,
>  			   show_absolute_syms, show_absolute_relocs,
> -			   show_reloc_info);
> +			   show_reloc_info, fg_kaslr);
>  	fclose(fp);
>  	return 0;
>  }
> -- 
> 2.20.1
> 

With these changes, yeah, I think it's good to go.

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 4/9] x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  2020-05-21 16:56 ` [PATCH v2 4/9] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
@ 2020-05-21 20:00   ` Kees Cook
  0 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2020-05-21 20:00 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, Masahiro Yamada, Michal Marek, x86,
	H. Peter Anvin, arjan, linux-kernel, kernel-hardening,
	rick.p.edgecombe, Tony Luck, linux-kbuild

On Thu, May 21, 2020 at 09:56:35AM -0700, Kristen Carlson Accardi wrote:
> Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
> the make file to build with -ffunction-sections if CONFIG_FG_KASLR
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> ---
>  Makefile         |  4 ++++
>  arch/x86/Kconfig | 13 +++++++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index 04f5662ae61a..28e515baa824 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -862,6 +862,10 @@ ifdef CONFIG_LIVEPATCH
>  KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
>  endif
>  
> +ifdef CONFIG_FG_KASLR
> +KBUILD_CFLAGS += -ffunction-sections
> +endif
> +
>  # arch Makefile may override CC so keep this after arch Makefile is included
>  NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
>  
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 2d3f963fd6f1..50e83ea57d70 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2183,6 +2183,19 @@ config RANDOMIZE_BASE
>  
>  	  If unsure, say Y.
>  
> +config FG_KASLR
> +	bool "Function Granular Kernel Address Space Layout Randomization"
> +	depends on $(cc-option, -ffunction-sections)
> +	depends on RANDOMIZE_BASE && X86_64
> +	help
> +	  This option improves the randomness of the kernel text
> +	  over basic Kernel Address Space Layout Randomization (KASLR)
> +	  by reordering the kernel text at boot time. This feature
> +	  uses information generated at compile time to re-layout the
> +	  kernel text section at boot time at function level granularity.
> +
> +	  If unsure, say N.
> +
>  # Relocation on x86 needs some additional build support
>  config X86_NEED_RELOCS
>  	def_bool y

Kconfig bikeshedding: how about putting FG_KASLR in arch/Kconfig, add
a "depends on ARCH_HAS_FG_KASLR", and remove the arch-specific depends.

Then in arch/x86 have ARCH_HAS_FG_KASLR as a def_bool y with the
RANDOMIZE_BASE && X86_64 depends.

This will more cleanly split the build elements (compiler flags) from
the arch elements (64-bit x86, arch-specific flags, etc).

With that split out:

Reviewed-by: Kees Cook <keescook@chromium.org>

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 7/9] x86: Add support for function granular KASLR
  2020-05-21 16:56 ` [PATCH v2 7/9] x86: Add support for function granular KASLR Kristen Carlson Accardi
@ 2020-05-21 21:08   ` Kees Cook
  2020-05-21 21:42     ` Kristen Carlson Accardi
  2020-06-04 17:27     ` Kristen Carlson Accardi
  0 siblings, 2 replies; 27+ messages in thread
From: Kees Cook @ 2020-05-21 21:08 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, Jonathan Corbet, x86, H. Peter Anvin, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Tony Luck,
	linux-doc

On Thu, May 21, 2020 at 09:56:38AM -0700, Kristen Carlson Accardi wrote:
> At boot time, find all the function sections that have separate .text
> sections, shuffle them, and then copy them to new locations. Adjust
> any relocations accordingly.

Commit log length vs "11 files changed, 1159 insertions(+), 15
deletions(-)" implies to me that a lot more detail is needed here. ;)

Perhaps describe what the code pieces are, why the code is being added
are here, etc (I see at least the ELF parsing, the ELF shuffling, the
relocation updates, the symbol list, and the re-sorting of kallsyms,
ORC, and extables. I think the commit log should prepare someone to read
the diff and know what to expect to find. (In the end, I wonder if these
pieces should be split up into logically separate patches, but for now,
let's just go with it -- though I've made some suggestions below about
things that might be worth splitting out.)

More below...

> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> ---
>  Documentation/security/fgkaslr.rst       | 155 +++++
>  Documentation/security/index.rst         |   1 +
>  arch/x86/boot/compressed/Makefile        |   3 +
>  arch/x86/boot/compressed/fgkaslr.c       | 823 +++++++++++++++++++++++
>  arch/x86/boot/compressed/kaslr.c         |   4 -
>  arch/x86/boot/compressed/misc.c          | 109 ++-
>  arch/x86/boot/compressed/misc.h          |  34 +
>  arch/x86/boot/compressed/utils.c         |  12 +
>  arch/x86/boot/compressed/vmlinux.symbols |  17 +
>  arch/x86/include/asm/boot.h              |  15 +-
>  include/uapi/linux/elf.h                 |   1 +
>  11 files changed, 1159 insertions(+), 15 deletions(-)
>  create mode 100644 Documentation/security/fgkaslr.rst
>  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
>  create mode 100644 arch/x86/boot/compressed/utils.c
>  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> 
> diff --git a/Documentation/security/fgkaslr.rst b/Documentation/security/fgkaslr.rst
> new file mode 100644
> index 000000000000..94939c62c50d
> --- /dev/null
> +++ b/Documentation/security/fgkaslr.rst
> @@ -0,0 +1,155 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=====================================================================
> +Function Granular Kernel Address Space Layout Randomization (fgkaslr)
> +=====================================================================
> +
> +:Date: 6 April 2020
> +:Author: Kristen Accardi
> +
> +Kernel Address Space Layout Randomization (KASLR) was merged into the kernel
> +with the objective of increasing the difficulty of code reuse attacks. Code
> +reuse attacks reused existing code snippets to get around existing memory
> +protections. They exploit software bugs which expose addresses of useful code
> +snippets to control the flow of execution for their own nefarious purposes.
> +KASLR as it was originally implemented moves the entire kernel code text as a
> +unit at boot time in order to make addresses less predictable. The order of the
> +code within the segment is unchanged - only the base address is shifted. There
> +are a few shortcomings to this algorithm.
> +
> +1. Low Entropy - there are only so many locations the kernel can fit in. This
> +   means an attacker could guess without too much trouble.
> +2. Knowledge of a single address can reveal the offset of the base address,
> +   exposing all other locations for a published/known kernel image.
> +3. Info leaks abound.
> +
> +Finer grained ASLR has been proposed as a way to make ASLR more resistant
> +to info leaks. It is not a new concept at all, and there are many variations
> +possible. Function reordering is an implementation of finer grained ASLR
> +which randomizes the layout of an address space on a function level
> +granularity. The term "fgkaslr" is used in this document to refer to the
> +technique of function reordering when used with KASLR, as well as finer grained
> +KASLR in general.
> +
> +The objective of this patch set is to improve a technology that is already
> +merged into the kernel (KASLR). This code will not prevent all code reuse
> +attacks, and should be considered as one of several tools that can be used.
> +
> +Implementation Details
> +======================
> +
> +The over-arching objective of the fgkaslr implementation is incremental
> +improvement over the existing KASLR algorithm. It is designed to work with
> +the existing solution, and there are two main area where code changes occur:
> +Build time, and Load time.
> +
> +Build time
> +----------
> +
> +GCC has had an option to place functions into individual .text sections
> +for many years now (-ffunction-sections). This option is used to implement
> +function reordering at load time. The final compiled vmlinux retains all the
> +section headers, which can be used to help find the address ranges of each
> +function. Using this information and an expanded table of relocation addresses,
> +individual text sections can be shuffled immediately after decompression.
> +Some data tables inside the kernel that have assumptions about order
> +require sorting after the update. In order to modify these tables,
> +a few key symbols from the objcopy symbol stripping process are preserved
> +for use after shuffling the text segments.
> +
> +Load time
> +---------
> +
> +The boot kernel was modified to parse the vmlinux elf file after
> +decompression to check for symbols for modifying data tables, and to
> +look for any .text.* sections to randomize. The sections are then shuffled,
> +and tables are updated or resorted. The existing code which updated relocation
> +addresses was modified to account for not just a fixed delta from the load
> +address, but the offset that the function section was moved to. This requires
> +inspection of each address to see if it was impacted by a randomization.
> +
> +In order to hide the new layout, symbols reported through /proc/kallsyms will
> +be sorted by name alphabetically rather than by address.
> +
> +Performance Impact
> +==================
> +
> +There are two areas where function reordering can impact performance: boot
> +time latency, and run time performance.
> +
> +Boot time latency
> +-----------------
> +
> +This implementation of finer grained KASLR impacts the boot time of the kernel
> +in several places. It requires additional parsing of the kernel ELF file to
> +obtain the section headers of the sections to be randomized. It calls the
> +random number generator for each section to be randomized to determine that
> +section's new memory location. It copies the decompressed kernel into a new
> +area of memory to avoid corruption when laying out the newly randomized
> +sections. It increases the number of relocations the kernel has to perform at
> +boot time vs. standard KASLR, and it also requires a lookup on each address
> +that needs to be relocated to see if it was in a randomized section and needs
> +to be adjusted by a new offset. Finally, it re-sorts a few data tables that
> +are required to be sorted by address.
> +
> +Booting a test VM on a modern, well appointed system showed an increase in
> +latency of approximately 1 second.
> +
> +Run time
> +--------
> +
> +The performance impact at run-time of function reordering varies by workload.
> +Randomly reordering the functions will cause an increase in cache misses
> +for some workloads. Some workloads perform significantly worse under FGKASLR,
> +while others stay the same or even improve. In general, it will depend on the
> +code flow whether or not finer grained KASLR will impact a workload, and how
> +the underlying code was designed. Because the layout changes per boot, each
> +time a system is rebooted the performance of a workload may change.
> +
> +Image Size
> +==========
> +
> +fgkaslr increases the size of the kernel binary due to the extra section
> +headers that are included, as well as the extra relocations that need to
> +be added. You can expect fgkaslr to increase the size of the resulting
> +vmlinux by about 3%, and the compressed image (bzImage) by 15%.
> +
> +Memory Usage
> +============
> +
> +fgkaslr increases the amount of heap that is required at boot time,
> +although this extra memory is released when the kernel has finished
> +decompression. As a result, it may not be appropriate to use this feature
> +on systems without much memory.
> +
> +Building
> +========
> +
> +To enable fine grained KASLR, you need to have the following config options
> +set (including all the ones you would use to build normal KASLR)
> +
> +``CONFIG_FG_KASLR=y``
> +
> +fgkaslr for the kernel is only supported for the X86_64 architecture.
> +
> +Modules
> +=======
> +
> +Modules are randomized similarly to the rest of the kernel by shuffling
> +the sections at load time prior to moving them into memory. The module must
> +also have been build with the -ffunction-sections compiler option.
> +
> +Although fgkaslr for the kernel is only supported for the X86_64 architecture,
> +it is possible to use fgkaslr with modules on other architectures. To enable
> +this feature, select the following config option:
> +
> +``CONFIG_MODULE_FG_KASLR``
> +
> +This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.
> +
> +Disabling
> +=========
> +
> +Disabling normal kaslr using the nokaslr command line option also disables
> +fgkaslr. In addtion, it is possible to disable fgkaslr separately by booting
> +with fgkaslr=off on the commandline.

Yay documentation! Are you able to include the references to the papers
you discuss in the cover letter in here too?

> diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
> index fc503dd689a7..5e370c409ad2 100644
> --- a/Documentation/security/index.rst
> +++ b/Documentation/security/index.rst
> @@ -7,6 +7,7 @@ Security Documentation
>  
>     credentials
>     IMA-templates
> +   fgkaslr
>     keys/index
>     lsm
>     lsm-development
> diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
> index 3a5a004498de..0ad5a31f1f0a 100644
> --- a/arch/x86/boot/compressed/Makefile
> +++ b/arch/x86/boot/compressed/Makefile
> @@ -80,10 +80,12 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o $(obj)/head_$(BITS).o
>  
>  vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
>  vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
> +vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/utils.o
>  ifdef CONFIG_X86_64
>  	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
>  	vmlinux-objs-y += $(obj)/mem_encrypt.o
>  	vmlinux-objs-y += $(obj)/pgtable_64.o
> +	vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/fgkaslr.o
>  endif

CONFIG_FG_KASLR is already gated by CONFIG_X86-64, so I think you can
just put these all on the same line before the ifdef:

vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/utils.o $(obj)/fgkaslr.o

>  
>  vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
> @@ -120,6 +122,7 @@ OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
>  
>  ifdef CONFIG_FG_KASLR
>  	RELOCS_ARGS += --fg-kaslr
> +	OBJCOPYFLAGS += --keep-symbols=$(srctree)/$(src)/vmlinux.symbols
>  endif
>  
>  $(obj)/vmlinux.bin: vmlinux FORCE
> diff --git a/arch/x86/boot/compressed/fgkaslr.c b/arch/x86/boot/compressed/fgkaslr.c
> new file mode 100644
> index 000000000000..451e807de276
> --- /dev/null
> +++ b/arch/x86/boot/compressed/fgkaslr.c
> @@ -0,0 +1,823 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * fgkaslr.c
> + *
> + * This contains the routines needed to reorder the kernel text section
> + * at boot time.
> + */
> +#include "misc.h"
> +#include "error.h"
> +#include "pgtable.h"
> +#include "../string.h"
> +#include "../voffset.h"
> +#include <linux/sort.h>
> +#include <linux/bsearch.h>
> +#include "../../include/asm/extable.h"
> +#include "../../include/asm/orc_types.h"
> +
> +/*
> + * Longest parameter of 'fgkaslr=' is 'off' right now, plus an extra '\0'
> + * for termination.
> + */
> +#define MAX_FGKASLR_ARG_LENGTH 4
> +int nofgkaslr;

This can be static.

> +
> +/*
> + * Use normal definitions of mem*() from string.c. There are already
> + * included header files which expect a definition of memset() and by
> + * the time we define memset macro, it is too late.
> + */
> +#undef memcpy
> +#undef memset
> +#define memzero(s, n)	memset((s), 0, (n))
> +#define memmove		memmove
> +
> +void *memmove(void *dest, const void *src, size_t n);
> +
> +static unsigned long percpu_start;
> +static unsigned long percpu_end;
> +
> +static long kallsyms_names;
> +static long kallsyms_offsets;
> +static long kallsyms_num_syms;
> +static long kallsyms_relative_base;
> +static long kallsyms_markers;
> +static long __start___ex_table_addr;
> +static long __stop___ex_table_addr;
> +static long _stext;
> +static long _etext;
> +static long _sinittext;
> +static long _einittext;
> +static long __start_orc_unwind_ip;
> +static long __stop_orc_unwind_ip;
> +static long __start_orc_unwind;
> +
> +/* addresses in mapped address space */
> +static int *base;
> +static u8 *names;
> +static unsigned long relative_base;
> +static unsigned int *markers_addr;
> +
> +struct kallsyms_name {
> +	u8 len;
> +	u8 indecis[256];
> +};
> +
> +struct kallsyms_name *names_table;
> +
> +struct orc_entry *cur_orc_table;
> +int *cur_orc_ip_table;

static?

> +
> +/*
> + * This is an array of pointers to sections headers for randomized sections
> + */
> +Elf64_Shdr **sections;

Given the creation of the Elf_Shdr etc macros in the header, should all
of the Elf64 things in here be updated to use the size-agnostic macros?
(Is anyone actually going to run a 32-bit kernel with fgkaslr some day?)

> +
> +/*
> + * This is an array of all section headers, randomized or otherwise.

Comments nitpick while I'm thinking about it: "This is" tends to be
redundant, and I'd pick a single-sentence-comment punctuation style:
i.e. with or without an ending period. I personally prefer including the
period, but mainly I think it should just be consistent. :)

> + */
> +Elf64_Shdr *sechdrs;

Also, these can be static?

> +
> +/*
> + * The number of elements in the randomized section header array (sections)
> + */
> +int sections_size;

static? (Also, it seems this is related to "sections" above, so maybe
more it closer?)

> +
> +static bool is_text(long addr)
> +{
> +	if ((addr >= _stext && addr < _etext) ||
> +	    (addr >= _sinittext && addr < _einittext))
> +		return true;
> +	return false;
> +}
> +
> +bool is_percpu_addr(long pc, long offset)
> +{
> +	unsigned long ptr;
> +	long address;
> +
> +	address = pc + offset + 4;
> +
> +	ptr = (unsigned long)address;
> +
> +	if (ptr >= percpu_start && ptr < percpu_end)
> +		return true;
> +
> +	return false;
> +}
> +
> +static int cmp_section_addr(const void *a, const void *b)
> +{
> +	unsigned long ptr = (unsigned long)a;
> +	Elf64_Shdr *s = *(Elf64_Shdr **)b;
> +	unsigned long end = s->sh_addr + s->sh_size;
> +
> +	if (ptr >= s->sh_addr && ptr < end)
> +		return 0;
> +
> +	if (ptr < s->sh_addr)
> +		return -1;
> +
> +	return 1;
> +}
> +
> +/*
> + * Discover if the address is in a randomized section and if so, adjust
> + * by the saved offset.
> + */
> +Elf64_Shdr *adjust_address(long *address)
> +{
> +	Elf64_Shdr **s;
> +	Elf64_Shdr *shdr;
> +
> +	if (nofgkaslr)
> +		return NULL;
> +
> +	s = bsearch((const void *)*address, sections, sections_size, sizeof(*s),
> +		    cmp_section_addr);
> +	if (s) {
> +		shdr = *s;
> +		*address += shdr->sh_offset;
> +		return shdr;
> +	}
> +
> +	return NULL;
> +}
> +
> +void adjust_relative_offset(long pc, long *value, Elf64_Shdr *section)
> +{
> +	Elf64_Shdr *s;
> +	long address;
> +
> +	if (nofgkaslr)
> +		return;
> +
> +	/*
> +	 * sometimes we are updating a relative offset that would
> +	 * normally be relative to the next instruction (such as a call).
> +	 * In this case to calculate the target, you need to add 32bits to
> +	 * the pc to get the next instruction value. However, sometimes
> +	 * targets are just data that was stored in a table such as ksymtab
> +	 * or cpu alternatives. In this case our target is not relative to
> +	 * the next instruction.
> +	 */

Excellent and scary comment. ;) Was this found by trial and error? That
sounds "fun" to debug. :P

> +
> +	/*
> +	 * Calculate the address that this offset would call.
> +	 */
> +	if (!is_text(pc))
> +		address = pc + *value;
> +	else
> +		address = pc + *value + 4;
> +
> +	/*
> +	 * if the address is in section that was randomized,
> +	 * we need to adjust the offset.
> +	 */
> +	s = adjust_address(&address);
> +	if (s)
> +		*value += s->sh_offset;
> +
> +	/*
> +	 * If the PC that this offset was calculated for was in a section
> +	 * that has been randomized, the value needs to be adjusted by the
> +	 * same amount as the randomized section was adjusted from it's original
> +	 * location.
> +	 */
> +	if (section)
> +		*value -= section->sh_offset;
> +}
> +
> +static void kallsyms_swp(void *a, void *b, int size)
> +{
> +	int idx1, idx2;
> +	int temp;
> +	struct kallsyms_name name_a;
> +
> +	/* determine our index into the array */
> +	idx1 = (int *)a - base;
> +	idx2 = (int *)b - base;
> +	temp = base[idx1];
> +	base[idx1] = base[idx2];
> +	base[idx2] = temp;
> +
> +	/* also swap the names table */
> +	memcpy(&name_a, &names_table[idx1], sizeof(name_a));
> +	memcpy(&names_table[idx1], &names_table[idx2],
> +	       sizeof(struct kallsyms_name));
> +	memcpy(&names_table[idx2], &name_a, sizeof(struct kallsyms_name));
> +}
> +
> +static int kallsyms_cmp(const void *a, const void *b)
> +{
> +	int addr_a, addr_b;
> +	unsigned long uaddr_a, uaddr_b;
> +
> +	addr_a = *(int *)a;
> +	addr_b = *(int *)b;
> +
> +	if (addr_a >= 0)
> +		uaddr_a = addr_a;
> +	if (addr_b >= 0)
> +		uaddr_b = addr_b;
> +
> +	if (addr_a < 0)
> +		uaddr_a = relative_base - 1 - addr_a;
> +	if (addr_b < 0)
> +		uaddr_b = relative_base - 1 - addr_b;
> +
> +	if (uaddr_b > uaddr_a)
> +		return -1;
> +
> +	return 0;
> +}
> +
> +static void deal_with_names(int num_syms)
> +{
> +	int num_bytes;
> +	int i, j;
> +	int offset;
> +
> +	/* we should have num_syms kallsyms_name entries */
> +	num_bytes = num_syms * sizeof(*names_table);
> +	names_table = malloc(num_syms * sizeof(*names_table));
> +	if (!names_table) {
> +		debug_putstr("\nbytes requested: ");
> +		debug_puthex(num_bytes);
> +		error("\nunable to allocate space for names table\n");
> +	}
> +
> +	/* read all the names entries */
> +	offset = 0;
> +	for (i = 0; i < num_syms; i++) {
> +		names_table[i].len = names[offset];
> +		offset++;
> +		for (j = 0; j < names_table[i].len; j++) {
> +			names_table[i].indecis[j] = names[offset];
> +			offset++;
> +		}
> +	}
> +}
> +
> +static void write_sorted_names(int num_syms)
> +{
> +	int i, j;
> +	int offset = 0;
> +	unsigned int *markers;
> +
> +	/*
> +	 * we are going to need to regenerate the markers table, which is a
> +	 * table of offsets into the compressed stream every 256 symbols.
> +	 * this code copied almost directly from scripts/kallsyms.c
> +	 */

Can any of this kallsyms sorting code be reasonably reused instead
of copy/pasting? Or is this mostly novel in that it's taking the output
of that earlier sort, which isn't the input format scripts/kallsyms.c
uses?

> +	markers = malloc(sizeof(unsigned int) * ((num_syms + 255) / 256));
> +	if (!markers) {
> +		debug_putstr("\nfailed to allocate heap space of ");
> +		debug_puthex(((num_syms + 255) / 256));
> +		debug_putstr(" bytes\n");
> +		error("Unable to allocate space for markers table");
> +	}
> +
> +	for (i = 0; i < num_syms; i++) {
> +		if ((i & 0xFF) == 0)
> +			markers[i >> 8] = offset;
> +
> +		names[offset] = (u8)names_table[i].len;
> +		offset++;
> +		for (j = 0; j < names_table[i].len; j++) {
> +			names[offset] = (u8)names_table[i].indecis[j];
> +			offset++;
> +		}
> +	}
> +
> +	/* write new markers table over old one */
> +	for (i = 0; i < ((num_syms + 255) >> 8); i++)
> +		markers_addr[i] = markers[i];
> +
> +	free(markers);
> +	free(names_table);
> +}
> +
> +static void sort_kallsyms(unsigned long map)
> +{
> +	int num_syms;
> +	int i;
> +
> +	debug_putstr("\nRe-sorting kallsyms...\n");
> +
> +	num_syms = *(int *)(kallsyms_num_syms + map);
> +	base = (int *)(kallsyms_offsets + map);
> +	relative_base = *(unsigned long *)(kallsyms_relative_base + map);
> +	markers_addr = (unsigned int *)(kallsyms_markers + map);
> +	names = (u8 *)(kallsyms_names + map);
> +
> +	/*
> +	 * the kallsyms table was generated prior to any randomization.
> +	 * it is a bunch of offsets from "relative base". In order for
> +	 * us to check if a symbol has an address that was in a randomized
> +	 * section, we need to reconstruct the address to it's original
> +	 * value prior to handle_relocations.
> +	 */
> +	for (i = 0; i < num_syms; i++) {
> +		unsigned long addr;
> +		int new_base;
> +
> +		/*
> +		 * according to kernel/kallsyms.c, positive offsets are absolute
> +		 * values and negative offsets are relative to the base.
> +		 */
> +		if (base[i] >= 0)
> +			addr = base[i];
> +		else
> +			addr = relative_base - 1 - base[i];
> +
> +		if (adjust_address(&addr)) {
> +			/* here we need to recalcuate the offset */
> +			new_base = relative_base - 1 - addr;
> +			base[i] = new_base;
> +		}
> +	}
> +
> +	/*
> +	 * here we need to read in all the kallsyms_names info
> +	 * so that we can regenerate it.
> +	 */
> +	deal_with_names(num_syms);
> +
> +	sort(base, num_syms, sizeof(int), kallsyms_cmp, kallsyms_swp);
> +
> +	/* write the newly sorted names table over the old one */
> +	write_sorted_names(num_syms);
> +}
> +
> +#include "../../../../lib/extable.c"

Now that the earlier linking glitches have been sorted out, I wonder if
it might be nicer to add this as a separate compilation unit, similar to
how early_serial_console.c is done? (Or, I guess, more specifically, why
can't this be in utils.c?)

> +
> +static inline unsigned long
> +ex_fixup_handler(const struct exception_table_entry *x)
> +{
> +	return ((unsigned long)&x->handler + x->handler);
> +}
> +
> +static inline unsigned long
> +ex_fixup_addr(const struct exception_table_entry *x)
> +{
> +	return ((unsigned long)&x->fixup + x->fixup);
> +}
> +
> +static void update_ex_table(unsigned long map)
> +{
> +	struct exception_table_entry *start_ex_table =
> +		(struct exception_table_entry *)(__start___ex_table_addr + map);
> +	struct exception_table_entry *stop_ex_table =
> +		(struct exception_table_entry *)(__stop___ex_table_addr + map);
> +	int num_entries =
> +		(__stop___ex_table_addr - __start___ex_table_addr) /
> +		sizeof(struct exception_table_entry);
> +	int i;
> +
> +	debug_putstr("\nUpdating exception table...");
> +	for (i = 0; i < num_entries; i++) {
> +		unsigned long insn = ex_to_insn(&start_ex_table[i]);
> +		unsigned long fixup = ex_fixup_addr(&start_ex_table[i]);
> +		unsigned long handler = ex_fixup_handler(&start_ex_table[i]);
> +		unsigned long addr;
> +		Elf64_Shdr *s;
> +
> +		/* check each address to see if it needs adjusting */
> +		addr = insn - map;
> +		s = adjust_address(&addr);
> +		if (s)
> +			start_ex_table[i].insn += s->sh_offset;
> +
> +		addr = fixup - map;
> +		s = adjust_address(&addr);
> +		if (s)
> +			start_ex_table[i].fixup += s->sh_offset;
> +
> +		addr = handler - map;
> +		s = adjust_address(&addr);
> +		if (s)
> +			start_ex_table[i].handler += s->sh_offset;
> +	}
> +}
> +
> +static void sort_ex_table(unsigned long map)
> +{
> +	struct exception_table_entry *start_ex_table =
> +		(struct exception_table_entry *)(__start___ex_table_addr + map);
> +	struct exception_table_entry *stop_ex_table =
> +		(struct exception_table_entry *)(__stop___ex_table_addr + map);
> +
> +	debug_putstr("\nRe-sorting exception table...");
> +
> +	sort_extable(start_ex_table, stop_ex_table);
> +}
> +
> +static inline unsigned long orc_ip(const int *ip)
> +{
> +	return (unsigned long)ip + *ip;
> +}
> +
> +static void orc_sort_swap(void *_a, void *_b, int size)
> +{
> +	struct orc_entry *orc_a, *orc_b;
> +	struct orc_entry orc_tmp;
> +	int *a = _a, *b = _b, tmp;
> +	int delta = _b - _a;
> +
> +	/* Swap the .orc_unwind_ip entries: */
> +	tmp = *a;
> +	*a = *b + delta;
> +	*b = tmp - delta;
> +
> +	/* Swap the corresponding .orc_unwind entries: */
> +	orc_a = cur_orc_table + (a - cur_orc_ip_table);
> +	orc_b = cur_orc_table + (b - cur_orc_ip_table);
> +	orc_tmp = *orc_a;
> +	*orc_a = *orc_b;
> +	*orc_b = orc_tmp;
> +}
> +
> +static int orc_sort_cmp(const void *_a, const void *_b)
> +{
> +	struct orc_entry *orc_a;
> +	const int *a = _a, *b = _b;
> +	unsigned long a_val = orc_ip(a);
> +	unsigned long b_val = orc_ip(b);
> +
> +	if (a_val > b_val)
> +		return 1;
> +	if (a_val < b_val)
> +		return -1;
> +
> +	/*
> +	 * The "weak" section terminator entries need to always be on the left
> +	 * to ensure the lookup code skips them in favor of real entries.
> +	 * These terminator entries exist to handle any gaps created by
> +	 * whitelisted .o files which didn't get objtool generation.
> +	 */
> +	orc_a = cur_orc_table + (a - cur_orc_ip_table);
> +	return orc_a->sp_reg == ORC_REG_UNDEFINED && !orc_a->end ? -1 : 1;
> +}
> +
> +static void sort_orc_table(unsigned long map)
> +{
> +	int num_entries =
> +		(__stop_orc_unwind_ip - __start_orc_unwind_ip) / sizeof(int);
> +
> +	cur_orc_ip_table = (int *)(__start_orc_unwind_ip + map);
> +	cur_orc_table = (struct orc_entry *)(__start_orc_unwind + map);
> +
> +	debug_putstr("\nRe-sorting orc tables...\n");
> +	sort(cur_orc_ip_table, num_entries, sizeof(int), orc_sort_cmp,
> +	     orc_sort_swap);
> +}
> +
> +void post_relocations_cleanup(unsigned long map)
> +{
> +	if (!nofgkaslr) {
> +		update_ex_table(map);
> +		sort_ex_table(map);
> +		sort_orc_table(map);
> +	}
> +
> +	/*
> +	 * maybe one day free will do something. So, we "free" this memory
> +	 * in either case
> +	 */
> +	free(sections);
> +	free(sechdrs);
> +}
> +
> +void pre_relocations_cleanup(unsigned long map)
> +{
> +	if (nofgkaslr)
> +		return;
> +
> +	sort_kallsyms(map);
> +}
> +
> +static void shuffle_sections(int *list, int size)
> +{
> +	int i;
> +	unsigned long j;
> +	int temp;
> +
> +	for (i = size - 1; i > 0; i--) {
> +		j = kaslr_get_random_long(NULL) % (i + 1);
> +
> +		temp = list[i];
> +		list[i] = list[j];
> +		list[j] = temp;
> +	}
> +}
> +
> +static void move_text(int num_sections, char *secstrings, Elf64_Shdr *text,
> +		      void *source, void *dest, Elf64_Phdr *phdr)
> +{
> +	unsigned long adjusted_addr;
> +	int copy_bytes;
> +	void *stash;
> +	Elf64_Shdr **sorted_sections;
> +	int *index_list;
> +	int i, j;
> +
> +	memmove(dest, source + text->sh_offset, text->sh_size);
> +	copy_bytes = text->sh_size;
> +	dest += text->sh_size;
> +	adjusted_addr = text->sh_addr + text->sh_size;
> +
> +	/*
> +	 * we leave the sections sorted in their original order
> +	 * by s->sh_addr, but shuffle the indexes in a random
> +	 * order for copying.
> +	 */
> +	index_list = malloc(sizeof(int) * num_sections);
> +	if (!index_list)
> +		error("Failed to allocate space for index list");
> +
> +	for (i = 0; i < num_sections; i++)
> +		index_list[i] = i;
> +
> +	shuffle_sections(index_list, num_sections);
> +
> +	/*
> +	 * to avoid overwriting earlier sections before they can get
> +	 * copied to dest, stash everything into a buffer first.
> +	 * this will cause our source address to be off by
> +	 * phdr->p_offset though, so we'll adjust s->sh_offset below.
> +	 *
> +	 * TBD: ideally we'd simply decompress higher up so that our
> +	 * copy wasn't in danger of overwriting anything important.
> +	 */
> +	stash = malloc(phdr->p_filesz);

Yeah, I'm glad this is captured in a comment. I'm fine leaving this
as it is for now, but it's certainly an area I'd like to look at again
in the future. I think we could gain some performance by removing this
copy. The KASLR code could choose two kernel-image-sized slots, decompress
into one (which would effectively be "stash" above) and the fgkaslr code
could write the resulting image into the second slot. For now, though,
this doesn't really matter: we just get a double-size image area and
copy the kernel around an extra time. :)

> +	if (!stash)
> +		error("Failed to allocate space for text stash");
> +
> +	memcpy(stash, source + phdr->p_offset, phdr->p_filesz);
> +
> +	/* now we'd walk through the sections. */
> +	for (j = 0; j < num_sections; j++) {
> +		unsigned long aligned_addr;
> +		Elf64_Shdr *s;
> +		const char *sname;
> +		void *src;
> +		int pad_bytes;
> +
> +		s = sections[index_list[j]];
> +
> +		sname = secstrings + s->sh_name;
> +
> +		/* align addr for this section */
> +		aligned_addr = ALIGN(adjusted_addr, s->sh_addralign);
> +
> +		/*
> +		 * copy out of stash, so adjust offset
> +		 */
> +		src = stash + s->sh_offset - phdr->p_offset;
> +
> +		/*
> +		 * Fill any space between sections with int3
> +		 */
> +		pad_bytes = aligned_addr - adjusted_addr;
> +		memset(dest, 0xcc, pad_bytes);
> +
> +		dest = (void *)ALIGN((unsigned long)dest, s->sh_addralign);
> +
> +		memmove(dest, src, s->sh_size);
> +
> +		dest += s->sh_size;
> +		copy_bytes += s->sh_size + pad_bytes;
> +		adjusted_addr = aligned_addr + s->sh_size;
> +
> +		/* we can blow away sh_offset for our own uses */
> +		s->sh_offset = aligned_addr - s->sh_addr;
> +	}
> +
> +	free(index_list);
> +
> +	/*
> +	 * move remainder of text segment. Ok to just use original source
> +	 * here since this area is untouched.
> +	 */
> +	memmove(dest, source + text->sh_offset + copy_bytes,
> +		phdr->p_filesz - copy_bytes);
> +	free(stash);
> +}
> +
> +#define GET_SYM(name)							\
> +	do {								\
> +		if (!name) {						\
> +			if (strcmp(#name, strtab + sym->st_name) == 0) {\
> +				name = sym->st_value;			\
> +				continue;				\
> +			}						\
> +		}							\
> +	} while (0)
> +
> +static void parse_symtab(Elf64_Sym *symtab, char *strtab, long num_syms)
> +{
> +	Elf64_Sym *sym;
> +
> +	if (!symtab || !strtab)
> +		return;
> +
> +	debug_putstr("\nLooking for symbols... ");
> +
> +	/*
> +	 * walk through the symbol table looking for the symbols
> +	 * that we care about.
> +	 */
> +	for (sym = symtab; --num_syms >= 0; sym++) {
> +		if (!sym->st_name)
> +			continue;
> +
> +		GET_SYM(kallsyms_num_syms);
> +		GET_SYM(kallsyms_offsets);
> +		GET_SYM(kallsyms_relative_base);
> +		GET_SYM(kallsyms_names);
> +		GET_SYM(kallsyms_markers);
> +		GET_SYM(_stext);
> +		GET_SYM(_etext);
> +		GET_SYM(_sinittext);
> +		GET_SYM(_einittext);
> +		GET_SYM(__start_orc_unwind_ip);
> +		GET_SYM(__stop_orc_unwind_ip);
> +		GET_SYM(__start_orc_unwind);
> +
> +		/*
> +		 * the GET_SYM macro can't be used here because these have
> +		 * to be renamed due to the inclusion of lib/extable.c
> +		 */

Perhaps just globally prefix all the needed image symbols with "loader_"
or "fgkaslr_" or "img_", and then you don't need to open-code this one
with a different target name. Something like this:

#define GET_SYM(name)							\
	do {								\
		if (!img_ ## name) {					\
			if (strcmp(#name, strtab + sym->st_name) == 0) {\
				img_ ## name = sym->st_value;		\
				continue;				\
			}						\
		}							\
	} while (0)

Then there is also a consistent naming convention within the code here
to easily see where addresses came from, and you avoid any future
collisions like are being avoided for the ex_table symbols.

> +		if (!__start___ex_table_addr) {
> +			if (strcmp("__start___ex_table",
> +				   strtab + sym->st_name) == 0) {
> +				__start___ex_table_addr = sym->st_value;
> +				continue;
> +			}
> +		}
> +
> +		if (!__stop___ex_table_addr) {
> +			if (strcmp("__stop___ex_table",
> +				   strtab + sym->st_name) == 0) {
> +				__stop___ex_table_addr = sym->st_value;
> +				continue;
> +			}
> +		}
> +	}
> +}
> +
> +void parse_sections_headers(void *output, Elf64_Ehdr *ehdr, Elf64_Phdr *phdrs)
> +{
> +	Elf64_Phdr *phdr;
> +	Elf64_Shdr *s;
> +	Elf64_Shdr *text = NULL;
> +	Elf64_Shdr *percpu = NULL;
> +	char *secstrings;
> +	const char *sname;
> +	int num_sections = 0;
> +	Elf64_Sym *symtab = NULL;
> +	char *strtab = NULL;
> +	long num_syms = 0;
> +	void *dest;
> +	int i;
> +	char arg[MAX_FGKASLR_ARG_LENGTH];
> +	Elf64_Shdr shdr;
> +	unsigned long shnum;
> +	unsigned int shstrndx;
> +
> +	debug_putstr("\nParsing ELF section headers... ");
> +
> +	/*
> +	 * Even though fgkaslr may have been disabled, we still
> +	 * need to parse through the section headers to get the
> +	 * start and end of the percpu section. This is because
> +	 * if we were built with CONFIG_FG_KASLR, there are more
> +	 * relative relocations present in vmlinux.relocs than
> +	 * just the percpu, and only the percpu relocs need to be
> +	 * adjusted when using just normal base address kaslr.
> +	 */
> +	if (cmdline_find_option("fgkaslr", arg, sizeof(arg)) == 3 &&
> +	    !strncmp(arg, "off", 3)) {
> +		warn("FG_KASLR disabled on cmdline.");
> +		nofgkaslr = 1;
> +	}
> +
> +	/* read the first section header */
> +	shnum = ehdr->e_shnum;
> +	shstrndx = ehdr->e_shstrndx;
> +	if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
> +		memcpy(&shdr, output + ehdr->e_shoff, sizeof(shdr));
> +		if (shnum == SHN_UNDEF)
> +			shnum = shdr.sh_size;
> +		if (shstrndx == SHN_XINDEX)
> +			shstrndx = shdr.sh_link;
> +	}
> +
> +	/* we are going to need to allocate space for the section headers */
> +	sechdrs = malloc(sizeof(*sechdrs) * shnum);
> +	if (!sechdrs)
> +		error("Failed to allocate space for shdrs");
> +
> +	sections = malloc(sizeof(*sections) * shnum);
> +	if (!sections)
> +		error("Failed to allocate space for section pointers");
> +
> +	memcpy(sechdrs, output + ehdr->e_shoff,
> +	       sizeof(*sechdrs) * shnum);
> +
> +	/* we need to allocate space for the section string table */
> +	s = &sechdrs[shstrndx];
> +
> +	secstrings = malloc(s->sh_size);
> +	if (!secstrings)
> +		error("Failed to allocate space for shstr");
> +
> +	memcpy(secstrings, output + s->sh_offset, s->sh_size);
> +
> +	/*
> +	 * now we need to walk through the section headers and collect the
> +	 * sizes of the .text sections to be randomized.
> +	 */
> +	for (i = 0; i < shnum; i++) {
> +		s = &sechdrs[i];
> +		sname = secstrings + s->sh_name;
> +
> +		if (s->sh_type == SHT_SYMTAB) {
> +			/* only one symtab per image */

I would assert this claim in the code as well: check symtab is NULL here?

> +			symtab = malloc(s->sh_size);
> +			if (!symtab)
> +				error("Failed to allocate space for symtab");
> +
> +			memcpy(symtab, output + s->sh_offset, s->sh_size);
> +			num_syms = s->sh_size / sizeof(*symtab);
> +			continue;
> +		}
> +
> +		if (s->sh_type == SHT_STRTAB && i != ehdr->e_shstrndx) {
> +			strtab = malloc(s->sh_size);

Similar robustness check?

> +			if (!strtab)
> +				error("Failed to allocate space for strtab");
> +
> +			memcpy(strtab, output + s->sh_offset, s->sh_size);
> +		}
> +
> +		if (!strcmp(sname, ".text")) {
> +			text = s;
> +			continue;
> +		}

Check text is still NULL here?

Also, why the continue? This means the section isn't included in the
sections[] array? (Obviously things still work, but I don't understand
why.)

> +
> +		if (!strcmp(sname, ".data..percpu")) {
> +			/* get start addr for later */
> +			percpu = s;

Similar, check percpu is still NULL here?

Also, is a "continue" intended here? (This is kind of the reverse of
the "continue" question above.) I think you get the same result during
the next "if", but I was expecting this block to look like the .text
test above.

> +		}
> +
> +		if (!(s->sh_flags & SHF_ALLOC) ||
> +		    !(s->sh_flags & SHF_EXECINSTR) ||
> +		    !(strstarts(sname, ".text")))
> +			continue;
> +
> +		sections[num_sections] = s;
> +
> +		num_sections++;
> +	}
> +	sections[num_sections] = NULL;
> +	sections_size = num_sections;
> +
> +	parse_symtab(symtab, strtab, num_syms);
> +
> +	for (i = 0; i < ehdr->e_phnum; i++) {
> +		phdr = &phdrs[i];
> +
> +		switch (phdr->p_type) {
> +		case PT_LOAD:
> +			if ((phdr->p_align % 0x200000) != 0)
> +				error("Alignment of LOAD segment isn't multiple of 2MB");
> +			dest = output;
> +			dest += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
> +			if (!nofgkaslr &&
> +			    (text && phdr->p_offset == text->sh_offset)) {
> +				move_text(num_sections, secstrings, text,
> +					  output, dest, phdr);
> +			} else {
> +				if (percpu &&
> +				    phdr->p_offset == percpu->sh_offset) {
> +					percpu_start = percpu->sh_addr;
> +					percpu_end = percpu_start +
> +							phdr->p_filesz;
> +				}
> +				memmove(dest, output + phdr->p_offset,
> +					phdr->p_filesz);
> +			}
> +			break;
> +		default: /* Ignore other PT_* */
> +			break;
> +		}
> +	}
> +
> +	/* we need to keep the section info to redo relocs */
> +	free(secstrings);
> +
> +	free(phdrs);
> +}
> +
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index d7408af55738..6f596bd5b6e5 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -39,10 +39,6 @@
>  #include <generated/utsrelease.h>
>  #include <asm/efi.h>
>  
> -/* Macros used by the included decompressor code below. */
> -#define STATIC
> -#include <linux/decompress/mm.h>
> -
>  #ifdef CONFIG_X86_5LEVEL
>  unsigned int __pgtable_l5_enabled;
>  unsigned int pgdir_shift __ro_after_init = 39;
> diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
> index 9652d5c2afda..5f08922fd12a 100644
> --- a/arch/x86/boot/compressed/misc.c
> +++ b/arch/x86/boot/compressed/misc.c
> @@ -26,9 +26,6 @@
>   * it is not safe to place pointers in static structures.
>   */
>  
> -/* Macros used by the included decompressor code below. */
> -#define STATIC		static

Can you split the STATIC macro rearrangement out into a separate patch?
I think it deserves a dedicated commit log to explain why it's changing
the way it is (i.e. we end up with "STATIC" no longer meaning "static"
in misc.c now...)

> -
>  /*
>   * Use normal definitions of mem*() from string.c. There are already
>   * included header files which expect a definition of memset() and by
> @@ -49,6 +46,10 @@ struct boot_params *boot_params;
>  
>  memptr free_mem_ptr;
>  memptr free_mem_end_ptr;
> +#ifdef CONFIG_FG_KASLR
> +unsigned long malloc_ptr;
> +int malloc_count;
> +#endif
>  
>  static char *vidmem;
>  static int vidport;
> @@ -203,10 +204,21 @@ static void handle_relocations(void *output, unsigned long output_len,
>  	if (IS_ENABLED(CONFIG_X86_64))
>  		delta = virt_addr - LOAD_PHYSICAL_ADDR;
>  
> -	if (!delta) {
> -		debug_putstr("No relocation needed... ");
> -		return;
> +	/*
> +	 * it is possible to have delta be zero and
> +	 * still have enabled fg kaslr. We need to perform relocations
> +	 * for fgkaslr regardless of whether the base address has moved.
> +	 */

Please reflow these sentences (long whitespace at the end of line one).

> +	if (!IS_ENABLED(CONFIG_FG_KASLR) ||
> +	    cmdline_find_option_bool("nokaslr")) {
> +		if (!delta) {
> +			debug_putstr("No relocation needed... ");
> +			return;
> +		}
>  	}
> +
> +	pre_relocations_cleanup(map);
> +
>  	debug_putstr("Performing relocations... ");
>  
>  	/*
> @@ -230,35 +242,106 @@ static void handle_relocations(void *output, unsigned long output_len,
>  	 */
>  	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
>  		long extended = *reloc;
> +		long value;
> +
> +		/*
> +		 * if using fgkaslr, we might have moved the address
> +		 * of the relocation. Check it to see if it needs adjusting
> +		 * from the original address.
> +		 */
> +		(void)adjust_address(&extended);

I don't think "(void)" needed here?

> +
>  		extended += map;
>  
>  		ptr = (unsigned long)extended;
>  		if (ptr < min_addr || ptr > max_addr)
>  			error("32-bit relocation outside of kernel!\n");
>  
> -		*(uint32_t *)ptr += delta;
> +		value = *(int32_t *)ptr;
> +
> +		/*
> +		 * If using fgkaslr, the value of the relocation
> +		 * might need to be changed because it referred
> +		 * to an address that has moved.
> +		 */
> +		adjust_address(&value);
> +
> +		value += delta;
> +
> +		*(uint32_t *)ptr = value;
>  	}
>  #ifdef CONFIG_X86_64
>  	while (*--reloc) {
>  		long extended = *reloc;
> +		long value;
> +		long oldvalue;
> +		Elf64_Shdr *s;
> +
> +		/*
> +		 * if using fgkaslr, we might have moved the address
> +		 * of the relocation. Check it to see if it needs adjusting
> +		 * from the original address.
> +		 */
> +		s = adjust_address(&extended);
> +
>  		extended += map;
>  
>  		ptr = (unsigned long)extended;
>  		if (ptr < min_addr || ptr > max_addr)
>  			error("inverse 32-bit relocation outside of kernel!\n");
>  
> -		*(int32_t *)ptr -= delta;
> +		value = *(int32_t *)ptr;
> +		oldvalue = value;
> +
> +		/*
> +		 * If using fgkaslr, these relocs will contain
> +		 * relative offsets which might need to be
> +		 * changed because it referred
> +		 * to an address that has moved.
> +		 */
> +		adjust_relative_offset(*reloc, &value, s);
> +
> +		/*
> +		 * only percpu symbols need to have their values adjusted for
> +		 * base address kaslr since relative offsets within the .text
> +		 * and .text.* sections are ok wrt each other.
> +		 */
> +		if (is_percpu_addr(*reloc, oldvalue))
> +			value -= delta;
> +
> +		*(int32_t *)ptr = value;
>  	}
>  	for (reloc--; *reloc; reloc--) {
>  		long extended = *reloc;
> +		long value;
> +
> +		/*
> +		 * if using fgkaslr, we might have moved the address
> +		 * of the relocation. Check it to see if it needs adjusting
> +		 * from the original address.
> +		 */
> +		(void)adjust_address(&extended);
> +
>  		extended += map;
>  
>  		ptr = (unsigned long)extended;
>  		if (ptr < min_addr || ptr > max_addr)
>  			error("64-bit relocation outside of kernel!\n");
>  
> -		*(uint64_t *)ptr += delta;
> +		value = *(int64_t *)ptr;
> +
> +		/*
> +		 * If using fgkaslr, the value of the relocation
> +		 * might need to be changed because it referred
> +		 * to an address that has moved.
> +		 */
> +		(void)adjust_address(&value);
> +
> +		value += delta;
> +
> +		*(uint64_t *)ptr = value;
>  	}
> +	post_relocations_cleanup(map);
>  #endif
>  }
>  #else
> @@ -296,6 +379,14 @@ static void parse_elf(void *output)
>  
>  	memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
>  
> +	if (IS_ENABLED(CONFIG_FG_KASLR)) {
> +		if (!cmdline_find_option_bool("nokaslr")) {
> +			parse_sections_headers(output, &ehdr, phdrs);
> +			return;

Hmmm. There is repeated logic between parse_sections_headers() and the
rest of parse_elf() here. I think if they should stay distinct, I would
split the remaining code in parse_elf() into a separate function, and
then parse_elf() becomes more readable as:

void parse_elf(...)
{
	if (fgkaslr)
		layout_fgkaslr(...)
	else
		layout_normal(...)
}

Does "layout" seem to be a good name here? Maybe "parse_elf" needs to
be "layout_image"? Yay name bikesheds. My main point is that the
"return" above feels abrupt, and begs the question, "why aren't we doing
this other stuff?" But if it's a boolean between two layout functions,
that's easy to understand, IMO.

> +		}
> +		warn("FG_KASLR disabled: 'nokaslr' on cmdline.");
> +	}
> +
>  	for (i = 0; i < ehdr.e_phnum; i++) {
>  		phdr = &phdrs[i];
>  
> diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
> index 726e264410ff..e8e45f263eaf 100644
> --- a/arch/x86/boot/compressed/misc.h
> +++ b/arch/x86/boot/compressed/misc.h
> @@ -39,6 +39,12 @@
>  /* misc.c */
>  extern memptr free_mem_ptr;
>  extern memptr free_mem_end_ptr;
> +#define STATIC
> +#ifdef CONFIG_FG_KASLR
> +#define STATIC_RW_DATA extern
> +#endif
> +#include <linux/decompress/mm.h>
> +
>  extern struct boot_params *boot_params;
>  void __putstr(const char *s);
>  void __puthex(unsigned long value);
> @@ -74,6 +80,34 @@ struct mem_vector {
>  	unsigned long long size;
>  };
>  
> +#ifdef CONFIG_X86_64
> +#define Elf_Ehdr Elf64_Ehdr
> +#define Elf_Phdr Elf64_Phdr
> +#define Elf_Shdr Elf64_Shdr
> +#else
> +#define Elf_Ehdr Elf32_Ehdr
> +#define Elf_Phdr Elf32_Phdr
> +#define Elf_Shdr Elf32_Shdr
> +#endif
> +
> +#if CONFIG_FG_KASLR
> +void parse_sections_headers(void *output, Elf_Ehdr *ehdr, Elf_Phdr *phdrs);
> +void pre_relocations_cleanup(unsigned long map);
> +void post_relocations_cleanup(unsigned long map);
> +Elf_Shdr *adjust_address(long *address);
> +void adjust_relative_offset(long pc, long *value, Elf_Shdr *section);
> +bool is_percpu_addr(long pc, long offset);
> +#else
> +static inline void parse_sections_headers(void *output, Elf_Ehdr *ehdr,
> +					  Elf_Phdr *phdrs) { }
> +static inline void pre_relocations_cleanup(unsigned long map) { }
> +static inline void post_relocations_cleanup(unsigned long map) { }
> +static inline Elf_Shdr *adjust_address(long *address) { return NULL; }
> +static inline void adjust_relative_offset(long pc, long *value,
> +					  Elf_Shdr *section) { }
> +static inline bool is_percpu_addr(long pc, long offset) { return true; }
> +#endif /* CONFIG_FG_KASLR */
> +
>  #if CONFIG_RANDOMIZE_BASE
>  /* kaslr.c */
>  void choose_random_location(unsigned long input,
> diff --git a/arch/x86/boot/compressed/utils.c b/arch/x86/boot/compressed/utils.c
> new file mode 100644
> index 000000000000..ceefc58d7c71
> --- /dev/null
> +++ b/arch/x86/boot/compressed/utils.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * utils.c
> + *
> + * This contains various libraries that are needed for fgkaslr
> + */
> +#define __DISABLE_EXPORTS
> +#define _LINUX_KPROBES_H
> +#define NOKPROBE_SYMBOL(fname)
> +#include "../../../../lib/sort.c"
> +#include "../../../../lib/bsearch.c"
> +
> diff --git a/arch/x86/boot/compressed/vmlinux.symbols b/arch/x86/boot/compressed/vmlinux.symbols
> new file mode 100644
> index 000000000000..cc86e79a2a3d
> --- /dev/null
> +++ b/arch/x86/boot/compressed/vmlinux.symbols
> @@ -0,0 +1,17 @@
> +kallsyms_offsets
> +kallsyms_addresses
> +kallsyms_num_syms
> +kallsyms_relative_base
> +kallsyms_names
> +kallsyms_token_table
> +kallsyms_token_index
> +kallsyms_markers
> +__start___ex_table
> +__stop___ex_table
> +_sinittext
> +_einittext
> +_stext
> +_etext
> +__start_orc_unwind_ip
> +__stop_orc_unwind_ip
> +__start_orc_unwind
> diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
> index 680c320363db..6918d33eb5ef 100644
> --- a/arch/x86/include/asm/boot.h
> +++ b/arch/x86/include/asm/boot.h
> @@ -26,8 +26,19 @@
>  
>  #ifdef CONFIG_KERNEL_BZIP2
>  # define BOOT_HEAP_SIZE		0x400000
> -#else /* !CONFIG_KERNEL_BZIP2 */
> -# define BOOT_HEAP_SIZE		 0x10000
> +#elif CONFIG_FG_KASLR
> +/*
> + * We need extra boot heap when using fgkaslr because we make a copy
> + * of the original decompressed kernel to avoid issues with writing
> + * over ourselves when shuffling the sections. We also need extra
> + * space for resorting kallsyms after shuffling. This value could
> + * be decreased if free() would release memory properly, or if we
> + * could avoid the kernel copy. It would need to be increased if we
> + * find additional tables that need to be resorted.
> + */
> +# define BOOT_HEAP_SIZE		0x4000000
> +#else /* !CONFIG_KERNEL_BZIP2 && !CONFIG_FG_KASLR */
> +# define BOOT_HEAP_SIZE		0x10000
>  #endif
>  
>  #ifdef CONFIG_X86_64
> diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
> index 34c02e4290fe..a85d1792d5a8 100644
> --- a/include/uapi/linux/elf.h
> +++ b/include/uapi/linux/elf.h
> @@ -298,6 +298,7 @@ typedef struct elf64_phdr {
>  #define SHN_LIVEPATCH	0xff20
>  #define SHN_ABS		0xfff1
>  #define SHN_COMMON	0xfff2
> +#define SHN_XINDEX	0xffff
>  #define SHN_HIRESERVE	0xffff
>   
>  typedef struct elf32_shdr {
> -- 
> 2.20.1
> 

Very exciting! :) I'm already looking forward to v3. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 8/9] kallsyms: Hide layout
  2020-05-21 16:56 ` [PATCH v2 8/9] kallsyms: Hide layout Kristen Carlson Accardi
@ 2020-05-21 21:14   ` Kees Cook
  0 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2020-05-21 21:14 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, arjan, x86, linux-kernel, kernel-hardening,
	rick.p.edgecombe, Tony Luck

On Thu, May 21, 2020 at 09:56:39AM -0700, Kristen Carlson Accardi wrote:
> This patch makes /proc/kallsyms display alphabetically by symbol
> name rather than sorted by address in order to hide the newly
> randomized address layout.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> ---
>  kernel/kallsyms.c | 138 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 137 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 16c8c605f4b0..558963b275ec 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -25,6 +25,7 @@
>  #include <linux/filter.h>
>  #include <linux/ftrace.h>
>  #include <linux/compiler.h>
> +#include <linux/list_sort.h>
>  
>  /*
>   * These will be re-linked against their real values
> @@ -446,6 +447,11 @@ struct kallsym_iter {
>  	int show_value;
>  };
>  
> +struct kallsyms_iter_list {
> +	struct kallsym_iter iter;
> +	struct list_head next;
> +};
> +
>  int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value,
>  			    char *type, char *name)
>  {
> @@ -660,6 +666,121 @@ int kallsyms_show_value(void)
>  	}
>  }
>  
> +static int sorted_show(struct seq_file *m, void *p)
> +{
> +	struct list_head *list = m->private;
> +	struct kallsyms_iter_list *iter;
> +	int rc;
> +
> +	if (list_empty(list))
> +		return 0;
> +
> +	iter = list_first_entry(list, struct kallsyms_iter_list, next);
> +
> +	m->private = iter;
> +	rc = s_show(m, p);
> +	m->private = list;
> +
> +	list_del(&iter->next);
> +	kfree(iter);
> +
> +	return rc;
> +}
> +
> +static void *sorted_start(struct seq_file *m, loff_t *pos)
> +{
> +	return m->private;
> +}
> +
> +static void *sorted_next(struct seq_file *m, void *p, loff_t *pos)
> +{
> +	struct list_head *list = m->private;
> +
> +	(*pos)++;
> +
> +	if (list_empty(list))
> +		return NULL;
> +
> +	return p;
> +}
> +
> +static const struct seq_operations kallsyms_sorted_op = {
> +	.start = sorted_start,
> +	.next = sorted_next,
> +	.stop = s_stop,
> +	.show = sorted_show
> +};
> +
> +static int kallsyms_list_cmp(void *priv, struct list_head *a,
> +			     struct list_head *b)
> +{
> +	struct kallsyms_iter_list *iter_a, *iter_b;
> +
> +	iter_a = list_entry(a, struct kallsyms_iter_list, next);
> +	iter_b = list_entry(b, struct kallsyms_iter_list, next);
> +
> +	return strcmp(iter_a->iter.name, iter_b->iter.name);
> +}
> +
> +int get_all_symbol_name(void *data, const char *name, struct module *mod,
> +			unsigned long addr)
> +{
> +	unsigned long sym_pos;
> +	struct kallsyms_iter_list *node, *last;
> +	struct list_head *head = (struct list_head *)data;
> +
> +	node = kmalloc(sizeof(*node), GFP_KERNEL);
> +	if (!node)
> +		return -ENOMEM;
> +
> +	if (list_empty(head)) {
> +		sym_pos = 0;
> +		memset(node, 0, sizeof(*node));
> +		reset_iter(&node->iter, 0);
> +		node->iter.show_value = kallsyms_show_value();
> +	} else {
> +		last = list_first_entry(head, struct kallsyms_iter_list, next);
> +		memcpy(node, last, sizeof(*node));
> +		sym_pos = last->iter.pos;
> +	}
> +
> +	INIT_LIST_HEAD(&node->next);
> +	list_add(&node->next, head);
> +
> +	/*
> +	 * update_iter returns false when at end of file
> +	 * which in this case we don't care about and can
> +	 * safely ignore. update_iter() will increment
> +	 * the value of iter->pos, for ksymbol_core.
> +	 */
> +	if (sym_pos >= kallsyms_num_syms)
> +		sym_pos++;
> +
> +	(void)update_iter(&node->iter, sym_pos);
> +
> +	return 0;
> +}
> +
> +static int kallsyms_sorted_open(struct inode *inode, struct file *file)
> +{
> +	int ret;
> +	struct list_head *list;
> +
> +	list = __seq_open_private(file, &kallsyms_sorted_op, sizeof(*list));
> +	if (!list)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(list);
> +
> +	ret = kallsyms_on_each_symbol(get_all_symbol_name, list);
> +	if (ret != 0)
> +		return ret;
> +
> +	list_sort(NULL, list, kallsyms_list_cmp);
> +
> +	return 0;
> +}
> +
>  static int kallsyms_open(struct inode *inode, struct file *file)
>  {
>  	/*
> @@ -704,9 +825,24 @@ static const struct proc_ops kallsyms_proc_ops = {
>  	.proc_release	= seq_release_private,
>  };
>  
> +static const struct proc_ops kallsyms_sorted_proc_ops = {
> +	.proc_open = kallsyms_sorted_open,
> +	.proc_read = seq_read,
> +	.proc_lseek = seq_lseek,
> +	.proc_release = seq_release_private,
> +};
> +
>  static int __init kallsyms_init(void)
>  {
> -	proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> +	/*
> +	 * When fine grained kaslr is enabled, we need to
> +	 * print out the symbols sorted by name rather than by
> +	 * by address, because this reveals the randomization order.
> +	 */
> +	if (!IS_ENABLED(CONFIG_FG_KASLR))
> +		proc_create("kallsyms", 0444, NULL, &kallsyms_proc_ops);
> +	else
> +		proc_create("kallsyms", 0444, NULL, &kallsyms_sorted_proc_ops);
>  	return 0;

Since this is compile-time selected, instead of the separate name and
test here, how about just redefine kallsyms_open initializer instead?

#ifdef CONFIG_FG_KASLR
...sorting routines...
static int kallsyms_open(struct inode *inode, struct file *file)
{ ... sorted version ... }
#else
static int kallsyms_open(struct inode *inode, struct file *file)
{ ... normal version ... }
#endif

(And then just move the comment to the sorted open version.)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 9/9] module: Reorder functions
  2020-05-21 16:56 ` [PATCH v2 9/9] module: Reorder functions Kristen Carlson Accardi
@ 2020-05-21 21:33   ` Kees Cook
  2020-06-09 20:14     ` Kristen Carlson Accardi
  0 siblings, 1 reply; 27+ messages in thread
From: Kees Cook @ 2020-05-21 21:33 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, x86, H. Peter Anvin, Jessica Yu, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Ard Biesheuvel,
	Tony Luck

On Thu, May 21, 2020 at 09:56:40AM -0700, Kristen Carlson Accardi wrote:
> Introduce a new config option to allow modules to be re-ordered
> by function. This option can be enabled independently of the
> kernel text KASLR or FG_KASLR settings so that it can be used
> by architectures that do not support either of these features.
> This option will be selected by default if CONFIG_FG_KASLR is
> selected.
> 
> If a module has functions split out into separate text sections
> (i.e. compiled with the -ffunction-sections flag), reorder the
> functions to provide some code diversification to modules.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Reviewed-by: Kees Cook <keescook@chromium.org>
> Acked-by: Ard Biesheuvel <ardb@kernel.org>
> Tested-by: Ard Biesheuvel <ardb@kernel.org>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> ---
>  arch/x86/Kconfig  |  1 +
>  arch/x86/Makefile |  3 ++
>  init/Kconfig      | 11 +++++++
>  kernel/module.c   | 81 +++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 96 insertions(+)
> 
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 50e83ea57d70..d0bdd5c8c432 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -2187,6 +2187,7 @@ config FG_KASLR
>  	bool "Function Granular Kernel Address Space Layout Randomization"
>  	depends on $(cc-option, -ffunction-sections)
>  	depends on RANDOMIZE_BASE && X86_64
> +	select MODULE_FG_KASLR
>  	help
>  	  This option improves the randomness of the kernel text
>  	  over basic Kernel Address Space Layout Randomization (KASLR)
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index b65ec63c7db7..8c830c37c74c 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -51,6 +51,9 @@ ifdef CONFIG_X86_NEED_RELOCS
>          LDFLAGS_vmlinux := --emit-relocs --discard-none
>  endif
>  
> +ifdef CONFIG_MODULE_FG_KASLR
> +	KBUILD_CFLAGS_MODULE += -ffunction-sections
> +endif

With CONFIG_FG_KASLR=y, will -ffunction-sections appear twice on the
compiler command line?

>  #
>  # Prevent GCC from generating any FP code by mistake.
>  #
> diff --git a/init/Kconfig b/init/Kconfig
> index 74a5ac65644f..b19920413bcc 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -2227,6 +2227,17 @@ config UNUSED_KSYMS_WHITELIST
>  	  one per line. The path can be absolute, or relative to the kernel
>  	  source tree.
>  
> +config MODULE_FG_KASLR

Oh, er, yes. I'd suggested moving 'config FG_KASLR' into arch/Kconfig,
but I think init/Kconfig is more correct.

> +	depends on $(cc-option, -ffunction-sections)

Oh! And I am reminded suddenly about CONFIG_FG_KASLR needing to interact
correctly with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION in that we do NOT
want the sections to be collapsed at link time:

#ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*

(I think I had fixed this in some earlier version?)

I think you want this (untested):


diff --git a/Makefile b/Makefile
index 04f5662ae61a..a0d9acd3b900 100644
--- a/Makefile
+++ b/Makefile
@@ -853,8 +853,11 @@ ifdef CONFIG_DEBUG_SECTION_MISMATCH
 KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
 endif
 
+ifneq ($(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION)$(CONFIG_FG_KASLR),)
+KBUILD_CFLAGS_KERNEL += -ffunction-sections
+endif
 ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
-KBUILD_CFLAGS_KERNEL += -ffunction-sections -fdata-sections
+KBUILD_CFLAGS_KERNEL += -fdata-sections
 LDFLAGS_vmlinux += --gc-sections
 endif
 
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 71e387a5fe90..5f5c692751dd 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -93,20 +93,31 @@
  * sections to be brought in with rodata.
  */
 #ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
-#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
 #define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX*
 #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
 #define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]*
 #define BSS_MAIN .bss .bss.[0-9a-zA-Z_]*
 #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
 #else
-#define TEXT_MAIN .text
 #define DATA_MAIN .data
 #define SDATA_MAIN .sdata
 #define RODATA_MAIN .rodata
 #define BSS_MAIN .bss
 #define SBSS_MAIN .sbss
 #endif
+/*
+ * Both LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_FG_KASLR options enable
+ * -ffunction-sections, which produces separately named .text sections. In
+ * the case of CONFIG_FG_KASLR, they need to stay distinct so they can be
+ * separately randomized. Without CONFIG_FG_KASLR, the separate .text
+ * sections can be collected back into a common section, which makes the
+ * resulting image slightly smaller.
+ */
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) && !defined(CONFIG_FG_KASLR)
+#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
+#else
+#define TEXT_MAIN .text
+#endif
 
 /*
  * Align to a 32 byte boundary equal to the


> +	bool "Module Function Granular Layout Randomization"
> +	help
> +	  This option randomizes the module text section by reordering the text
> +	  section by function at module load time. In order to use this
> +	  feature, the module must have been compiled with the
> +	  -ffunction-sections compiler flag.
> +
> +	  If unsure, say N.
> +
>  endif # MODULES
>  
>  config MODULES_TREE_LOOKUP
> diff --git a/kernel/module.c b/kernel/module.c
> index 646f1e2330d2..e3cd619c60c2 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -53,6 +53,7 @@
>  #include <linux/bsearch.h>
>  #include <linux/dynamic_debug.h>
>  #include <linux/audit.h>
> +#include <linux/random.h>
>  #include <uapi/linux/module.h>
>  #include "module-internal.h"
>  
> @@ -2370,6 +2371,83 @@ static long get_offset(struct module *mod, unsigned int *size,
>  	return ret;
>  }
>  
> +/*
> + * shuffle_text_list()
> + * Use a Fisher Yates algorithm to shuffle a list of text sections.
> + */
> +static void shuffle_text_list(Elf_Shdr **list, int size)
> +{
> +	int i;
> +	unsigned int j;
> +	Elf_Shdr *temp;
> +
> +	for (i = size - 1; i > 0; i--) {
> +		/*
> +		 * pick a random index from 0 to i
> +		 */
> +		get_random_bytes(&j, sizeof(j));
> +		j = j % (i + 1);
> +
> +		temp = list[i];
> +		list[i] = list[j];
> +		list[j] = temp;
> +	}
> +}
> +
> +/*
> + * randomize_text()
> + * Look through the core section looking for executable code sections.
> + * Store sections in an array and then shuffle the sections
> + * to reorder the functions.
> + */
> +static void randomize_text(struct module *mod, struct load_info *info)
> +{
> +	int i;
> +	int num_text_sections = 0;
> +	Elf_Shdr **text_list;
> +	int size = 0;
> +	int max_sections = info->hdr->e_shnum;
> +	unsigned int sec = find_sec(info, ".text");
> +
> +	if (sec == 0)
> +		return;
> +
> +	text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
> +	if (!text_list)
> +		return;
> +
> +	for (i = 0; i < max_sections; i++) {
> +		Elf_Shdr *shdr = &info->sechdrs[i];
> +		const char *sname = info->secstrings + shdr->sh_name;
> +
> +		if (!(shdr->sh_flags & SHF_ALLOC) ||
> +		    !(shdr->sh_flags & SHF_EXECINSTR) ||
> +		    strstarts(sname, ".init"))
> +			continue;
> +
> +		text_list[num_text_sections] = shdr;
> +		num_text_sections++;
> +	}
> +
> +	shuffle_text_list(text_list, num_text_sections);
> +
> +	for (i = 0; i < num_text_sections; i++) {
> +		Elf_Shdr *shdr = text_list[i];
> +
> +		/*
> +		 * get_offset has a section index for it's last
> +		 * argument, that is only used by arch_mod_section_prepend(),
> +		 * which is only defined by parisc. Since this this type
> +		 * of randomization isn't supported on parisc, we can
> +		 * safely pass in zero as the last argument, as it is
> +		 * ignored.
> +		 */
> +		shdr->sh_entsize = get_offset(mod, &size, shdr, 0);
> +	}
> +
> +	kfree(text_list);
> +}
> +
>  /* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld
>     might -- code, read-only data, read-write data, small data.  Tally
>     sizes, and place the offsets into sh_entsize fields: high bit means it
> @@ -2460,6 +2538,9 @@ static void layout_sections(struct module *mod, struct load_info *info)
>  			break;
>  		}
>  	}
> +
> +	if (IS_ENABLED(CONFIG_MODULE_FG_KASLR))
> +		randomize_text(mod, info);
>  }
>  
>  static void set_license(struct module *mod, const char *license)
> -- 
> 2.20.1
> 

Everything else looks good.

-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 7/9] x86: Add support for function granular KASLR
  2020-05-21 21:08   ` Kees Cook
@ 2020-05-21 21:42     ` Kristen Carlson Accardi
  2020-06-04 17:27     ` Kristen Carlson Accardi
  1 sibling, 0 replies; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 21:42 UTC (permalink / raw)
  To: Kees Cook
  Cc: tglx, mingo, bp, Jonathan Corbet, x86, H. Peter Anvin, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Tony Luck,
	linux-doc

Hi Kees,
Thanks for your review - I will incorporate what I can into v3, or
explain why not once I give it a try :).

On Thu, 2020-05-21 at 14:08 -0700, Kees Cook wrote:
> > 
<snip>

> On Thu, May 21, 2020 at 09:56:38AM -0700, Kristen Carlson Accardi
> wrote:
> > +	/*
> > +	 * sometimes we are updating a relative offset that would
> > +	 * normally be relative to the next instruction (such as a
> > call).
> > +	 * In this case to calculate the target, you need to add 32bits
> > to
> > +	 * the pc to get the next instruction value. However, sometimes
> > +	 * targets are just data that was stored in a table such as
> > ksymtab
> > +	 * or cpu alternatives. In this case our target is not relative
> > to
> > +	 * the next instruction.
> > +	 */
> 
> Excellent and scary comment. ;) Was this found by trial and error?
> That
> sounds "fun" to debug. :P

This did suck to debug. Thank goodness for debugging with gdb in a VM.
As you know, I had previously had a patch to use a prand to be able to
retain the same layout across boots, and that came in handy here. While
we decided to not submit this functionality with this initial merge
attempt, I will add it on in the future as it does make debugging much
easier when you can reliably duplicate failure modes.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] Function Granular KASLR
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
                   ` (8 preceding siblings ...)
  2020-05-21 16:56 ` [PATCH v2 9/9] module: Reorder functions Kristen Carlson Accardi
@ 2020-05-21 21:54 ` Kees Cook
  2020-05-21 22:26 ` Thomas Gleixner
  10 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2020-05-21 21:54 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, arjan, x86, linux-kernel, kernel-hardening,
	rick.p.edgecombe

On Thu, May 21, 2020 at 09:56:31AM -0700, Kristen Carlson Accardi wrote:
> Changes in v2:
> --------------
> * Fix to address i386 build failure
> * Allow module reordering patch to be configured separately so that
>   arm (or other non-x86_64 arches) can take advantage of module function
>   reordering. This support has not be tested by me, but smoke tested by
>   Ard Biesheuvel <ardb@kernel.org> on arm.
> * Fix build issue when building on arm as reported by
>   Ard Biesheuvel <ardb@kernel.org> 
> * minor chages for certain checkpatch warnings and review feedback.

I successfully built and booted this on top of linux-next. For my builds
I include:

CONFIG_LOCK_DEBUGGING_SUPPORT=y
CONFIG_PROVE_LOCKING=y
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y
CONFIG_DEBUG_RWSEMS=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_ATOMIC_SLEEP=y

which catches various things. One of those (I assume either CONFIG_LOCKDEP
or CONFIG_DEBUG_MUTEXES) has found an issue with kallsyms:

[   34.112989] ------------[ cut here ]------------
[   34.113560] WARNING: CPU: 1 PID: 1997 at kernel/module.c:260 module_assert_mutex+0x29/0x30
[   34.114479] Modules linked in:
[   34.114831] CPU: 1 PID: 1997 Comm: grep Tainted: G        W 5.7.0-rc6-next-20200519+ #497
...
[   34.128556] Call Trace:
[   34.128867]  module_kallsyms_on_each_symbol+0x1d/0xa0
[   34.130238]  kallsyms_on_each_symbol+0xbd/0xd0
[   34.131642]  kallsyms_sorted_open+0x3f/0x70
[   34.132160]  proc_reg_open+0x99/0x180
[   34.133222]  do_dentry_open+0x176/0x400
[   34.134182]  vfs_open+0x2d/0x30
[   34.134579]  do_open.isra.0+0x2a0/0x410
[   34.135058]  path_openat+0x175/0x620
[   34.135506]  do_filp_open+0x91/0x100
[   34.136912]  do_sys_openat2+0x210/0x2d0
[   34.137388]  do_sys_open+0x46/0x80
[   34.137818]  __x64_sys_openat+0x20/0x30
[   34.138288]  do_syscall_64+0x55/0x1d0
[   34.138720]  entry_SYSCALL_64_after_hwframe+0x49/0xb3

Triggering it is easy, just "cat /proc/kallsyms" (and I'd note that I
don't even have any modules loaded). Tracking this down, it just looks
like kallsyms needs to hold a lock while sorting:


diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 558963b275ec..182b16a6079b 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -772,7 +772,9 @@ static int kallsyms_sorted_open(struct inode *inode, struct file *file)
 
 	INIT_LIST_HEAD(list);
 
+	mutex_lock(&module_mutex);
 	ret = kallsyms_on_each_symbol(get_all_symbol_name, list);
+	mutex_unlock(&module_mutex);
 	if (ret != 0)
 		return ret;
 

This fixes it for me. Everything else seems to be lovely. :) Nice work!

-- 
Kees Cook

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] Function Granular KASLR
  2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
                   ` (9 preceding siblings ...)
  2020-05-21 21:54 ` [PATCH v2 0/9] Function Granular KASLR Kees Cook
@ 2020-05-21 22:26 ` Thomas Gleixner
  2020-05-21 23:30   ` Kees Cook
  10 siblings, 1 reply; 27+ messages in thread
From: Thomas Gleixner @ 2020-05-21 22:26 UTC (permalink / raw)
  To: Kristen Carlson Accardi, keescook, mingo, bp
  Cc: arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi

Kristen,

Kristen Carlson Accardi <kristen@linux.intel.com> writes:

sorry for not following this work and a maybe stupid question.

> Proposed Improvement
> --------------------
> This patch set proposes adding function reordering on top of the existing
> KASLR base address randomization. The over-arching objective is incremental
> improvement over what we already have. It is designed to work in combination
> with the existing solution. The implementation is really pretty simple, and
> there are 2 main area where changes occur:
>
> * Build time
>
> GCC has had an option to place functions into individual .text sections for
> many years now. This option can be used to implement function reordering at
> load time. The final compiled vmlinux retains all the section headers, which
> can be used to help find the address ranges of each function. Using this
> information and an expanded table of relocation addresses, individual text
> sections can be suffled immediately after decompression. Some data tables
> inside the kernel that have assumptions about order require re-sorting
> after being updated when applying relocations. In order to modify these tables,
> a few key symbols are excluded from the objcopy symbol stripping process for
> use after shuffling the text segments.

I understand how this is supposed to work, but I fail to find an
explanation how all of this is preserving the text subsections we have,
i.e. .kprobes.text, .entry.text ...?

I assume that the functions in these subsections are reshuffled within
their own randomized address space so that __xxx_text_start and
__xxx_text_end markers still make sense, right?

I'm surely too tired to figure it out from the patches, but you really
want to explain that very detailed for mere mortals who are not deep
into this magic as you are.

Thanks,

        tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] Function Granular KASLR
  2020-05-21 22:26 ` Thomas Gleixner
@ 2020-05-21 23:30   ` Kees Cook
  2020-05-21 23:43     ` Thomas Gleixner
  2020-05-21 23:44     ` Kristen Carlson Accardi
  0 siblings, 2 replies; 27+ messages in thread
From: Kees Cook @ 2020-05-21 23:30 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Kristen Carlson Accardi, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe

On Fri, May 22, 2020 at 12:26:30AM +0200, Thomas Gleixner wrote:
> I understand how this is supposed to work, but I fail to find an
> explanation how all of this is preserving the text subsections we have,
> i.e. .kprobes.text, .entry.text ...?

I had the same question when I first started looking at earlier versions
of this series! :)

> I assume that the functions in these subsections are reshuffled within
> their own randomized address space so that __xxx_text_start and
> __xxx_text_end markers still make sense, right?

No, but perhaps in the future. Right now, they are entirely ignored and
left untouched. The current series only looks at the sections produced
by -ffunction-sections, which is to say only things named ".text.$thing"
(e.g. ".text.func1", ".text.func2"). Since the "special" text sections
in the kernel are named ".$thing.text" (specifically to avoid other
long-standing linker logic that does similar .text.* pattern matches)
they get ignored by FGKASLR right now too.

Even more specifically, they're ignored because all of these special
_input_ sections are actually manually collected by the linker script
into the ".text" _output_ section, which FGKASLR ignores -- it can only
randomize the final output sections (and has no basic block visibility
into the section contents), so everything in .text is untouched. Because
these special sections are collapsed into the single .text output
section is why we've needed the __$thing_start and __$thing_end symbols
manually constructed by the linker scripts: we lose input section
location/size details once the linker collects them into an output
section.

> I'm surely too tired to figure it out from the patches, but you really
> want to explain that very detailed for mere mortals who are not deep
> into this magic as you are.

Yeah, it's worth calling out, especially since it's an area of future
work -- I think if we can move the special sections out of .text into
their own output sections that can get randomized and we'll have section
position/size information available without the manual ..._start/_end
symbols. But this will require work with the compiler and linker to get
what's needed relative to -ffunction-sections, teach the kernel about
the new way of getting _start/_end, etc etc.

So, before any of that, just .text.* is a good first step, and after
that I think next would be getting .text randomized relative to the other
.text.* sections (IIUC, it is entirely untouched currently, so only the
standard KASLR base offset moves it around). Only after that do we start
poking around trying to munge the special section contents (which
requires use solving a few problems simultaneously). :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] Function Granular KASLR
  2020-05-21 23:30   ` Kees Cook
@ 2020-05-21 23:43     ` Thomas Gleixner
  2020-05-21 23:44     ` Kristen Carlson Accardi
  1 sibling, 0 replies; 27+ messages in thread
From: Thomas Gleixner @ 2020-05-21 23:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe

Kees,

Kees Cook <keescook@chromium.org> writes:
> On Fri, May 22, 2020 at 12:26:30AM +0200, Thomas Gleixner wrote:
>> I understand how this is supposed to work, but I fail to find an
>> explanation how all of this is preserving the text subsections we have,
>> i.e. .kprobes.text, .entry.text ...?
>
> I had the same question when I first started looking at earlier versions
> of this series! :)
>
>> I assume that the functions in these subsections are reshuffled within
>> their own randomized address space so that __xxx_text_start and
>> __xxx_text_end markers still make sense, right?
>
> No, but perhaps in the future. Right now, they are entirely ignored and
> left untouched.

I'm fine with that restriction, but for a moment I got worried that this
might screw up explicit subsections.

This really want's to be clearly expressed in the cover letter and the
changelogs so that such questions don't arise again.

<SNIP>

> So, before any of that, just .text.* is a good first step, and after
> that I think next would be getting .text randomized relative to the other
> .text.* sections (IIUC, it is entirely untouched currently, so only the
> standard KASLR base offset moves it around). Only after that do we start
> poking around trying to munge the special section contents (which
> requires use solving a few problems simultaneously). :)

Thanks for the detailed explanation!

       tglx

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 0/9] Function Granular KASLR
  2020-05-21 23:30   ` Kees Cook
  2020-05-21 23:43     ` Thomas Gleixner
@ 2020-05-21 23:44     ` Kristen Carlson Accardi
  1 sibling, 0 replies; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-05-21 23:44 UTC (permalink / raw)
  To: Kees Cook, Thomas Gleixner
  Cc: mingo, bp, arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe

On Thu, 2020-05-21 at 16:30 -0700, Kees Cook wrote:
> On Fri, May 22, 2020 at 12:26:30AM +0200, Thomas Gleixner wrote:
> > I understand how this is supposed to work, but I fail to find an
> > explanation how all of this is preserving the text subsections we
> > have,
> > i.e. .kprobes.text, .entry.text ...?
> 
> I had the same question when I first started looking at earlier
> versions
> of this series! :)

Thanks for responding - clearly I do need to update the cover letter
and documentation.

> 
> > I assume that the functions in these subsections are reshuffled
> > within
> > their own randomized address space so that __xxx_text_start and
> > __xxx_text_end markers still make sense, right?
> 
> No, but perhaps in the future. Right now, they are entirely ignored
> and
> left untouched. The current series only looks at the sections
> produced
> by -ffunction-sections, which is to say only things named
> ".text.$thing"
> (e.g. ".text.func1", ".text.func2"). Since the "special" text
> sections
> in the kernel are named ".$thing.text" (specifically to avoid other
> long-standing linker logic that does similar .text.* pattern matches)
> they get ignored by FGKASLR right now too.
> 
> Even more specifically, they're ignored because all of these special
> _input_ sections are actually manually collected by the linker script
> into the ".text" _output_ section, which FGKASLR ignores -- it can
> only
> randomize the final output sections (and has no basic block
> visibility
> into the section contents), so everything in .text is untouched.
> Because
> these special sections are collapsed into the single .text output
> section is why we've needed the __$thing_start and __$thing_end
> symbols
> manually constructed by the linker scripts: we lose input section
> location/size details once the linker collects them into an output
> section.
> 
> > I'm surely too tired to figure it out from the patches, but you
> > really
> > want to explain that very detailed for mere mortals who are not
> > deep
> > into this magic as you are.
> 
> Yeah, it's worth calling out, especially since it's an area of future
> work -- I think if we can move the special sections out of .text into
> their own output sections that can get randomized and we'll have
> section
> position/size information available without the manual ..._start/_end
> symbols. But this will require work with the compiler and linker to
> get
> what's needed relative to -ffunction-sections, teach the kernel about
> the new way of getting _start/_end, etc etc.
> 
> So, before any of that, just .text.* is a good first step, and after
> that I think next would be getting .text randomized relative to the
> other
> .text.* sections (IIUC, it is entirely untouched currently, so only
> the
> standard KASLR base offset moves it around). Only after that do we
> start
> poking around trying to munge the special section contents (which
> requires use solving a few problems simultaneously). :)
> 

That's right - we keep .text unrandomized, so any special sections that
are collected into .text are still in their original layout. Like you
said, they still get to take advantage of normal KASLR (base address
randomization).



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 7/9] x86: Add support for function granular KASLR
  2020-05-21 21:08   ` Kees Cook
  2020-05-21 21:42     ` Kristen Carlson Accardi
@ 2020-06-04 17:27     ` Kristen Carlson Accardi
  2020-06-04 22:37       ` Kees Cook
  1 sibling, 1 reply; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-06-04 17:27 UTC (permalink / raw)
  To: Kees Cook
  Cc: tglx, mingo, bp, Jonathan Corbet, x86, H. Peter Anvin, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Tony Luck,
	linux-doc

On Thu, 2020-05-21 at 14:08 -0700, Kees Cook wrote:
> On Thu, May 21, 2020 at 09:56:38AM -0700, Kristen Carlson Accardi
> wrote:
> > At boot time, find all the function sections that have separate
> > .text
> > sections, shuffle them, and then copy them to new locations. Adjust
> > any relocations accordingly.
> 
> Commit log length vs "11 files changed, 1159 insertions(+), 15
> deletions(-)" implies to me that a lot more detail is needed here. ;)
> 
> Perhaps describe what the code pieces are, why the code is being
> added
> are here, etc (I see at least the ELF parsing, the ELF shuffling, the
> relocation updates, the symbol list, and the re-sorting of kallsyms,
> ORC, and extables. I think the commit log should prepare someone to
> read
> the diff and know what to expect to find. (In the end, I wonder if
> these
> pieces should be split up into logically separate patches, but for
> now,
> let's just go with it -- though I've made some suggestions below
> about
> things that might be worth splitting out.)
> 
> More below...
> 
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > Tested-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  Documentation/security/fgkaslr.rst       | 155 +++++
> >  Documentation/security/index.rst         |   1 +
> >  arch/x86/boot/compressed/Makefile        |   3 +
> >  arch/x86/boot/compressed/fgkaslr.c       | 823
> > +++++++++++++++++++++++
> >  arch/x86/boot/compressed/kaslr.c         |   4 -
> >  arch/x86/boot/compressed/misc.c          | 109 ++-
> >  arch/x86/boot/compressed/misc.h          |  34 +
> >  arch/x86/boot/compressed/utils.c         |  12 +
> >  arch/x86/boot/compressed/vmlinux.symbols |  17 +
> >  arch/x86/include/asm/boot.h              |  15 +-
> >  include/uapi/linux/elf.h                 |   1 +
> >  11 files changed, 1159 insertions(+), 15 deletions(-)
> >  create mode 100644 Documentation/security/fgkaslr.rst
> >  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
> >  create mode 100644 arch/x86/boot/compressed/utils.c
> >  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> > 
> > diff --git a/Documentation/security/fgkaslr.rst
> > b/Documentation/security/fgkaslr.rst
> > new file mode 100644
> > index 000000000000..94939c62c50d
> > --- /dev/null
> > +++ b/Documentation/security/fgkaslr.rst
> > @@ -0,0 +1,155 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +==================================================================
> > ===
> > +Function Granular Kernel Address Space Layout Randomization
> > (fgkaslr)
> > +==================================================================
> > ===
> > +
> > +:Date: 6 April 2020
> > +:Author: Kristen Accardi
> > +
> > +Kernel Address Space Layout Randomization (KASLR) was merged into
> > the kernel
> > +with the objective of increasing the difficulty of code reuse
> > attacks. Code
> > +reuse attacks reused existing code snippets to get around existing
> > memory
> > +protections. They exploit software bugs which expose addresses of
> > useful code
> > +snippets to control the flow of execution for their own nefarious
> > purposes.
> > +KASLR as it was originally implemented moves the entire kernel
> > code text as a
> > +unit at boot time in order to make addresses less predictable. The
> > order of the
> > +code within the segment is unchanged - only the base address is
> > shifted. There
> > +are a few shortcomings to this algorithm.
> > +
> > +1. Low Entropy - there are only so many locations the kernel can
> > fit in. This
> > +   means an attacker could guess without too much trouble.
> > +2. Knowledge of a single address can reveal the offset of the base
> > address,
> > +   exposing all other locations for a published/known kernel
> > image.
> > +3. Info leaks abound.
> > +
> > +Finer grained ASLR has been proposed as a way to make ASLR more
> > resistant
> > +to info leaks. It is not a new concept at all, and there are many
> > variations
> > +possible. Function reordering is an implementation of finer
> > grained ASLR
> > +which randomizes the layout of an address space on a function
> > level
> > +granularity. The term "fgkaslr" is used in this document to refer
> > to the
> > +technique of function reordering when used with KASLR, as well as
> > finer grained
> > +KASLR in general.
> > +
> > +The objective of this patch set is to improve a technology that is
> > already
> > +merged into the kernel (KASLR). This code will not prevent all
> > code reuse
> > +attacks, and should be considered as one of several tools that can
> > be used.
> > +
> > +Implementation Details
> > +======================
> > +
> > +The over-arching objective of the fgkaslr implementation is
> > incremental
> > +improvement over the existing KASLR algorithm. It is designed to
> > work with
> > +the existing solution, and there are two main area where code
> > changes occur:
> > +Build time, and Load time.
> > +
> > +Build time
> > +----------
> > +
> > +GCC has had an option to place functions into individual .text
> > sections
> > +for many years now (-ffunction-sections). This option is used to
> > implement
> > +function reordering at load time. The final compiled vmlinux
> > retains all the
> > +section headers, which can be used to help find the address ranges
> > of each
> > +function. Using this information and an expanded table of
> > relocation addresses,
> > +individual text sections can be shuffled immediately after
> > decompression.
> > +Some data tables inside the kernel that have assumptions about
> > order
> > +require sorting after the update. In order to modify these tables,
> > +a few key symbols from the objcopy symbol stripping process are
> > preserved
> > +for use after shuffling the text segments.
> > +
> > +Load time
> > +---------
> > +
> > +The boot kernel was modified to parse the vmlinux elf file after
> > +decompression to check for symbols for modifying data tables, and
> > to
> > +look for any .text.* sections to randomize. The sections are then
> > shuffled,
> > +and tables are updated or resorted. The existing code which
> > updated relocation
> > +addresses was modified to account for not just a fixed delta from
> > the load
> > +address, but the offset that the function section was moved to.
> > This requires
> > +inspection of each address to see if it was impacted by a
> > randomization.
> > +
> > +In order to hide the new layout, symbols reported through
> > /proc/kallsyms will
> > +be sorted by name alphabetically rather than by address.
> > +
> > +Performance Impact
> > +==================
> > +
> > +There are two areas where function reordering can impact
> > performance: boot
> > +time latency, and run time performance.
> > +
> > +Boot time latency
> > +-----------------
> > +
> > +This implementation of finer grained KASLR impacts the boot time
> > of the kernel
> > +in several places. It requires additional parsing of the kernel
> > ELF file to
> > +obtain the section headers of the sections to be randomized. It
> > calls the
> > +random number generator for each section to be randomized to
> > determine that
> > +section's new memory location. It copies the decompressed kernel
> > into a new
> > +area of memory to avoid corruption when laying out the newly
> > randomized
> > +sections. It increases the number of relocations the kernel has to
> > perform at
> > +boot time vs. standard KASLR, and it also requires a lookup on
> > each address
> > +that needs to be relocated to see if it was in a randomized
> > section and needs
> > +to be adjusted by a new offset. Finally, it re-sorts a few data
> > tables that
> > +are required to be sorted by address.
> > +
> > +Booting a test VM on a modern, well appointed system showed an
> > increase in
> > +latency of approximately 1 second.
> > +
> > +Run time
> > +--------
> > +
> > +The performance impact at run-time of function reordering varies
> > by workload.
> > +Randomly reordering the functions will cause an increase in cache
> > misses
> > +for some workloads. Some workloads perform significantly worse
> > under FGKASLR,
> > +while others stay the same or even improve. In general, it will
> > depend on the
> > +code flow whether or not finer grained KASLR will impact a
> > workload, and how
> > +the underlying code was designed. Because the layout changes per
> > boot, each
> > +time a system is rebooted the performance of a workload may
> > change.
> > +
> > +Image Size
> > +==========
> > +
> > +fgkaslr increases the size of the kernel binary due to the extra
> > section
> > +headers that are included, as well as the extra relocations that
> > need to
> > +be added. You can expect fgkaslr to increase the size of the
> > resulting
> > +vmlinux by about 3%, and the compressed image (bzImage) by 15%.
> > +
> > +Memory Usage
> > +============
> > +
> > +fgkaslr increases the amount of heap that is required at boot
> > time,
> > +although this extra memory is released when the kernel has
> > finished
> > +decompression. As a result, it may not be appropriate to use this
> > feature
> > +on systems without much memory.
> > +
> > +Building
> > +========
> > +
> > +To enable fine grained KASLR, you need to have the following
> > config options
> > +set (including all the ones you would use to build normal KASLR)
> > +
> > +``CONFIG_FG_KASLR=y``
> > +
> > +fgkaslr for the kernel is only supported for the X86_64
> > architecture.
> > +
> > +Modules
> > +=======
> > +
> > +Modules are randomized similarly to the rest of the kernel by
> > shuffling
> > +the sections at load time prior to moving them into memory. The
> > module must
> > +also have been build with the -ffunction-sections compiler option.
> > +
> > +Although fgkaslr for the kernel is only supported for the X86_64
> > architecture,
> > +it is possible to use fgkaslr with modules on other architectures.
> > To enable
> > +this feature, select the following config option:
> > +
> > +``CONFIG_MODULE_FG_KASLR``
> > +
> > +This option is selected automatically for X86_64 when
> > CONFIG_FG_KASLR is set.
> > +
> > +Disabling
> > +=========
> > +
> > +Disabling normal kaslr using the nokaslr command line option also
> > disables
> > +fgkaslr. In addtion, it is possible to disable fgkaslr separately
> > by booting
> > +with fgkaslr=off on the commandline.
> 
> Yay documentation! Are you able to include the references to the
> papers
> you discuss in the cover letter in here too?

Yes, I will add the two papers I referenced, although it isn't a very
thorough list of references (ok, not even close to thorough).

> 
> > diff --git a/Documentation/security/index.rst
> > b/Documentation/security/index.rst
> > index fc503dd689a7..5e370c409ad2 100644
> > --- a/Documentation/security/index.rst
> > +++ b/Documentation/security/index.rst
> > @@ -7,6 +7,7 @@ Security Documentation
> >  
> >     credentials
> >     IMA-templates
> > +   fgkaslr
> >     keys/index
> >     lsm
> >     lsm-development
> > diff --git a/arch/x86/boot/compressed/Makefile
> > b/arch/x86/boot/compressed/Makefile
> > index 3a5a004498de..0ad5a31f1f0a 100644
> > --- a/arch/x86/boot/compressed/Makefile
> > +++ b/arch/x86/boot/compressed/Makefile
> > @@ -80,10 +80,12 @@ vmlinux-objs-y := $(obj)/vmlinux.lds
> > $(obj)/kernel_info.o $(obj)/head_$(BITS).o
> >  
> >  vmlinux-objs-$(CONFIG_EARLY_PRINTK) +=
> > $(obj)/early_serial_console.o
> >  vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
> > +vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/utils.o
> >  ifdef CONFIG_X86_64
> >  	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
> >  	vmlinux-objs-y += $(obj)/mem_encrypt.o
> >  	vmlinux-objs-y += $(obj)/pgtable_64.o
> > +	vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/fgkaslr.o
> >  endif
> 
> CONFIG_FG_KASLR is already gated by CONFIG_X86-64, so I think you can
> just put these all on the same line before the ifdef:
> 
> vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/utils.o $(obj)/fgkaslr.o
> 
> >  
> >  vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
> > @@ -120,6 +122,7 @@ OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
> >  
> >  ifdef CONFIG_FG_KASLR
> >  	RELOCS_ARGS += --fg-kaslr
> > +	OBJCOPYFLAGS += --keep-
> > symbols=$(srctree)/$(src)/vmlinux.symbols
> >  endif
> >  
> >  $(obj)/vmlinux.bin: vmlinux FORCE
> > diff --git a/arch/x86/boot/compressed/fgkaslr.c
> > b/arch/x86/boot/compressed/fgkaslr.c
> > new file mode 100644
> > index 000000000000..451e807de276
> > --- /dev/null
> > +++ b/arch/x86/boot/compressed/fgkaslr.c
> > @@ -0,0 +1,823 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * fgkaslr.c
> > + *
> > + * This contains the routines needed to reorder the kernel text
> > section
> > + * at boot time.
> > + */
> > +#include "misc.h"
> > +#include "error.h"
> > +#include "pgtable.h"
> > +#include "../string.h"
> > +#include "../voffset.h"
> > +#include <linux/sort.h>
> > +#include <linux/bsearch.h>
> > +#include "../../include/asm/extable.h"
> > +#include "../../include/asm/orc_types.h"
> > +
> > +/*
> > + * Longest parameter of 'fgkaslr=' is 'off' right now, plus an
> > extra '\0'
> > + * for termination.
> > + */
> > +#define MAX_FGKASLR_ARG_LENGTH 4
> > +int nofgkaslr;
> 
> This can be static.
> 
> > +
> > +/*
> > + * Use normal definitions of mem*() from string.c. There are
> > already
> > + * included header files which expect a definition of memset() and
> > by
> > + * the time we define memset macro, it is too late.
> > + */
> > +#undef memcpy
> > +#undef memset
> > +#define memzero(s, n)	memset((s), 0, (n))
> > +#define memmove		memmove
> > +
> > +void *memmove(void *dest, const void *src, size_t n);
> > +
> > +static unsigned long percpu_start;
> > +static unsigned long percpu_end;
> > +
> > +static long kallsyms_names;
> > +static long kallsyms_offsets;
> > +static long kallsyms_num_syms;
> > +static long kallsyms_relative_base;
> > +static long kallsyms_markers;
> > +static long __start___ex_table_addr;
> > +static long __stop___ex_table_addr;
> > +static long _stext;
> > +static long _etext;
> > +static long _sinittext;
> > +static long _einittext;
> > +static long __start_orc_unwind_ip;
> > +static long __stop_orc_unwind_ip;
> > +static long __start_orc_unwind;
> > +
> > +/* addresses in mapped address space */
> > +static int *base;
> > +static u8 *names;
> > +static unsigned long relative_base;
> > +static unsigned int *markers_addr;
> > +
> > +struct kallsyms_name {
> > +	u8 len;
> > +	u8 indecis[256];
> > +};
> > +
> > +struct kallsyms_name *names_table;
> > +
> > +struct orc_entry *cur_orc_table;
> > +int *cur_orc_ip_table;
> 
> static?
> 
> > +
> > +/*
> > + * This is an array of pointers to sections headers for randomized
> > sections
> > + */
> > +Elf64_Shdr **sections;
> 
> Given the creation of the Elf_Shdr etc macros in the header, should
> all
> of the Elf64 things in here be updated to use the size-agnostic
> macros?
> (Is anyone actually going to run a 32-bit kernel with fgkaslr some
> day?)

I suppose it's not impossible, just ... not useful. I will make the
update.

> 
> > +
> > +/*
> > + * This is an array of all section headers, randomized or
> > otherwise.
> 
> Comments nitpick while I'm thinking about it: "This is" tends to be
> redundant, and I'd pick a single-sentence-comment punctuation style:
> i.e. with or without an ending period. I personally prefer including
> the
> period, but mainly I think it should just be consistent. :)
> 
> > + */
> > +Elf64_Shdr *sechdrs;
> 
> Also, these can be static?
> 
> > +
> > +/*
> > + * The number of elements in the randomized section header array
> > (sections)
> > + */
> > +int sections_size;
> 
> static? (Also, it seems this is related to "sections" above, so maybe
> more it closer?)
> 
> > +
> > +static bool is_text(long addr)
> > +{
> > +	if ((addr >= _stext && addr < _etext) ||
> > +	    (addr >= _sinittext && addr < _einittext))
> > +		return true;
> > +	return false;
> > +}
> > +
> > +bool is_percpu_addr(long pc, long offset)
> > +{
> > +	unsigned long ptr;
> > +	long address;
> > +
> > +	address = pc + offset + 4;
> > +
> > +	ptr = (unsigned long)address;
> > +
> > +	if (ptr >= percpu_start && ptr < percpu_end)
> > +		return true;
> > +
> > +	return false;
> > +}
> > +
> > +static int cmp_section_addr(const void *a, const void *b)
> > +{
> > +	unsigned long ptr = (unsigned long)a;
> > +	Elf64_Shdr *s = *(Elf64_Shdr **)b;
> > +	unsigned long end = s->sh_addr + s->sh_size;
> > +
> > +	if (ptr >= s->sh_addr && ptr < end)
> > +		return 0;
> > +
> > +	if (ptr < s->sh_addr)
> > +		return -1;
> > +
> > +	return 1;
> > +}
> > +
> > +/*
> > + * Discover if the address is in a randomized section and if so,
> > adjust
> > + * by the saved offset.
> > + */
> > +Elf64_Shdr *adjust_address(long *address)
> > +{
> > +	Elf64_Shdr **s;
> > +	Elf64_Shdr *shdr;
> > +
> > +	if (nofgkaslr)
> > +		return NULL;
> > +
> > +	s = bsearch((const void *)*address, sections, sections_size,
> > sizeof(*s),
> > +		    cmp_section_addr);
> > +	if (s) {
> > +		shdr = *s;
> > +		*address += shdr->sh_offset;
> > +		return shdr;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> > +void adjust_relative_offset(long pc, long *value, Elf64_Shdr
> > *section)
> > +{
> > +	Elf64_Shdr *s;
> > +	long address;
> > +
> > +	if (nofgkaslr)
> > +		return;
> > +
> > +	/*
> > +	 * sometimes we are updating a relative offset that would
> > +	 * normally be relative to the next instruction (such as a
> > call).
> > +	 * In this case to calculate the target, you need to add 32bits
> > to
> > +	 * the pc to get the next instruction value. However, sometimes
> > +	 * targets are just data that was stored in a table such as
> > ksymtab
> > +	 * or cpu alternatives. In this case our target is not relative
> > to
> > +	 * the next instruction.
> > +	 */
> 
> Excellent and scary comment. ;) Was this found by trial and error?
> That
> sounds "fun" to debug. :P
> 
> > +
> > +	/*
> > +	 * Calculate the address that this offset would call.
> > +	 */
> > +	if (!is_text(pc))
> > +		address = pc + *value;
> > +	else
> > +		address = pc + *value + 4;
> > +
> > +	/*
> > +	 * if the address is in section that was randomized,
> > +	 * we need to adjust the offset.
> > +	 */
> > +	s = adjust_address(&address);
> > +	if (s)
> > +		*value += s->sh_offset;
> > +
> > +	/*
> > +	 * If the PC that this offset was calculated for was in a
> > section
> > +	 * that has been randomized, the value needs to be adjusted by
> > the
> > +	 * same amount as the randomized section was adjusted from it's
> > original
> > +	 * location.
> > +	 */
> > +	if (section)
> > +		*value -= section->sh_offset;
> > +}
> > +
> > +static void kallsyms_swp(void *a, void *b, int size)
> > +{
> > +	int idx1, idx2;
> > +	int temp;
> > +	struct kallsyms_name name_a;
> > +
> > +	/* determine our index into the array */
> > +	idx1 = (int *)a - base;
> > +	idx2 = (int *)b - base;
> > +	temp = base[idx1];
> > +	base[idx1] = base[idx2];
> > +	base[idx2] = temp;
> > +
> > +	/* also swap the names table */
> > +	memcpy(&name_a, &names_table[idx1], sizeof(name_a));
> > +	memcpy(&names_table[idx1], &names_table[idx2],
> > +	       sizeof(struct kallsyms_name));
> > +	memcpy(&names_table[idx2], &name_a, sizeof(struct
> > kallsyms_name));
> > +}
> > +
> > +static int kallsyms_cmp(const void *a, const void *b)
> > +{
> > +	int addr_a, addr_b;
> > +	unsigned long uaddr_a, uaddr_b;
> > +
> > +	addr_a = *(int *)a;
> > +	addr_b = *(int *)b;
> > +
> > +	if (addr_a >= 0)
> > +		uaddr_a = addr_a;
> > +	if (addr_b >= 0)
> > +		uaddr_b = addr_b;
> > +
> > +	if (addr_a < 0)
> > +		uaddr_a = relative_base - 1 - addr_a;
> > +	if (addr_b < 0)
> > +		uaddr_b = relative_base - 1 - addr_b;
> > +
> > +	if (uaddr_b > uaddr_a)
> > +		return -1;
> > +
> > +	return 0;
> > +}
> > +
> > +static void deal_with_names(int num_syms)
> > +{
> > +	int num_bytes;
> > +	int i, j;
> > +	int offset;
> > +
> > +	/* we should have num_syms kallsyms_name entries */
> > +	num_bytes = num_syms * sizeof(*names_table);
> > +	names_table = malloc(num_syms * sizeof(*names_table));
> > +	if (!names_table) {
> > +		debug_putstr("\nbytes requested: ");
> > +		debug_puthex(num_bytes);
> > +		error("\nunable to allocate space for names table\n");
> > +	}
> > +
> > +	/* read all the names entries */
> > +	offset = 0;
> > +	for (i = 0; i < num_syms; i++) {
> > +		names_table[i].len = names[offset];
> > +		offset++;
> > +		for (j = 0; j < names_table[i].len; j++) {
> > +			names_table[i].indecis[j] = names[offset];
> > +			offset++;
> > +		}
> > +	}
> > +}
> > +
> > +static void write_sorted_names(int num_syms)
> > +{
> > +	int i, j;
> > +	int offset = 0;
> > +	unsigned int *markers;
> > +
> > +	/*
> > +	 * we are going to need to regenerate the markers table, which
> > is a
> > +	 * table of offsets into the compressed stream every 256
> > symbols.
> > +	 * this code copied almost directly from scripts/kallsyms.c
> > +	 */
> 
> Can any of this kallsyms sorting code be reasonably reused instead
> of copy/pasting? Or is this mostly novel in that it's taking the
> output
> of that earlier sort, which isn't the input format scripts/kallsyms.c
> uses?

No - I cut out only code blocks from scripts/kallsyms.c, but there was
no neat way to reuse entire functions without majorly refactoring a lot
of stuff in scripts/kallsyms.c

> 
> > +	markers = malloc(sizeof(unsigned int) * ((num_syms + 255) /
> > 256));
> > +	if (!markers) {
> > +		debug_putstr("\nfailed to allocate heap space of ");
> > +		debug_puthex(((num_syms + 255) / 256));
> > +		debug_putstr(" bytes\n");
> > +		error("Unable to allocate space for markers table");
> > +	}
> > +
> > +	for (i = 0; i < num_syms; i++) {
> > +		if ((i & 0xFF) == 0)
> > +			markers[i >> 8] = offset;
> > +
> > +		names[offset] = (u8)names_table[i].len;
> > +		offset++;
> > +		for (j = 0; j < names_table[i].len; j++) {
> > +			names[offset] = (u8)names_table[i].indecis[j];
> > +			offset++;
> > +		}
> > +	}
> > +
> > +	/* write new markers table over old one */
> > +	for (i = 0; i < ((num_syms + 255) >> 8); i++)
> > +		markers_addr[i] = markers[i];
> > +
> > +	free(markers);
> > +	free(names_table);
> > +}
> > +
> > +static void sort_kallsyms(unsigned long map)
> > +{
> > +	int num_syms;
> > +	int i;
> > +
> > +	debug_putstr("\nRe-sorting kallsyms...\n");
> > +
> > +	num_syms = *(int *)(kallsyms_num_syms + map);
> > +	base = (int *)(kallsyms_offsets + map);
> > +	relative_base = *(unsigned long *)(kallsyms_relative_base +
> > map);
> > +	markers_addr = (unsigned int *)(kallsyms_markers + map);
> > +	names = (u8 *)(kallsyms_names + map);
> > +
> > +	/*
> > +	 * the kallsyms table was generated prior to any randomization.
> > +	 * it is a bunch of offsets from "relative base". In order for
> > +	 * us to check if a symbol has an address that was in a
> > randomized
> > +	 * section, we need to reconstruct the address to it's original
> > +	 * value prior to handle_relocations.
> > +	 */
> > +	for (i = 0; i < num_syms; i++) {
> > +		unsigned long addr;
> > +		int new_base;
> > +
> > +		/*
> > +		 * according to kernel/kallsyms.c, positive offsets are
> > absolute
> > +		 * values and negative offsets are relative to the
> > base.
> > +		 */
> > +		if (base[i] >= 0)
> > +			addr = base[i];
> > +		else
> > +			addr = relative_base - 1 - base[i];
> > +
> > +		if (adjust_address(&addr)) {
> > +			/* here we need to recalcuate the offset */
> > +			new_base = relative_base - 1 - addr;
> > +			base[i] = new_base;
> > +		}
> > +	}
> > +
> > +	/*
> > +	 * here we need to read in all the kallsyms_names info
> > +	 * so that we can regenerate it.
> > +	 */
> > +	deal_with_names(num_syms);
> > +
> > +	sort(base, num_syms, sizeof(int), kallsyms_cmp, kallsyms_swp);
> > +
> > +	/* write the newly sorted names table over the old one */
> > +	write_sorted_names(num_syms);
> > +}
> > +
> > +#include "../../../../lib/extable.c"
> 
> Now that the earlier linking glitches have been sorted out, I wonder
> if
> it might be nicer to add this as a separate compilation unit, similar
> to
> how early_serial_console.c is done? (Or, I guess, more specifically,
> why
> can't this be in utils.c?)

The problem with putting this in utils.c was because there was an
inline function (static) that I use that is defined in extable.c
(ex_to_insn). If I move this to utils.c I'm not sure how to keep re-
using this inline function without modifying it with a define like
STATIC. I thought it was cleaner to just leave it alone and do it this
way.

> 
> > +
> > +static inline unsigned long
> > +ex_fixup_handler(const struct exception_table_entry *x)
> > +{
> > +	return ((unsigned long)&x->handler + x->handler);
> > +}
> > +
> > +static inline unsigned long
> > +ex_fixup_addr(const struct exception_table_entry *x)
> > +{
> > +	return ((unsigned long)&x->fixup + x->fixup);
> > +}
> > +
> > +static void update_ex_table(unsigned long map)
> > +{
> > +	struct exception_table_entry *start_ex_table =
> > +		(struct exception_table_entry
> > *)(__start___ex_table_addr + map);
> > +	struct exception_table_entry *stop_ex_table =
> > +		(struct exception_table_entry *)(__stop___ex_table_addr
> > + map);
> > +	int num_entries =
> > +		(__stop___ex_table_addr - __start___ex_table_addr) /
> > +		sizeof(struct exception_table_entry);
> > +	int i;
> > +
> > +	debug_putstr("\nUpdating exception table...");
> > +	for (i = 0; i < num_entries; i++) {
> > +		unsigned long insn = ex_to_insn(&start_ex_table[i]);
> > +		unsigned long fixup =
> > ex_fixup_addr(&start_ex_table[i]);
> > +		unsigned long handler =
> > ex_fixup_handler(&start_ex_table[i]);
> > +		unsigned long addr;
> > +		Elf64_Shdr *s;
> > +
> > +		/* check each address to see if it needs adjusting */
> > +		addr = insn - map;
> > +		s = adjust_address(&addr);
> > +		if (s)
> > +			start_ex_table[i].insn += s->sh_offset;
> > +
> > +		addr = fixup - map;
> > +		s = adjust_address(&addr);
> > +		if (s)
> > +			start_ex_table[i].fixup += s->sh_offset;
> > +
> > +		addr = handler - map;
> > +		s = adjust_address(&addr);
> > +		if (s)
> > +			start_ex_table[i].handler += s->sh_offset;
> > +	}
> > +}
> > +
> > +static void sort_ex_table(unsigned long map)
> > +{
> > +	struct exception_table_entry *start_ex_table =
> > +		(struct exception_table_entry
> > *)(__start___ex_table_addr + map);
> > +	struct exception_table_entry *stop_ex_table =
> > +		(struct exception_table_entry *)(__stop___ex_table_addr
> > + map);
> > +
> > +	debug_putstr("\nRe-sorting exception table...");
> > +
> > +	sort_extable(start_ex_table, stop_ex_table);
> > +}
> > +

<snip>

> > +void parse_sections_headers(void *output, Elf64_Ehdr *ehdr,
> > Elf64_Phdr *phdrs)
> > +{
> > +	Elf64_Phdr *phdr;
> > +	Elf64_Shdr *s;
> > +	Elf64_Shdr *text = NULL;
> > +	Elf64_Shdr *percpu = NULL;
> > +	char *secstrings;
> > +	const char *sname;
> > +	int num_sections = 0;
> > +	Elf64_Sym *symtab = NULL;
> > +	char *strtab = NULL;
> > +	long num_syms = 0;
> > +	void *dest;
> > +	int i;
> > +	char arg[MAX_FGKASLR_ARG_LENGTH];
> > +	Elf64_Shdr shdr;
> > +	unsigned long shnum;
> > +	unsigned int shstrndx;
> > +
> > +	debug_putstr("\nParsing ELF section headers... ");
> > +
> > +	/*
> > +	 * Even though fgkaslr may have been disabled, we still
> > +	 * need to parse through the section headers to get the
> > +	 * start and end of the percpu section. This is because
> > +	 * if we were built with CONFIG_FG_KASLR, there are more
> > +	 * relative relocations present in vmlinux.relocs than
> > +	 * just the percpu, and only the percpu relocs need to be
> > +	 * adjusted when using just normal base address kaslr.
> > +	 */
> > +	if (cmdline_find_option("fgkaslr", arg, sizeof(arg)) == 3 &&
> > +	    !strncmp(arg, "off", 3)) {
> > +		warn("FG_KASLR disabled on cmdline.");
> > +		nofgkaslr = 1;
> > +	}
> > +
> > +	/* read the first section header */
> > +	shnum = ehdr->e_shnum;
> > +	shstrndx = ehdr->e_shstrndx;
> > +	if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
> > +		memcpy(&shdr, output + ehdr->e_shoff, sizeof(shdr));
> > +		if (shnum == SHN_UNDEF)
> > +			shnum = shdr.sh_size;
> > +		if (shstrndx == SHN_XINDEX)
> > +			shstrndx = shdr.sh_link;
> > +	}
> > +
> > +	/* we are going to need to allocate space for the section
> > headers */
> > +	sechdrs = malloc(sizeof(*sechdrs) * shnum);
> > +	if (!sechdrs)
> > +		error("Failed to allocate space for shdrs");
> > +
> > +	sections = malloc(sizeof(*sections) * shnum);
> > +	if (!sections)
> > +		error("Failed to allocate space for section pointers");
> > +
> > +	memcpy(sechdrs, output + ehdr->e_shoff,
> > +	       sizeof(*sechdrs) * shnum);
> > +
> > +	/* we need to allocate space for the section string table */
> > +	s = &sechdrs[shstrndx];
> > +
> > +	secstrings = malloc(s->sh_size);
> > +	if (!secstrings)
> > +		error("Failed to allocate space for shstr");
> > +
> > +	memcpy(secstrings, output + s->sh_offset, s->sh_size);
> > +
> > +	/*
> > +	 * now we need to walk through the section headers and collect
> > the
> > +	 * sizes of the .text sections to be randomized.
> > +	 */
> > +	for (i = 0; i < shnum; i++) {
> > +		s = &sechdrs[i];
> > +		sname = secstrings + s->sh_name;
> > +
> > +		if (s->sh_type == SHT_SYMTAB) {
> > +			/* only one symtab per image */
> 
> I would assert this claim in the code as well: check symtab is NULL
> here?
> 
> > +			symtab = malloc(s->sh_size);
> > +			if (!symtab)
> > +				error("Failed to allocate space for
> > symtab");
> > +
> > +			memcpy(symtab, output + s->sh_offset, s-
> > >sh_size);
> > +			num_syms = s->sh_size / sizeof(*symtab);
> > +			continue;
> > +		}
> > +
> > +		if (s->sh_type == SHT_STRTAB && i != ehdr->e_shstrndx)
> > {
> > +			strtab = malloc(s->sh_size);
> 
> Similar robustness check?
> 
> > +			if (!strtab)
> > +				error("Failed to allocate space for
> > strtab");
> > +
> > +			memcpy(strtab, output + s->sh_offset, s-
> > >sh_size);
> > +		}
> > +
> > +		if (!strcmp(sname, ".text")) {
> > +			text = s;
> > +			continue;
> > +		}
> 
> Check text is still NULL here?
> 
> Also, why the continue? This means the section isn't included in the
> sections[] array? (Obviously things still work, but I don't
> understand
> why.)

I don't include .text in the sections[] array because sections[] is
only for sections to be randomized, and we don't randomize .text.

> 
> > +
> > +		if (!strcmp(sname, ".data..percpu")) {
> > +			/* get start addr for later */
> > +			percpu = s;
> 
> Similar, check percpu is still NULL here?
> 
> Also, is a "continue" intended here? (This is kind of the reverse of
> the "continue" question above.) I think you get the same result
> during
> the next "if", but I was expecting this block to look like the .text
> test above.

You are right, I could have put a continue here and saved the next
compare.

> <snip>

> > diff --git a/arch/x86/boot/compressed/kaslr.c
> > b/arch/x86/boot/compressed/kaslr.c
> > index d7408af55738..6f596bd5b6e5 100644
> > --- a/arch/x86/boot/compressed/kaslr.c
> > +++ b/arch/x86/boot/compressed/kaslr.c
> > @@ -39,10 +39,6 @@
> >  #include <generated/utsrelease.h>
> >  #include <asm/efi.h>
> >  
> > -/* Macros used by the included decompressor code below. */
> > -#define STATIC
> > -#include <linux/decompress/mm.h>
> > -
> >  #ifdef CONFIG_X86_5LEVEL
> >  unsigned int __pgtable_l5_enabled;
> >  unsigned int pgdir_shift __ro_after_init = 39;
> > diff --git a/arch/x86/boot/compressed/misc.c
> > b/arch/x86/boot/compressed/misc.c
> > index 9652d5c2afda..5f08922fd12a 100644
> > --- a/arch/x86/boot/compressed/misc.c
> > +++ b/arch/x86/boot/compressed/misc.c
> > @@ -26,9 +26,6 @@
> >   * it is not safe to place pointers in static structures.
> >   */
> >  
> > -/* Macros used by the included decompressor code below. */
> > -#define STATIC		static
> 
> Can you split the STATIC macro rearrangement out into a separate
> patch?
> I think it deserves a dedicated commit log to explain why it's
> changing
> the way it is (i.e. we end up with "STATIC" no longer meaning
> "static"
> in misc.c now

This change was made to fix the issue of having malloc_ptr be declared
locally rather than globally - (that weird problem I was having that
made it so I had to #include all the .c files until I figured out what
the issue was. If I separate out the change, then I feel like the
commit doesn't make sense out of this context. What if I put a big
comment in misc.h?

> > 

<snip>

> 
> > +	if (!IS_ENABLED(CONFIG_FG_KASLR) ||
> > +	    cmdline_find_option_bool("nokaslr")) {
> > +		if (!delta) {
> > +			debug_putstr("No relocation needed... ");
> > +			return;
> > +		}
> >  	}
> > +
> > +	pre_relocations_cleanup(map);
> > +
> >  	debug_putstr("Performing relocations... ");
> >  
> >  	/*
> > @@ -230,35 +242,106 @@ static void handle_relocations(void *output,
> > unsigned long output_len,
> >  	 */
> >  	for (reloc = output + output_len - sizeof(*reloc); *reloc;
> > reloc--) {
> >  		long extended = *reloc;
> > +		long value;
> > +
> > +		/*
> > +		 * if using fgkaslr, we might have moved the address
> > +		 * of the relocation. Check it to see if it needs
> > adjusting
> > +		 * from the original address.
> > +		 */
> > +		(void)adjust_address(&extended);
> 
> I don't think "(void)" needed here?

I can delete it - it's not "needed", but was done to show an
intentional discard of the return value from adjust_address.


The rest of your feedback incorporated into v3.

Thanks!
Kristen





^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 7/9] x86: Add support for function granular KASLR
  2020-06-04 17:27     ` Kristen Carlson Accardi
@ 2020-06-04 22:37       ` Kees Cook
  0 siblings, 0 replies; 27+ messages in thread
From: Kees Cook @ 2020-06-04 22:37 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, Jonathan Corbet, x86, H. Peter Anvin, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Tony Luck,
	linux-doc

On Thu, Jun 04, 2020 at 10:27:24AM -0700, Kristen Carlson Accardi wrote:
> On Thu, 2020-05-21 at 14:08 -0700, Kees Cook wrote:
> > On Thu, May 21, 2020 at 09:56:38AM -0700, Kristen Carlson Accardi
> > wrote:
> > > [...]
> > > +/*
> > > + * This is an array of pointers to sections headers for randomized
> > > sections
> > > + */
> > > +Elf64_Shdr **sections;
> > 
> > Given the creation of the Elf_Shdr etc macros in the header, should
> > all
> > of the Elf64 things in here be updated to use the size-agnostic
> > macros?
> > (Is anyone actually going to run a 32-bit kernel with fgkaslr some
> > day?)
> 
> I suppose it's not impossible, just ... not useful. I will make the
> update.

Yeah. I figure it might just look cleaner? Or, I guess, drop the other
macros? I guess I wasn't sure why those macros got touched and then
didn't get used.

> > > [...]
> > > +	/*
> > > +	 * we are going to need to regenerate the markers table, which is a
> > > +	 * table of offsets into the compressed stream every 256 symbols.
> > > +	 * this code copied almost directly from scripts/kallsyms.c
> > > +	 */
> > 
> > Can any of this kallsyms sorting code be reasonably reused instead
> > of copy/pasting? Or is this mostly novel in that it's taking the
> > output
> > of that earlier sort, which isn't the input format scripts/kallsyms.c
> > uses?
> 
> No - I cut out only code blocks from scripts/kallsyms.c, but there was
> no neat way to reuse entire functions without majorly refactoring a lot
> of stuff in scripts/kallsyms.c

Hmm. I wonder if there might be a way to carve out what you need into
lib/ and then #include it as a separate compilation unit? I'm always
worried about cut/pasted code getting out of sync.

> > > [...]
> > > +#include "../../../../lib/extable.c"
> > 
> > Now that the earlier linking glitches have been sorted out, I wonder if
> > it might be nicer to add this as a separate compilation unit, similar to
> > how early_serial_console.c is done? (Or, I guess, more specifically, why
> > can't this be in utils.c?)
> 
> The problem with putting this in utils.c was because there was an
> inline function (static) that I use that is defined in extable.c
> (ex_to_insn). If I move this to utils.c I'm not sure how to keep re-
> using this inline function without modifying it with a define like
> STATIC. I thought it was cleaner to just leave it alone and do it this
> way.

Hm, I see. I guess if this becomes fragile in the future, the special
inlines can be moved to lib/extable.h and then things can be carved out
into a separate compilation unit.

> > > [...]
> > > +		if (!strcmp(sname, ".text")) {
> > > +			text = s;
> > > +			continue;
> > > +		}
> > 
> > Check text is still NULL here?
> > 
> > Also, why the continue? This means the section isn't included in the
> > sections[] array? (Obviously things still work, but I don't
> > understand
> > why.)
> 
> I don't include .text in the sections[] array because sections[] is
> only for sections to be randomized, and we don't randomize .text.

Yeah, I got myself confused there originally. I actuallyed realized my
mistake on this later when I was explaining how FGKASLR worked in the
thread with tglx. :)

> > > +
> > > +		if (!strcmp(sname, ".data..percpu")) {
> > > +			/* get start addr for later */
> > > +			percpu = s;
> > 
> > Similar, check percpu is still NULL here?
> > 
> > Also, is a "continue" intended here? (This is kind of the reverse of
> > the "continue" question above.) I think you get the same result
> > during
> > the next "if", but I was expecting this block to look like the .text
> > test above.
> 
> You are right, I could have put a continue here and saved the next
> compare.

Cool; yeah, it was just a "wait, is that right?" when looking at it.

> > > diff --git a/arch/x86/boot/compressed/misc.c
> > > b/arch/x86/boot/compressed/misc.c
> > > index 9652d5c2afda..5f08922fd12a 100644
> > > --- a/arch/x86/boot/compressed/misc.c
> > > +++ b/arch/x86/boot/compressed/misc.c
> > > @@ -26,9 +26,6 @@
> > >   * it is not safe to place pointers in static structures.
> > >   */
> > >  
> > > -/* Macros used by the included decompressor code below. */
> > > -#define STATIC		static
> > 
> > Can you split the STATIC macro rearrangement out into a separate
> > patch?
> > I think it deserves a dedicated commit log to explain why it's
> > changing
> > the way it is (i.e. we end up with "STATIC" no longer meaning
> > "static"
> > in misc.c now
> 
> This change was made to fix the issue of having malloc_ptr be declared
> locally rather than globally - (that weird problem I was having that
> made it so I had to #include all the .c files until I figured out what
> the issue was. If I separate out the change, then I feel like the
> commit doesn't make sense out of this context. What if I put a big
> comment in misc.h?

I think it's fine to split it out with a commit log describing what it's
_about_ to fix, "without this, any other code using misc.h ..." and/or
"in the next patches we'll need this because ..."

> > > [...]
> > > +		(void)adjust_address(&extended);
> > 
> > I don't think "(void)" needed here?
> 
> I can delete it - it's not "needed", but was done to show an
> intentional discard of the return value from adjust_address.

The code style in the kernel seems to be to not include these (since
it's a call-site override) and instead use __must_check on the function
when the results MUST be checked. This way, all call-sites suddenly
light up when the attribute gets added and we don't end up with places
that might get masked, etc.

> The rest of your feedback incorporated into v3.

Awesome!

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 9/9] module: Reorder functions
  2020-05-21 21:33   ` Kees Cook
@ 2020-06-09 20:14     ` Kristen Carlson Accardi
  2020-06-09 20:42       ` Kees Cook
  0 siblings, 1 reply; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-06-09 20:14 UTC (permalink / raw)
  To: Kees Cook
  Cc: tglx, mingo, bp, x86, H. Peter Anvin, Jessica Yu, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Ard Biesheuvel,
	Tony Luck

On Thu, 2020-05-21 at 14:33 -0700, Kees Cook wrote:
> Oh! And I am reminded suddenly about CONFIG_FG_KASLR needing to
> interact
> correctly with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION in that we do NOT
> want the sections to be collapsed at link time:

sorry - I'm a little confused and was wondering if you could clarify
something. Does this mean you expect CONFIG_FG_KASLR=y and
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y to be a valid config? I am not
familiar with the option, but it seems like you are saying that it
requires sections to be collapsed, in which case both of these options
as yes would not be allowed? Should I actively prevent this in the
Kconfig?

Thanks.

Kristen

> 
> #ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> #define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
> 
> (I think I had fixed this in some earlier version?)
> 
> I think you want this (untested):
> 
> 
> diff --git a/Makefile b/Makefile
> index 04f5662ae61a..a0d9acd3b900 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -853,8 +853,11 @@ ifdef CONFIG_DEBUG_SECTION_MISMATCH
>  KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-
> once)
>  endif
>  
> +ifneq ($(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION)$(CONFIG_FG_KASLR),)
> +KBUILD_CFLAGS_KERNEL += -ffunction-sections
> +endif
>  ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> -KBUILD_CFLAGS_KERNEL += -ffunction-sections -fdata-sections
> +KBUILD_CFLAGS_KERNEL += -fdata-sections
>  LDFLAGS_vmlinux += --gc-sections
>  endif
>  
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-
> generic/vmlinux.lds.h
> index 71e387a5fe90..5f5c692751dd 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -93,20 +93,31 @@
>   * sections to be brought in with rodata.
>   */
>  #ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> -#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
>  #define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX*
>  #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
>  #define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]*
>  #define BSS_MAIN .bss .bss.[0-9a-zA-Z_]*
>  #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
>  #else
> -#define TEXT_MAIN .text
>  #define DATA_MAIN .data
>  #define SDATA_MAIN .sdata
>  #define RODATA_MAIN .rodata
>  #define BSS_MAIN .bss
>  #define SBSS_MAIN .sbss
>  #endif
> +/*
> + * Both LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_FG_KASLR options
> enable
> + * -ffunction-sections, which produces separately named .text
> sections. In
> + * the case of CONFIG_FG_KASLR, they need to stay distinct so they
> can be
> + * separately randomized. Without CONFIG_FG_KASLR, the separate
> .text
> + * sections can be collected back into a common section, which makes
> the
> + * resulting image slightly smaller.
> + */
> +#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) &&
> !defined(CONFIG_FG_KASLR)
> +#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
> +#else
> +#define TEXT_MAIN .text
> +#endif


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 9/9] module: Reorder functions
  2020-06-09 20:14     ` Kristen Carlson Accardi
@ 2020-06-09 20:42       ` Kees Cook
  2020-06-09 20:59         ` Kristen Carlson Accardi
  0 siblings, 1 reply; 27+ messages in thread
From: Kees Cook @ 2020-06-09 20:42 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, x86, H. Peter Anvin, Jessica Yu, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Ard Biesheuvel,
	Tony Luck

On Tue, Jun 09, 2020 at 01:14:04PM -0700, Kristen Carlson Accardi wrote:
> On Thu, 2020-05-21 at 14:33 -0700, Kees Cook wrote:
> > Oh! And I am reminded suddenly about CONFIG_FG_KASLR needing to
> > interact
> > correctly with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION in that we do NOT
> > want the sections to be collapsed at link time:
> 
> sorry - I'm a little confused and was wondering if you could clarify
> something. Does this mean you expect CONFIG_FG_KASLR=y and
> CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y to be a valid config? I am not

Yes, I don't see a reason they can't be used together.

> familiar with the option, but it seems like you are saying that it
> requires sections to be collapsed, in which case both of these options
> as yes would not be allowed? Should I actively prevent this in the
> Kconfig?

No, I'm saying that CONFIG_LD_DEAD_CODE_DATA_ELIMINATION does _not_
actually require that the sections be collapsed, but the Makefile
currently does this just to keep the resulting ELF "tidy". We want
that disabled (for the .text parts) in the case of CONFIG_FG_KASLR. The
dead code elimination step, is, IIUC, done at link time before the
output sections are written.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 9/9] module: Reorder functions
  2020-06-09 20:42       ` Kees Cook
@ 2020-06-09 20:59         ` Kristen Carlson Accardi
  0 siblings, 0 replies; 27+ messages in thread
From: Kristen Carlson Accardi @ 2020-06-09 20:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: tglx, mingo, bp, x86, H. Peter Anvin, Jessica Yu, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Ard Biesheuvel,
	Tony Luck

On Tue, 2020-06-09 at 13:42 -0700, Kees Cook wrote:
> On Tue, Jun 09, 2020 at 01:14:04PM -0700, Kristen Carlson Accardi
> wrote:
> > On Thu, 2020-05-21 at 14:33 -0700, Kees Cook wrote:
> > > Oh! And I am reminded suddenly about CONFIG_FG_KASLR needing to
> > > interact
> > > correctly with CONFIG_LD_DEAD_CODE_DATA_ELIMINATION in that we do
> > > NOT
> > > want the sections to be collapsed at link time:
> > 
> > sorry - I'm a little confused and was wondering if you could
> > clarify
> > something. Does this mean you expect CONFIG_FG_KASLR=y and
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y to be a valid config? I am
> > not
> 
> Yes, I don't see a reason they can't be used together.
> 
> > familiar with the option, but it seems like you are saying that it
> > requires sections to be collapsed, in which case both of these
> > options
> > as yes would not be allowed? Should I actively prevent this in the
> > Kconfig?
> 
> No, I'm saying that CONFIG_LD_DEAD_CODE_DATA_ELIMINATION does _not_
> actually require that the sections be collapsed, but the Makefile
> currently does this just to keep the resulting ELF "tidy". We want
> that disabled (for the .text parts) in the case of CONFIG_FG_KASLR.
> The
> dead code elimination step, is, IIUC, done at link time before the
> output sections are written.
> 

Ah ok, that makes sense. Thanks.



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-06-09 20:59 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-21 16:56 [PATCH v2 0/9] Function Granular KASLR Kristen Carlson Accardi
2020-05-21 16:56 ` [PATCH v2 1/9] objtool: Do not assume order of parent/child functions Kristen Carlson Accardi
2020-05-21 16:56 ` [PATCH v2 2/9] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
2020-05-21 16:56 ` [PATCH v2 3/9] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
2020-05-21 16:56 ` [PATCH v2 4/9] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
2020-05-21 20:00   ` Kees Cook
2020-05-21 16:56 ` [PATCH v2 5/9] x86: Make sure _etext includes function sections Kristen Carlson Accardi
2020-05-21 18:11   ` Kees Cook
2020-05-21 16:56 ` [PATCH v2 6/9] x86/tools: Add relative relocs for randomized functions Kristen Carlson Accardi
2020-05-21 19:54   ` Kees Cook
2020-05-21 16:56 ` [PATCH v2 7/9] x86: Add support for function granular KASLR Kristen Carlson Accardi
2020-05-21 21:08   ` Kees Cook
2020-05-21 21:42     ` Kristen Carlson Accardi
2020-06-04 17:27     ` Kristen Carlson Accardi
2020-06-04 22:37       ` Kees Cook
2020-05-21 16:56 ` [PATCH v2 8/9] kallsyms: Hide layout Kristen Carlson Accardi
2020-05-21 21:14   ` Kees Cook
2020-05-21 16:56 ` [PATCH v2 9/9] module: Reorder functions Kristen Carlson Accardi
2020-05-21 21:33   ` Kees Cook
2020-06-09 20:14     ` Kristen Carlson Accardi
2020-06-09 20:42       ` Kees Cook
2020-06-09 20:59         ` Kristen Carlson Accardi
2020-05-21 21:54 ` [PATCH v2 0/9] Function Granular KASLR Kees Cook
2020-05-21 22:26 ` Thomas Gleixner
2020-05-21 23:30   ` Kees Cook
2020-05-21 23:43     ` Thomas Gleixner
2020-05-21 23:44     ` Kristen Carlson Accardi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).