kernel-hardening.lists.openwall.com archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/10] Function Granular KASLR
@ 2020-07-17 16:59 Kristen Carlson Accardi
  2020-07-17 16:59 ` [PATCH v4 01/10] objtool: Do not assume order of parent/child functions Kristen Carlson Accardi
                   ` (12 more replies)
  0 siblings, 13 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 16:59 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp
  Cc: arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi

Function Granular Kernel Address Space Layout Randomization (fgkaslr)
---------------------------------------------------------------------

This patch set is an implementation of finer grained kernel address space
randomization. It rearranges your kernel code at load time 
on a per-function level granularity, with only around a second added to
boot time.

Changes in v4:
-------------
* dropped the patch to split out change to STATIC definition in
  x86/boot/compressed/misc.c and replaced with a patch authored
  by Kees Cook to avoid the duplicate malloc definitions
* Added a section to Documentation/admin-guide/kernel-parameters.txt
  to document the fgkaslr boot option.
* redesigned the patch to hide the new layout when reading
  /proc/kallsyms. The previous implementation utilized a dynamically
  allocated linked list to display the kernel and module symbols
  in alphabetical order. The new implementation uses a randomly
  shuffled index array to display the kernel and module symbols
  in a random order.

Changes in v3:
-------------
* Makefile changes to accommodate CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
* removal of extraneous ALIGN_PAGE from _etext changes
* changed variable names in x86/tools/relocs to be less confusing
* split out change to STATIC definition in x86/boot/compressed/misc.c
* Updates to Documentation to make it more clear what is preserved in .text
* much more detailed commit message for function granular KASLR patch
* minor tweaks and changes that make for more readable code
* this cover letter updated slightly to add additional details

Changes in v2:
--------------
* Fix to address i386 build failure
* Allow module reordering patch to be configured separately so that
  arm (or other non-x86_64 arches) can take advantage of module function
  reordering. This support has not be tested by me, but smoke tested by
  Ard Biesheuvel <ardb@kernel.org> on arm.
* Fix build issue when building on arm as reported by
  Ard Biesheuvel <ardb@kernel.org> 

Patches to objtool are included because they are dependencies for this
patchset, however they have been submitted by their maintainer separately.

Background
----------
KASLR was merged into the kernel with the objective of increasing the
difficulty of code reuse attacks. Code reuse attacks reused existing code
snippets to get around existing memory protections. They exploit software bugs
which expose addresses of useful code snippets to control the flow of
execution for their own nefarious purposes. KASLR moves the entire kernel
code text as a unit at boot time in order to make addresses less predictable.
The order of the code within the segment is unchanged - only the base address
is shifted. There are a few shortcomings to this algorithm.

1. Low Entropy - there are only so many locations the kernel can fit in. This
   means an attacker could guess without too much trouble.
2. Knowledge of a single address can reveal the offset of the base address,
   exposing all other locations for a published/known kernel image.
3. Info leaks abound.

Finer grained ASLR has been proposed as a way to make ASLR more resistant
to info leaks. It is not a new concept at all, and there are many variations
possible. Function reordering is an implementation of finer grained ASLR
which randomizes the layout of an address space on a function level
granularity. We use the term "fgkaslr" in this document to refer to the
technique of function reordering when used with KASLR, as well as finer grained
KASLR in general.

Proposed Improvement
--------------------
This patch set proposes adding function reordering on top of the existing
KASLR base address randomization. The over-arching objective is incremental
improvement over what we already have. It is designed to work in combination
with the existing solution. The implementation is really pretty simple, and
there are 2 main area where changes occur:

* Build time

GCC has had an option to place functions into individual .text sections for
many years now. This option can be used to implement function reordering at
load time. The final compiled vmlinux retains all the section headers, which
can be used to help find the address ranges of each function. Using this
information and an expanded table of relocation addresses, individual text
sections can be suffled immediately after decompression. Some data tables
inside the kernel that have assumptions about order require re-sorting
after being updated when applying relocations. In order to modify these tables,
a few key symbols are excluded from the objcopy symbol stripping process for
use after shuffling the text segments.

Some highlights from the build time changes to look for:

The top level kernel Makefile was modified to add the gcc flag if it
is supported. Currently, I am applying this flag to everything it is
possible to randomize. Anything that is written in C and not present in a
special input section is randomized. The final binary segment 0 retains a
consolidated .text section, as well as all the individual .text.* sections.
Future work could turn off this flags for selected files or even entire
subsystems, although obviously at the cost of security.

The relocs tool is updated to add relative relocations. This information
previously wasn't included because it wasn't necessary when moving the
entire .text segment as a unit. 

A new file was created to contain a list of symbols that objcopy should
keep. We use those symbols at load time as described below.

* Load time

The boot kernel was modified to parse the vmlinux elf file after
decompression to check for our interesting symbols that we kept, and to
look for any .text.* sections to randomize. The consolidated .text section
is skipped and not moved. The sections are shuffled randomly, and copied
into memory following the .text section in a new random order. The existing
code which updated relocation addresses was modified to account for
not just a fixed delta from the load address, but the offset that the function
section was moved to. This requires inspection of each address to see if
it was impacted by a randomization. We use a bsearch to make this less
horrible on performance. Any tables that need to be modified with new
addresses or resorted are updated using the symbol addresses parsed from the
elf symbol table.

In order to hide our new layout, symbols reported through /proc/kallsyms
will be displayed in a random order.

Security Considerations
-----------------------
The objective of this patch set is to improve a technology that is already
merged into the kernel (KASLR). This code will not prevent all attacks,
but should instead be considered as one of several tools that can be used.
In particular, this code is meant to make KASLR more effective in the presence
of info leaks.

How much entropy we are adding to the existing entropy of standard KASLR will
depend on a few variables. Firstly and most obviously, the number of functions
that are randomized matters. This implementation keeps the existing .text
section for code that cannot be randomized - for example, because it was
assembly code. The less sections to randomize, the less entropy. In addition,
due to alignment (16 bytes for x86_64), the number of bits in a address that
the attacker needs to guess is reduced, as the lower bits are identical.

Performance Impact
------------------
There are two areas where function reordering can impact performance: boot
time latency, and run time performance.

* Boot time latency
This implementation of finer grained KASLR impacts the boot time of the kernel
in several places. It requires additional parsing of the kernel ELF file to
obtain the section headers of the sections to be randomized. It calls the
random number generator for each section to be randomized to determine that
section's new memory location. It copies the decompressed kernel into a new
area of memory to avoid corruption when laying out the newly randomized
sections. It increases the number of relocations the kernel has to perform at
boot time vs. standard KASLR, and it also requires a lookup on each address
that needs to be relocated to see if it was in a randomized section and needs
to be adjusted by a new offset. Finally, it re-sorts a few data tables that
are required to be sorted by address.

Booting a test VM on a modern, well appointed system showed an increase in
latency of approximately 1 second.

* Run time
The performance impact at run-time of function reordering varies by workload.
Using kcbench, a kernel compilation benchmark, the performance of a kernel
build with finer grained KASLR was about 1% slower than a kernel with standard
KASLR. Analysis with perf showed a slightly higher percentage of 
L1-icache-load-misses. Other workloads were examined as well, with varied
results. Some workloads performed significantly worse under FGKASLR, while
others stayed the same or were mysteriously better. In general, it will
depend on the code flow whether or not finer grained KASLR will impact
your workload, and how the underlying code was designed. Because the layout
changes per boot, each time a system is rebooted the performance of a workload
may change.

Future work could identify hot areas that may not be randomized and either
leave them in the .text section or group them together into a single section
that may be randomized. If grouping things together helps, one other thing to
consider is that if we could identify text blobs that should be grouped together
to benefit a particular code flow, it could be interesting to explore
whether this security feature could be also be used as a performance
feature if you are interested in optimizing your kernel layout for a
particular workload at boot time. Optimizing function layout for a particular
workload has been researched and proven effective - for more information
read the Facebook paper "Optimizing Function Placement for Large-Scale
Data-Center Applications" (see references section below).

Image Size
----------
Adding additional section headers as a result of compiling with
-ffunction-sections will increase the size of the vmlinux ELF file.
With a standard distro config, the resulting vmlinux was increased by
about 3%. The compressed image is also increased due to the header files,
as well as the extra relocations that must be added. You can expect fgkaslr
to increase the size of the compressed image by about 15%.

Memory Usage
------------
fgkaslr increases the amount of heap that is required at boot time,
although this extra memory is released when the kernel has finished
decompression. As a result, it may not be appropriate to use this feature on
systems without much memory.

Building
--------
To enable fine grained KASLR, you need to have the following config options
set (including all the ones you would use to build normal KASLR)

CONFIG_FG_KASLR=y

In addition, fgkaslr is only supported for the X86_64 architecture.

Modules
-------
Modules are randomized similarly to the rest of the kernel by shuffling
the sections at load time prior to moving them into memory. The module must
also have been build with the -ffunction-sections compiler option.

Although fgkaslr for the kernel is only supported for the X86_64 architecture,
it is possible to use fgkaslr with modules on other architectures. To enable
this feature, select

CONFIG_MODULE_FG_KASLR=y

This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.

Disabling
---------
Disabling normal KASLR using the nokaslr command line option also disables
fgkaslr. It is also possible to disable fgkaslr separately by booting with
fgkaslr=off on the commandline.

References
----------
There are a lot of academic papers which explore finer grained ASLR.
This paper in particular contributed the most to my implementation design
as well as my overall understanding of the problem space:

Selfrando: Securing the Tor Browser against De-anonymization Exploits,
M. Conti, S. Crane, T. Frassetto, et al.

For more information on how function layout impacts performance, see:

Optimizing Function Placement for Large-Scale Data-Center Applications,
G. Ottoni, B. Maher

Kees Cook (2):
  x86/boot: Allow a "silent" kaslr random byte fetch
  x86/boot/compressed: Avoid duplicate malloc() implementations

Kristen Carlson Accardi (8):
  objtool: Do not assume order of parent/child functions
  x86: tools/relocs: Support >64K section headers
  x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  x86: Make sure _etext includes function sections
  x86/tools: Add relative relocs for randomized functions
  x86: Add support for function granular KASLR
  kallsyms: Hide layout
  module: Reorder functions

 .../admin-guide/kernel-parameters.txt         |   7 +
 Documentation/security/fgkaslr.rst            | 172 ++++
 Documentation/security/index.rst              |   1 +
 Makefile                                      |   6 +-
 arch/x86/Kconfig                              |   4 +
 arch/x86/Makefile                             |   5 +
 arch/x86/boot/compressed/Makefile             |   9 +-
 arch/x86/boot/compressed/fgkaslr.c            | 811 ++++++++++++++++++
 arch/x86/boot/compressed/kaslr.c              |   4 -
 arch/x86/boot/compressed/misc.c               | 157 +++-
 arch/x86/boot/compressed/misc.h               |  30 +
 arch/x86/boot/compressed/utils.c              |  11 +
 arch/x86/boot/compressed/vmlinux.symbols      |  17 +
 arch/x86/include/asm/boot.h                   |  15 +-
 arch/x86/kernel/vmlinux.lds.S                 |  17 +-
 arch/x86/lib/kaslr.c                          |  18 +-
 arch/x86/tools/relocs.c                       | 143 ++-
 arch/x86/tools/relocs.h                       |   4 +-
 arch/x86/tools/relocs_common.c                |  15 +-
 include/asm-generic/vmlinux.lds.h             |  18 +-
 include/linux/decompress/mm.h                 |  12 +-
 include/uapi/linux/elf.h                      |   1 +
 init/Kconfig                                  |  26 +
 kernel/kallsyms.c                             | 163 +++-
 kernel/module.c                               |  81 ++
 tools/objtool/elf.c                           |   8 +-
 26 files changed, 1670 insertions(+), 85 deletions(-)
 create mode 100644 Documentation/security/fgkaslr.rst
 create mode 100644 arch/x86/boot/compressed/fgkaslr.c
 create mode 100644 arch/x86/boot/compressed/utils.c
 create mode 100644 arch/x86/boot/compressed/vmlinux.symbols


base-commit: 11ba468877bb23f28956a35e896356252d63c983
-- 
2.20.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v4 01/10] objtool: Do not assume order of parent/child functions
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
@ 2020-07-17 16:59 ` Kristen Carlson Accardi
  2020-07-17 16:59 ` [PATCH v4 02/10] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 16:59 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, Josh Poimboeuf, Peter Zijlstra
  Cc: arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi

If a .cold function is examined prior to it's parent, the link
to the parent/child function can be overwritten when the parent
is examined. Only update pfunc and cfunc if they were previously
nil to prevent this from happening.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 tools/objtool/elf.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c
index 26d11d821941..c18f0f216740 100644
--- a/tools/objtool/elf.c
+++ b/tools/objtool/elf.c
@@ -434,7 +434,13 @@ static int read_symbols(struct elf *elf)
 			size_t pnamelen;
 			if (sym->type != STT_FUNC)
 				continue;
-			sym->pfunc = sym->cfunc = sym;
+
+			if (sym->pfunc == NULL)
+				sym->pfunc = sym;
+
+			if (sym->cfunc == NULL)
+				sym->cfunc = sym;
+
 			coldstr = strstr(sym->name, ".cold");
 			if (!coldstr)
 				continue;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 02/10] x86: tools/relocs: Support >64K section headers
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
  2020-07-17 16:59 ` [PATCH v4 01/10] objtool: Do not assume order of parent/child functions Kristen Carlson Accardi
@ 2020-07-17 16:59 ` Kristen Carlson Accardi
  2020-07-17 17:00 ` [PATCH v4 03/10] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 16:59 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck

While the relocs tool already supports finding the total number
of section headers if vmlinux exceeds 64K sections, it fails to
read the extended symbol table to get section header indexes for symbols,
causing incorrect symbol table indexes to be used when there are > 64K
symbols.

Parse the elf file to read the extended symbol table info, and then
replace all direct references to st_shndx with calls to sym_index(),
which will determine whether the value can be read directly or
whether the value should be pulled out of the extended table.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/tools/relocs.c | 104 ++++++++++++++++++++++++++++++----------
 1 file changed, 78 insertions(+), 26 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..31b2d151aa63 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -14,6 +14,10 @@
 static Elf_Ehdr		ehdr;
 static unsigned long	shnum;
 static unsigned int	shstrndx;
+static unsigned int	shsymtabndx;
+static unsigned int	shxsymtabndx;
+
+static int sym_index(Elf_Sym *sym);
 
 struct relocs {
 	uint32_t	*offset;
@@ -32,6 +36,7 @@ struct section {
 	Elf_Shdr       shdr;
 	struct section *link;
 	Elf_Sym        *symtab;
+	Elf32_Word     *xsymtab;
 	Elf_Rel        *reltab;
 	char           *strtab;
 };
@@ -265,7 +270,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym *sym)
 		name = sym_strtab + sym->st_name;
 	}
 	else {
-		name = sec_name(sym->st_shndx);
+		name = sec_name(sym_index(sym));
 	}
 	return name;
 }
@@ -335,6 +340,23 @@ static uint64_t elf64_to_cpu(uint64_t val)
 #define elf_xword_to_cpu(x)	elf32_to_cpu(x)
 #endif
 
+static int sym_index(Elf_Sym *sym)
+{
+	Elf_Sym *symtab = secs[shsymtabndx].symtab;
+	Elf32_Word *xsymtab = secs[shxsymtabndx].xsymtab;
+	unsigned long offset;
+	int index;
+
+	if (sym->st_shndx != SHN_XINDEX)
+		return sym->st_shndx;
+
+	/* calculate offset of sym from head of table. */
+	offset = (unsigned long)sym - (unsigned long)symtab;
+	index = offset / sizeof(*sym);
+
+	return elf32_to_cpu(xsymtab[index]);
+}
+
 static void read_ehdr(FILE *fp)
 {
 	if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1) {
@@ -468,31 +490,60 @@ static void read_strtabs(FILE *fp)
 static void read_symtabs(FILE *fp)
 {
 	int i,j;
+
 	for (i = 0; i < shnum; i++) {
 		struct section *sec = &secs[i];
-		if (sec->shdr.sh_type != SHT_SYMTAB) {
+		int num_syms;
+
+		switch (sec->shdr.sh_type) {
+		case SHT_SYMTAB_SHNDX:
+			sec->xsymtab = malloc(sec->shdr.sh_size);
+			if (!sec->xsymtab) {
+				die("malloc of %d bytes for xsymtab failed\n",
+				    sec->shdr.sh_size);
+			}
+			if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+				die("Seek to %d failed: %s\n",
+				    sec->shdr.sh_offset, strerror(errno));
+			}
+			if (fread(sec->xsymtab, 1, sec->shdr.sh_size, fp)
+			    != sec->shdr.sh_size) {
+				die("Cannot read extended symbol table: %s\n",
+				    strerror(errno));
+			}
+			shxsymtabndx = i;
+			continue;
+
+		case SHT_SYMTAB:
+			num_syms = sec->shdr.sh_size / sizeof(Elf_Sym);
+
+			sec->symtab = malloc(sec->shdr.sh_size);
+			if (!sec->symtab) {
+				die("malloc of %d bytes for symtab failed\n",
+				    sec->shdr.sh_size);
+			}
+			if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+				die("Seek to %d failed: %s\n",
+				    sec->shdr.sh_offset, strerror(errno));
+			}
+			if (fread(sec->symtab, 1, sec->shdr.sh_size, fp)
+			    != sec->shdr.sh_size) {
+				die("Cannot read symbol table: %s\n",
+				    strerror(errno));
+			}
+			for (j = 0; j < num_syms; j++) {
+				Elf_Sym *sym = &sec->symtab[j];
+
+				sym->st_name  = elf_word_to_cpu(sym->st_name);
+				sym->st_value = elf_addr_to_cpu(sym->st_value);
+				sym->st_size  = elf_xword_to_cpu(sym->st_size);
+				sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
+			}
+			shsymtabndx = i;
+			continue;
+
+		default:
 			continue;
-		}
-		sec->symtab = malloc(sec->shdr.sh_size);
-		if (!sec->symtab) {
-			die("malloc of %d bytes for symtab failed\n",
-				sec->shdr.sh_size);
-		}
-		if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
-			die("Seek to %d failed: %s\n",
-				sec->shdr.sh_offset, strerror(errno));
-		}
-		if (fread(sec->symtab, 1, sec->shdr.sh_size, fp)
-		    != sec->shdr.sh_size) {
-			die("Cannot read symbol table: %s\n",
-				strerror(errno));
-		}
-		for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Sym); j++) {
-			Elf_Sym *sym = &sec->symtab[j];
-			sym->st_name  = elf_word_to_cpu(sym->st_name);
-			sym->st_value = elf_addr_to_cpu(sym->st_value);
-			sym->st_size  = elf_xword_to_cpu(sym->st_size);
-			sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
 		}
 	}
 }
@@ -759,13 +810,14 @@ static void percpu_init(void)
  */
 static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 {
-	return (sym->st_shndx == per_cpu_shndx) &&
+	int shndx = sym_index(sym);
+
+	return (shndx == per_cpu_shndx) &&
 		strcmp(symname, "__init_begin") &&
 		strcmp(symname, "__per_cpu_load") &&
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
-
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
 {
@@ -1088,7 +1140,7 @@ static int do_reloc_info(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		sec_name(sec->shdr.sh_info),
 		rel_type(ELF_R_TYPE(rel->r_info)),
 		symname,
-		sec_name(sym->st_shndx));
+		sec_name(sym_index(sym)));
 	return 0;
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 03/10] x86/boot: Allow a "silent" kaslr random byte fetch
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
  2020-07-17 16:59 ` [PATCH v4 01/10] objtool: Do not assume order of parent/child functions Kristen Carlson Accardi
  2020-07-17 16:59 ` [PATCH v4 02/10] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
@ 2020-07-17 17:00 ` Kristen Carlson Accardi
  2020-07-17 17:00 ` [PATCH v4 04/10] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 17:00 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi

From: Kees Cook <keescook@chromium.org>

Under earlyprintk, each RNG call produces a debug report line. When
shuffling hundreds of functions, this is not useful information (each
line is identical and tells us nothing new). Instead, allow for a NULL
"purpose" to suppress the debug reporting.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/lib/kaslr.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index a53665116458..2b3eb8c948a3 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -56,11 +56,14 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	unsigned long raw, random = get_boot_seed();
 	bool use_i8254 = true;
 
-	debug_putstr(purpose);
-	debug_putstr(" KASLR using");
+	if (purpose) {
+		debug_putstr(purpose);
+		debug_putstr(" KASLR using");
+	}
 
 	if (has_cpuflag(X86_FEATURE_RDRAND)) {
-		debug_putstr(" RDRAND");
+		if (purpose)
+			debug_putstr(" RDRAND");
 		if (rdrand_long(&raw)) {
 			random ^= raw;
 			use_i8254 = false;
@@ -68,7 +71,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	}
 
 	if (has_cpuflag(X86_FEATURE_TSC)) {
-		debug_putstr(" RDTSC");
+		if (purpose)
+			debug_putstr(" RDTSC");
 		raw = rdtsc();
 
 		random ^= raw;
@@ -76,7 +80,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	}
 
 	if (use_i8254) {
-		debug_putstr(" i8254");
+		if (purpose)
+			debug_putstr(" i8254");
 		random ^= i8254();
 	}
 
@@ -86,7 +91,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	    : "a" (random), "rm" (mix_const));
 	random += raw;
 
-	debug_putstr("...\n");
+	if (purpose)
+		debug_putstr("...\n");
 
 	return random;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 04/10] x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (2 preceding siblings ...)
  2020-07-17 17:00 ` [PATCH v4 03/10] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
@ 2020-07-17 17:00 ` Kristen Carlson Accardi
  2020-07-17 17:00 ` [PATCH v4 05/10] x86: Make sure _etext includes function sections Kristen Carlson Accardi
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 17:00 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, Masahiro Yamada, Michal Marek, x86,
	H. Peter Anvin, Arnd Bergmann
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck, linux-kbuild, linux-arch

Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
the make file to build with -ffunction-sections if CONFIG_FG_KASLR.

While the only architecture that supports CONFIG_FG_KASLR does not
currently enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION, make sure these
2 features play nicely together for the future by ensuring that if
CONFIG_LD_DEAD_CODE_DATA_ELIMINATION is selected when used with
CONFIG_FG_KASLR the function sections will not be consolidated back
into .text. Thanks to Kees Cook for the dead code elimination changes.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 Makefile                          |  6 +++++-
 arch/x86/Kconfig                  |  4 ++++
 include/asm-generic/vmlinux.lds.h | 16 ++++++++++++++--
 init/Kconfig                      | 14 ++++++++++++++
 4 files changed, 37 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index 0b5f8538bde5..66427b12de53 100644
--- a/Makefile
+++ b/Makefile
@@ -872,7 +872,7 @@ KBUILD_CFLAGS += $(call cc-option, -fno-inline-functions-called-once)
 endif
 
 ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
-KBUILD_CFLAGS_KERNEL += -ffunction-sections -fdata-sections
+KBUILD_CFLAGS_KERNEL += -fdata-sections
 LDFLAGS_vmlinux += --gc-sections
 endif
 
@@ -880,6 +880,10 @@ ifdef CONFIG_LIVEPATCH
 KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
 endif
 
+ifneq ($(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION)$(CONFIG_FG_KASLR),)
+KBUILD_CFLAGS += -ffunction-sections
+endif
+
 ifdef CONFIG_SHADOW_CALL_STACK
 CC_FLAGS_SCS	:= -fsanitize=shadow-call-stack
 KBUILD_CFLAGS	+= $(CC_FLAGS_SCS)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 883da0abf779..e7a2db3e270d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -372,6 +372,10 @@ config CC_HAS_SANE_STACKPROTECTOR
 	   We have to make sure stack protector is unconditionally disabled if
 	   the compiler produces broken code.
 
+config ARCH_HAS_FG_KASLR
+	def_bool y
+	depends on RANDOMIZE_BASE && X86_64
+
 menu "Processor type and features"
 
 config ZONE_DMA
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index db600ef218d7..a5552cf28d5d 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -93,14 +93,12 @@
  * sections to be brought in with rodata.
  */
 #ifdef CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
-#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
 #define DATA_MAIN .data .data.[0-9a-zA-Z_]* .data..LPBX*
 #define SDATA_MAIN .sdata .sdata.[0-9a-zA-Z_]*
 #define RODATA_MAIN .rodata .rodata.[0-9a-zA-Z_]*
 #define BSS_MAIN .bss .bss.[0-9a-zA-Z_]*
 #define SBSS_MAIN .sbss .sbss.[0-9a-zA-Z_]*
 #else
-#define TEXT_MAIN .text
 #define DATA_MAIN .data
 #define SDATA_MAIN .sdata
 #define RODATA_MAIN .rodata
@@ -108,6 +106,20 @@
 #define SBSS_MAIN .sbss
 #endif
 
+/*
+ * Both LD_DEAD_CODE_DATA_ELIMINATION and CONFIG_FG_KASLR options enable
+ * -ffunction-sections, which produces separately named .text sections. In
+ * the case of CONFIG_FG_KASLR, they need to stay distict so they can be
+ * separately randomized. Without CONFIG_FG_KASLR, the separate .text
+ * sections can be collected back into a common section, which makes the
+ * resulting image slightly smaller
+ */
+#if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) && !defined(CONFIG_FG_KASLR)
+#define TEXT_MAIN .text .text.[0-9a-zA-Z_]*
+#else
+#define TEXT_MAIN .text
+#endif
+
 /*
  * Align to a 32 byte boundary equal to the
  * alignment gcc 4.5 uses for a struct
diff --git a/init/Kconfig b/init/Kconfig
index 0498af567f70..82f042a1062f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1990,6 +1990,20 @@ config PROFILING
 config TRACEPOINTS
 	bool
 
+config FG_KASLR
+	bool "Function Granular Kernel Address Space Layout Randomization"
+	depends on $(cc-option, -ffunction-sections)
+	depends on ARCH_HAS_FG_KASLR
+	default n
+	help
+	  This option improves the randomness of the kernel text
+	  over basic Kernel Address Space Layout Randomization (KASLR)
+	  by reordering the kernel text at boot time. This feature
+	  uses information generated at compile time to re-layout the
+	  kernel text section at boot time at function level granularity.
+
+	  If unsure, say N.
+
 endmenu		# General setup
 
 source "arch/Kconfig"
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 05/10] x86: Make sure _etext includes function sections
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (3 preceding siblings ...)
  2020-07-17 17:00 ` [PATCH v4 04/10] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
@ 2020-07-17 17:00 ` Kristen Carlson Accardi
  2020-07-17 17:00 ` [PATCH v4 06/10] x86/tools: Add relative relocs for randomized functions Kristen Carlson Accardi
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 17:00 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin, Arnd Bergmann
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck, linux-arch

When using -ffunction-sections to place each function in
it's own text section so it can be randomized at load time, the
linker considers these .text.* sections "orphaned sections", and
will place them after the first similar section (.text). In order
to accurately represent the end of the text section and the
orphaned sections, _etext must be moved so that it is after both
.text and .text.* The text size must also be calculated to
include .text AND .text.*

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/kernel/vmlinux.lds.S     | 17 +++++++++++++++--
 include/asm-generic/vmlinux.lds.h |  2 +-
 2 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3bfc8dd8a43d..e8da7eeb4d8d 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -146,9 +146,22 @@ SECTIONS
 #endif
 	} :text =0xcccc
 
-	/* End of text section, which should occupy whole number of pages */
-	_etext = .;
+	/*
+	 * -ffunction-sections creates .text.* sections, which are considered
+	 * "orphan sections" and added after the first similar section (.text).
+	 * Placing this ALIGN statement before _etext causes the address of
+	 * _etext to be below that of all the .text.* orphaned sections
+	 */
 	. = ALIGN(PAGE_SIZE);
+	_etext = .;
+
+	/*
+	 * the size of the .text section is used to calculate the address
+	 * range for orc lookups. If we just use SIZEOF(.text), we will
+	 * miss all the .text.* sections. Calculate the size using _etext
+	 * and _stext and save the value for later.
+	 */
+	text_size = _etext - _stext;
 
 	X86_ALIGN_RODATA_BEGIN
 	RO_DATA(PAGE_SIZE)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index a5552cf28d5d..34eab6513fdc 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -835,7 +835,7 @@
 	. = ALIGN(4);							\
 	.orc_lookup : AT(ADDR(.orc_lookup) - LOAD_OFFSET) {		\
 		orc_lookup = .;						\
-		. += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) /	\
+		. += (((text_size + LOOKUP_BLOCK_SIZE - 1) /	\
 			LOOKUP_BLOCK_SIZE) + 1) * 4;			\
 		orc_lookup_end = .;					\
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 06/10] x86/tools: Add relative relocs for randomized functions
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (4 preceding siblings ...)
  2020-07-17 17:00 ` [PATCH v4 05/10] x86: Make sure _etext includes function sections Kristen Carlson Accardi
@ 2020-07-17 17:00 ` Kristen Carlson Accardi
  2020-07-17 17:00 ` [PATCH v4 07/10] x86/boot/compressed: Avoid duplicate malloc() implementations Kristen Carlson Accardi
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 17:00 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck

When reordering functions, the relative offsets for relocs that
are either in the randomized sections, or refer to the randomized
sections will need to be adjusted. Add code to detect whether a
reloc satisfies these cases, and if so, add them to the appropriate
reloc list.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 arch/x86/boot/compressed/Makefile |  7 +++++-
 arch/x86/tools/relocs.c           | 41 ++++++++++++++++++++++++++++---
 arch/x86/tools/relocs.h           |  4 +--
 arch/x86/tools/relocs_common.c    | 15 +++++++----
 4 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 7619742f91c9..c17b1c8ec82c 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -119,6 +119,11 @@ $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
 	$(call if_changed,check-and-link-vmlinux)
 
 OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
+
+ifdef CONFIG_FG_KASLR
+	RELOCS_ARGS += --fg-kaslr
+endif
+
 $(obj)/vmlinux.bin: vmlinux FORCE
 	$(call if_changed,objcopy)
 
@@ -126,7 +131,7 @@ targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relo
 
 CMD_RELOCS = arch/x86/tools/relocs
 quiet_cmd_relocs = RELOCS  $@
-      cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
+      cmd_relocs = $(CMD_RELOCS) $(RELOCS_ARGS) $< > $@;$(CMD_RELOCS) $(RELOCS_ARGS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
 	$(call if_changed,relocs)
 
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index 31b2d151aa63..e0665038742e 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -42,6 +42,8 @@ struct section {
 };
 static struct section *secs;
 
+static int fgkaslr_mode;
+
 static const char * const sym_regex_kernel[S_NSYMTYPES] = {
 /*
  * Following symbols have been audited. There values are constant and do
@@ -818,6 +820,32 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
+static int is_function_section(struct section *sec)
+{
+	const char *name;
+
+	if (!fgkaslr_mode)
+		return 0;
+
+	name = sec_name(sec->shdr.sh_info);
+
+	return(!strncmp(name, ".text.", 6));
+}
+
+static int is_randomized_sym(ElfW(Sym) *sym)
+{
+	const char *name;
+
+	if (!fgkaslr_mode)
+		return 0;
+
+	if (sym->st_shndx > shnum)
+		return 0;
+
+	name = sec_name(sym_index(sym));
+	return(!strncmp(name, ".text.", 6));
+}
+
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
 {
@@ -842,13 +870,17 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 	case R_X86_64_PC32:
 	case R_X86_64_PLT32:
 		/*
-		 * PC relative relocations don't need to be adjusted unless
-		 * referencing a percpu symbol.
+		 * we need to keep pc relative relocations for sections which
+		 * might be randomized, and for the percpu section.
+		 * We also need to keep relocations for any offset which might
+		 * reference an address in a section which has been randomized.
 		 *
 		 * NB: R_X86_64_PLT32 can be treated as R_X86_64_PC32.
 		 */
-		if (is_percpu_sym(sym, symname))
+		if (is_function_section(sec) || is_randomized_sym(sym) ||
+		    is_percpu_sym(sym, symname))
 			add_reloc(&relocs32neg, offset);
+
 		break;
 
 	case R_X86_64_PC64:
@@ -1158,8 +1190,9 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 	     int show_absolute_syms, int show_absolute_relocs,
-	     int show_reloc_info)
+	     int show_reloc_info, int fgkaslr)
 {
+	fgkaslr_mode = fgkaslr;
 	regex_init(use_real_mode);
 	read_ehdr(fp);
 	read_shdrs(fp);
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 43c83c0fd22c..f582895c04dd 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -31,8 +31,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int fgkaslr);
 void process_64(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int fgkaslr);
 #endif /* RELOCS_H */
diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
index 6634352a20bc..a80efa2f53ff 100644
--- a/arch/x86/tools/relocs_common.c
+++ b/arch/x86/tools/relocs_common.c
@@ -12,14 +12,14 @@ void die(char *fmt, ...)
 
 static void usage(void)
 {
-	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \
-	    " vmlinux\n");
+	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode|" \
+	    "--fg-kaslr] vmlinux\n");
 }
 
 int main(int argc, char **argv)
 {
 	int show_absolute_syms, show_absolute_relocs, show_reloc_info;
-	int as_text, use_real_mode;
+	int as_text, use_real_mode, fgkaslr_opt;
 	const char *fname;
 	FILE *fp;
 	int i;
@@ -30,6 +30,7 @@ int main(int argc, char **argv)
 	show_reloc_info = 0;
 	as_text = 0;
 	use_real_mode = 0;
+	fgkaslr_opt = 0;
 	fname = NULL;
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
@@ -54,6 +55,10 @@ int main(int argc, char **argv)
 				use_real_mode = 1;
 				continue;
 			}
+			if (strcmp(arg, "--fg-kaslr") == 0) {
+				fgkaslr_opt = 1;
+				continue;
+			}
 		}
 		else if (!fname) {
 			fname = arg;
@@ -75,11 +80,11 @@ int main(int argc, char **argv)
 	if (e_ident[EI_CLASS] == ELFCLASS64)
 		process_64(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, fgkaslr_opt);
 	else
 		process_32(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, fgkaslr_opt);
 	fclose(fp);
 	return 0;
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 07/10] x86/boot/compressed: Avoid duplicate malloc() implementations
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (5 preceding siblings ...)
  2020-07-17 17:00 ` [PATCH v4 06/10] x86/tools: Add relative relocs for randomized functions Kristen Carlson Accardi
@ 2020-07-17 17:00 ` Kristen Carlson Accardi
  2020-07-17 17:00 ` [PATCH v4 08/10] x86: Add support for function granular KASLR Kristen Carlson Accardi
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 17:00 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi

From: Kees Cook <keescook@chromium.org>

The preboot malloc() (and free()) implementation in
include/linux/decompress/mm.h (which is also included by the
static decompressors) is static. This is fine when the only thing
interested in using malloc() is the decompression code, but the
x86 preboot environment uses malloc() in a couple places, leading to a
potential collision when the static copies of the available memory
region ("malloc_ptr") gets reset to the global "free_mem_ptr" value.
As it happens, the existing usage pattern happened to be safe because each
user did 1 malloc() and 1 free() before returning and were not nested:

extract_kernel() (misc.c)
	choose_random_location() (kaslr.c)
		mem_avoid_init()
			handle_mem_options()
				malloc()
				...
				free()
	...
	parse_elf() (misc.c)
		malloc()
		...
		free()

Adding FGKASLR, however, will insert additional malloc() calls local to
fgkaslr.c in the middle of parse_elf()'s malloc()/free() pair:

	parse_elf() (misc.c)
		malloc()
		if (...) {
			layout_randomized_image(output, &ehdr, phdrs);
				malloc() <- boom
				...
		else
			layout_image(output, &ehdr, phdrs);
		free()

To avoid collisions, there must be a single implementation of malloc().
Adjust include/linux/decompress/mm.h so that visibility can be
controlled, provide prototypes in misc.h, and implement the functions in
misc.c. This also results in a small size savings:

$ size vmlinux.before vmlinux.after
   text    data     bss     dec     hex filename
8842314     468  178320 9021102  89a6ae vmlinux.before
8842240     468  178320 9021028  89a664 vmlinux.after

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/boot/compressed/kaslr.c |  4 ----
 arch/x86/boot/compressed/misc.c  |  3 +++
 arch/x86/boot/compressed/misc.h  |  2 ++
 include/linux/decompress/mm.h    | 12 ++++++++++--
 4 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index d7408af55738..6f596bd5b6e5 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -39,10 +39,6 @@
 #include <generated/utsrelease.h>
 #include <asm/efi.h>
 
-/* Macros used by the included decompressor code below. */
-#define STATIC
-#include <linux/decompress/mm.h>
-
 #ifdef CONFIG_X86_5LEVEL
 unsigned int __pgtable_l5_enabled;
 unsigned int pgdir_shift __ro_after_init = 39;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 9652d5c2afda..90a4b64b3037 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -28,6 +28,9 @@
 
 /* Macros used by the included decompressor code below. */
 #define STATIC		static
+/* Define an externally visible malloc()/free(). */
+#define MALLOC_VISIBLE
+#include <linux/decompress/mm.h>
 
 /*
  * Use normal definitions of mem*() from string.c. There are already
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 726e264410ff..81fbc8d686fa 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -39,6 +39,8 @@
 /* misc.c */
 extern memptr free_mem_ptr;
 extern memptr free_mem_end_ptr;
+extern void *malloc(int size);
+extern void free(void *where);
 extern struct boot_params *boot_params;
 void __putstr(const char *s);
 void __puthex(unsigned long value);
diff --git a/include/linux/decompress/mm.h b/include/linux/decompress/mm.h
index 868e9eacd69e..9192986b1a73 100644
--- a/include/linux/decompress/mm.h
+++ b/include/linux/decompress/mm.h
@@ -25,13 +25,21 @@
 #define STATIC_RW_DATA static
 #endif
 
+/*
+ * When an architecture needs to share the malloc()/free() implementation
+ * between compilation units, it needs to have non-local visibility.
+ */
+#ifndef MALLOC_VISIBLE
+#define MALLOC_VISIBLE static
+#endif
+
 /* A trivial malloc implementation, adapted from
  *  malloc by Hannu Savolainen 1993 and Matthias Urlichs 1994
  */
 STATIC_RW_DATA unsigned long malloc_ptr;
 STATIC_RW_DATA int malloc_count;
 
-static void *malloc(int size)
+MALLOC_VISIBLE void *malloc(int size)
 {
 	void *p;
 
@@ -52,7 +60,7 @@ static void *malloc(int size)
 	return p;
 }
 
-static void free(void *where)
+MALLOC_VISIBLE void free(void *where)
 {
 	malloc_count--;
 	if (!malloc_count)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 08/10] x86: Add support for function granular KASLR
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (6 preceding siblings ...)
  2020-07-17 17:00 ` [PATCH v4 07/10] x86/boot/compressed: Avoid duplicate malloc() implementations Kristen Carlson Accardi
@ 2020-07-17 17:00 ` Kristen Carlson Accardi
  2020-07-17 17:00 ` [PATCH v4 09/10] kallsyms: Hide layout Kristen Carlson Accardi
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 17:00 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, Jonathan Corbet, x86, H. Peter Anvin
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck, linux-doc

This commit contains the changes required to re-layout the kernel text
sections generated by -ffunction-sections shortly after decompression.
Documentation of the feature is also added.

After decompression, the decompressed image's elf headers are parsed.
In order to manually update certain data structures that are built with
relative offsets during the kernel build process, certain symbols are
not stripped by objdump and their location is retained in the elf symbol
tables. These addresses are saved.

If the image was built with -ffunction-sections, there will be ELF section
headers present which contain information about the address range of each
section. Anything that is not broken out into function sections (i.e. is
consolidated into .text) is left in it's original location, but any other
executable section which begins with ".text." is located and shuffled
randomly within the remaining text segment address range.

After the sections have been copied to their new locations, but before
relocations have been applied, the kallsyms tables must be updated to
reflect the new symbol locations. Because it is expected that these tables
will be sorted by address, the kallsyms tables will need to be sorted
after the update.

When applying relocations, the address of the relocation needs to be
adjusted by the offset from the original location of the section that was
randomized to it's new location. In addition, if a value at that relocation
was a location in the text segment that was randomized, it's value will be
adjusted to a new location.

After relocations have been applied, the exception table must be updated
with with new symbol locations, and then re-sorted by the new address. The
orc table will have been updated as part of applying relocations, but since
it is expected to be sorted by address, it will need to be resorted.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
---
 .../admin-guide/kernel-parameters.txt         |   7 +
 Documentation/security/fgkaslr.rst            | 172 ++++
 Documentation/security/index.rst              |   1 +
 arch/x86/boot/compressed/Makefile             |   2 +
 arch/x86/boot/compressed/fgkaslr.c            | 811 ++++++++++++++++++
 arch/x86/boot/compressed/misc.c               | 154 +++-
 arch/x86/boot/compressed/misc.h               |  28 +
 arch/x86/boot/compressed/utils.c              |  11 +
 arch/x86/boot/compressed/vmlinux.symbols      |  17 +
 arch/x86/include/asm/boot.h                   |  15 +-
 include/uapi/linux/elf.h                      |   1 +
 11 files changed, 1191 insertions(+), 28 deletions(-)
 create mode 100644 Documentation/security/fgkaslr.rst
 create mode 100644 arch/x86/boot/compressed/fgkaslr.c
 create mode 100644 arch/x86/boot/compressed/utils.c
 create mode 100644 arch/x86/boot/compressed/vmlinux.symbols

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index fb95fad81c79..0affa1458017 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2082,6 +2082,13 @@
 			kernel and module base offset ASLR (Address Space
 			Layout Randomization).
 
+	fgkaslr		[KNL]
+			Format: { "off" }
+			When CONFIG_FG_KASLR is set, setting this parameter
+			to "off" disables kernel function granular ASLR
+			(Address Space Layout Randomization).
+			See Documentation/security/fgkaslr.rst.
+
 	kasan_multi_shot
 			[KNL] Enforce KASAN (Kernel Address Sanitizer) to print
 			report on every invalid memory access. Without this
diff --git a/Documentation/security/fgkaslr.rst b/Documentation/security/fgkaslr.rst
new file mode 100644
index 000000000000..d52af50d6715
--- /dev/null
+++ b/Documentation/security/fgkaslr.rst
@@ -0,0 +1,172 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=====================================================================
+Function Granular Kernel Address Space Layout Randomization (fgkaslr)
+=====================================================================
+
+:Date: 6 April 2020
+:Author: Kristen Accardi
+
+Kernel Address Space Layout Randomization (KASLR) was merged into the kernel
+with the objective of increasing the difficulty of code reuse attacks. Code
+reuse attacks reused existing code snippets to get around existing memory
+protections. They exploit software bugs which expose addresses of useful code
+snippets to control the flow of execution for their own nefarious purposes.
+KASLR as it was originally implemented moves the entire kernel code text as a
+unit at boot time in order to make addresses less predictable. The order of the
+code within the segment is unchanged - only the base address is shifted. There
+are a few shortcomings to this algorithm.
+
+1. Low Entropy - there are only so many locations the kernel can fit in. This
+   means an attacker could guess without too much trouble.
+2. Knowledge of a single address can reveal the offset of the base address,
+   exposing all other locations for a published/known kernel image.
+3. Info leaks abound.
+
+Finer grained ASLR has been proposed as a way to make ASLR more resistant
+to info leaks. It is not a new concept at all, and there are many variations
+possible. Function reordering is an implementation of finer grained ASLR
+which randomizes the layout of an address space on a function level
+granularity. The term "fgkaslr" is used in this document to refer to the
+technique of function reordering when used with KASLR, as well as finer grained
+KASLR in general.
+
+The objective of this patch set is to improve a technology that is already
+merged into the kernel (KASLR). This code will not prevent all code reuse
+attacks, and should be considered as one of several tools that can be used.
+
+Implementation Details
+======================
+
+The over-arching objective of the fgkaslr implementation is incremental
+improvement over the existing KASLR algorithm. It is designed to work with
+the existing solution, and there are two main area where code changes occur:
+Build time, and Load time.
+
+Build time
+----------
+
+GCC has had an option to place functions into individual .text sections
+for many years now (-ffunction-sections). This option is used to implement
+function reordering at load time. The final compiled vmlinux retains all the
+section headers, which can be used to help find the address ranges of each
+function. Using this information and an expanded table of relocation addresses,
+individual text sections can be shuffled immediately after decompression.
+Some data tables inside the kernel that have assumptions about order
+require sorting after the update. In order to modify these tables,
+a few key symbols from the objcopy symbol stripping process are preserved
+for use after shuffling the text segments. Any special input sections which are
+defined by the kernel build process and collected into the .text output
+segment are left unmodified and will still be present inside the .text segment,
+unrandomized other than normal base address randomization.
+
+Load time
+---------
+
+The boot kernel was modified to parse the vmlinux elf file after
+decompression to check for symbols for modifying data tables, and to
+look for any .text.* sections to randomize. The sections are then shuffled,
+and tables are updated or resorted. The existing code which updated relocation
+addresses was modified to account for not just a fixed delta from the load
+address, but the offset that the function section was moved to. This requires
+inspection of each address to see if it was impacted by a randomization.
+
+In order to hide the new layout, symbols reported through /proc/kallsyms will
+be sorted by name alphabetically rather than by address.
+
+Performance Impact
+==================
+
+There are two areas where function reordering can impact performance: boot
+time latency, and run time performance.
+
+Boot time latency
+-----------------
+
+This implementation of finer grained KASLR impacts the boot time of the kernel
+in several places. It requires additional parsing of the kernel ELF file to
+obtain the section headers of the sections to be randomized. It calls the
+random number generator for each section to be randomized to determine that
+section's new memory location. It copies the decompressed kernel into a new
+area of memory to avoid corruption when laying out the newly randomized
+sections. It increases the number of relocations the kernel has to perform at
+boot time vs. standard KASLR, and it also requires a lookup on each address
+that needs to be relocated to see if it was in a randomized section and needs
+to be adjusted by a new offset. Finally, it re-sorts a few data tables that
+are required to be sorted by address.
+
+Booting a test VM on a modern, well appointed system showed an increase in
+latency of approximately 1 second.
+
+Run time
+--------
+
+The performance impact at run-time of function reordering varies by workload.
+Randomly reordering the functions will cause an increase in cache misses
+for some workloads. Some workloads perform significantly worse under FGKASLR,
+while others stay the same or even improve. In general, it will depend on the
+code flow whether or not finer grained KASLR will impact a workload, and how
+the underlying code was designed. Because the layout changes per boot, each
+time a system is rebooted the performance of a workload may change.
+
+Image Size
+==========
+
+fgkaslr increases the size of the kernel binary due to the extra section
+headers that are included, as well as the extra relocations that need to
+be added. You can expect fgkaslr to increase the size of the resulting
+vmlinux by about 3%, and the compressed image (bzImage) by 15%.
+
+Memory Usage
+============
+
+fgkaslr increases the amount of heap that is required at boot time,
+although this extra memory is released when the kernel has finished
+decompression. As a result, it may not be appropriate to use this feature
+on systems without much memory.
+
+Building
+========
+
+To enable fine grained KASLR, you need to have the following config options
+set (including all the ones you would use to build normal KASLR)
+
+``CONFIG_FG_KASLR=y``
+
+fgkaslr for the kernel is only supported for the X86_64 architecture.
+
+Modules
+=======
+
+Modules are randomized similarly to the rest of the kernel by shuffling
+the sections at load time prior to moving them into memory. The module must
+also have been build with the -ffunction-sections compiler option.
+
+Although fgkaslr for the kernel is only supported for the X86_64 architecture,
+it is possible to use fgkaslr with modules on other architectures. To enable
+this feature, select the following config option:
+
+``CONFIG_MODULE_FG_KASLR``
+
+This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.
+
+Disabling
+=========
+
+Disabling normal kaslr using the nokaslr command line option also disables
+fgkaslr. In addition, it is possible to disable fgkaslr separately by booting
+with fgkaslr=off on the commandline.
+
+Further Information
+===================
+
+There are a lot of academic papers which explore finer grained ASLR.
+This paper in particular contributed significantly to the implementation design.
+
+Selfrando: Securing the Tor Browser against De-anonymization Exploits,
+M. Conti, S. Crane, T. Frassetto, et al.
+
+For more information on how function layout impacts performance, see:
+
+Optimizing Function Placement for Large-Scale Data-Center Applications,
+G. Ottoni, B. Maher
diff --git a/Documentation/security/index.rst b/Documentation/security/index.rst
index 8129405eb2cc..19677beb33d4 100644
--- a/Documentation/security/index.rst
+++ b/Documentation/security/index.rst
@@ -7,6 +7,7 @@ Security Documentation
 
    credentials
    IMA-templates
+   fgkaslr
    keys/index
    lsm
    lsm-development
diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index c17b1c8ec82c..1508995dc2ed 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -82,6 +82,7 @@ vmlinux-objs-y := $(obj)/vmlinux.lds $(obj)/kernel_info.o $(obj)/head_$(BITS).o
 
 vmlinux-objs-$(CONFIG_EARLY_PRINTK) += $(obj)/early_serial_console.o
 vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr.o
+vmlinux-objs-$(CONFIG_FG_KASLR) += $(obj)/utils.o $(obj)/fgkaslr.o
 ifdef CONFIG_X86_64
 	vmlinux-objs-$(CONFIG_RANDOMIZE_BASE) += $(obj)/kaslr_64.o
 	vmlinux-objs-y += $(obj)/mem_encrypt.o
@@ -122,6 +123,7 @@ OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 
 ifdef CONFIG_FG_KASLR
 	RELOCS_ARGS += --fg-kaslr
+	OBJCOPYFLAGS += --keep-symbols=$(srctree)/$(src)/vmlinux.symbols
 endif
 
 $(obj)/vmlinux.bin: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/fgkaslr.c b/arch/x86/boot/compressed/fgkaslr.c
new file mode 100644
index 000000000000..d08f0cc0217f
--- /dev/null
+++ b/arch/x86/boot/compressed/fgkaslr.c
@@ -0,0 +1,811 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fgkaslr.c
+ *
+ * This contains the routines needed to reorder the kernel text section
+ * at boot time.
+ */
+#include "misc.h"
+#include "error.h"
+#include "pgtable.h"
+#include "../string.h"
+#include "../voffset.h"
+#include <linux/sort.h>
+#include <linux/bsearch.h>
+#include "../../include/asm/extable.h"
+#include "../../include/asm/orc_types.h"
+
+/*
+ * Longest parameter of 'fgkaslr=' is 'off' right now, plus an extra '\0'
+ * for termination.
+ */
+#define MAX_FGKASLR_ARG_LENGTH 4
+static int nofgkaslr;
+
+/*
+ * Use normal definitions of mem*() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memcpy
+#undef memset
+#define memzero(s, n)	memset((s), 0, (n))
+#define memmove		memmove
+
+void *memmove(void *dest, const void *src, size_t n);
+
+static unsigned long percpu_start;
+static unsigned long percpu_end;
+
+static long addr_kallsyms_names;
+static long addr_kallsyms_offsets;
+static long addr_kallsyms_num_syms;
+static long addr_kallsyms_relative_base;
+static long addr_kallsyms_markers;
+static long addr___start___ex_table;
+static long addr___stop___ex_table;
+static long addr__stext;
+static long addr__etext;
+static long addr__sinittext;
+static long addr__einittext;
+static long addr___start_orc_unwind_ip;
+static long addr___stop_orc_unwind_ip;
+static long addr___start_orc_unwind;
+
+/* addresses in mapped address space */
+static int *base;
+static u8 *names;
+static unsigned long relative_base;
+static unsigned int *markers_addr;
+
+struct kallsyms_name {
+	u8 len;
+	u8 indecis[256];
+};
+
+static struct kallsyms_name *names_table;
+
+static struct orc_entry *cur_orc_table;
+static int *cur_orc_ip_table;
+
+/* Array of pointers to sections headers for randomized sections */
+Elf_Shdr **sections;
+
+/* Number of elements in the randomized section header array (sections) */
+static int sections_size;
+
+/* Array of all section headers, randomized or otherwise */
+static Elf_Shdr *sechdrs;
+
+static bool is_text(long addr)
+{
+	if ((addr >= addr__stext && addr < addr__etext) ||
+	    (addr >= addr__sinittext && addr < addr__einittext))
+		return true;
+	return false;
+}
+
+bool is_percpu_addr(long pc, long offset)
+{
+	unsigned long ptr;
+	long address;
+
+	address = pc + offset + 4;
+
+	ptr = (unsigned long)address;
+
+	if (ptr >= percpu_start && ptr < percpu_end)
+		return true;
+
+	return false;
+}
+
+static int cmp_section_addr(const void *a, const void *b)
+{
+	unsigned long ptr = (unsigned long)a;
+	Elf_Shdr *s = *(Elf_Shdr **)b;
+	unsigned long end = s->sh_addr + s->sh_size;
+
+	if (ptr >= s->sh_addr && ptr < end)
+		return 0;
+
+	if (ptr < s->sh_addr)
+		return -1;
+
+	return 1;
+}
+
+/*
+ * Discover if the address is in a randomized section and if so, adjust
+ * by the saved offset.
+ */
+Elf_Shdr *adjust_address(long *address)
+{
+	Elf_Shdr **s;
+	Elf_Shdr *shdr;
+
+	if (nofgkaslr)
+		return NULL;
+
+	s = bsearch((const void *)*address, sections, sections_size, sizeof(*s),
+		    cmp_section_addr);
+	if (s) {
+		shdr = *s;
+		*address += shdr->sh_offset;
+		return shdr;
+	}
+
+	return NULL;
+}
+
+void adjust_relative_offset(long pc, long *value, Elf_Shdr *section)
+{
+	Elf_Shdr *s;
+	long address;
+
+	if (nofgkaslr)
+		return;
+
+	/*
+	 * sometimes we are updating a relative offset that would
+	 * normally be relative to the next instruction (such as a call).
+	 * In this case to calculate the target, you need to add 32bits to
+	 * the pc to get the next instruction value. However, sometimes
+	 * targets are just data that was stored in a table such as ksymtab
+	 * or cpu alternatives. In this case our target is not relative to
+	 * the next instruction.
+	 */
+
+	/* Calculate the address that this offset would call. */
+	if (!is_text(pc))
+		address = pc + *value;
+	else
+		address = pc + *value + 4;
+
+	/*
+	 * if the address is in section that was randomized,
+	 * we need to adjust the offset.
+	 */
+	s = adjust_address(&address);
+	if (s)
+		*value += s->sh_offset;
+
+	/*
+	 * If the PC that this offset was calculated for was in a section
+	 * that has been randomized, the value needs to be adjusted by the
+	 * same amount as the randomized section was adjusted from it's original
+	 * location.
+	 */
+	if (section)
+		*value -= section->sh_offset;
+}
+
+static void kallsyms_swp(void *a, void *b, int size)
+{
+	int idx1, idx2;
+	int temp;
+	struct kallsyms_name name_a;
+
+	/* Determine our index into the array. */
+	idx1 = (int *)a - base;
+	idx2 = (int *)b - base;
+	temp = base[idx1];
+	base[idx1] = base[idx2];
+	base[idx2] = temp;
+
+	/* Swap the names table. */
+	memcpy(&name_a, &names_table[idx1], sizeof(name_a));
+	memcpy(&names_table[idx1], &names_table[idx2],
+	       sizeof(struct kallsyms_name));
+	memcpy(&names_table[idx2], &name_a, sizeof(struct kallsyms_name));
+}
+
+static int kallsyms_cmp(const void *a, const void *b)
+{
+	int addr_a, addr_b;
+	unsigned long uaddr_a, uaddr_b;
+
+	addr_a = *(int *)a;
+	addr_b = *(int *)b;
+
+	if (addr_a >= 0)
+		uaddr_a = addr_a;
+	if (addr_b >= 0)
+		uaddr_b = addr_b;
+
+	if (addr_a < 0)
+		uaddr_a = relative_base - 1 - addr_a;
+	if (addr_b < 0)
+		uaddr_b = relative_base - 1 - addr_b;
+
+	if (uaddr_b > uaddr_a)
+		return -1;
+
+	return 0;
+}
+
+static void deal_with_names(int num_syms)
+{
+	int num_bytes;
+	int i, j;
+	int offset;
+
+	/* we should have num_syms kallsyms_name entries */
+	num_bytes = num_syms * sizeof(*names_table);
+	names_table = malloc(num_syms * sizeof(*names_table));
+	if (!names_table) {
+		debug_putstr("\nbytes requested: ");
+		debug_puthex(num_bytes);
+		error("\nunable to allocate space for names table\n");
+	}
+
+	/* read all the names entries */
+	offset = 0;
+	for (i = 0; i < num_syms; i++) {
+		names_table[i].len = names[offset];
+		offset++;
+		for (j = 0; j < names_table[i].len; j++) {
+			names_table[i].indecis[j] = names[offset];
+			offset++;
+		}
+	}
+}
+
+static void write_sorted_names(int num_syms)
+{
+	int i, j;
+	int offset = 0;
+	unsigned int *markers;
+
+	/*
+	 * we are going to need to regenerate the markers table, which is a
+	 * table of offsets into the compressed stream every 256 symbols.
+	 * this code copied almost directly from scripts/kallsyms.c
+	 */
+	markers = malloc(sizeof(unsigned int) * ((num_syms + 255) / 256));
+	if (!markers) {
+		debug_putstr("\nfailed to allocate heap space of ");
+		debug_puthex(((num_syms + 255) / 256));
+		debug_putstr(" bytes\n");
+		error("Unable to allocate space for markers table");
+	}
+
+	for (i = 0; i < num_syms; i++) {
+		if ((i & 0xFF) == 0)
+			markers[i >> 8] = offset;
+
+		names[offset] = (u8)names_table[i].len;
+		offset++;
+		for (j = 0; j < names_table[i].len; j++) {
+			names[offset] = (u8)names_table[i].indecis[j];
+			offset++;
+		}
+	}
+
+	/* write new markers table over old one */
+	for (i = 0; i < ((num_syms + 255) >> 8); i++)
+		markers_addr[i] = markers[i];
+
+	free(markers);
+	free(names_table);
+}
+
+static void sort_kallsyms(unsigned long map)
+{
+	int num_syms;
+	int i;
+
+	debug_putstr("\nRe-sorting kallsyms...\n");
+
+	num_syms = *(int *)(addr_kallsyms_num_syms + map);
+	base = (int *)(addr_kallsyms_offsets + map);
+	relative_base = *(unsigned long *)(addr_kallsyms_relative_base + map);
+	markers_addr = (unsigned int *)(addr_kallsyms_markers + map);
+	names = (u8 *)(addr_kallsyms_names + map);
+
+	/*
+	 * the kallsyms table was generated prior to any randomization.
+	 * it is a bunch of offsets from "relative base". In order for
+	 * us to check if a symbol has an address that was in a randomized
+	 * section, we need to reconstruct the address to it's original
+	 * value prior to handle_relocations.
+	 */
+	for (i = 0; i < num_syms; i++) {
+		unsigned long addr;
+		int new_base;
+
+		/*
+		 * according to kernel/kallsyms.c, positive offsets are absolute
+		 * values and negative offsets are relative to the base.
+		 */
+		if (base[i] >= 0)
+			addr = base[i];
+		else
+			addr = relative_base - 1 - base[i];
+
+		if (adjust_address(&addr)) {
+			/* here we need to recalcuate the offset */
+			new_base = relative_base - 1 - addr;
+			base[i] = new_base;
+		}
+	}
+
+	/*
+	 * here we need to read in all the kallsyms_names info
+	 * so that we can regenerate it.
+	 */
+	deal_with_names(num_syms);
+
+	sort(base, num_syms, sizeof(int), kallsyms_cmp, kallsyms_swp);
+
+	/* write the newly sorted names table over the old one */
+	write_sorted_names(num_syms);
+}
+
+/*
+ * We need to include this file here rather than in utils.c because
+ * some of the helper functions in extable.c are used to update
+ * the extable below and are defined as "static" in extable.c
+ */
+#include "../../../../lib/extable.c"
+
+static inline unsigned long
+ex_fixup_handler(const struct exception_table_entry *x)
+{
+	return ((unsigned long)&x->handler + x->handler);
+}
+
+static inline unsigned long
+ex_fixup_addr(const struct exception_table_entry *x)
+{
+	return ((unsigned long)&x->fixup + x->fixup);
+}
+
+static void update_ex_table(unsigned long map)
+{
+	struct exception_table_entry *start_ex_table =
+		(struct exception_table_entry *)(addr___start___ex_table + map);
+	struct exception_table_entry *stop_ex_table =
+		(struct exception_table_entry *)(addr___stop___ex_table + map);
+	int num_entries =
+		(addr___stop___ex_table - addr___start___ex_table) /
+		sizeof(struct exception_table_entry);
+	int i;
+
+	debug_putstr("\nUpdating exception table...");
+	for (i = 0; i < num_entries; i++) {
+		unsigned long insn = ex_to_insn(&start_ex_table[i]);
+		unsigned long fixup = ex_fixup_addr(&start_ex_table[i]);
+		unsigned long handler = ex_fixup_handler(&start_ex_table[i]);
+		unsigned long addr;
+		Elf_Shdr *s;
+
+		/* check each address to see if it needs adjusting */
+		addr = insn - map;
+		s = adjust_address(&addr);
+		if (s)
+			start_ex_table[i].insn += s->sh_offset;
+
+		addr = fixup - map;
+		s = adjust_address(&addr);
+		if (s)
+			start_ex_table[i].fixup += s->sh_offset;
+
+		addr = handler - map;
+		s = adjust_address(&addr);
+		if (s)
+			start_ex_table[i].handler += s->sh_offset;
+	}
+}
+
+static void sort_ex_table(unsigned long map)
+{
+	struct exception_table_entry *start_ex_table =
+		(struct exception_table_entry *)(addr___start___ex_table + map);
+	struct exception_table_entry *stop_ex_table =
+		(struct exception_table_entry *)(addr___stop___ex_table + map);
+
+	debug_putstr("\nRe-sorting exception table...");
+
+	sort_extable(start_ex_table, stop_ex_table);
+}
+
+static inline unsigned long orc_ip(const int *ip)
+{
+	return (unsigned long)ip + *ip;
+}
+
+static void orc_sort_swap(void *_a, void *_b, int size)
+{
+	struct orc_entry *orc_a, *orc_b;
+	struct orc_entry orc_tmp;
+	int *a = _a, *b = _b, tmp;
+	int delta = _b - _a;
+
+	/* Swap the .orc_unwind_ip entries: */
+	tmp = *a;
+	*a = *b + delta;
+	*b = tmp - delta;
+
+	/* Swap the corresponding .orc_unwind entries: */
+	orc_a = cur_orc_table + (a - cur_orc_ip_table);
+	orc_b = cur_orc_table + (b - cur_orc_ip_table);
+	orc_tmp = *orc_a;
+	*orc_a = *orc_b;
+	*orc_b = orc_tmp;
+}
+
+static int orc_sort_cmp(const void *_a, const void *_b)
+{
+	struct orc_entry *orc_a;
+	const int *a = _a, *b = _b;
+	unsigned long a_val = orc_ip(a);
+	unsigned long b_val = orc_ip(b);
+
+	if (a_val > b_val)
+		return 1;
+	if (a_val < b_val)
+		return -1;
+
+	/*
+	 * The "weak" section terminator entries need to always be on the left
+	 * to ensure the lookup code skips them in favor of real entries.
+	 * These terminator entries exist to handle any gaps created by
+	 * whitelisted .o files which didn't get objtool generation.
+	 */
+	orc_a = cur_orc_table + (a - cur_orc_ip_table);
+	return orc_a->sp_reg == ORC_REG_UNDEFINED && !orc_a->end ? -1 : 1;
+}
+
+static void sort_orc_table(unsigned long map)
+{
+	int num_entries =
+		(addr___stop_orc_unwind_ip - addr___start_orc_unwind_ip) / sizeof(int);
+
+	cur_orc_ip_table = (int *)(addr___start_orc_unwind_ip + map);
+	cur_orc_table = (struct orc_entry *)(addr___start_orc_unwind + map);
+
+	debug_putstr("\nRe-sorting orc tables...\n");
+	sort(cur_orc_ip_table, num_entries, sizeof(int), orc_sort_cmp,
+	     orc_sort_swap);
+}
+
+void post_relocations_cleanup(unsigned long map)
+{
+	if (!nofgkaslr) {
+		update_ex_table(map);
+		sort_ex_table(map);
+		sort_orc_table(map);
+	}
+
+	/*
+	 * maybe one day free will do something. So, we "free" this memory
+	 * in either case
+	 */
+	free(sections);
+	free(sechdrs);
+}
+
+void pre_relocations_cleanup(unsigned long map)
+{
+	if (nofgkaslr)
+		return;
+
+	sort_kallsyms(map);
+}
+
+static void shuffle_sections(int *list, int size)
+{
+	int i;
+	unsigned long j;
+	int temp;
+
+	for (i = size - 1; i > 0; i--) {
+		j = kaslr_get_random_long(NULL) % (i + 1);
+
+		temp = list[i];
+		list[i] = list[j];
+		list[j] = temp;
+	}
+}
+
+static void move_text(int num_sections, char *secstrings, Elf_Shdr *text,
+		      void *source, void *dest, Elf64_Phdr *phdr)
+{
+	unsigned long adjusted_addr;
+	int copy_bytes;
+	void *stash;
+	Elf_Shdr **sorted_sections;
+	int *index_list;
+	int i, j;
+
+	memmove(dest, source + text->sh_offset, text->sh_size);
+	copy_bytes = text->sh_size;
+	dest += text->sh_size;
+	adjusted_addr = text->sh_addr + text->sh_size;
+
+	/*
+	 * we leave the sections sorted in their original order
+	 * by s->sh_addr, but shuffle the indexes in a random
+	 * order for copying.
+	 */
+	index_list = malloc(sizeof(int) * num_sections);
+	if (!index_list)
+		error("Failed to allocate space for index list");
+
+	for (i = 0; i < num_sections; i++)
+		index_list[i] = i;
+
+	shuffle_sections(index_list, num_sections);
+
+	/*
+	 * to avoid overwriting earlier sections before they can get
+	 * copied to dest, stash everything into a buffer first.
+	 * this will cause our source address to be off by
+	 * phdr->p_offset though, so we'll adjust s->sh_offset below.
+	 *
+	 * TBD: ideally we'd simply decompress higher up so that our
+	 * copy wasn't in danger of overwriting anything important.
+	 */
+	stash = malloc(phdr->p_filesz);
+	if (!stash)
+		error("Failed to allocate space for text stash");
+
+	memcpy(stash, source + phdr->p_offset, phdr->p_filesz);
+
+	/* now we'd walk through the sections. */
+	for (j = 0; j < num_sections; j++) {
+		unsigned long aligned_addr;
+		Elf_Shdr *s;
+		const char *sname;
+		void *src;
+		int pad_bytes;
+
+		s = sections[index_list[j]];
+
+		sname = secstrings + s->sh_name;
+
+		/* align addr for this section */
+		aligned_addr = ALIGN(adjusted_addr, s->sh_addralign);
+
+		/*
+		 * copy out of stash, so adjust offset
+		 */
+		src = stash + s->sh_offset - phdr->p_offset;
+
+		/*
+		 * Fill any space between sections with int3
+		 */
+		pad_bytes = aligned_addr - adjusted_addr;
+		memset(dest, 0xcc, pad_bytes);
+
+		dest = (void *)ALIGN((unsigned long)dest, s->sh_addralign);
+
+		memmove(dest, src, s->sh_size);
+
+		dest += s->sh_size;
+		copy_bytes += s->sh_size + pad_bytes;
+		adjusted_addr = aligned_addr + s->sh_size;
+
+		/* we can blow away sh_offset for our own uses */
+		s->sh_offset = aligned_addr - s->sh_addr;
+	}
+
+	free(index_list);
+
+	/*
+	 * move remainder of text segment. Ok to just use original source
+	 * here since this area is untouched.
+	 */
+	memmove(dest, source + text->sh_offset + copy_bytes,
+		phdr->p_filesz - copy_bytes);
+	free(stash);
+}
+
+#define GET_SYM(name)							\
+	do {								\
+		if (!addr_ ## name) {					\
+			if (strcmp(#name, strtab + sym->st_name) == 0) {\
+				addr_ ## name = sym->st_value;		\
+				continue;				\
+			}						\
+		}							\
+	} while (0)
+
+static void parse_symtab(Elf64_Sym *symtab, char *strtab, long num_syms)
+{
+	Elf64_Sym *sym;
+
+	if (!symtab || !strtab)
+		return;
+
+	debug_putstr("\nLooking for symbols... ");
+
+	/*
+	 * walk through the symbol table looking for the symbols
+	 * that we care about.
+	 */
+	for (sym = symtab; --num_syms >= 0; sym++) {
+		if (!sym->st_name)
+			continue;
+
+		GET_SYM(kallsyms_num_syms);
+		GET_SYM(kallsyms_offsets);
+		GET_SYM(kallsyms_relative_base);
+		GET_SYM(kallsyms_names);
+		GET_SYM(kallsyms_markers);
+		GET_SYM(_stext);
+		GET_SYM(_etext);
+		GET_SYM(_sinittext);
+		GET_SYM(_einittext);
+		GET_SYM(__start_orc_unwind_ip);
+		GET_SYM(__stop_orc_unwind_ip);
+		GET_SYM(__start_orc_unwind);
+		GET_SYM(__start___ex_table);
+		GET_SYM(__stop___ex_table);
+	}
+}
+
+void layout_randomized_image(void *output, Elf64_Ehdr *ehdr, Elf64_Phdr *phdrs)
+{
+	Elf64_Phdr *phdr;
+	Elf_Shdr *s;
+	Elf_Shdr *text = NULL;
+	Elf_Shdr *percpu = NULL;
+	char *secstrings;
+	const char *sname;
+	int num_sections = 0;
+	Elf64_Sym *symtab = NULL;
+	char *strtab = NULL;
+	long num_syms = 0;
+	void *dest;
+	int i;
+	char arg[MAX_FGKASLR_ARG_LENGTH];
+	Elf_Shdr shdr;
+	unsigned long shnum;
+	unsigned int shstrndx;
+
+	debug_putstr("\nParsing ELF section headers... ");
+
+	/*
+	 * Even though fgkaslr may have been disabled, we still
+	 * need to parse through the section headers to get the
+	 * start and end of the percpu section. This is because
+	 * if we were built with CONFIG_FG_KASLR, there are more
+	 * relative relocations present in vmlinux.relocs than
+	 * just the percpu, and only the percpu relocs need to be
+	 * adjusted when using just normal base address kaslr.
+	 */
+	if (cmdline_find_option("fgkaslr", arg, sizeof(arg)) == 3 &&
+	    !strncmp(arg, "off", 3)) {
+		warn("FG_KASLR disabled on cmdline.");
+		nofgkaslr = 1;
+	}
+
+	/* read the first section header */
+	shnum = ehdr->e_shnum;
+	shstrndx = ehdr->e_shstrndx;
+	if (shnum == SHN_UNDEF || shstrndx == SHN_XINDEX) {
+		memcpy(&shdr, output + ehdr->e_shoff, sizeof(shdr));
+		if (shnum == SHN_UNDEF)
+			shnum = shdr.sh_size;
+		if (shstrndx == SHN_XINDEX)
+			shstrndx = shdr.sh_link;
+	}
+
+	/* we are going to need to allocate space for the section headers */
+	sechdrs = malloc(sizeof(*sechdrs) * shnum);
+	if (!sechdrs)
+		error("Failed to allocate space for shdrs");
+
+	sections = malloc(sizeof(*sections) * shnum);
+	if (!sections)
+		error("Failed to allocate space for section pointers");
+
+	memcpy(sechdrs, output + ehdr->e_shoff,
+	       sizeof(*sechdrs) * shnum);
+
+	/* we need to allocate space for the section string table */
+	s = &sechdrs[shstrndx];
+
+	secstrings = malloc(s->sh_size);
+	if (!secstrings)
+		error("Failed to allocate space for shstr");
+
+	memcpy(secstrings, output + s->sh_offset, s->sh_size);
+
+	/*
+	 * now we need to walk through the section headers and collect the
+	 * sizes of the .text sections to be randomized.
+	 */
+	for (i = 0; i < shnum; i++) {
+		s = &sechdrs[i];
+		sname = secstrings + s->sh_name;
+
+		if (s->sh_type == SHT_SYMTAB) {
+			/* only one symtab per image */
+			if (symtab)
+				error("Unexpected duplicate symtab");
+
+			symtab = malloc(s->sh_size);
+			if (!symtab)
+				error("Failed to allocate space for symtab");
+
+			memcpy(symtab, output + s->sh_offset, s->sh_size);
+			num_syms = s->sh_size / sizeof(*symtab);
+			continue;
+		}
+
+		if (s->sh_type == SHT_STRTAB && i != ehdr->e_shstrndx) {
+			if (strtab)
+				error("Unexpected duplicate strtab");
+
+			strtab = malloc(s->sh_size);
+			if (!strtab)
+				error("Failed to allocate space for strtab");
+
+			memcpy(strtab, output + s->sh_offset, s->sh_size);
+		}
+
+		if (!strcmp(sname, ".text")) {
+			if (text)
+				error("Unexpected duplicate .text section");
+
+			text = s;
+			continue;
+		}
+
+		if (!strcmp(sname, ".data..percpu")) {
+			/* get start addr for later */
+			percpu = s;
+			continue;
+		}
+
+		if (!(s->sh_flags & SHF_ALLOC) ||
+		    !(s->sh_flags & SHF_EXECINSTR) ||
+		    !(strstarts(sname, ".text")))
+			continue;
+
+		sections[num_sections] = s;
+
+		num_sections++;
+	}
+	sections[num_sections] = NULL;
+	sections_size = num_sections;
+
+	parse_symtab(symtab, strtab, num_syms);
+
+	for (i = 0; i < ehdr->e_phnum; i++) {
+		phdr = &phdrs[i];
+
+		switch (phdr->p_type) {
+		case PT_LOAD:
+			if ((phdr->p_align % 0x200000) != 0)
+				error("Alignment of LOAD segment isn't multiple of 2MB");
+			dest = output;
+			dest += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
+			if (!nofgkaslr &&
+			    (text && phdr->p_offset == text->sh_offset)) {
+				move_text(num_sections, secstrings, text,
+					  output, dest, phdr);
+			} else {
+				if (percpu &&
+				    phdr->p_offset == percpu->sh_offset) {
+					percpu_start = percpu->sh_addr;
+					percpu_end = percpu_start +
+							phdr->p_filesz;
+				}
+				memmove(dest, output + phdr->p_offset,
+					phdr->p_filesz);
+			}
+			break;
+		default: /* Ignore other PT_* */
+			break;
+		}
+	}
+
+	/* we need to keep the section info to redo relocs */
+	free(secstrings);
+
+	free(phdrs);
+}
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 90a4b64b3037..f52150ec3ee7 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -206,10 +206,21 @@ static void handle_relocations(void *output, unsigned long output_len,
 	if (IS_ENABLED(CONFIG_X86_64))
 		delta = virt_addr - LOAD_PHYSICAL_ADDR;
 
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
+	/*
+	 * it is possible to have delta be zero and still have enabled
+	 * fg kaslr. We need to perform relocations for fgkaslr regardless
+	 * of whether the base address has moved.
+	 */
+	if (!IS_ENABLED(CONFIG_FG_KASLR) ||
+	    cmdline_find_option_bool("nokaslr")) {
+		if (!delta) {
+			debug_putstr("No relocation needed... ");
+			return;
+		}
 	}
+
+	pre_relocations_cleanup(map);
+
 	debug_putstr("Performing relocations... ");
 
 	/*
@@ -233,35 +244,106 @@ static void handle_relocations(void *output, unsigned long output_len,
 	 */
 	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
 		long extended = *reloc;
+		long value;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("32-bit relocation outside of kernel!\n");
 
-		*(uint32_t *)ptr += delta;
+		value = *(int32_t *)ptr;
+
+		/*
+		 * If using fgkaslr, the value of the relocation
+		 * might need to be changed because it referred
+		 * to an address that has moved.
+		 */
+		adjust_address(&value);
+
+		value += delta;
+
+		*(uint32_t *)ptr = value;
 	}
 #ifdef CONFIG_X86_64
 	while (*--reloc) {
 		long extended = *reloc;
+		long value;
+		long oldvalue;
+		Elf64_Shdr *s;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		s = adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("inverse 32-bit relocation outside of kernel!\n");
 
-		*(int32_t *)ptr -= delta;
+		value = *(int32_t *)ptr;
+		oldvalue = value;
+
+		/*
+		 * If using fgkaslr, these relocs will contain
+		 * relative offsets which might need to be
+		 * changed because it referred
+		 * to an address that has moved.
+		 */
+		adjust_relative_offset(*reloc, &value, s);
+
+		/*
+		 * only percpu symbols need to have their values adjusted for
+		 * base address kaslr since relative offsets within the .text
+		 * and .text.* sections are ok wrt each other.
+		 */
+		if (is_percpu_addr(*reloc, oldvalue))
+			value -= delta;
+
+		*(int32_t *)ptr = value;
 	}
 	for (reloc--; *reloc; reloc--) {
 		long extended = *reloc;
+		long value;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("64-bit relocation outside of kernel!\n");
 
-		*(uint64_t *)ptr += delta;
+		value = *(int64_t *)ptr;
+
+		/*
+		 * If using fgkaslr, the value of the relocation
+		 * might need to be changed because it referred
+		 * to an address that has moved.
+		 */
+		adjust_address(&value);
+
+		value += delta;
+
+		*(uint64_t *)ptr = value;
 	}
+	post_relocations_cleanup(map);
 #endif
 }
 #else
@@ -270,6 +352,35 @@ static inline void handle_relocations(void *output, unsigned long output_len,
 { }
 #endif
 
+static void layout_image(void *output, Elf_Ehdr *ehdr, Elf_Phdr *phdrs)
+{
+	int i;
+	void *dest;
+	Elf_Phdr *phdr;
+
+	for (i = 0; i < ehdr->e_phnum; i++) {
+		phdr = &phdrs[i];
+
+		switch (phdr->p_type) {
+		case PT_LOAD:
+#ifdef CONFIG_X86_64
+			if ((phdr->p_align % 0x200000) != 0)
+				error("Alignment of LOAD segment isn't multiple of 2MB");
+#endif
+#ifdef CONFIG_RELOCATABLE
+			dest = output;
+			dest += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
+#else
+			dest = (void *)(phdr->p_paddr);
+#endif
+			memmove(dest, output + phdr->p_offset, phdr->p_filesz);
+			break;
+		default: /* Ignore other PT_* */
+			break;
+		}
+	}
+}
+
 static void parse_elf(void *output)
 {
 #ifdef CONFIG_X86_64
@@ -281,6 +392,7 @@ static void parse_elf(void *output)
 #endif
 	void *dest;
 	int i;
+	int nokaslr;
 
 	memcpy(&ehdr, output, sizeof(ehdr));
 	if (ehdr.e_ident[EI_MAG0] != ELFMAG0 ||
@@ -291,6 +403,12 @@ static void parse_elf(void *output)
 		return;
 	}
 
+	if (IS_ENABLED(CONFIG_FG_KASLR)) {
+		nokaslr = cmdline_find_option_bool("nokaslr");
+		if (nokaslr)
+			warn("FG_KASLR disabled: 'nokaslr' on cmdline.");
+	}
+
 	debug_putstr("Parsing ELF... ");
 
 	phdrs = malloc(sizeof(*phdrs) * ehdr.e_phnum);
@@ -299,26 +417,10 @@ static void parse_elf(void *output)
 
 	memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
 
-	for (i = 0; i < ehdr.e_phnum; i++) {
-		phdr = &phdrs[i];
-
-		switch (phdr->p_type) {
-		case PT_LOAD:
-#ifdef CONFIG_X86_64
-			if ((phdr->p_align % 0x200000) != 0)
-				error("Alignment of LOAD segment isn't multiple of 2MB");
-#endif
-#ifdef CONFIG_RELOCATABLE
-			dest = output;
-			dest += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
-#else
-			dest = (void *)(phdr->p_paddr);
-#endif
-			memmove(dest, output + phdr->p_offset, phdr->p_filesz);
-			break;
-		default: /* Ignore other PT_* */ break;
-		}
-	}
+	if (IS_ENABLED(CONFIG_FG_KASLR) && !nokaslr)
+		layout_randomized_image(output, &ehdr, phdrs);
+	else
+		layout_image(output, &ehdr, phdrs);
 
 	free(phdrs);
 }
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 81fbc8d686fa..fd0c63cfaa4a 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -76,6 +76,34 @@ struct mem_vector {
 	unsigned long long size;
 };
 
+#ifdef CONFIG_X86_64
+#define Elf_Ehdr Elf64_Ehdr
+#define Elf_Phdr Elf64_Phdr
+#define Elf_Shdr Elf64_Shdr
+#else
+#define Elf_Ehdr Elf32_Ehdr
+#define Elf_Phdr Elf32_Phdr
+#define Elf_Shdr Elf32_Shdr
+#endif
+
+#if CONFIG_FG_KASLR
+void layout_randomized_image(void *output, Elf_Ehdr *ehdr, Elf_Phdr *phdrs);
+void pre_relocations_cleanup(unsigned long map);
+void post_relocations_cleanup(unsigned long map);
+Elf_Shdr *adjust_address(long *address);
+void adjust_relative_offset(long pc, long *value, Elf_Shdr *section);
+bool is_percpu_addr(long pc, long offset);
+#else
+static inline void layout_randomized_image(void *output, Elf_Ehdr *ehdr,
+					   Elf_Phdr *phdrs) { }
+static inline void pre_relocations_cleanup(unsigned long map) { }
+static inline void post_relocations_cleanup(unsigned long map) { }
+static inline Elf_Shdr *adjust_address(long *address) { return NULL; }
+static inline void adjust_relative_offset(long pc, long *value,
+					  Elf_Shdr *section) { }
+static inline bool is_percpu_addr(long pc, long offset) { return true; }
+#endif /* CONFIG_FG_KASLR */
+
 #if CONFIG_RANDOMIZE_BASE
 /* kaslr.c */
 void choose_random_location(unsigned long input,
diff --git a/arch/x86/boot/compressed/utils.c b/arch/x86/boot/compressed/utils.c
new file mode 100644
index 000000000000..726f5b9092dc
--- /dev/null
+++ b/arch/x86/boot/compressed/utils.c
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * utils.c
+ *
+ * This contains various libraries that are needed for fgkaslr
+ */
+#define __DISABLE_EXPORTS
+#define _LINUX_KPROBES_H
+#define NOKPROBE_SYMBOL(fname)
+#include "../../../../lib/sort.c"
+#include "../../../../lib/bsearch.c"
diff --git a/arch/x86/boot/compressed/vmlinux.symbols b/arch/x86/boot/compressed/vmlinux.symbols
new file mode 100644
index 000000000000..cc86e79a2a3d
--- /dev/null
+++ b/arch/x86/boot/compressed/vmlinux.symbols
@@ -0,0 +1,17 @@
+kallsyms_offsets
+kallsyms_addresses
+kallsyms_num_syms
+kallsyms_relative_base
+kallsyms_names
+kallsyms_token_table
+kallsyms_token_index
+kallsyms_markers
+__start___ex_table
+__stop___ex_table
+_sinittext
+_einittext
+_stext
+_etext
+__start_orc_unwind_ip
+__stop_orc_unwind_ip
+__start_orc_unwind
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 680c320363db..6918d33eb5ef 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -26,8 +26,19 @@
 
 #ifdef CONFIG_KERNEL_BZIP2
 # define BOOT_HEAP_SIZE		0x400000
-#else /* !CONFIG_KERNEL_BZIP2 */
-# define BOOT_HEAP_SIZE		 0x10000
+#elif CONFIG_FG_KASLR
+/*
+ * We need extra boot heap when using fgkaslr because we make a copy
+ * of the original decompressed kernel to avoid issues with writing
+ * over ourselves when shuffling the sections. We also need extra
+ * space for resorting kallsyms after shuffling. This value could
+ * be decreased if free() would release memory properly, or if we
+ * could avoid the kernel copy. It would need to be increased if we
+ * find additional tables that need to be resorted.
+ */
+# define BOOT_HEAP_SIZE		0x4000000
+#else /* !CONFIG_KERNEL_BZIP2 && !CONFIG_FG_KASLR */
+# define BOOT_HEAP_SIZE		0x10000
 #endif
 
 #ifdef CONFIG_X86_64
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index c6dd0215482e..4ef6360bd18c 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -299,6 +299,7 @@ typedef struct elf64_phdr {
 #define SHN_LIVEPATCH	0xff20
 #define SHN_ABS		0xfff1
 #define SHN_COMMON	0xfff2
+#define SHN_XINDEX	0xffff
 #define SHN_HIRESERVE	0xffff
  
 typedef struct elf32_shdr {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 09/10] kallsyms: Hide layout
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (7 preceding siblings ...)
  2020-07-17 17:00 ` [PATCH v4 08/10] x86: Add support for function granular KASLR Kristen Carlson Accardi
@ 2020-07-17 17:00 ` Kristen Carlson Accardi
  2020-07-20  1:25   ` Kees Cook
  2020-07-17 17:00 ` [PATCH v4 10/10] module: Reorder functions Kristen Carlson Accardi
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 17:00 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp
  Cc: arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Tony Luck

This patch makes /proc/kallsyms display in a random order, rather
than sorted by address in order to hide the newly randomized address
layout.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 kernel/kallsyms.c | 163 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 162 insertions(+), 1 deletion(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index bb14e64f62a4..45d147f7f10e 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -446,6 +446,12 @@ struct kallsym_iter {
 	int show_value;
 };
 
+struct kallsyms_shuffled_iter {
+	struct kallsym_iter iter;
+	loff_t total_syms;
+	loff_t shuffled_index[];
+};
+
 int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value,
 			    char *type, char *name)
 {
@@ -661,7 +667,7 @@ bool kallsyms_show_value(const struct cred *cred)
 	}
 }
 
-static int kallsyms_open(struct inode *inode, struct file *file)
+static int __kallsyms_open(struct inode *inode, struct file *file)
 {
 	/*
 	 * We keep iterator in m->private, since normal case is to
@@ -682,6 +688,161 @@ static int kallsyms_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+/*
+ * When function granular kaslr is enabled, we need to print out the symbols
+ * at random so we don't reveal the new layout.
+ */
+#if defined(CONFIG_FG_KASLR)
+static int update_random_pos(struct kallsyms_shuffled_iter *s_iter,
+			     loff_t pos, loff_t *new_pos)
+{
+	loff_t new;
+
+	if (pos >= s_iter->total_syms)
+		return 0;
+
+	new = s_iter->shuffled_index[pos];
+
+	/*
+	 * normally this would be done as part of update_iter, however,
+	 * we want to avoid triggering this in the event that new is
+	 * zero since we don't want to blow away our pos end indicators.
+	 */
+	if (new == 0) {
+		s_iter->iter.name[0] = '\0';
+		s_iter->iter.nameoff = get_symbol_offset(new);
+		s_iter->iter.pos = new;
+	}
+
+	*new_pos = new;
+	return 1;
+}
+
+static void *shuffled_start(struct seq_file *m, loff_t *pos)
+{
+	struct kallsyms_shuffled_iter *s_iter = m->private;
+	loff_t new_pos;
+
+	if (!update_random_pos(s_iter, *pos, &new_pos))
+		return NULL;
+
+	return s_start(m, &new_pos);
+}
+
+static void *shuffled_next(struct seq_file *m, void *p, loff_t *pos)
+{
+	struct kallsyms_shuffled_iter *s_iter = m->private;
+	loff_t new_pos;
+
+	(*pos)++;
+
+	if (!update_random_pos(s_iter, *pos, &new_pos))
+		return NULL;
+
+	if (!update_iter(m->private, new_pos))
+		return NULL;
+
+	return p;
+}
+
+/*
+ * shuffle_index_list()
+ * Use a Fisher Yates algorithm to shuffle a list of text sections.
+ */
+static void shuffle_index_list(loff_t *indexes, loff_t size)
+{
+	int i;
+	unsigned int j;
+	loff_t temp;
+
+	for (i = size - 1; i > 0; i--) {
+		/* pick a random index from 0 to i */
+		get_random_bytes(&j, sizeof(j));
+		j = j % (i + 1);
+
+		temp = indexes[i];
+		indexes[i] = indexes[j];
+		indexes[j] = temp;
+	}
+}
+
+static const struct seq_operations kallsyms_shuffled_op = {
+	.start = shuffled_start,
+	.next = shuffled_next,
+	.stop = s_stop,
+	.show = s_show
+};
+
+static int kallsyms_random_open(struct inode *inode, struct file *file)
+{
+	loff_t pos;
+	struct kallsyms_shuffled_iter *shuffled_iter;
+	struct kallsym_iter iter;
+	bool show_value;
+
+	/*
+	 * If privileged, go ahead and use the normal algorithm for
+	 * displaying symbols
+	 */
+	show_value = kallsyms_show_value(file->f_cred);
+	if (show_value)
+		return __kallsyms_open(inode, file);
+
+	/*
+	 * we need to figure out how many extra symbols there are
+	 * to print out past kallsyms_num_syms
+	 */
+	pos = kallsyms_num_syms;
+	reset_iter(&iter, 0);
+	do {
+		if (!update_iter(&iter, pos))
+			break;
+		pos++;
+	} while (1);
+
+	/*
+	 * add storage space for an array of loff_t equal to the size
+	 * of the total number of symbols we need to print
+	 */
+	shuffled_iter = __seq_open_private(file, &kallsyms_shuffled_op,
+					   sizeof(*shuffled_iter) +
+					   (sizeof(pos) * pos));
+	if (!shuffled_iter)
+		return -ENOMEM;
+
+	reset_iter(&shuffled_iter->iter, 0);
+	shuffled_iter->iter.show_value = show_value;
+	shuffled_iter->total_syms = pos;
+
+	/*
+	 * the existing update_iter algorithm requires that we
+	 * are either moving along increasing pos sequentially,
+	 * or that these values are correct. Since these values
+	 * were discovered above, initialize our new iter so we
+	 * can use update_iter non-sequentially.
+	 */
+	shuffled_iter->iter.pos_arch_end = iter.pos_arch_end;
+	shuffled_iter->iter.pos_mod_end = iter.pos_mod_end;
+	shuffled_iter->iter.pos_ftrace_mod_end = iter.pos_ftrace_mod_end;
+
+	/*
+	 * initialize the array with all possible pos values, then
+	 * shuffle the array so that the values will display in a random
+	 * order.
+	 */
+	for (pos = 0; pos < shuffled_iter->total_syms; pos++)
+		shuffled_iter->shuffled_index[pos] = pos;
+
+	shuffle_index_list(shuffled_iter->shuffled_index, shuffled_iter->total_syms);
+
+	return 0;
+}
+
+#define kallsyms_open kallsyms_random_open
+#else
+#define kallsyms_open __kallsyms_open
+#endif /* CONFIG_FG_KASLR */
+
 #ifdef	CONFIG_KGDB_KDB
 const char *kdb_walk_kallsyms(loff_t *pos)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v4 10/10] module: Reorder functions
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (8 preceding siblings ...)
  2020-07-17 17:00 ` [PATCH v4 09/10] kallsyms: Hide layout Kristen Carlson Accardi
@ 2020-07-17 17:00 ` Kristen Carlson Accardi
  2020-07-28 17:29   ` Jessica Yu
  2020-07-22  9:27 ` [PATCH v4 00/10] Function Granular KASLR Miroslav Benes
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-17 17:00 UTC (permalink / raw)
  To: keescook, tglx, mingo, bp, x86, H. Peter Anvin, Jessica Yu
  Cc: arjan, linux-kernel, kernel-hardening, rick.p.edgecombe,
	Kristen Carlson Accardi, Ard Biesheuvel, Tony Luck

Introduce a new config option to allow modules to be re-ordered
by function. This option can be enabled independently of the
kernel text KASLR or FG_KASLR settings so that it can be used
by architectures that do not support either of these features.
This option will be selected by default if CONFIG_FG_KASLR is
selected.

If a module has functions split out into separate text sections
(i.e. compiled with the -ffunction-sections flag), reorder the
functions to provide some code diversification to modules.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/Makefile |  5 +++
 init/Kconfig      | 12 +++++++
 kernel/kallsyms.c |  2 +-
 kernel/module.c   | 81 +++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 00e378de8bc0..0f2dbc46eb5c 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -51,6 +51,11 @@ ifdef CONFIG_X86_NEED_RELOCS
         LDFLAGS_vmlinux := --emit-relocs --discard-none
 endif
 
+ifndef CONFIG_FG_KASLR
+	ifdef CONFIG_MODULE_FG_KASLR
+		KBUILD_CFLAGS_MODULE += -ffunction-sections
+	endif
+endif
 #
 # Prevent GCC from generating any FP code by mistake.
 #
diff --git a/init/Kconfig b/init/Kconfig
index 82f042a1062f..b4741838da40 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1994,6 +1994,7 @@ config FG_KASLR
 	bool "Function Granular Kernel Address Space Layout Randomization"
 	depends on $(cc-option, -ffunction-sections)
 	depends on ARCH_HAS_FG_KASLR
+	select MODULE_FG_KASLR
 	default n
 	help
 	  This option improves the randomness of the kernel text
@@ -2278,6 +2279,17 @@ config UNUSED_KSYMS_WHITELIST
 	  one per line. The path can be absolute, or relative to the kernel
 	  source tree.
 
+config MODULE_FG_KASLR
+	depends on $(cc-option, -ffunction-sections)
+	bool "Module Function Granular Layout Randomization"
+	help
+	  This option randomizes the module text section by reordering the text
+	  section by function at module load time. In order to use this
+	  feature, the module must have been compiled with the
+	  -ffunction-sections compiler flag.
+
+	  If unsure, say N.
+
 endif # MODULES
 
 config MODULES_TREE_LOOKUP
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 45d147f7f10e..e3f7d0fd3270 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -692,7 +692,7 @@ static int __kallsyms_open(struct inode *inode, struct file *file)
  * When function granular kaslr is enabled, we need to print out the symbols
  * at random so we don't reveal the new layout.
  */
-#if defined(CONFIG_FG_KASLR)
+#if defined(CONFIG_FG_KASLR) || defined(CONFIG_MODULE_FG_KASLR)
 static int update_random_pos(struct kallsyms_shuffled_iter *s_iter,
 			     loff_t pos, loff_t *new_pos)
 {
diff --git a/kernel/module.c b/kernel/module.c
index aa183c9ac0a2..0f4f4e567a42 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -56,6 +56,7 @@
 #include <linux/bsearch.h>
 #include <linux/dynamic_debug.h>
 #include <linux/audit.h>
+#include <linux/random.h>
 #include <uapi/linux/module.h>
 #include "module-internal.h"
 
@@ -2391,6 +2392,83 @@ static long get_offset(struct module *mod, unsigned int *size,
 	return ret;
 }
 
+/*
+ * shuffle_text_list()
+ * Use a Fisher Yates algorithm to shuffle a list of text sections.
+ */
+static void shuffle_text_list(Elf_Shdr **list, int size)
+{
+	int i;
+	unsigned int j;
+	Elf_Shdr *temp;
+
+	for (i = size - 1; i > 0; i--) {
+		/*
+		 * pick a random index from 0 to i
+		 */
+		get_random_bytes(&j, sizeof(j));
+		j = j % (i + 1);
+
+		temp = list[i];
+		list[i] = list[j];
+		list[j] = temp;
+	}
+}
+
+/*
+ * randomize_text()
+ * Look through the core section looking for executable code sections.
+ * Store sections in an array and then shuffle the sections
+ * to reorder the functions.
+ */
+static void randomize_text(struct module *mod, struct load_info *info)
+{
+	int i;
+	int num_text_sections = 0;
+	Elf_Shdr **text_list;
+	int size = 0;
+	int max_sections = info->hdr->e_shnum;
+	unsigned int sec = find_sec(info, ".text");
+
+	if (sec == 0)
+		return;
+
+	text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
+	if (!text_list)
+		return;
+
+	for (i = 0; i < max_sections; i++) {
+		Elf_Shdr *shdr = &info->sechdrs[i];
+		const char *sname = info->secstrings + shdr->sh_name;
+
+		if (!(shdr->sh_flags & SHF_ALLOC) ||
+		    !(shdr->sh_flags & SHF_EXECINSTR) ||
+		    strstarts(sname, ".init"))
+			continue;
+
+		text_list[num_text_sections] = shdr;
+		num_text_sections++;
+	}
+
+	shuffle_text_list(text_list, num_text_sections);
+
+	for (i = 0; i < num_text_sections; i++) {
+		Elf_Shdr *shdr = text_list[i];
+
+		/*
+		 * get_offset has a section index for it's last
+		 * argument, that is only used by arch_mod_section_prepend(),
+		 * which is only defined by parisc. Since this this type
+		 * of randomization isn't supported on parisc, we can
+		 * safely pass in zero as the last argument, as it is
+		 * ignored.
+		 */
+		shdr->sh_entsize = get_offset(mod, &size, shdr, 0);
+	}
+
+	kfree(text_list);
+}
+
 /* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld
    might -- code, read-only data, read-write data, small data.  Tally
    sizes, and place the offsets into sh_entsize fields: high bit means it
@@ -2481,6 +2559,9 @@ static void layout_sections(struct module *mod, struct load_info *info)
 			break;
 		}
 	}
+
+	if (IS_ENABLED(CONFIG_MODULE_FG_KASLR))
+		randomize_text(mod, info);
 }
 
 static void set_license(struct module *mod, const char *license)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 09/10] kallsyms: Hide layout
  2020-07-17 17:00 ` [PATCH v4 09/10] kallsyms: Hide layout Kristen Carlson Accardi
@ 2020-07-20  1:25   ` Kees Cook
  2020-07-20 16:59     ` Kristen Carlson Accardi
  0 siblings, 1 reply; 46+ messages in thread
From: Kees Cook @ 2020-07-20  1:25 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, arjan, x86, linux-kernel, kernel-hardening,
	rick.p.edgecombe, Tony Luck

On Fri, Jul 17, 2020 at 10:00:06AM -0700, Kristen Carlson Accardi wrote:
> This patch makes /proc/kallsyms display in a random order, rather
> than sorted by address in order to hide the newly randomized address
> layout.

Ah! Much nicer. Is there any reason not to just do this unconditionally,
regardless of FGKASLR? It's a smallish dynamic allocation, and
displaying kallsyms is hardly fast-path...

> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Tested-by: Tony Luck <tony.luck@intel.com>
> ---
>  kernel/kallsyms.c | 163 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 162 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index bb14e64f62a4..45d147f7f10e 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -446,6 +446,12 @@ struct kallsym_iter {
>  	int show_value;
>  };
>  
> +struct kallsyms_shuffled_iter {
> +	struct kallsym_iter iter;
> +	loff_t total_syms;
> +	loff_t shuffled_index[];
> +};
> +
>  int __weak arch_get_kallsym(unsigned int symnum, unsigned long *value,
>  			    char *type, char *name)
>  {
> @@ -661,7 +667,7 @@ bool kallsyms_show_value(const struct cred *cred)
>  	}
>  }
>  
> -static int kallsyms_open(struct inode *inode, struct file *file)
> +static int __kallsyms_open(struct inode *inode, struct file *file)
>  {
>  	/*
>  	 * We keep iterator in m->private, since normal case is to
> @@ -682,6 +688,161 @@ static int kallsyms_open(struct inode *inode, struct file *file)
>  	return 0;
>  }
>  
> +/*
> + * When function granular kaslr is enabled, we need to print out the symbols
> + * at random so we don't reveal the new layout.
> + */
> +#if defined(CONFIG_FG_KASLR)
> +static int update_random_pos(struct kallsyms_shuffled_iter *s_iter,
> +			     loff_t pos, loff_t *new_pos)
> +{
> +	loff_t new;
> +
> +	if (pos >= s_iter->total_syms)
> +		return 0;
> +
> +	new = s_iter->shuffled_index[pos];
> +
> +	/*
> +	 * normally this would be done as part of update_iter, however,
> +	 * we want to avoid triggering this in the event that new is
> +	 * zero since we don't want to blow away our pos end indicators.
> +	 */
> +	if (new == 0) {
> +		s_iter->iter.name[0] = '\0';
> +		s_iter->iter.nameoff = get_symbol_offset(new);
> +		s_iter->iter.pos = new;
> +	}
> +
> +	*new_pos = new;
> +	return 1;
> +}
> +
> +static void *shuffled_start(struct seq_file *m, loff_t *pos)
> +{
> +	struct kallsyms_shuffled_iter *s_iter = m->private;
> +	loff_t new_pos;
> +
> +	if (!update_random_pos(s_iter, *pos, &new_pos))
> +		return NULL;
> +
> +	return s_start(m, &new_pos);
> +}
> +
> +static void *shuffled_next(struct seq_file *m, void *p, loff_t *pos)
> +{
> +	struct kallsyms_shuffled_iter *s_iter = m->private;
> +	loff_t new_pos;
> +
> +	(*pos)++;
> +
> +	if (!update_random_pos(s_iter, *pos, &new_pos))
> +		return NULL;
> +
> +	if (!update_iter(m->private, new_pos))
> +		return NULL;
> +
> +	return p;
> +}
> +
> +/*
> + * shuffle_index_list()
> + * Use a Fisher Yates algorithm to shuffle a list of text sections.
> + */
> +static void shuffle_index_list(loff_t *indexes, loff_t size)
> +{
> +	int i;
> +	unsigned int j;
> +	loff_t temp;
> +
> +	for (i = size - 1; i > 0; i--) {
> +		/* pick a random index from 0 to i */
> +		get_random_bytes(&j, sizeof(j));
> +		j = j % (i + 1);
> +
> +		temp = indexes[i];
> +		indexes[i] = indexes[j];
> +		indexes[j] = temp;
> +	}
> +}
> +
> +static const struct seq_operations kallsyms_shuffled_op = {
> +	.start = shuffled_start,
> +	.next = shuffled_next,
> +	.stop = s_stop,
> +	.show = s_show
> +};
> +
> +static int kallsyms_random_open(struct inode *inode, struct file *file)
> +{
> +	loff_t pos;
> +	struct kallsyms_shuffled_iter *shuffled_iter;
> +	struct kallsym_iter iter;
> +	bool show_value;
> +
> +	/*
> +	 * If privileged, go ahead and use the normal algorithm for
> +	 * displaying symbols
> +	 */
> +	show_value = kallsyms_show_value(file->f_cred);
> +	if (show_value)
> +		return __kallsyms_open(inode, file);
> +
> +	/*
> +	 * we need to figure out how many extra symbols there are
> +	 * to print out past kallsyms_num_syms
> +	 */
> +	pos = kallsyms_num_syms;
> +	reset_iter(&iter, 0);
> +	do {
> +		if (!update_iter(&iter, pos))
> +			break;
> +		pos++;
> +	} while (1);

Can this be tracked separately instead of needing to search for it every
time? (Looks like it's modules and ftrace? Could they each have a
*_num_sysms?)

(I need to go read how kallsyms doesn't miscount in general when the
symbol table changes out from under it...)


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 09/10] kallsyms: Hide layout
  2020-07-20  1:25   ` Kees Cook
@ 2020-07-20 16:59     ` Kristen Carlson Accardi
  0 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-20 16:59 UTC (permalink / raw)
  To: Kees Cook
  Cc: tglx, mingo, bp, arjan, x86, linux-kernel, kernel-hardening,
	rick.p.edgecombe, Tony Luck

On Sun, 2020-07-19 at 18:25 -0700, Kees Cook wrote:
> On Fri, Jul 17, 2020 at 10:00:06AM -0700, Kristen Carlson Accardi
> wrote:
> > This patch makes /proc/kallsyms display in a random order, rather
> > than sorted by address in order to hide the newly randomized
> > address
> > layout.
> 
> Ah! Much nicer. Is there any reason not to just do this
> unconditionally,
> regardless of FGKASLR? It's a smallish dynamic allocation, and
> displaying kallsyms is hardly fast-path...

My only concern would be whether or not there are scripts out there
which assume the list would be ordered. If someone chooses to enable
CONFIG_FG_KASLR, I think that it is reasonable to break those scripts.
On the flip side, I don't know why it needs to come out of
/proc/kallsyms in order, you can always just sort it after the fact if
you need it sorted. It would make it more maintainable to not special
case this. Hopefully a maintainer will comment on their preference.
Another thing I do in this patch is continue to use the existing sorted
by address functions if you are root. I didn't know if this was
neccessary - it'd be nice if we could just do it the same way all the
time. But I need some guidance here.

> 
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > Reviewed-by: Tony Luck <tony.luck@intel.com>
> > Tested-by: Tony Luck <tony.luck@intel.com>
> > ---
> >  kernel/kallsyms.c | 163
> > +++++++++++++++++++++++++++++++++++++++++++++-
> >  1 file changed, 162 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> > index bb14e64f62a4..45d147f7f10e 100644
> > --- a/kernel/kallsyms.c
> > +++ b/kernel/kallsyms.c
> > @@ -446,6 +446,12 @@ struct kallsym_iter {
> >  	int show_value;
> >  };
> >  
> > +struct kallsyms_shuffled_iter {
> > +	struct kallsym_iter iter;
> > +	loff_t total_syms;
> > > 
> > > (I need to go read how kallsyms doesn't miscount in general when
> > the
> > > symbol table changes out from under it...)
> > > 
> > > 
> > +	loff_t shuffled_index[];
> > +};
> > +
> >  int __weak arch_get_kallsym(unsigned int symnum, unsigned long
> > *value,
> >  			    char *type, char *name)
> >  {
> > @@ -661,7 +667,7 @@ bool kallsyms_show_value(const struct cred
> > *cred)
> >  	}
> >  }
> >  
> > -static int kallsyms_open(struct inode *inode, struct file *file)
> > +static int __kallsyms_open(struct inode *inode, struct file *file)
> >  {
> >  	/*
> >  	 * We keep iterator in m->private, since normal case is to
> > @@ -682,6 +688,161 @@ static int kallsyms_open(struct inode *inode,
> > struct file *file)
> >  	return 0;
> >  }
> >  
> > +/*
> > + * When function granular kaslr is enabled, we need to print out
> > the symbols
> > + * at random so we don't reveal the new layout.
> > + */
> > +#if defined(CONFIG_FG_KASLR)
> > +static int update_random_pos(struct kallsyms_shuffled_iter
> > *s_iter,
> > +			     loff_t pos, loff_t *new_pos)
> > +{
> > +	loff_t new;
> > +
> > +	if (pos >= s_iter->total_syms)
> > +		return 0;
> > +
> > +	new = s_iter->shuffled_index[pos];
> > +
> > +	/*
> > +	 * normally this would be done as part of update_iter, however,
> > +	 * we want to avoid triggering this in the event that new is
> > +	 * zero since we don't want to blow away our pos end
> > indicators.
> > +	 */
> > +	if (new == 0) {
> > +		s_iter->iter.name[0] = '\0';
> > +		s_iter->iter.nameoff = get_symbol_offset(new);
> > +		s_iter->iter.pos = new;
> > +	}
> > +
> > +	*new_pos = new;
> > +	return 1;
> > +}
> > +
> > +static void *shuffled_start(struct seq_file *m, loff_t *pos)
> > +{
> > +	struct kallsyms_shuffled_iter *s_iter = m->private;
> > +	loff_t new_pos;
> > +
> > +	if (!update_random_pos(s_iter, *pos, &new_pos))
> > +		return NULL;
> > +
> > +	return s_start(m, &new_pos);
> > +}
> > +
> > +static void *shuffled_next(struct seq_file *m, void *p, loff_t
> > *pos)
> > +{
> > +	struct kallsyms_shuffled_iter *s_iter = m->private;
> > +	loff_t new_pos;
> > +
> > +	(*pos)++;
> > +
> > +	if (!update_random_pos(s_iter, *pos, &new_pos))
> > +		return NULL;
> > +
> > +	if (!update_iter(m->private, new_pos))
> > +		return NULL;
> > +
> > +	return p;
> > +}
> > +
> > +/*
> > + * shuffle_index_list()
> > + * Use a Fisher Yates algorithm to shuffle a list of text
> > sections.
> > + */
> > +static void shuffle_index_list(loff_t *indexes, loff_t size)
> > +{
> > +	int i;
> > +	unsigned int j;
> > +	loff_t temp;
> > +
> > +	for (i = size - 1; i > 0; i--) {
> > +		/* pick a random index from 0 to i */
> > +		get_random_bytes(&j, sizeof(j));
> > +		j = j % (i + 1);
> > +
> > +		temp = indexes[i];
> > +		indexes[i] = indexes[j];
> > +		indexes[j] = temp;
> > +	}
> > +}
> > +
> > +static const struct seq_operations kallsyms_shuffled_op = {
> > +	.start = shuffled_start,
> > +	.next = shuffled_next,
> > +	.stop = s_stop,
> > +	.show = s_show
> > +};
> > +
> > +static int kallsyms_random_open(struct inode *inode, struct file
> > *file)
> > +{
> > +	loff_t pos;
> > +	struct kallsyms_shuffled_iter *shuffled_iter;
> > +	struct kallsym_iter iter;
> > +	bool show_value;
> > +
> > +	/*
> > +	 * If privileged, go ahead and use the normal algorithm for
> > +	 * displaying symbols
> > +	 */
> > +	show_value = kallsyms_show_value(file->f_cred);
> > +	if (show_value)
> > +		return __kallsyms_open(inode, file);
> > +
> > +	/*
> > +	 * we need to figure out how many extra symbols there are
> > +	 * to print out past kallsyms_num_syms
> > +	 */
> > +	pos = kallsyms_num_syms;
> > +	reset_iter(&iter, 0);
> > +	do {
> > +		if (!update_iter(&iter, pos))
> > +			break;
> > +		pos++;
> > +	} while (1);
> 
> Can this be tracked separately instead of needing to search for it
> every
> time? (Looks like it's modules and ftrace? Could they each have a
> *_num_sysms?)

It could, but I'd probably have to make more changes to those
subsystems to keep this information there than I would to just do this
search here. Because we start search at the end of the kernel core
symbols, I don't think this search takes too many cycles to complete.
On my system running a standard distro config the number of module and
bpf symbols is not all that many, especially compared to the kernel
core symbols.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (9 preceding siblings ...)
  2020-07-17 17:00 ` [PATCH v4 10/10] module: Reorder functions Kristen Carlson Accardi
@ 2020-07-22  9:27 ` Miroslav Benes
  2020-07-22 14:39   ` Kees Cook
  2020-08-03 11:39   ` Evgenii Shatokhin
  2020-08-04 18:23 ` Joe Lawrence
  2020-08-06 15:32 ` Ingo Molnar
  12 siblings, 2 replies; 46+ messages in thread
From: Miroslav Benes @ 2020-07-22  9:27 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: keescook, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching

Let me CC live-patching ML, because from a quick glance this is something 
which could impact live patching code. At least it invalidates assumptions 
which "sympos" is based on.

Miroslav

On Fri, 17 Jul 2020, Kristen Carlson Accardi wrote:

> Function Granular Kernel Address Space Layout Randomization (fgkaslr)
> ---------------------------------------------------------------------
> 
> This patch set is an implementation of finer grained kernel address space
> randomization. It rearranges your kernel code at load time 
> on a per-function level granularity, with only around a second added to
> boot time.

[...]

> Background
> ----------
> KASLR was merged into the kernel with the objective of increasing the
> difficulty of code reuse attacks. Code reuse attacks reused existing code
> snippets to get around existing memory protections. They exploit software bugs
> which expose addresses of useful code snippets to control the flow of
> execution for their own nefarious purposes. KASLR moves the entire kernel
> code text as a unit at boot time in order to make addresses less predictable.
> The order of the code within the segment is unchanged - only the base address
> is shifted. There are a few shortcomings to this algorithm.
> 
> 1. Low Entropy - there are only so many locations the kernel can fit in. This
>    means an attacker could guess without too much trouble.
> 2. Knowledge of a single address can reveal the offset of the base address,
>    exposing all other locations for a published/known kernel image.
> 3. Info leaks abound.
> 
> Finer grained ASLR has been proposed as a way to make ASLR more resistant
> to info leaks. It is not a new concept at all, and there are many variations
> possible. Function reordering is an implementation of finer grained ASLR
> which randomizes the layout of an address space on a function level
> granularity. We use the term "fgkaslr" in this document to refer to the
> technique of function reordering when used with KASLR, as well as finer grained
> KASLR in general.
> 
> Proposed Improvement
> --------------------
> This patch set proposes adding function reordering on top of the existing
> KASLR base address randomization. The over-arching objective is incremental
> improvement over what we already have. It is designed to work in combination
> with the existing solution. The implementation is really pretty simple, and
> there are 2 main area where changes occur:
> 
> * Build time
> 
> GCC has had an option to place functions into individual .text sections for
> many years now. This option can be used to implement function reordering at
> load time. The final compiled vmlinux retains all the section headers, which
> can be used to help find the address ranges of each function. Using this
> information and an expanded table of relocation addresses, individual text
> sections can be suffled immediately after decompression. Some data tables
> inside the kernel that have assumptions about order require re-sorting
> after being updated when applying relocations. In order to modify these tables,
> a few key symbols are excluded from the objcopy symbol stripping process for
> use after shuffling the text segments.
> 
> Some highlights from the build time changes to look for:
> 
> The top level kernel Makefile was modified to add the gcc flag if it
> is supported. Currently, I am applying this flag to everything it is
> possible to randomize. Anything that is written in C and not present in a
> special input section is randomized. The final binary segment 0 retains a
> consolidated .text section, as well as all the individual .text.* sections.
> Future work could turn off this flags for selected files or even entire
> subsystems, although obviously at the cost of security.
> 
> The relocs tool is updated to add relative relocations. This information
> previously wasn't included because it wasn't necessary when moving the
> entire .text segment as a unit. 
> 
> A new file was created to contain a list of symbols that objcopy should
> keep. We use those symbols at load time as described below.
> 
> * Load time
> 
> The boot kernel was modified to parse the vmlinux elf file after
> decompression to check for our interesting symbols that we kept, and to
> look for any .text.* sections to randomize. The consolidated .text section
> is skipped and not moved. The sections are shuffled randomly, and copied
> into memory following the .text section in a new random order. The existing
> code which updated relocation addresses was modified to account for
> not just a fixed delta from the load address, but the offset that the function
> section was moved to. This requires inspection of each address to see if
> it was impacted by a randomization. We use a bsearch to make this less
> horrible on performance. Any tables that need to be modified with new
> addresses or resorted are updated using the symbol addresses parsed from the
> elf symbol table.
> 
> In order to hide our new layout, symbols reported through /proc/kallsyms
> will be displayed in a random order.
> 
> Security Considerations
> -----------------------
> The objective of this patch set is to improve a technology that is already
> merged into the kernel (KASLR). This code will not prevent all attacks,
> but should instead be considered as one of several tools that can be used.
> In particular, this code is meant to make KASLR more effective in the presence
> of info leaks.
> 
> How much entropy we are adding to the existing entropy of standard KASLR will
> depend on a few variables. Firstly and most obviously, the number of functions
> that are randomized matters. This implementation keeps the existing .text
> section for code that cannot be randomized - for example, because it was
> assembly code. The less sections to randomize, the less entropy. In addition,
> due to alignment (16 bytes for x86_64), the number of bits in a address that
> the attacker needs to guess is reduced, as the lower bits are identical.

[...]

> Modules
> -------
> Modules are randomized similarly to the rest of the kernel by shuffling
> the sections at load time prior to moving them into memory. The module must
> also have been build with the -ffunction-sections compiler option.
> 
> Although fgkaslr for the kernel is only supported for the X86_64 architecture,
> it is possible to use fgkaslr with modules on other architectures. To enable
> this feature, select
> 
> CONFIG_MODULE_FG_KASLR=y
> 
> This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.
> 
> Disabling
> ---------
> Disabling normal KASLR using the nokaslr command line option also disables
> fgkaslr. It is also possible to disable fgkaslr separately by booting with
> fgkaslr=off on the commandline.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22  9:27 ` [PATCH v4 00/10] Function Granular KASLR Miroslav Benes
@ 2020-07-22 14:39   ` Kees Cook
  2020-07-22 14:51     ` Joe Lawrence
  2020-07-22 16:07     ` Josh Poimboeuf
  2020-08-03 11:39   ` Evgenii Shatokhin
  1 sibling, 2 replies; 46+ messages in thread
From: Kees Cook @ 2020-07-22 14:39 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching

On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote:
> Let me CC live-patching ML, because from a quick glance this is something 
> which could impact live patching code. At least it invalidates assumptions 
> which "sympos" is based on.

In a quick skim, it looks like the symbol resolution is using
kallsyms_on_each_symbol(), so I think this is safe? What's a good
selftest for live-patching?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22 14:39   ` Kees Cook
@ 2020-07-22 14:51     ` Joe Lawrence
  2020-07-22 14:56       ` Joe Lawrence
  2020-07-22 16:07     ` Josh Poimboeuf
  1 sibling, 1 reply; 46+ messages in thread
From: Joe Lawrence @ 2020-07-22 14:51 UTC (permalink / raw)
  To: Kees Cook, Miroslav Benes
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching

On 7/22/20 10:39 AM, Kees Cook wrote:
> On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote:
>> Let me CC live-patching ML, because from a quick glance this is something
>> which could impact live patching code. At least it invalidates assumptions
>> which "sympos" is based on.
> 
> In a quick skim, it looks like the symbol resolution is using
> kallsyms_on_each_symbol(), so I think this is safe? What's a good
> selftest for live-patching?
> 

Hi Kees,

I don't think any of the in-tree tests currently exercise the 
kallsyms/sympos end of livepatching.

I do have a local branch that does facilitate creating klp-relocations 
that do rely upon this feature -- I'll try to see if I can get those 
working with this patchset and report back later this week.

-- Joe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22 14:51     ` Joe Lawrence
@ 2020-07-22 14:56       ` Joe Lawrence
  2020-07-22 18:24         ` Kristen Carlson Accardi
  0 siblings, 1 reply; 46+ messages in thread
From: Joe Lawrence @ 2020-07-22 14:56 UTC (permalink / raw)
  To: Kees Cook, Miroslav Benes
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching

On 7/22/20 10:51 AM, Joe Lawrence wrote:
> On 7/22/20 10:39 AM, Kees Cook wrote:
>> On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote:
>>> Let me CC live-patching ML, because from a quick glance this is something
>>> which could impact live patching code. At least it invalidates assumptions
>>> which "sympos" is based on.
>>
>> In a quick skim, it looks like the symbol resolution is using
>> kallsyms_on_each_symbol(), so I think this is safe? What's a good
>> selftest for live-patching?
>>
> 
> Hi Kees,
> 
> I don't think any of the in-tree tests currently exercise the
> kallsyms/sympos end of livepatching.
> 

On second thought, I mispoke.. The general livepatch code does use it:

klp_init_object
   klp_init_object_loaded
     klp_find_object_symbol

in which case any of the current kselftests should exercise that.

   % make -C tools/testing/selftests/livepatch run_tests

-- Joe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22 14:39   ` Kees Cook
  2020-07-22 14:51     ` Joe Lawrence
@ 2020-07-22 16:07     ` Josh Poimboeuf
  2020-07-22 19:42       ` Kees Cook
  1 sibling, 1 reply; 46+ messages in thread
From: Josh Poimboeuf @ 2020-07-22 16:07 UTC (permalink / raw)
  To: Kees Cook
  Cc: Miroslav Benes, Kristen Carlson Accardi, tglx, mingo, bp, arjan,
	x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	live-patching

On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote:
> On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote:
> > Let me CC live-patching ML, because from a quick glance this is something 
> > which could impact live patching code. At least it invalidates assumptions 
> > which "sympos" is based on.
> 
> In a quick skim, it looks like the symbol resolution is using
> kallsyms_on_each_symbol(), so I think this is safe? What's a good
> selftest for live-patching?

The problem is duplicate symbols.  If there are two static functions
named 'foo' then livepatch needs a way to distinguish them.

Our current approach to that problem is "sympos".  We rely on the fact
that the second foo() always comes after the first one in the symbol
list and kallsyms.  So they're referred to as foo,1 and foo,2.

-- 
Josh


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22 14:56       ` Joe Lawrence
@ 2020-07-22 18:24         ` Kristen Carlson Accardi
  0 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-22 18:24 UTC (permalink / raw)
  To: Joe Lawrence, Kees Cook, Miroslav Benes
  Cc: tglx, mingo, bp, arjan, x86, linux-kernel, kernel-hardening,
	rick.p.edgecombe, live-patching

On Wed, 2020-07-22 at 10:56 -0400, Joe Lawrence wrote:
> On 7/22/20 10:51 AM, Joe Lawrence wrote:
> > On 7/22/20 10:39 AM, Kees Cook wrote:
> > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote:
> > > > Let me CC live-patching ML, because from a quick glance this is
> > > > something
> > > > which could impact live patching code. At least it invalidates
> > > > assumptions
> > > > which "sympos" is based on.
> > > 
> > > In a quick skim, it looks like the symbol resolution is using
> > > kallsyms_on_each_symbol(), so I think this is safe? What's a good
> > > selftest for live-patching?
> > > 
> > 
> > Hi Kees,
> > 
> > I don't think any of the in-tree tests currently exercise the
> > kallsyms/sympos end of livepatching.
> > 
> 
> On second thought, I mispoke.. The general livepatch code does use
> it:
> 
> klp_init_object
>    klp_init_object_loaded
>      klp_find_object_symbol
> 
> in which case any of the current kselftests should exercise that.
> 
>    % make -C tools/testing/selftests/livepatch run_tests
> 
> -- Joe
> 

Thanks, it looks like this should work for helping me exercise the live
patch code paths. I will take a look and get back to you all.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22 16:07     ` Josh Poimboeuf
@ 2020-07-22 19:42       ` Kees Cook
  2020-07-22 19:56         ` Kristen Carlson Accardi
  0 siblings, 1 reply; 46+ messages in thread
From: Kees Cook @ 2020-07-22 19:42 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Miroslav Benes, Kristen Carlson Accardi, tglx, mingo, bp, arjan,
	x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	live-patching

On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote:
> On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote:
> > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote:
> > > Let me CC live-patching ML, because from a quick glance this is something 
> > > which could impact live patching code. At least it invalidates assumptions 
> > > which "sympos" is based on.
> > 
> > In a quick skim, it looks like the symbol resolution is using
> > kallsyms_on_each_symbol(), so I think this is safe? What's a good
> > selftest for live-patching?
> 
> The problem is duplicate symbols.  If there are two static functions
> named 'foo' then livepatch needs a way to distinguish them.
> 
> Our current approach to that problem is "sympos".  We rely on the fact
> that the second foo() always comes after the first one in the symbol
> list and kallsyms.  So they're referred to as foo,1 and foo,2.

Ah. Fun. In that case, perhaps the LTO series has some solutions. I
think builds with LTO end up renaming duplicate symbols like that, so
it'll be back to being unique.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22 19:42       ` Kees Cook
@ 2020-07-22 19:56         ` Kristen Carlson Accardi
  2020-07-22 21:33           ` Josh Poimboeuf
  0 siblings, 1 reply; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-07-22 19:56 UTC (permalink / raw)
  To: Kees Cook, Josh Poimboeuf
  Cc: Miroslav Benes, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching

On Wed, 2020-07-22 at 12:42 -0700, Kees Cook wrote:
> On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote:
> > On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote:
> > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote:
> > > > Let me CC live-patching ML, because from a quick glance this is
> > > > something 
> > > > which could impact live patching code. At least it invalidates
> > > > assumptions 
> > > > which "sympos" is based on.
> > > 
> > > In a quick skim, it looks like the symbol resolution is using
> > > kallsyms_on_each_symbol(), so I think this is safe? What's a good
> > > selftest for live-patching?
> > 
> > The problem is duplicate symbols.  If there are two static
> > functions
> > named 'foo' then livepatch needs a way to distinguish them.
> > 
> > Our current approach to that problem is "sympos".  We rely on the
> > fact
> > that the second foo() always comes after the first one in the
> > symbol
> > list and kallsyms.  So they're referred to as foo,1 and foo,2.
> 
> Ah. Fun. In that case, perhaps the LTO series has some solutions. I
> think builds with LTO end up renaming duplicate symbols like that, so
> it'll be back to being unique.
> 

Well, glad to hear there might be some precendence for how to solve
this, as I wasn't able to think of something reasonable off the top of
my head. Are you speaking of the Clang LTO series? 
https://lore.kernel.org/lkml/20200624203200.78870-1-samitolvanen@google.com/


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22 19:56         ` Kristen Carlson Accardi
@ 2020-07-22 21:33           ` Josh Poimboeuf
  2020-08-21 23:02             ` Kristen Carlson Accardi
  0 siblings, 1 reply; 46+ messages in thread
From: Josh Poimboeuf @ 2020-07-22 21:33 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Kees Cook, Miroslav Benes, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching

On Wed, Jul 22, 2020 at 12:56:10PM -0700, Kristen Carlson Accardi wrote:
> On Wed, 2020-07-22 at 12:42 -0700, Kees Cook wrote:
> > On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote:
> > > On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote:
> > > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes wrote:
> > > > > Let me CC live-patching ML, because from a quick glance this is
> > > > > something 
> > > > > which could impact live patching code. At least it invalidates
> > > > > assumptions 
> > > > > which "sympos" is based on.
> > > > 
> > > > In a quick skim, it looks like the symbol resolution is using
> > > > kallsyms_on_each_symbol(), so I think this is safe? What's a good
> > > > selftest for live-patching?
> > > 
> > > The problem is duplicate symbols.  If there are two static
> > > functions
> > > named 'foo' then livepatch needs a way to distinguish them.
> > > 
> > > Our current approach to that problem is "sympos".  We rely on the
> > > fact
> > > that the second foo() always comes after the first one in the
> > > symbol
> > > list and kallsyms.  So they're referred to as foo,1 and foo,2.
> > 
> > Ah. Fun. In that case, perhaps the LTO series has some solutions. I
> > think builds with LTO end up renaming duplicate symbols like that, so
> > it'll be back to being unique.
> > 
> 
> Well, glad to hear there might be some precendence for how to solve
> this, as I wasn't able to think of something reasonable off the top of
> my head. Are you speaking of the Clang LTO series? 
> https://lore.kernel.org/lkml/20200624203200.78870-1-samitolvanen@google.com/

I'm not sure how LTO does it, but a few more (half-brained) ideas that
could work:

1) Add a field in kallsyms to keep track of a symbol's original offset
   before randomization/re-sorting.  Livepatch could use that field to
   determine the original sympos.

2) In fgkaslr code, go through all the sections and mark the ones which
   have duplicates (i.e. same name).  Then when shuffling the sections,
   skip a shuffle if it involves a duplicate section.  That way all the
   duplicates would retain their original sympos.

3) Livepatch could uniquely identify symbols by some feature other than
   sympos.  For example:

   Symbol/function size - obviously this would only work if duplicately
   named symbols have different sizes.

   Checksum - as part of a separate feature we're also looking at giving
   each function its own checksum, calculated based on its instruction
   opcodes.  Though calculating checksums at runtime could be
   complicated by IP-relative addressing.

I'm thinking #1 or #2 wouldn't be too bad.  #3 might be harder.

-- 
Josh


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 10/10] module: Reorder functions
  2020-07-17 17:00 ` [PATCH v4 10/10] module: Reorder functions Kristen Carlson Accardi
@ 2020-07-28 17:29   ` Jessica Yu
  0 siblings, 0 replies; 46+ messages in thread
From: Jessica Yu @ 2020-07-28 17:29 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: keescook, tglx, mingo, bp, x86, H. Peter Anvin, arjan,
	linux-kernel, kernel-hardening, rick.p.edgecombe, Ard Biesheuvel,
	Tony Luck

+++ Kristen Carlson Accardi [17/07/20 10:00 -0700]:
>Introduce a new config option to allow modules to be re-ordered
>by function. This option can be enabled independently of the
>kernel text KASLR or FG_KASLR settings so that it can be used
>by architectures that do not support either of these features.
>This option will be selected by default if CONFIG_FG_KASLR is
>selected.
>
>If a module has functions split out into separate text sections
>(i.e. compiled with the -ffunction-sections flag), reorder the
>functions to provide some code diversification to modules.
>
>Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
>Reviewed-by: Kees Cook <keescook@chromium.org>
>Acked-by: Ard Biesheuvel <ardb@kernel.org>
>Tested-by: Ard Biesheuvel <ardb@kernel.org>
>Reviewed-by: Tony Luck <tony.luck@intel.com>
>Tested-by: Tony Luck <tony.luck@intel.com>

Hi Kristen!

I've boot tested this on x86, (un)loaded some modules, and checked
their resulting section addresses as a quick sanity test. Feel free
to add my:

Acked-by: Jessica Yu <jeyu@kernel.org>
Tested-by: Jessica Yu <jeyu@kernel.org>

Thank you!

Jessica

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22  9:27 ` [PATCH v4 00/10] Function Granular KASLR Miroslav Benes
  2020-07-22 14:39   ` Kees Cook
@ 2020-08-03 11:39   ` Evgenii Shatokhin
  2020-08-03 17:45     ` Kees Cook
  1 sibling, 1 reply; 46+ messages in thread
From: Evgenii Shatokhin @ 2020-08-03 11:39 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Miroslav Benes, keescook, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching,
	Joe Lawrence, Josh Poimboeuf

Hi,

> On Fri, 17 Jul 2020, Kristen Carlson Accardi wrote:
> 
>> Function Granular Kernel Address Space Layout Randomization (fgkaslr)
>> ---------------------------------------------------------------------
>>
>> This patch set is an implementation of finer grained kernel address space
>> randomization. It rearranges your kernel code at load time
>> on a per-function level granularity, with only around a second added to
>> boot time.
> 
> [...]

>> Modules
>> -------
>> Modules are randomized similarly to the rest of the kernel by shuffling
>> the sections at load time prior to moving them into memory. The module must
>> also have been build with the -ffunction-sections compiler option.

It seems, a couple more adjustments are needed in the module loader code.

With function granular KASLR, modules will have lots of ELF sections due 
to -ffunction-sections.
On my x86_64 system with kernel 5.8-rc7 with FG KASLR patches, for 
example, xfs.ko has 4849 ELF sections total, 2428 of these are loaded 
and shown in /sys/module/xfs/sections/.

There are at least 2 places where high-order memory allocations might 
happen during module loading. Such allocations may fail if memory is 
fragmented, while physically contiguous memory areas are not really 
needed there. I suggest to switch to kvmalloc/kvfree there.

1. kernel/module.c, randomize_text():
	Elf_Shdr **text_list;
	...
	int max_sections = info->hdr->e_shnum;
	...
	text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);

The size of the allocated memory area is (8 * total_number_of_sections), 
if I understand it right, which is 38792 for xfs.ko, a 4th order allocation.

2. kernel/module.c, mod_sysfs_setup() => add_sect_attrs().

This allocation can be larger than the first one.

We found this issue with livepatch modules some time ago (these modules 
are already built with -ffunction-sections) [1], but, with FG KASLR, it 
affects all kernel modules. Large ones like xfs.ko, btrfs.ko, etc., 
could suffer the most from it.

When a module is loaded sysfs attributes are created for its ELF 
sections (visible as /sys/module/<module_name>/sections/*). and contain 
the start addresses of these ELF sections. A single memory chunk is 
allocated
for all these:

         size[0] = ALIGN(sizeof(*sect_attrs)
                         + nloaded * sizeof(sect_attrs->attrs[0]),
                         sizeof(sect_attrs->grp.attrs[0]));
         size[1] = (nloaded + 1) * sizeof(sect_attrs->grp.attrs[0]);
         sect_attrs = kzalloc(size[0] + size[1], GFP_KERNEL);

'nloaded' is the number of loaded ELF section in the module.

For the kernel 5.8-rc7 on my system, the total size is 56 + 72 * 
nloaded, which is 174872 for xfs.ko, 43 pages, 6th order allocation.

I enabled 'mm_page_alloc' tracepoint with filter 'order > 3' to confirm 
the issue and, indeed, got these two allocations when modprobe'ing xfs:
----------------------------
/sys/kernel/debug/tracing/trace:
         modprobe-1509  <...>: mm_page_alloc: <...> order=4 
migratetype=0 gfp_flags=GFP_KERNEL|__GFP_COMP
         modprobe-1509  <stack trace>
		=> __alloc_pages_nodemask
		=> alloc_pages_current
		=> kmalloc_order
		=> kmalloc_order_trace
		=> __kmalloc
		=> load_module

         modprobe-1509  <...>: mm_page_alloc: <...> order=6 
migratetype=0 gfp_flags=GFP_KERNEL|__GFP_COMP|__GFP_ZERO
         modprobe-1509  <stack trace>
		=> __alloc_pages_nodemask
		=> alloc_pages_current
		=> kmalloc_order
		=> kmalloc_order_trace
		=> __kmalloc
		=> mod_sysfs_setup
		=> load_module
----------------------------

I suppose, something like this can be used as workaround:

* for randomize_text():
-----------
diff --git a/kernel/module.c b/kernel/module.c
index 0f4f4e567a42..a2473db1d0a3 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2433,7 +2433,7 @@ static void randomize_text(struct module *mod, 
struct load_info *info)
  	if (sec == 0)
  		return;

-	text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
+	text_list = kvmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
  	if (!text_list)
  		return;

@@ -2466,7 +2466,7 @@ static void randomize_text(struct module *mod, 
struct load_info *info)
  		shdr->sh_entsize = get_offset(mod, &size, shdr, 0);
  	}

-	kfree(text_list);
+	kvfree(text_list);
  }

  /* Lay out the SHF_ALLOC sections in a way not dissimilar to how ld

-----------

* for add_sect_attrs():
-----------
diff --git a/kernel/module.c b/kernel/module.c
index 0f4f4e567a42..a2473db1d0a3 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1541,7 +1541,7 @@ static void free_sect_attrs(struct 
module_sect_attrs *sect_attrs)

  	for (section = 0; section < sect_attrs->nsections; section++)
  		kfree(sect_attrs->attrs[section].battr.attr.name);
-	kfree(sect_attrs);
+	kvfree(sect_attrs);
  }

  static void add_sect_attrs(struct module *mod, const struct load_info 
*info)
@@ -1558,7 +1558,7 @@ static void add_sect_attrs(struct module *mod, 
const struct load_info *info)
  	size[0] = ALIGN(struct_size(sect_attrs, attrs, nloaded),
  			sizeof(sect_attrs->grp.bin_attrs[0]));
  	size[1] = (nloaded + 1) * sizeof(sect_attrs->grp.bin_attrs[0]);
-	sect_attrs = kzalloc(size[0] + size[1], GFP_KERNEL);
+	sect_attrs = kvzalloc(size[0] + size[1], GFP_KERNEL);
  	if (sect_attrs == NULL)
  		return;

-----------

[1] https://github.com/dynup/kpatch/pull/1131

Regards,
Evgenii

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-03 11:39   ` Evgenii Shatokhin
@ 2020-08-03 17:45     ` Kees Cook
  2020-08-03 18:17       ` Joe Lawrence
  0 siblings, 1 reply; 46+ messages in thread
From: Kees Cook @ 2020-08-03 17:45 UTC (permalink / raw)
  To: Evgenii Shatokhin
  Cc: Kristen Carlson Accardi, Miroslav Benes, tglx, mingo, bp, arjan,
	x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	live-patching, Joe Lawrence, Josh Poimboeuf, Jessica Yu

On Mon, Aug 03, 2020 at 02:39:32PM +0300, Evgenii Shatokhin wrote:
> There are at least 2 places where high-order memory allocations might happen
> during module loading. Such allocations may fail if memory is fragmented,
> while physically contiguous memory areas are not really needed there. I
> suggest to switch to kvmalloc/kvfree there.

While this does seem to be the right solution for the extant problem, I
do want to take a moment and ask if the function sections need to be
exposed at all? What tools use this information, and do they just want
to see the bounds of the code region? (i.e. the start/end of all the
.text* sections) Perhaps .text.* could be excluded from the sysfs
section list?

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-03 17:45     ` Kees Cook
@ 2020-08-03 18:17       ` Joe Lawrence
  2020-08-03 19:38         ` Frank Ch. Eigler
  2020-08-04 17:04         ` Jessica Yu
  0 siblings, 2 replies; 46+ messages in thread
From: Joe Lawrence @ 2020-08-03 18:17 UTC (permalink / raw)
  To: Kees Cook, Evgenii Shatokhin
  Cc: Kristen Carlson Accardi, Miroslav Benes, tglx, mingo, bp, arjan,
	x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	live-patching, Josh Poimboeuf, Jessica Yu, Frank Ch. Eigler

On 8/3/20 1:45 PM, Kees Cook wrote:
> On Mon, Aug 03, 2020 at 02:39:32PM +0300, Evgenii Shatokhin wrote:
>> There are at least 2 places where high-order memory allocations might happen
>> during module loading. Such allocations may fail if memory is fragmented,
>> while physically contiguous memory areas are not really needed there. I
>> suggest to switch to kvmalloc/kvfree there.
> 
> While this does seem to be the right solution for the extant problem, I
> do want to take a moment and ask if the function sections need to be
> exposed at all? What tools use this information, and do they just want
> to see the bounds of the code region? (i.e. the start/end of all the
> .text* sections) Perhaps .text.* could be excluded from the sysfs
> section list?
> 

[[cc += FChE, see [0] for Evgenii's full mail ]]

It looks like debugging tools like systemtap [1], gdb [2] and its 
add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info.

But yeah, it would be preferable if we didn't export a long sysfs 
representation if nobody actually needs it.


[0] 
https://lore.kernel.org/lkml/e9c4d88b-86db-47e9-4299-3fac45a7e3fd@virtuozzo.com/
[1] https://fossies.org/linux/systemtap/staprun/staprun.c
[2] 
https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch04.html#linuxdrive3-CHP-4-SECT-6.1

-- Joe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-03 18:17       ` Joe Lawrence
@ 2020-08-03 19:38         ` Frank Ch. Eigler
  2020-08-03 20:11           ` Kees Cook
  2020-08-04 17:04         ` Jessica Yu
  1 sibling, 1 reply; 46+ messages in thread
From: Frank Ch. Eigler @ 2020-08-03 19:38 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: Kees Cook, Evgenii Shatokhin, Kristen Carlson Accardi,
	Miroslav Benes, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching,
	Josh Poimboeuf, Jessica Yu

Hi -

> > While this does seem to be the right solution for the extant problem, I
> > do want to take a moment and ask if the function sections need to be
> > exposed at all? What tools use this information, and do they just want
> > to see the bounds of the code region? (i.e. the start/end of all the
> > .text* sections) Perhaps .text.* could be excluded from the sysfs
> > section list?

> [[cc += FChE, see [0] for Evgenii's full mail ]]

Thanks!

> It looks like debugging tools like systemtap [1], gdb [2] and its
> add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info.
> But yeah, it would be preferable if we didn't export a long sysfs
> representation if nobody actually needs it.

Systemtap needs to know base addresses of loaded text & data sections,
in order to perform relocation of probe point PCs and context data
addresses.  It uses /sys/module/...., kind of under protest, because
there seems to exist no MODULE_EXPORT'd API to get at that information
some other way.

- FChE


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-03 19:38         ` Frank Ch. Eigler
@ 2020-08-03 20:11           ` Kees Cook
  2020-08-03 21:12             ` Frank Ch. Eigler
  0 siblings, 1 reply; 46+ messages in thread
From: Kees Cook @ 2020-08-03 20:11 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Joe Lawrence, Evgenii Shatokhin, Kristen Carlson Accardi,
	Miroslav Benes, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching,
	Josh Poimboeuf, Jessica Yu

On Mon, Aug 03, 2020 at 03:38:37PM -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> > > While this does seem to be the right solution for the extant problem, I
> > > do want to take a moment and ask if the function sections need to be
> > > exposed at all? What tools use this information, and do they just want
> > > to see the bounds of the code region? (i.e. the start/end of all the
> > > .text* sections) Perhaps .text.* could be excluded from the sysfs
> > > section list?
> 
> > [[cc += FChE, see [0] for Evgenii's full mail ]]
> 
> Thanks!
> 
> > It looks like debugging tools like systemtap [1], gdb [2] and its
> > add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info.
> > But yeah, it would be preferable if we didn't export a long sysfs
> > representation if nobody actually needs it.
> 
> Systemtap needs to know base addresses of loaded text & data sections,
> in order to perform relocation of probe point PCs and context data
> addresses.  It uses /sys/module/...., kind of under protest, because
> there seems to exist no MODULE_EXPORT'd API to get at that information
> some other way.

Wouldn't /proc/kallsysms entries cover this? I must be missing
something...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-03 20:11           ` Kees Cook
@ 2020-08-03 21:12             ` Frank Ch. Eigler
  2020-08-03 21:41               ` Kees Cook
  0 siblings, 1 reply; 46+ messages in thread
From: Frank Ch. Eigler @ 2020-08-03 21:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Joe Lawrence, Evgenii Shatokhin, Kristen Carlson Accardi,
	Miroslav Benes, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching,
	Josh Poimboeuf, Jessica Yu

Hi -

On Mon, Aug 03, 2020 at 01:11:27PM -0700, Kees Cook wrote:
> [...]
> > Systemtap needs to know base addresses of loaded text & data sections,
> > in order to perform relocation of probe point PCs and context data
> > addresses.  It uses /sys/module/...., kind of under protest, because
> > there seems to exist no MODULE_EXPORT'd API to get at that information
> > some other way.
> 
> Wouldn't /proc/kallsysms entries cover this? I must be missing
> something...

We have relocated based on sections, not some subset of function
symbols accessible that way, partly because DWARF line- and DIE- based
probes can map to addresses some way away from function symbols, into
function interiors, or cloned/moved bits of optimized code.  It would
take some work to prove that function-symbol based heuristic
arithmetic would have just as much reach.

- FChE


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-03 21:12             ` Frank Ch. Eigler
@ 2020-08-03 21:41               ` Kees Cook
  2020-08-04  0:48                 ` Frank Ch. Eigler
  0 siblings, 1 reply; 46+ messages in thread
From: Kees Cook @ 2020-08-03 21:41 UTC (permalink / raw)
  To: Frank Ch. Eigler
  Cc: Joe Lawrence, Evgenii Shatokhin, Kristen Carlson Accardi,
	Miroslav Benes, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching,
	Josh Poimboeuf, Jessica Yu

On Mon, Aug 03, 2020 at 05:12:28PM -0400, Frank Ch. Eigler wrote:
> Hi -
> 
> On Mon, Aug 03, 2020 at 01:11:27PM -0700, Kees Cook wrote:
> > [...]
> > > Systemtap needs to know base addresses of loaded text & data sections,
> > > in order to perform relocation of probe point PCs and context data
> > > addresses.  It uses /sys/module/...., kind of under protest, because
> > > there seems to exist no MODULE_EXPORT'd API to get at that information
> > > some other way.
> > 
> > Wouldn't /proc/kallsysms entries cover this? I must be missing
> > something...
> 
> We have relocated based on sections, not some subset of function
> symbols accessible that way, partly because DWARF line- and DIE- based
> probes can map to addresses some way away from function symbols, into
> function interiors, or cloned/moved bits of optimized code.  It would
> take some work to prove that function-symbol based heuristic
> arithmetic would have just as much reach.

Interesting. Do you have an example handy? It seems like something like
that would reference the enclosing section, which means we can't just
leave them out of the sysfs list... (but if such things never happen in
the function-sections, then we *can* remove them...)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-03 21:41               ` Kees Cook
@ 2020-08-04  0:48                 ` Frank Ch. Eigler
  0 siblings, 0 replies; 46+ messages in thread
From: Frank Ch. Eigler @ 2020-08-04  0:48 UTC (permalink / raw)
  To: Kees Cook
  Cc: Joe Lawrence, Evgenii Shatokhin, Kristen Carlson Accardi,
	Miroslav Benes, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching,
	Josh Poimboeuf, Jessica Yu

Hi -

> > We have relocated based on sections, not some subset of function
> > symbols accessible that way, partly because DWARF line- and DIE- based
> > probes can map to addresses some way away from function symbols, into
> > function interiors, or cloned/moved bits of optimized code.  It would
> > take some work to prove that function-symbol based heuristic
> > arithmetic would have just as much reach.
> 
> Interesting. Do you have an example handy? 

No, I'm afraid I don't have one that I know cannot possibly be
expressed by reference to a function symbol only.  I'd look at
systemtap (4.3) probe point lists like:

% stap -vL 'kernel.statement("*@kernel/*verif*.c:*")'
% stap -vL 'module("amdgpu").statement("*@*execution*.c:*")'

which give an impression of computed PC addresses.

> It seems like something like that would reference the enclosing
> section, which means we can't just leave them out of the sysfs
> list... (but if such things never happen in the function-sections,
> then we *can* remove them...)

I'm not sure we can easily prove they can never happen there.

- FChE


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-03 18:17       ` Joe Lawrence
  2020-08-03 19:38         ` Frank Ch. Eigler
@ 2020-08-04 17:04         ` Jessica Yu
  1 sibling, 0 replies; 46+ messages in thread
From: Jessica Yu @ 2020-08-04 17:04 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: Kees Cook, Evgenii Shatokhin, Kristen Carlson Accardi,
	Miroslav Benes, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching,
	Josh Poimboeuf, Frank Ch. Eigler

+++ Joe Lawrence [03/08/20 14:17 -0400]:
>On 8/3/20 1:45 PM, Kees Cook wrote:
>>On Mon, Aug 03, 2020 at 02:39:32PM +0300, Evgenii Shatokhin wrote:
>>>There are at least 2 places where high-order memory allocations might happen
>>>during module loading. Such allocations may fail if memory is fragmented,
>>>while physically contiguous memory areas are not really needed there. I
>>>suggest to switch to kvmalloc/kvfree there.

Thanks Evgenii for pointing out the potential memory allocation issues
that may arise with very large modules when memory is fragmented. I was curious
as to which modules on my machine would be considered large, and there seems to be
quite a handful...(x86_64 with v5.8-rc6 with a relatively standard distro config and
FG KASLR patches on top):

./amdgpu/sections 7277
./i915/sections 4267
./nouveau/sections 3772
./xfs/sections 2395
./btrfs/sections 1966
./mac80211/sections 1588
./kvm/sections 1468
./cfg80211/sections 1194
./drm/sections 1012
./bluetooth/sections 843
./iwlmvm/sections 664
./usbcore/sections 524
./videodev/sections 436

So, I agree with the suggestion that we could switch to kvmalloc() to
try to mitigate potential allocation problems when memory is fragmented.

>>While this does seem to be the right solution for the extant problem, I
>>do want to take a moment and ask if the function sections need to be
>>exposed at all? What tools use this information, and do they just want
>>to see the bounds of the code region? (i.e. the start/end of all the
>>.text* sections) Perhaps .text.* could be excluded from the sysfs
>>section list?
>
>[[cc += FChE, see [0] for Evgenii's full mail ]]
>
>It looks like debugging tools like systemtap [1], gdb [2] and its 
>add-symbol-file cmd, etc. peek at the /sys/module/<MOD>/section/ info.
>
>But yeah, it would be preferable if we didn't export a long sysfs 
>representation if nobody actually needs it.

Thanks Joe for looking into this. Hmm, AFAICT for gdb it's not a hard
dependency per se - for add-symbol-file I was under the impression
that we are responsible for obtaining the relevant section addresses
ourselves through /sys/module/ (the most oft cited method) and then
feeding those to add-symbol-file. It would definitely be more
difficult to find out the section addresses without the /sys/module/
section entries. In any case, it sounds like systemtap has a hard
dependency on /sys/module/*/sections anyway.

Regarding /proc/kallsyms, I think it is probably possible to expose
section symbols and their addresses via /proc/kallsyms rather than
through sysfs (it would then live in the module's vmalloc'ed memory)
but I'm not sure how helpful that would actually be, especially since
existing tools depend on the sysfs interface being there.

>[0] https://lore.kernel.org/lkml/e9c4d88b-86db-47e9-4299-3fac45a7e3fd@virtuozzo.com/
>[1] https://fossies.org/linux/systemtap/staprun/staprun.c
>[2] https://www.oreilly.com/library/view/linux-device-drivers/0596005903/ch04.html#linuxdrive3-CHP-4-SECT-6.1
>
>-- Joe

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (10 preceding siblings ...)
  2020-07-22  9:27 ` [PATCH v4 00/10] Function Granular KASLR Miroslav Benes
@ 2020-08-04 18:23 ` Joe Lawrence
  2020-08-07 16:38   ` Kristen Carlson Accardi
  2020-08-12 17:18   ` Kristen Carlson Accardi
  2020-08-06 15:32 ` Ingo Molnar
  12 siblings, 2 replies; 46+ messages in thread
From: Joe Lawrence @ 2020-08-04 18:23 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: keescook, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching

On Fri, Jul 17, 2020 at 09:59:57AM -0700, Kristen Carlson Accardi wrote:
> Function Granular Kernel Address Space Layout Randomization (fgkaslr)
> ---------------------------------------------------------------------
> 
> This patch set is an implementation of finer grained kernel address space
> randomization. It rearranges your kernel code at load time 
> on a per-function level granularity, with only around a second added to
> boot time.
> 
> Changes in v4:
> -------------
> * dropped the patch to split out change to STATIC definition in
>   x86/boot/compressed/misc.c and replaced with a patch authored
>   by Kees Cook to avoid the duplicate malloc definitions
> * Added a section to Documentation/admin-guide/kernel-parameters.txt
>   to document the fgkaslr boot option.
> * redesigned the patch to hide the new layout when reading
>   /proc/kallsyms. The previous implementation utilized a dynamically
>   allocated linked list to display the kernel and module symbols
>   in alphabetical order. The new implementation uses a randomly
>   shuffled index array to display the kernel and module symbols
>   in a random order.
> 
> Changes in v3:
> -------------
> * Makefile changes to accommodate CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> * removal of extraneous ALIGN_PAGE from _etext changes
> * changed variable names in x86/tools/relocs to be less confusing
> * split out change to STATIC definition in x86/boot/compressed/misc.c
> * Updates to Documentation to make it more clear what is preserved in .text
> * much more detailed commit message for function granular KASLR patch
> * minor tweaks and changes that make for more readable code
> * this cover letter updated slightly to add additional details
> 
> Changes in v2:
> --------------
> * Fix to address i386 build failure
> * Allow module reordering patch to be configured separately so that
>   arm (or other non-x86_64 arches) can take advantage of module function
>   reordering. This support has not be tested by me, but smoke tested by
>   Ard Biesheuvel <ardb@kernel.org> on arm.
> * Fix build issue when building on arm as reported by
>   Ard Biesheuvel <ardb@kernel.org> 
> 
> Patches to objtool are included because they are dependencies for this
> patchset, however they have been submitted by their maintainer separately.
> 
> Background
> ----------
> KASLR was merged into the kernel with the objective of increasing the
> difficulty of code reuse attacks. Code reuse attacks reused existing code
> snippets to get around existing memory protections. They exploit software bugs
> which expose addresses of useful code snippets to control the flow of
> execution for their own nefarious purposes. KASLR moves the entire kernel
> code text as a unit at boot time in order to make addresses less predictable.
> The order of the code within the segment is unchanged - only the base address
> is shifted. There are a few shortcomings to this algorithm.
> 
> 1. Low Entropy - there are only so many locations the kernel can fit in. This
>    means an attacker could guess without too much trouble.
> 2. Knowledge of a single address can reveal the offset of the base address,
>    exposing all other locations for a published/known kernel image.
> 3. Info leaks abound.
> 
> Finer grained ASLR has been proposed as a way to make ASLR more resistant
> to info leaks. It is not a new concept at all, and there are many variations
> possible. Function reordering is an implementation of finer grained ASLR
> which randomizes the layout of an address space on a function level
> granularity. We use the term "fgkaslr" in this document to refer to the
> technique of function reordering when used with KASLR, as well as finer grained
> KASLR in general.
> 
> Proposed Improvement
> --------------------
> This patch set proposes adding function reordering on top of the existing
> KASLR base address randomization. The over-arching objective is incremental
> improvement over what we already have. It is designed to work in combination
> with the existing solution. The implementation is really pretty simple, and
> there are 2 main area where changes occur:
> 
> * Build time
> 
> GCC has had an option to place functions into individual .text sections for
> many years now. This option can be used to implement function reordering at
> load time. The final compiled vmlinux retains all the section headers, which
> can be used to help find the address ranges of each function. Using this
> information and an expanded table of relocation addresses, individual text
> sections can be suffled immediately after decompression. Some data tables
> inside the kernel that have assumptions about order require re-sorting
> after being updated when applying relocations. In order to modify these tables,
> a few key symbols are excluded from the objcopy symbol stripping process for
> use after shuffling the text segments.
> 
> Some highlights from the build time changes to look for:
> 
> The top level kernel Makefile was modified to add the gcc flag if it
> is supported. Currently, I am applying this flag to everything it is
> possible to randomize. Anything that is written in C and not present in a
> special input section is randomized. The final binary segment 0 retains a
> consolidated .text section, as well as all the individual .text.* sections.
> Future work could turn off this flags for selected files or even entire
> subsystems, although obviously at the cost of security.
> 
> The relocs tool is updated to add relative relocations. This information
> previously wasn't included because it wasn't necessary when moving the
> entire .text segment as a unit. 
> 
> A new file was created to contain a list of symbols that objcopy should
> keep. We use those symbols at load time as described below.
> 
> * Load time
> 
> The boot kernel was modified to parse the vmlinux elf file after
> decompression to check for our interesting symbols that we kept, and to
> look for any .text.* sections to randomize. The consolidated .text section
> is skipped and not moved. The sections are shuffled randomly, and copied
> into memory following the .text section in a new random order. The existing
> code which updated relocation addresses was modified to account for
> not just a fixed delta from the load address, but the offset that the function
> section was moved to. This requires inspection of each address to see if
> it was impacted by a randomization. We use a bsearch to make this less
> horrible on performance. Any tables that need to be modified with new
> addresses or resorted are updated using the symbol addresses parsed from the
> elf symbol table.
> 
> In order to hide our new layout, symbols reported through /proc/kallsyms
> will be displayed in a random order.
> 
> Security Considerations
> -----------------------
> The objective of this patch set is to improve a technology that is already
> merged into the kernel (KASLR). This code will not prevent all attacks,
> but should instead be considered as one of several tools that can be used.
> In particular, this code is meant to make KASLR more effective in the presence
> of info leaks.
> 
> How much entropy we are adding to the existing entropy of standard KASLR will
> depend on a few variables. Firstly and most obviously, the number of functions
> that are randomized matters. This implementation keeps the existing .text
> section for code that cannot be randomized - for example, because it was
> assembly code. The less sections to randomize, the less entropy. In addition,
> due to alignment (16 bytes for x86_64), the number of bits in a address that
> the attacker needs to guess is reduced, as the lower bits are identical.
> 
> Performance Impact
> ------------------
> There are two areas where function reordering can impact performance: boot
> time latency, and run time performance.
> 
> * Boot time latency
> This implementation of finer grained KASLR impacts the boot time of the kernel
> in several places. It requires additional parsing of the kernel ELF file to
> obtain the section headers of the sections to be randomized. It calls the
> random number generator for each section to be randomized to determine that
> section's new memory location. It copies the decompressed kernel into a new
> area of memory to avoid corruption when laying out the newly randomized
> sections. It increases the number of relocations the kernel has to perform at
> boot time vs. standard KASLR, and it also requires a lookup on each address
> that needs to be relocated to see if it was in a randomized section and needs
> to be adjusted by a new offset. Finally, it re-sorts a few data tables that
> are required to be sorted by address.
> 
> Booting a test VM on a modern, well appointed system showed an increase in
> latency of approximately 1 second.
> 
> * Run time
> The performance impact at run-time of function reordering varies by workload.
> Using kcbench, a kernel compilation benchmark, the performance of a kernel
> build with finer grained KASLR was about 1% slower than a kernel with standard
> KASLR. Analysis with perf showed a slightly higher percentage of 
> L1-icache-load-misses. Other workloads were examined as well, with varied
> results. Some workloads performed significantly worse under FGKASLR, while
> others stayed the same or were mysteriously better. In general, it will
> depend on the code flow whether or not finer grained KASLR will impact
> your workload, and how the underlying code was designed. Because the layout
> changes per boot, each time a system is rebooted the performance of a workload
> may change.
> 
> Future work could identify hot areas that may not be randomized and either
> leave them in the .text section or group them together into a single section
> that may be randomized. If grouping things together helps, one other thing to
> consider is that if we could identify text blobs that should be grouped together
> to benefit a particular code flow, it could be interesting to explore
> whether this security feature could be also be used as a performance
> feature if you are interested in optimizing your kernel layout for a
> particular workload at boot time. Optimizing function layout for a particular
> workload has been researched and proven effective - for more information
> read the Facebook paper "Optimizing Function Placement for Large-Scale
> Data-Center Applications" (see references section below).
> 
> Image Size
> ----------
> Adding additional section headers as a result of compiling with
> -ffunction-sections will increase the size of the vmlinux ELF file.
> With a standard distro config, the resulting vmlinux was increased by
> about 3%. The compressed image is also increased due to the header files,
> as well as the extra relocations that must be added. You can expect fgkaslr
> to increase the size of the compressed image by about 15%.
> 
> Memory Usage
> ------------
> fgkaslr increases the amount of heap that is required at boot time,
> although this extra memory is released when the kernel has finished
> decompression. As a result, it may not be appropriate to use this feature on
> systems without much memory.
> 
> Building
> --------
> To enable fine grained KASLR, you need to have the following config options
> set (including all the ones you would use to build normal KASLR)
> 
> CONFIG_FG_KASLR=y
> 
> In addition, fgkaslr is only supported for the X86_64 architecture.
> 
> Modules
> -------
> Modules are randomized similarly to the rest of the kernel by shuffling
> the sections at load time prior to moving them into memory. The module must
> also have been build with the -ffunction-sections compiler option.
> 
> Although fgkaslr for the kernel is only supported for the X86_64 architecture,
> it is possible to use fgkaslr with modules on other architectures. To enable
> this feature, select
> 
> CONFIG_MODULE_FG_KASLR=y
> 
> This option is selected automatically for X86_64 when CONFIG_FG_KASLR is set.
> 
> Disabling
> ---------
> Disabling normal KASLR using the nokaslr command line option also disables
> fgkaslr. It is also possible to disable fgkaslr separately by booting with
> fgkaslr=off on the commandline.
> 
> References
> ----------
> There are a lot of academic papers which explore finer grained ASLR.
> This paper in particular contributed the most to my implementation design
> as well as my overall understanding of the problem space:
> 
> Selfrando: Securing the Tor Browser against De-anonymization Exploits,
> M. Conti, S. Crane, T. Frassetto, et al.
> 
> For more information on how function layout impacts performance, see:
> 
> Optimizing Function Placement for Large-Scale Data-Center Applications,
> G. Ottoni, B. Maher
> 
> Kees Cook (2):
>   x86/boot: Allow a "silent" kaslr random byte fetch
>   x86/boot/compressed: Avoid duplicate malloc() implementations
> 
> Kristen Carlson Accardi (8):
>   objtool: Do not assume order of parent/child functions
>   x86: tools/relocs: Support >64K section headers
>   x86: Makefile: Add build and config option for CONFIG_FG_KASLR
>   x86: Make sure _etext includes function sections
>   x86/tools: Add relative relocs for randomized functions
>   x86: Add support for function granular KASLR
>   kallsyms: Hide layout
>   module: Reorder functions
> 
>  .../admin-guide/kernel-parameters.txt         |   7 +
>  Documentation/security/fgkaslr.rst            | 172 ++++
>  Documentation/security/index.rst              |   1 +
>  Makefile                                      |   6 +-
>  arch/x86/Kconfig                              |   4 +
>  arch/x86/Makefile                             |   5 +
>  arch/x86/boot/compressed/Makefile             |   9 +-
>  arch/x86/boot/compressed/fgkaslr.c            | 811 ++++++++++++++++++
>  arch/x86/boot/compressed/kaslr.c              |   4 -
>  arch/x86/boot/compressed/misc.c               | 157 +++-
>  arch/x86/boot/compressed/misc.h               |  30 +
>  arch/x86/boot/compressed/utils.c              |  11 +
>  arch/x86/boot/compressed/vmlinux.symbols      |  17 +
>  arch/x86/include/asm/boot.h                   |  15 +-
>  arch/x86/kernel/vmlinux.lds.S                 |  17 +-
>  arch/x86/lib/kaslr.c                          |  18 +-
>  arch/x86/tools/relocs.c                       | 143 ++-
>  arch/x86/tools/relocs.h                       |   4 +-
>  arch/x86/tools/relocs_common.c                |  15 +-
>  include/asm-generic/vmlinux.lds.h             |  18 +-
>  include/linux/decompress/mm.h                 |  12 +-
>  include/uapi/linux/elf.h                      |   1 +
>  init/Kconfig                                  |  26 +
>  kernel/kallsyms.c                             | 163 +++-
>  kernel/module.c                               |  81 ++
>  tools/objtool/elf.c                           |   8 +-
>  26 files changed, 1670 insertions(+), 85 deletions(-)
>  create mode 100644 Documentation/security/fgkaslr.rst
>  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
>  create mode 100644 arch/x86/boot/compressed/utils.c
>  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> 
> 
> base-commit: 11ba468877bb23f28956a35e896356252d63c983
> -- 
> 2.20.1
> 

Apologies in advance if this has already been discussed elsewhere, but I
did finally get around to testing the patchset against the livepatching
kselftests.

The livepatching kselftests fail as all livepatches stall their
transitions.  It appears that reliable (ORC) stack unwinding is broken
when fgkaslr is enabled.

Relevant config options:

  CONFIG_ARCH_HAS_FG_KASLR=y
  CONFIG_ARCH_STACKWALK=y
  CONFIG_FG_KASLR=y
  CONFIG_HAVE_LIVEPATCH=y
  CONFIG_HAVE_RELIABLE_STACKTRACE=y
  CONFIG_LIVEPATCH=y
  CONFIG_MODULE_FG_KASLR=y
  CONFIG_TEST_LIVEPATCH=m
  CONFIG_UNWINDER_ORC=y

The livepatch transitions are stuck along this call path:

  klp_check_stack
    stack_trace_save_tsk_reliable
      arch_stack_walk_reliable
  
          /* Check for stack corruption */
          if (unwind_error(&state))
                  return -EINVAL;

where the unwinder error is set by unwind_next_frame():

  arch/x86/kernel/unwind_orc.c
  bool unwind_next_frame(struct unwind_state *state)
 
sometimes here:
 
  	/* End-of-stack check for kernel threads: */
  	if (orc->sp_reg == ORC_REG_UNDEFINED) {
  		if (!orc->end)
  			goto err;
  
  		goto the_end;
  	}

or here:

  	/* Prevent a recursive loop due to bad ORC data: */                                                                                
  	if (state->stack_info.type == prev_type &&                                                                                         
  	    on_stack(&state->stack_info, (void *)state->sp, sizeof(long)) &&                                                               
  	    state->sp <= prev_sp) {                                                                                                        
  		orc_warn_current("stack going in the wrong direction? at %pB\n",                                                           
  				 (void *)orig_ip);                                                                                         
  		goto err;                                                                                                                  
  	}

(and probably other places the ORC unwinder gets confused.)


It also manifests itself in other, more visible ways.  For example, a
kernel module that calls dump_stack() in its init function or even
/proc/<pid>/stack:

(fgkaslr on)
------------

Call Trace:
 ? dump_stack+0x57/0x73
 ? 0xffffffffc0850000
 ? mymodule_init+0xa/0x1000 [dumpstack]
 ? do_one_initcall+0x46/0x1f0
 ? free_unref_page_commit+0x91/0x100
 ? _cond_resched+0x15/0x30
 ? kmem_cache_alloc_trace+0x14b/0x210
 ? do_init_module+0x5a/0x220
 ? load_module+0x1912/0x1b20
 ? __do_sys_finit_module+0xa8/0x110
 ? __do_sys_finit_module+0xa8/0x110
 ? do_syscall_64+0x47/0x80
 ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

% sudo cat /proc/$$/stack
[<0>] do_wait+0x1c3/0x230
[<0>] kernel_wait4+0xa6/0x140


fgkaslr=off
-----------

Call Trace:
 dump_stack+0x57/0x73
 ? 0xffffffffc04f2000
 mymodule_init+0xa/0x1000 [readonly]
 do_one_initcall+0x46/0x1f0
 ? free_unref_page_commit+0x91/0x100
 ? _cond_resched+0x15/0x30
 ? kmem_cache_alloc_trace+0x14b/0x210
 do_init_module+0x5a/0x220
 load_module+0x1912/0x1b20
 ? __do_sys_finit_module+0xa8/0x110
 __do_sys_finit_module+0xa8/0x110
 do_syscall_64+0x47/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

% sudo cat /proc/$$/stack
[<0>] do_wait+0x1c3/0x230
[<0>] kernel_wait4+0xa6/0x140
[<0>] __do_sys_wait4+0x83/0x90
[<0>] do_syscall_64+0x47/0x80
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9


I would think fixing and verifying these latter cases would be easier than
chasing livepatch transitions (but would still probably fix klp case, too).
Perhaps Josh or someone has other ORC unwinder tests that could be used?

-- Joe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
                   ` (11 preceding siblings ...)
  2020-08-04 18:23 ` Joe Lawrence
@ 2020-08-06 15:32 ` Ingo Molnar
  2020-08-06 19:24   ` Kristen Carlson Accardi
  2020-08-06 19:27   ` Kees Cook
  12 siblings, 2 replies; 46+ messages in thread
From: Ingo Molnar @ 2020-08-06 15:32 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: keescook, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe


* Kristen Carlson Accardi <kristen@linux.intel.com> wrote:

> Function Granular Kernel Address Space Layout Randomization (fgkaslr)
> ---------------------------------------------------------------------
> 
> This patch set is an implementation of finer grained kernel address space
> randomization. It rearranges your kernel code at load time 
> on a per-function level granularity, with only around a second added to
> boot time.

This is a very nice feature IMO, and it should be far more effective 
at randomizing the kernel, due to the sheer number of randomization 
bits that kernel function granular randomization presents.

If this is a good approximation of fg-kaslr randomization depth:

  thule:~/tip> grep ' [tT] ' /proc/kallsyms  | wc -l
  88488

... then that's 80K bits of randomization instead of the mere handful 
of kaslr bits we have today. Very nice!

> In order to hide our new layout, symbols reported through 
> /proc/kallsyms will be displayed in a random order.

Neat. :-)

> Performance Impact
> ------------------

> * Run time
> The performance impact at run-time of function reordering varies by workload.
> Using kcbench, a kernel compilation benchmark, the performance of a kernel
> build with finer grained KASLR was about 1% slower than a kernel with standard
> KASLR. Analysis with perf showed a slightly higher percentage of 
> L1-icache-load-misses. Other workloads were examined as well, with varied
> results. Some workloads performed significantly worse under FGKASLR, while
> others stayed the same or were mysteriously better. In general, it will
> depend on the code flow whether or not finer grained KASLR will impact
> your workload, and how the underlying code was designed. Because the layout
> changes per boot, each time a system is rebooted the performance of a workload
> may change.

I'd guess that the biggest performance impact comes from tearing apart 
'groups' of functions that particular workloads are using.

In that sense it might be worthwile to add a '__kaslr_group' function 
tag to key functions, which would keep certain performance critical 
functions next to each other.

This shouldn't really be a problem, as even with generous amount of 
grouping the number of randomization bits is incredibly large.

> Future work could identify hot areas that may not be randomized and either
> leave them in the .text section or group them together into a single section
> that may be randomized. If grouping things together helps, one other thing to
> consider is that if we could identify text blobs that should be grouped together
> to benefit a particular code flow, it could be interesting to explore
> whether this security feature could be also be used as a performance
> feature if you are interested in optimizing your kernel layout for a
> particular workload at boot time. Optimizing function layout for a particular
> workload has been researched and proven effective - for more information
> read the Facebook paper "Optimizing Function Placement for Large-Scale
> Data-Center Applications" (see references section below).

I'm pretty sure the 'grouping' solution would address any real 
slowdowns.

I'd also suggest allowing the passing in of a boot-time pseudo-random 
generator seed number, which would allow the creation of a 
pseudo-randomized but repeatable layout across reboots.

> Image Size
> ----------
> Adding additional section headers as a result of compiling with
> -ffunction-sections will increase the size of the vmlinux ELF file.
> With a standard distro config, the resulting vmlinux was increased by
> about 3%. The compressed image is also increased due to the header files,
> as well as the extra relocations that must be added. You can expect fgkaslr
> to increase the size of the compressed image by about 15%.

What is the increase of the resulting raw kernel image? Additional 
relocations might increase its size (unless I'm missing something) - 
it would be nice to measure this effect. I'd expect this to be really 
low.

vmlinux or compressed kernel size doesn't really matter on x86-64, 
it's a boot time only expense well within typical system resource 
limits.

> Disabling
> ---------
> Disabling normal KASLR using the nokaslr command line option also disables
> fgkaslr. It is also possible to disable fgkaslr separately by booting with
> fgkaslr=off on the commandline.

I'd suggest to also add a 'nofgkaslr' boot option if it doesn't yet 
exist, to keep usage symmetric with kaslr.

Likewise, there should probably be a 'kaslr=off' option as well.

The less random our user interfaces are, the better ...

>  arch/x86/boot/compressed/Makefile             |   9 +-
>  arch/x86/boot/compressed/fgkaslr.c            | 811 ++++++++++++++++++
>  arch/x86/boot/compressed/kaslr.c              |   4 -
>  arch/x86/boot/compressed/misc.c               | 157 +++-
>  arch/x86/boot/compressed/misc.h               |  30 +
>  arch/x86/boot/compressed/utils.c              |  11 +
>  arch/x86/boot/compressed/vmlinux.symbols      |  17 +
>  arch/x86/include/asm/boot.h                   |  15 +-
>  arch/x86/kernel/vmlinux.lds.S                 |  17 +-
>  arch/x86/lib/kaslr.c                          |  18 +-
>  arch/x86/tools/relocs.c                       | 143 ++-
>  arch/x86/tools/relocs.h                       |   4 +-
>  arch/x86/tools/relocs_common.c                |  15 +-
>  include/asm-generic/vmlinux.lds.h             |  18 +-
>  include/linux/decompress/mm.h                 |  12 +-
>  include/uapi/linux/elf.h                      |   1 +
>  init/Kconfig                                  |  26 +
>  kernel/kallsyms.c                             | 163 +++-
>  kernel/module.c                               |  81 ++
>  tools/objtool/elf.c                           |   8 +-
>  26 files changed, 1670 insertions(+), 85 deletions(-)
>  create mode 100644 Documentation/security/fgkaslr.rst
>  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
>  create mode 100644 arch/x86/boot/compressed/utils.c
>  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols

This looks surprisingly lean overall.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-06 15:32 ` Ingo Molnar
@ 2020-08-06 19:24   ` Kristen Carlson Accardi
  2020-08-06 19:27   ` Kees Cook
  1 sibling, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-08-06 19:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: keescook, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe

Hi Mingo, thanks for taking a look, I am glad you like the idea. Some
replies below:

On Thu, 2020-08-06 at 17:32 +0200, Ingo Molnar wrote:
> * Kristen Carlson Accardi <kristen@linux.intel.com> wrote:
> 
> > Function Granular Kernel Address Space Layout Randomization
> > (fgkaslr)
> > -----------------------------------------------------------------
> > ----
> > 
> > This patch set is an implementation of finer grained kernel address
> > space
> > randomization. It rearranges your kernel code at load time 
> > on a per-function level granularity, with only around a second
> > added to
> > boot time.
> 
> This is a very nice feature IMO, and it should be far more effective 
> at randomizing the kernel, due to the sheer number of randomization 
> bits that kernel function granular randomization presents.
> 
> If this is a good approximation of fg-kaslr randomization depth:
> 
>   thule:~/tip> grep ' [tT] ' /proc/kallsyms  | wc -l
>   88488
> 
> ... then that's 80K bits of randomization instead of the mere
> handful 
> of kaslr bits we have today. Very nice!
> 
> > In order to hide our new layout, symbols reported through 
> > /proc/kallsyms will be displayed in a random order.
> 
> Neat. :-)
> 
> > Performance Impact
> > ------------------
> > * Run time
> > The performance impact at run-time of function reordering varies by
> > workload.
> > Using kcbench, a kernel compilation benchmark, the performance of a
> > kernel
> > build with finer grained KASLR was about 1% slower than a kernel
> > with standard
> > KASLR. Analysis with perf showed a slightly higher percentage of 
> > L1-icache-load-misses. Other workloads were examined as well, with
> > varied
> > results. Some workloads performed significantly worse under
> > FGKASLR, while
> > others stayed the same or were mysteriously better. In general, it
> > will
> > depend on the code flow whether or not finer grained KASLR will
> > impact
> > your workload, and how the underlying code was designed. Because
> > the layout
> > changes per boot, each time a system is rebooted the performance of
> > a workload
> > may change.
> 
> I'd guess that the biggest performance impact comes from tearing
> apart 
> 'groups' of functions that particular workloads are using.
> 
> In that sense it might be worthwile to add a '__kaslr_group'
> function 
> tag to key functions, which would keep certain performance critical 
> functions next to each other.
> 
> This shouldn't really be a problem, as even with generous amount of 
> grouping the number of randomization bits is incredibly large.

So my strategy so far was to try to get a very basic non-performance
optimized fgkaslr mode merged first, then add performance optimized
options as a next step. For example, a user might pass in
fgkaslr="group" to the fgkaslr kernel parameter to select a layout
which groups some things by whatever criteria we want to mitigate some
of the performance impact of full randomization, or they might chose
fgkaslr="full", which just randomizes everything (the current
implementation). If people think it's worth adding the performance
optimizations for the initial merge, I can certainly work on those, but
i thought it might be better to keep it super simple at first.

> 
> > Future work could identify hot areas that may not be randomized and
> > either
> > leave them in the .text section or group them together into a
> > single section
> > that may be randomized. If grouping things together helps, one
> > other thing to
> > consider is that if we could identify text blobs that should be
> > grouped together
> > to benefit a particular code flow, it could be interesting to
> > explore
> > whether this security feature could be also be used as a
> > performance
> > feature if you are interested in optimizing your kernel layout for
> > a
> > particular workload at boot time. Optimizing function layout for a
> > particular
> > workload has been researched and proven effective - for more
> > information
> > read the Facebook paper "Optimizing Function Placement for Large-
> > Scale
> > Data-Center Applications" (see references section below).
> 
> I'm pretty sure the 'grouping' solution would address any real 
> slowdowns.
> 
> I'd also suggest allowing the passing in of a boot-time pseudo-
> random 
> generator seed number, which would allow the creation of a 
> pseudo-randomized but repeatable layout across reboots.

We talked during the RFC stage of porting the chacha20 code to this
early boot stage to use as a prand generator. Ultimately, this means
you now have a secret you have to protect (the seed), and so I've
dropped this for now. I could see maybe having this as a debug option?
I certainly use a prand myself even now when I'm still debugging
functional issues (although the one I use for my own debugging isn't
suitable for merging).

> 
> > Image Size
> > ----------
> > Adding additional section headers as a result of compiling with
> > -ffunction-sections will increase the size of the vmlinux ELF file.
> > With a standard distro config, the resulting vmlinux was increased
> > by
> > about 3%. The compressed image is also increased due to the header
> > files,
> > as well as the extra relocations that must be added. You can expect
> > fgkaslr
> > to increase the size of the compressed image by about 15%.
> 
> What is the increase of the resulting raw kernel image? Additional 
> relocations might increase its size (unless I'm missing something) - 
> it would be nice to measure this effect. I'd expect this to be
> really 
> low.

By raw kernel image, do you mean just what eventually gets copied into
memory after decompression minus the relocation table? If so, this is
almost no difference - the only difference is that there is a little
bit of change in the padding between sections vs what the non-
randomized kernel is because of alignment differences with the new
layout. so you wind up with a few extra bytes give or take.

> 
> vmlinux or compressed kernel size doesn't really matter on x86-64, 
> it's a boot time only expense well within typical system resource 
> limits.
> 
> > Disabling
> > ---------
> > Disabling normal KASLR using the nokaslr command line option also
> > disables
> > fgkaslr. It is also possible to disable fgkaslr separately by
> > booting with
> > fgkaslr=off on the commandline.
> 
> I'd suggest to also add a 'nofgkaslr' boot option if it doesn't yet 
> exist, to keep usage symmetric with kaslr.
> 
> Likewise, there should probably be a 'kaslr=off' option as well.
> 
> The less random our user interfaces are, the better ...
> 
> >  arch/x86/boot/compressed/Makefile             |   9 +-
> >  arch/x86/boot/compressed/fgkaslr.c            | 811
> > ++++++++++++++++++
> >  arch/x86/boot/compressed/kaslr.c              |   4 -
> >  arch/x86/boot/compressed/misc.c               | 157 +++-
> >  arch/x86/boot/compressed/misc.h               |  30 +
> >  arch/x86/boot/compressed/utils.c              |  11 +
> >  arch/x86/boot/compressed/vmlinux.symbols      |  17 +
> >  arch/x86/include/asm/boot.h                   |  15 +-
> >  arch/x86/kernel/vmlinux.lds.S                 |  17 +-
> >  arch/x86/lib/kaslr.c                          |  18 +-
> >  arch/x86/tools/relocs.c                       | 143 ++-
> >  arch/x86/tools/relocs.h                       |   4 +-
> >  arch/x86/tools/relocs_common.c                |  15 +-
> >  include/asm-generic/vmlinux.lds.h             |  18 +-
> >  include/linux/decompress/mm.h                 |  12 +-
> >  include/uapi/linux/elf.h                      |   1 +
> >  init/Kconfig                                  |  26 +
> >  kernel/kallsyms.c                             | 163 +++-
> >  kernel/module.c                               |  81 ++
> >  tools/objtool/elf.c                           |   8 +-
> >  26 files changed, 1670 insertions(+), 85 deletions(-)
> >  create mode 100644 Documentation/security/fgkaslr.rst
> >  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
> >  create mode 100644 arch/x86/boot/compressed/utils.c
> >  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> 
> This looks surprisingly lean overall.

Most of the changes outside of fgkaslr.c, module.c, and kallsyms.c were
little tweaks here and there to accommodate using -ffunction-sections
and handling >64K elf sections, otherwise yes, I tried to keep it very
self contained and non-invasive.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-06 15:32 ` Ingo Molnar
  2020-08-06 19:24   ` Kristen Carlson Accardi
@ 2020-08-06 19:27   ` Kees Cook
  1 sibling, 0 replies; 46+ messages in thread
From: Kees Cook @ 2020-08-06 19:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe

On Thu, Aug 06, 2020 at 05:32:58PM +0200, Ingo Molnar wrote:
> * Kristen Carlson Accardi <kristen@linux.intel.com> wrote:
> > [...]
> > Performance Impact
> > ------------------
> 
> > * Run time
> > The performance impact at run-time of function reordering varies by workload.
> > Using kcbench, a kernel compilation benchmark, the performance of a kernel
> > build with finer grained KASLR was about 1% slower than a kernel with standard
> > KASLR. Analysis with perf showed a slightly higher percentage of 
> > L1-icache-load-misses. Other workloads were examined as well, with varied
> > results. Some workloads performed significantly worse under FGKASLR, while
> > others stayed the same or were mysteriously better. In general, it will
> > depend on the code flow whether or not finer grained KASLR will impact
> > your workload, and how the underlying code was designed. Because the layout
> > changes per boot, each time a system is rebooted the performance of a workload
> > may change.
> 
> I'd guess that the biggest performance impact comes from tearing apart 
> 'groups' of functions that particular workloads are using.
> 
> In that sense it might be worthwile to add a '__kaslr_group' function 
> tag to key functions, which would keep certain performance critical 
> functions next to each other.

We kind of already do this manually for things like the scheduler, etc,
using macros like ".whatever.text", so we might be able to create a more
generalized approach for those. Right now they require a "section" macro
usage and a linker script __start* and __end* marker, etc:

#define SCHED_TEXT						\
                ALIGN_FUNCTION();				\
                __sched_text_start = .;				\
                *(.sched.text)					\
                __sched_text_end = .;

Manually collected each whatever_TEXT define and building out each
__whatever_start, etc is annoying. It'd be really cool to have linker
script have wildcard replacements for build a syntax like this, based
on the presences of matching input sections:

	.%.text : {
		__%_start = .;
		*(.%.text.hot)
		*(.%.text)
		*(.%.text.*)
		*(.%.text.unlikely)
		__%_end = .;
		}

> I'd also suggest allowing the passing in of a boot-time pseudo-random 
> generator seed number, which would allow the creation of a 
> pseudo-randomized but repeatable layout across reboots.

This was present in earlier versions of the series.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-04 18:23 ` Joe Lawrence
@ 2020-08-07 16:38   ` Kristen Carlson Accardi
  2020-08-07 17:20     ` Kees Cook
  2020-08-12 17:18   ` Kristen Carlson Accardi
  1 sibling, 1 reply; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-08-07 16:38 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: keescook, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching

On Tue, 2020-08-04 at 14:23 -0400, Joe Lawrence wrote:
> On Fri, Jul 17, 2020 at 09:59:57AM -0700, Kristen Carlson Accardi
> wrote:
> > Function Granular Kernel Address Space Layout Randomization
> > (fgkaslr)
> > -----------------------------------------------------------------
> > ----
> > 
> > This patch set is an implementation of finer grained kernel address
> > space
> > randomization. It rearranges your kernel code at load time 
> > on a per-function level granularity, with only around a second
> > added to
> > boot time.
> > 
> > Changes in v4:
> > -------------
> > * dropped the patch to split out change to STATIC definition in
> >   x86/boot/compressed/misc.c and replaced with a patch authored
> >   by Kees Cook to avoid the duplicate malloc definitions
> > * Added a section to Documentation/admin-guide/kernel-
> > parameters.txt
> >   to document the fgkaslr boot option.
> > * redesigned the patch to hide the new layout when reading
> >   /proc/kallsyms. The previous implementation utilized a
> > dynamically
> >   allocated linked list to display the kernel and module symbols
> >   in alphabetical order. The new implementation uses a randomly
> >   shuffled index array to display the kernel and module symbols
> >   in a random order.
> > 
> > Changes in v3:
> > -------------
> > * Makefile changes to accommodate
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> > * removal of extraneous ALIGN_PAGE from _etext changes
> > * changed variable names in x86/tools/relocs to be less confusing
> > * split out change to STATIC definition in
> > x86/boot/compressed/misc.c
> > * Updates to Documentation to make it more clear what is preserved
> > in .text
> > * much more detailed commit message for function granular KASLR
> > patch
> > * minor tweaks and changes that make for more readable code
> > * this cover letter updated slightly to add additional details
> > 
> > Changes in v2:
> > --------------
> > * Fix to address i386 build failure
> > * Allow module reordering patch to be configured separately so that
> >   arm (or other non-x86_64 arches) can take advantage of module
> > function
> >   reordering. This support has not be tested by me, but smoke
> > tested by
> >   Ard Biesheuvel <ardb@kernel.org> on arm.
> > * Fix build issue when building on arm as reported by
> >   Ard Biesheuvel <ardb@kernel.org> 
> > 
> > Patches to objtool are included because they are dependencies for
> > this
> > patchset, however they have been submitted by their maintainer
> > separately.
> > 
> > Background
> > ----------
> > KASLR was merged into the kernel with the objective of increasing
> > the
> > difficulty of code reuse attacks. Code reuse attacks reused
> > existing code
> > snippets to get around existing memory protections. They exploit
> > software bugs
> > which expose addresses of useful code snippets to control the flow
> > of
> > execution for their own nefarious purposes. KASLR moves the entire
> > kernel
> > code text as a unit at boot time in order to make addresses less
> > predictable.
> > The order of the code within the segment is unchanged - only the
> > base address
> > is shifted. There are a few shortcomings to this algorithm.
> > 
> > 1. Low Entropy - there are only so many locations the kernel can
> > fit in. This
> >    means an attacker could guess without too much trouble.
> > 2. Knowledge of a single address can reveal the offset of the base
> > address,
> >    exposing all other locations for a published/known kernel image.
> > 3. Info leaks abound.
> > 
> > Finer grained ASLR has been proposed as a way to make ASLR more
> > resistant
> > to info leaks. It is not a new concept at all, and there are many
> > variations
> > possible. Function reordering is an implementation of finer grained
> > ASLR
> > which randomizes the layout of an address space on a function level
> > granularity. We use the term "fgkaslr" in this document to refer to
> > the
> > technique of function reordering when used with KASLR, as well as
> > finer grained
> > KASLR in general.
> > 
> > Proposed Improvement
> > --------------------
> > This patch set proposes adding function reordering on top of the
> > existing
> > KASLR base address randomization. The over-arching objective is
> > incremental
> > improvement over what we already have. It is designed to work in
> > combination
> > with the existing solution. The implementation is really pretty
> > simple, and
> > there are 2 main area where changes occur:
> > 
> > * Build time
> > 
> > GCC has had an option to place functions into individual .text
> > sections for
> > many years now. This option can be used to implement function
> > reordering at
> > load time. The final compiled vmlinux retains all the section
> > headers, which
> > can be used to help find the address ranges of each function. Using
> > this
> > information and an expanded table of relocation addresses,
> > individual text
> > sections can be suffled immediately after decompression. Some data
> > tables
> > inside the kernel that have assumptions about order require re-
> > sorting
> > after being updated when applying relocations. In order to modify
> > these tables,
> > a few key symbols are excluded from the objcopy symbol stripping
> > process for
> > use after shuffling the text segments.
> > 
> > Some highlights from the build time changes to look for:
> > 
> > The top level kernel Makefile was modified to add the gcc flag if
> > it
> > is supported. Currently, I am applying this flag to everything it
> > is
> > possible to randomize. Anything that is written in C and not
> > present in a
> > special input section is randomized. The final binary segment 0
> > retains a
> > consolidated .text section, as well as all the individual .text.*
> > sections.
> > Future work could turn off this flags for selected files or even
> > entire
> > subsystems, although obviously at the cost of security.
> > 
> > The relocs tool is updated to add relative relocations. This
> > information
> > previously wasn't included because it wasn't necessary when moving
> > the
> > entire .text segment as a unit. 
> > 
> > A new file was created to contain a list of symbols that objcopy
> > should
> > keep. We use those symbols at load time as described below.
> > 
> > * Load time
> > 
> > The boot kernel was modified to parse the vmlinux elf file after
> > decompression to check for our interesting symbols that we kept,
> > and to
> > look for any .text.* sections to randomize. The consolidated .text
> > section
> > is skipped and not moved. The sections are shuffled randomly, and
> > copied
> > into memory following the .text section in a new random order. The
> > existing
> > code which updated relocation addresses was modified to account for
> > not just a fixed delta from the load address, but the offset that
> > the function
> > section was moved to. This requires inspection of each address to
> > see if
> > it was impacted by a randomization. We use a bsearch to make this
> > less
> > horrible on performance. Any tables that need to be modified with
> > new
> > addresses or resorted are updated using the symbol addresses parsed
> > from the
> > elf symbol table.
> > 
> > In order to hide our new layout, symbols reported through
> > /proc/kallsyms
> > will be displayed in a random order.
> > 
> > Security Considerations
> > -----------------------
> > The objective of this patch set is to improve a technology that is
> > already
> > merged into the kernel (KASLR). This code will not prevent all
> > attacks,
> > but should instead be considered as one of several tools that can
> > be used.
> > In particular, this code is meant to make KASLR more effective in
> > the presence
> > of info leaks.
> > 
> > How much entropy we are adding to the existing entropy of standard
> > KASLR will
> > depend on a few variables. Firstly and most obviously, the number
> > of functions
> > that are randomized matters. This implementation keeps the existing
> > .text
> > section for code that cannot be randomized - for example, because
> > it was
> > assembly code. The less sections to randomize, the less entropy. In
> > addition,
> > due to alignment (16 bytes for x86_64), the number of bits in a
> > address that
> > the attacker needs to guess is reduced, as the lower bits are
> > identical.
> > 
> > Performance Impact
> > ------------------
> > There are two areas where function reordering can impact
> > performance: boot
> > time latency, and run time performance.
> > 
> > * Boot time latency
> > This implementation of finer grained KASLR impacts the boot time of
> > the kernel
> > in several places. It requires additional parsing of the kernel ELF
> > file to
> > obtain the section headers of the sections to be randomized. It
> > calls the
> > random number generator for each section to be randomized to
> > determine that
> > section's new memory location. It copies the decompressed kernel
> > into a new
> > area of memory to avoid corruption when laying out the newly
> > randomized
> > sections. It increases the number of relocations the kernel has to
> > perform at
> > boot time vs. standard KASLR, and it also requires a lookup on each
> > address
> > that needs to be relocated to see if it was in a randomized section
> > and needs
> > to be adjusted by a new offset. Finally, it re-sorts a few data
> > tables that
> > are required to be sorted by address.
> > 
> > Booting a test VM on a modern, well appointed system showed an
> > increase in
> > latency of approximately 1 second.
> > 
> > * Run time
> > The performance impact at run-time of function reordering varies by
> > workload.
> > Using kcbench, a kernel compilation benchmark, the performance of a
> > kernel
> > build with finer grained KASLR was about 1% slower than a kernel
> > with standard
> > KASLR. Analysis with perf showed a slightly higher percentage of 
> > L1-icache-load-misses. Other workloads were examined as well, with
> > varied
> > results. Some workloads performed significantly worse under
> > FGKASLR, while
> > others stayed the same or were mysteriously better. In general, it
> > will
> > depend on the code flow whether or not finer grained KASLR will
> > impact
> > your workload, and how the underlying code was designed. Because
> > the layout
> > changes per boot, each time a system is rebooted the performance of
> > a workload
> > may change.
> > 
> > Future work could identify hot areas that may not be randomized and
> > either
> > leave them in the .text section or group them together into a
> > single section
> > that may be randomized. If grouping things together helps, one
> > other thing to
> > consider is that if we could identify text blobs that should be
> > grouped together
> > to benefit a particular code flow, it could be interesting to
> > explore
> > whether this security feature could be also be used as a
> > performance
> > feature if you are interested in optimizing your kernel layout for
> > a
> > particular workload at boot time. Optimizing function layout for a
> > particular
> > workload has been researched and proven effective - for more
> > information
> > read the Facebook paper "Optimizing Function Placement for Large-
> > Scale
> > Data-Center Applications" (see references section below).
> > 
> > Image Size
> > ----------
> > Adding additional section headers as a result of compiling with
> > -ffunction-sections will increase the size of the vmlinux ELF file.
> > With a standard distro config, the resulting vmlinux was increased
> > by
> > about 3%. The compressed image is also increased due to the header
> > files,
> > as well as the extra relocations that must be added. You can expect
> > fgkaslr
> > to increase the size of the compressed image by about 15%.
> > 
> > Memory Usage
> > ------------
> > fgkaslr increases the amount of heap that is required at boot time,
> > although this extra memory is released when the kernel has finished
> > decompression. As a result, it may not be appropriate to use this
> > feature on
> > systems without much memory.
> > 
> > Building
> > --------
> > To enable fine grained KASLR, you need to have the following config
> > options
> > set (including all the ones you would use to build normal KASLR)
> > 
> > CONFIG_FG_KASLR=y
> > 
> > In addition, fgkaslr is only supported for the X86_64 architecture.
> > 
> > Modules
> > -------
> > Modules are randomized similarly to the rest of the kernel by
> > shuffling
> > the sections at load time prior to moving them into memory. The
> > module must
> > also have been build with the -ffunction-sections compiler option.
> > 
> > Although fgkaslr for the kernel is only supported for the X86_64
> > architecture,
> > it is possible to use fgkaslr with modules on other architectures.
> > To enable
> > this feature, select
> > 
> > CONFIG_MODULE_FG_KASLR=y
> > 
> > This option is selected automatically for X86_64 when
> > CONFIG_FG_KASLR is set.
> > 
> > Disabling
> > ---------
> > Disabling normal KASLR using the nokaslr command line option also
> > disables
> > fgkaslr. It is also possible to disable fgkaslr separately by
> > booting with
> > fgkaslr=off on the commandline.
> > 
> > References
> > ----------
> > There are a lot of academic papers which explore finer grained
> > ASLR.
> > This paper in particular contributed the most to my implementation
> > design
> > as well as my overall understanding of the problem space:
> > 
> > Selfrando: Securing the Tor Browser against De-anonymization
> > Exploits,
> > M. Conti, S. Crane, T. Frassetto, et al.
> > 
> > For more information on how function layout impacts performance,
> > see:
> > 
> > Optimizing Function Placement for Large-Scale Data-Center
> > Applications,
> > G. Ottoni, B. Maher
> > 
> > Kees Cook (2):
> >   x86/boot: Allow a "silent" kaslr random byte fetch
> >   x86/boot/compressed: Avoid duplicate malloc() implementations
> > 
> > Kristen Carlson Accardi (8):
> >   objtool: Do not assume order of parent/child functions
> >   x86: tools/relocs: Support >64K section headers
> >   x86: Makefile: Add build and config option for CONFIG_FG_KASLR
> >   x86: Make sure _etext includes function sections
> >   x86/tools: Add relative relocs for randomized functions
> >   x86: Add support for function granular KASLR
> >   kallsyms: Hide layout
> >   module: Reorder functions
> > 
> >  .../admin-guide/kernel-parameters.txt         |   7 +
> >  Documentation/security/fgkaslr.rst            | 172 ++++
> >  Documentation/security/index.rst              |   1 +
> >  Makefile                                      |   6 +-
> >  arch/x86/Kconfig                              |   4 +
> >  arch/x86/Makefile                             |   5 +
> >  arch/x86/boot/compressed/Makefile             |   9 +-
> >  arch/x86/boot/compressed/fgkaslr.c            | 811
> > ++++++++++++++++++
> >  arch/x86/boot/compressed/kaslr.c              |   4 -
> >  arch/x86/boot/compressed/misc.c               | 157 +++-
> >  arch/x86/boot/compressed/misc.h               |  30 +
> >  arch/x86/boot/compressed/utils.c              |  11 +
> >  arch/x86/boot/compressed/vmlinux.symbols      |  17 +
> >  arch/x86/include/asm/boot.h                   |  15 +-
> >  arch/x86/kernel/vmlinux.lds.S                 |  17 +-
> >  arch/x86/lib/kaslr.c                          |  18 +-
> >  arch/x86/tools/relocs.c                       | 143 ++-
> >  arch/x86/tools/relocs.h                       |   4 +-
> >  arch/x86/tools/relocs_common.c                |  15 +-
> >  include/asm-generic/vmlinux.lds.h             |  18 +-
> >  include/linux/decompress/mm.h                 |  12 +-
> >  include/uapi/linux/elf.h                      |   1 +
> >  init/Kconfig                                  |  26 +
> >  kernel/kallsyms.c                             | 163 +++-
> >  kernel/module.c                               |  81 ++
> >  tools/objtool/elf.c                           |   8 +-
> >  26 files changed, 1670 insertions(+), 85 deletions(-)
> >  create mode 100644 Documentation/security/fgkaslr.rst
> >  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
> >  create mode 100644 arch/x86/boot/compressed/utils.c
> >  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> > 
> > 
> > base-commit: 11ba468877bb23f28956a35e896356252d63c983
> > -- 
> > 2.20.1
> > 
> 
> Apologies in advance if this has already been discussed elsewhere,
> but I
> did finally get around to testing the patchset against the
> livepatching
> kselftests.
> 
> The livepatching kselftests fail as all livepatches stall their
> transitions.  It appears that reliable (ORC) stack unwinding is
> broken
> when fgkaslr is enabled.
> 
> Relevant config options:
> 
>   CONFIG_ARCH_HAS_FG_KASLR=y
>   CONFIG_ARCH_STACKWALK=y
>   CONFIG_FG_KASLR=y
>   CONFIG_HAVE_LIVEPATCH=y
>   CONFIG_HAVE_RELIABLE_STACKTRACE=y
>   CONFIG_LIVEPATCH=y
>   CONFIG_MODULE_FG_KASLR=y
>   CONFIG_TEST_LIVEPATCH=m
>   CONFIG_UNWINDER_ORC=y
> 
> The livepatch transitions are stuck along this call path:
> 
>   klp_check_stack
>     stack_trace_save_tsk_reliable
>       arch_stack_walk_reliable
>   
>           /* Check for stack corruption */
>           if (unwind_error(&state))
>                   return -EINVAL;
> 
> where the unwinder error is set by unwind_next_frame():
> 
>   arch/x86/kernel/unwind_orc.c
>   bool unwind_next_frame(struct unwind_state *state)
>  
> sometimes here:
>  
>   	/* End-of-stack check for kernel threads: */
>   	if (orc->sp_reg == ORC_REG_UNDEFINED) {
>   		if (!orc->end)
>   			goto err;
>   
>   		goto the_end;
>   	}
> 
> or here:
> 
>   	/* Prevent a recursive loop due to bad ORC data:
> */                                                                   
>              
>   	if (state->stack_info.type == prev_type
> &&                                                                   
>                       
>   	    on_stack(&state->stack_info, (void *)state->sp,
> sizeof(long))
> &&                                                               
>   	    state->sp <= prev_sp)
> {                                                                    
>                                     
>   		orc_warn_current("stack going in the wrong direction?
> at %pB\n",                                                           
>   				 (void
> *)orig_ip);                                                          
>                                
>   		goto
> err;                                                                 
>                                                  
>   	}
> 
> (and probably other places the ORC unwinder gets confused.)
> 
> 
> It also manifests itself in other, more visible ways.  For example, a
> kernel module that calls dump_stack() in its init function or even
> /proc/<pid>/stack:
> 
> (fgkaslr on)
> ------------
> 
> Call Trace:
>  ? dump_stack+0x57/0x73
>  ? 0xffffffffc0850000
>  ? mymodule_init+0xa/0x1000 [dumpstack]
>  ? do_one_initcall+0x46/0x1f0
>  ? free_unref_page_commit+0x91/0x100
>  ? _cond_resched+0x15/0x30
>  ? kmem_cache_alloc_trace+0x14b/0x210
>  ? do_init_module+0x5a/0x220
>  ? load_module+0x1912/0x1b20
>  ? __do_sys_finit_module+0xa8/0x110
>  ? __do_sys_finit_module+0xa8/0x110
>  ? do_syscall_64+0x47/0x80
>  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> % sudo cat /proc/$$/stack
> [<0>] do_wait+0x1c3/0x230
> [<0>] kernel_wait4+0xa6/0x140
> 
> 
> fgkaslr=off
> -----------
> 
> Call Trace:
>  dump_stack+0x57/0x73
>  ? 0xffffffffc04f2000
>  mymodule_init+0xa/0x1000 [readonly]
>  do_one_initcall+0x46/0x1f0
>  ? free_unref_page_commit+0x91/0x100
>  ? _cond_resched+0x15/0x30
>  ? kmem_cache_alloc_trace+0x14b/0x210
>  do_init_module+0x5a/0x220
>  load_module+0x1912/0x1b20
>  ? __do_sys_finit_module+0xa8/0x110
>  __do_sys_finit_module+0xa8/0x110
>  do_syscall_64+0x47/0x80
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> % sudo cat /proc/$$/stack
> [<0>] do_wait+0x1c3/0x230
> [<0>] kernel_wait4+0xa6/0x140
> [<0>] __do_sys_wait4+0x83/0x90
> [<0>] do_syscall_64+0x47/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> 
> I would think fixing and verifying these latter cases would be easier
> than
> chasing livepatch transitions (but would still probably fix klp case,
> too).
> Perhaps Josh or someone has other ORC unwinder tests that could be
> used?
> 
> -- Joe
> 

Hi Joe,
Thanks for testing. Yes, Josh and I have been discussing the orc_unwind
issues. I've root caused one issue already, in that objtool places an
orc_unwind_ip address just outside the section, so my algorithm fails
to relocate this address. There are other issues as well that I still
haven't root caused. I'll be addressing this in v5 and plan to have
something that passes livepatch testing with that version.

Kristen


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-07 16:38   ` Kristen Carlson Accardi
@ 2020-08-07 17:20     ` Kees Cook
  2020-08-10 16:10       ` Kristen Carlson Accardi
  0 siblings, 1 reply; 46+ messages in thread
From: Kees Cook @ 2020-08-07 17:20 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Joe Lawrence, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching

On Fri, Aug 07, 2020 at 09:38:11AM -0700, Kristen Carlson Accardi wrote:
> Thanks for testing. Yes, Josh and I have been discussing the orc_unwind
> issues. I've root caused one issue already, in that objtool places an
> orc_unwind_ip address just outside the section, so my algorithm fails
> to relocate this address. There are other issues as well that I still
> haven't root caused. I'll be addressing this in v5 and plan to have
> something that passes livepatch testing with that version.

FWIW, I'm okay with seeing fgkaslr be developed progressively. Getting
it working with !livepatching would be fine as a first step. There's
value in getting the general behavior landed, and then continuing to
improve it.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-07 17:20     ` Kees Cook
@ 2020-08-10 16:10       ` Kristen Carlson Accardi
  0 siblings, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-08-10 16:10 UTC (permalink / raw)
  To: Kees Cook
  Cc: Joe Lawrence, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching

On Fri, 2020-08-07 at 10:20 -0700, Kees Cook wrote:
> On Fri, Aug 07, 2020 at 09:38:11AM -0700, Kristen Carlson Accardi
> wrote:
> > Thanks for testing. Yes, Josh and I have been discussing the
> > orc_unwind
> > issues. I've root caused one issue already, in that objtool places
> > an
> > orc_unwind_ip address just outside the section, so my algorithm
> > fails
> > to relocate this address. There are other issues as well that I
> > still
> > haven't root caused. I'll be addressing this in v5 and plan to have
> > something that passes livepatch testing with that version.
> 
> FWIW, I'm okay with seeing fgkaslr be developed progressively.
> Getting
> it working with !livepatching would be fine as a first step. There's
> value in getting the general behavior landed, and then continuing to
> improve it.
> 

In this case, part of the issue with livepatching appears to be a more
general issue with objtool and how it creates the orc unwind entries
when you have >64K sections. So livepatching is a good test case for
making sure that the orc tables are actually correct. However, the
other issue with livepatching (the duplicate symbols), might be worth
deferring if the solution is complex - I will keep that in mind as I
look at it more closely.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-04 18:23 ` Joe Lawrence
  2020-08-07 16:38   ` Kristen Carlson Accardi
@ 2020-08-12 17:18   ` Kristen Carlson Accardi
  1 sibling, 0 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-08-12 17:18 UTC (permalink / raw)
  To: Joe Lawrence
  Cc: keescook, tglx, mingo, bp, arjan, x86, linux-kernel,
	kernel-hardening, rick.p.edgecombe, live-patching

On Tue, 2020-08-04 at 14:23 -0400, Joe Lawrence wrote:
> On Fri, Jul 17, 2020 at 09:59:57AM -0700, Kristen Carlson Accardi
> wrote:
> > Function Granular Kernel Address Space Layout Randomization
> > (fgkaslr)
> > -----------------------------------------------------------------
> > ----
> > 
> > This patch set is an implementation of finer grained kernel address
> > space
> > randomization. It rearranges your kernel code at load time 
> > on a per-function level granularity, with only around a second
> > added to
> > boot time.
> > 
> > Changes in v4:
> > -------------
> > * dropped the patch to split out change to STATIC definition in
> >   x86/boot/compressed/misc.c and replaced with a patch authored
> >   by Kees Cook to avoid the duplicate malloc definitions
> > * Added a section to Documentation/admin-guide/kernel-
> > parameters.txt
> >   to document the fgkaslr boot option.
> > * redesigned the patch to hide the new layout when reading
> >   /proc/kallsyms. The previous implementation utilized a
> > dynamically
> >   allocated linked list to display the kernel and module symbols
> >   in alphabetical order. The new implementation uses a randomly
> >   shuffled index array to display the kernel and module symbols
> >   in a random order.
> > 
> > Changes in v3:
> > -------------
> > * Makefile changes to accommodate
> > CONFIG_LD_DEAD_CODE_DATA_ELIMINATION
> > * removal of extraneous ALIGN_PAGE from _etext changes
> > * changed variable names in x86/tools/relocs to be less confusing
> > * split out change to STATIC definition in
> > x86/boot/compressed/misc.c
> > * Updates to Documentation to make it more clear what is preserved
> > in .text
> > * much more detailed commit message for function granular KASLR
> > patch
> > * minor tweaks and changes that make for more readable code
> > * this cover letter updated slightly to add additional details
> > 
> > Changes in v2:
> > --------------
> > * Fix to address i386 build failure
> > * Allow module reordering patch to be configured separately so that
> >   arm (or other non-x86_64 arches) can take advantage of module
> > function
> >   reordering. This support has not be tested by me, but smoke
> > tested by
> >   Ard Biesheuvel <ardb@kernel.org> on arm.
> > * Fix build issue when building on arm as reported by
> >   Ard Biesheuvel <ardb@kernel.org> 
> > 
> > Patches to objtool are included because they are dependencies for
> > this
> > patchset, however they have been submitted by their maintainer
> > separately.
> > 
> > Background
> > ----------
> > KASLR was merged into the kernel with the objective of increasing
> > the
> > difficulty of code reuse attacks. Code reuse attacks reused
> > existing code
> > snippets to get around existing memory protections. They exploit
> > software bugs
> > which expose addresses of useful code snippets to control the flow
> > of
> > execution for their own nefarious purposes. KASLR moves the entire
> > kernel
> > code text as a unit at boot time in order to make addresses less
> > predictable.
> > The order of the code within the segment is unchanged - only the
> > base address
> > is shifted. There are a few shortcomings to this algorithm.
> > 
> > 1. Low Entropy - there are only so many locations the kernel can
> > fit in. This
> >    means an attacker could guess without too much trouble.
> > 2. Knowledge of a single address can reveal the offset of the base
> > address,
> >    exposing all other locations for a published/known kernel image.
> > 3. Info leaks abound.
> > 
> > Finer grained ASLR has been proposed as a way to make ASLR more
> > resistant
> > to info leaks. It is not a new concept at all, and there are many
> > variations
> > possible. Function reordering is an implementation of finer grained
> > ASLR
> > which randomizes the layout of an address space on a function level
> > granularity. We use the term "fgkaslr" in this document to refer to
> > the
> > technique of function reordering when used with KASLR, as well as
> > finer grained
> > KASLR in general.
> > 
> > Proposed Improvement
> > --------------------
> > This patch set proposes adding function reordering on top of the
> > existing
> > KASLR base address randomization. The over-arching objective is
> > incremental
> > improvement over what we already have. It is designed to work in
> > combination
> > with the existing solution. The implementation is really pretty
> > simple, and
> > there are 2 main area where changes occur:
> > 
> > * Build time
> > 
> > GCC has had an option to place functions into individual .text
> > sections for
> > many years now. This option can be used to implement function
> > reordering at
> > load time. The final compiled vmlinux retains all the section
> > headers, which
> > can be used to help find the address ranges of each function. Using
> > this
> > information and an expanded table of relocation addresses,
> > individual text
> > sections can be suffled immediately after decompression. Some data
> > tables
> > inside the kernel that have assumptions about order require re-
> > sorting
> > after being updated when applying relocations. In order to modify
> > these tables,
> > a few key symbols are excluded from the objcopy symbol stripping
> > process for
> > use after shuffling the text segments.
> > 
> > Some highlights from the build time changes to look for:
> > 
> > The top level kernel Makefile was modified to add the gcc flag if
> > it
> > is supported. Currently, I am applying this flag to everything it
> > is
> > possible to randomize. Anything that is written in C and not
> > present in a
> > special input section is randomized. The final binary segment 0
> > retains a
> > consolidated .text section, as well as all the individual .text.*
> > sections.
> > Future work could turn off this flags for selected files or even
> > entire
> > subsystems, although obviously at the cost of security.
> > 
> > The relocs tool is updated to add relative relocations. This
> > information
> > previously wasn't included because it wasn't necessary when moving
> > the
> > entire .text segment as a unit. 
> > 
> > A new file was created to contain a list of symbols that objcopy
> > should
> > keep. We use those symbols at load time as described below.
> > 
> > * Load time
> > 
> > The boot kernel was modified to parse the vmlinux elf file after
> > decompression to check for our interesting symbols that we kept,
> > and to
> > look for any .text.* sections to randomize. The consolidated .text
> > section
> > is skipped and not moved. The sections are shuffled randomly, and
> > copied
> > into memory following the .text section in a new random order. The
> > existing
> > code which updated relocation addresses was modified to account for
> > not just a fixed delta from the load address, but the offset that
> > the function
> > section was moved to. This requires inspection of each address to
> > see if
> > it was impacted by a randomization. We use a bsearch to make this
> > less
> > horrible on performance. Any tables that need to be modified with
> > new
> > addresses or resorted are updated using the symbol addresses parsed
> > from the
> > elf symbol table.
> > 
> > In order to hide our new layout, symbols reported through
> > /proc/kallsyms
> > will be displayed in a random order.
> > 
> > Security Considerations
> > -----------------------
> > The objective of this patch set is to improve a technology that is
> > already
> > merged into the kernel (KASLR). This code will not prevent all
> > attacks,
> > but should instead be considered as one of several tools that can
> > be used.
> > In particular, this code is meant to make KASLR more effective in
> > the presence
> > of info leaks.
> > 
> > How much entropy we are adding to the existing entropy of standard
> > KASLR will
> > depend on a few variables. Firstly and most obviously, the number
> > of functions
> > that are randomized matters. This implementation keeps the existing
> > .text
> > section for code that cannot be randomized - for example, because
> > it was
> > assembly code. The less sections to randomize, the less entropy. In
> > addition,
> > due to alignment (16 bytes for x86_64), the number of bits in a
> > address that
> > the attacker needs to guess is reduced, as the lower bits are
> > identical.
> > 
> > Performance Impact
> > ------------------
> > There are two areas where function reordering can impact
> > performance: boot
> > time latency, and run time performance.
> > 
> > * Boot time latency
> > This implementation of finer grained KASLR impacts the boot time of
> > the kernel
> > in several places. It requires additional parsing of the kernel ELF
> > file to
> > obtain the section headers of the sections to be randomized. It
> > calls the
> > random number generator for each section to be randomized to
> > determine that
> > section's new memory location. It copies the decompressed kernel
> > into a new
> > area of memory to avoid corruption when laying out the newly
> > randomized
> > sections. It increases the number of relocations the kernel has to
> > perform at
> > boot time vs. standard KASLR, and it also requires a lookup on each
> > address
> > that needs to be relocated to see if it was in a randomized section
> > and needs
> > to be adjusted by a new offset. Finally, it re-sorts a few data
> > tables that
> > are required to be sorted by address.
> > 
> > Booting a test VM on a modern, well appointed system showed an
> > increase in
> > latency of approximately 1 second.
> > 
> > * Run time
> > The performance impact at run-time of function reordering varies by
> > workload.
> > Using kcbench, a kernel compilation benchmark, the performance of a
> > kernel
> > build with finer grained KASLR was about 1% slower than a kernel
> > with standard
> > KASLR. Analysis with perf showed a slightly higher percentage of 
> > L1-icache-load-misses. Other workloads were examined as well, with
> > varied
> > results. Some workloads performed significantly worse under
> > FGKASLR, while
> > others stayed the same or were mysteriously better. In general, it
> > will
> > depend on the code flow whether or not finer grained KASLR will
> > impact
> > your workload, and how the underlying code was designed. Because
> > the layout
> > changes per boot, each time a system is rebooted the performance of
> > a workload
> > may change.
> > 
> > Future work could identify hot areas that may not be randomized and
> > either
> > leave them in the .text section or group them together into a
> > single section
> > that may be randomized. If grouping things together helps, one
> > other thing to
> > consider is that if we could identify text blobs that should be
> > grouped together
> > to benefit a particular code flow, it could be interesting to
> > explore
> > whether this security feature could be also be used as a
> > performance
> > feature if you are interested in optimizing your kernel layout for
> > a
> > particular workload at boot time. Optimizing function layout for a
> > particular
> > workload has been researched and proven effective - for more
> > information
> > read the Facebook paper "Optimizing Function Placement for Large-
> > Scale
> > Data-Center Applications" (see references section below).
> > 
> > Image Size
> > ----------
> > Adding additional section headers as a result of compiling with
> > -ffunction-sections will increase the size of the vmlinux ELF file.
> > With a standard distro config, the resulting vmlinux was increased
> > by
> > about 3%. The compressed image is also increased due to the header
> > files,
> > as well as the extra relocations that must be added. You can expect
> > fgkaslr
> > to increase the size of the compressed image by about 15%.
> > 
> > Memory Usage
> > ------------
> > fgkaslr increases the amount of heap that is required at boot time,
> > although this extra memory is released when the kernel has finished
> > decompression. As a result, it may not be appropriate to use this
> > feature on
> > systems without much memory.
> > 
> > Building
> > --------
> > To enable fine grained KASLR, you need to have the following config
> > options
> > set (including all the ones you would use to build normal KASLR)
> > 
> > CONFIG_FG_KASLR=y
> > 
> > In addition, fgkaslr is only supported for the X86_64 architecture.
> > 
> > Modules
> > -------
> > Modules are randomized similarly to the rest of the kernel by
> > shuffling
> > the sections at load time prior to moving them into memory. The
> > module must
> > also have been build with the -ffunction-sections compiler option.
> > 
> > Although fgkaslr for the kernel is only supported for the X86_64
> > architecture,
> > it is possible to use fgkaslr with modules on other architectures.
> > To enable
> > this feature, select
> > 
> > CONFIG_MODULE_FG_KASLR=y
> > 
> > This option is selected automatically for X86_64 when
> > CONFIG_FG_KASLR is set.
> > 
> > Disabling
> > ---------
> > Disabling normal KASLR using the nokaslr command line option also
> > disables
> > fgkaslr. It is also possible to disable fgkaslr separately by
> > booting with
> > fgkaslr=off on the commandline.
> > 
> > References
> > ----------
> > There are a lot of academic papers which explore finer grained
> > ASLR.
> > This paper in particular contributed the most to my implementation
> > design
> > as well as my overall understanding of the problem space:
> > 
> > Selfrando: Securing the Tor Browser against De-anonymization
> > Exploits,
> > M. Conti, S. Crane, T. Frassetto, et al.
> > 
> > For more information on how function layout impacts performance,
> > see:
> > 
> > Optimizing Function Placement for Large-Scale Data-Center
> > Applications,
> > G. Ottoni, B. Maher
> > 
> > Kees Cook (2):
> >   x86/boot: Allow a "silent" kaslr random byte fetch
> >   x86/boot/compressed: Avoid duplicate malloc() implementations
> > 
> > Kristen Carlson Accardi (8):
> >   objtool: Do not assume order of parent/child functions
> >   x86: tools/relocs: Support >64K section headers
> >   x86: Makefile: Add build and config option for CONFIG_FG_KASLR
> >   x86: Make sure _etext includes function sections
> >   x86/tools: Add relative relocs for randomized functions
> >   x86: Add support for function granular KASLR
> >   kallsyms: Hide layout
> >   module: Reorder functions
> > 
> >  .../admin-guide/kernel-parameters.txt         |   7 +
> >  Documentation/security/fgkaslr.rst            | 172 ++++
> >  Documentation/security/index.rst              |   1 +
> >  Makefile                                      |   6 +-
> >  arch/x86/Kconfig                              |   4 +
> >  arch/x86/Makefile                             |   5 +
> >  arch/x86/boot/compressed/Makefile             |   9 +-
> >  arch/x86/boot/compressed/fgkaslr.c            | 811
> > ++++++++++++++++++
> >  arch/x86/boot/compressed/kaslr.c              |   4 -
> >  arch/x86/boot/compressed/misc.c               | 157 +++-
> >  arch/x86/boot/compressed/misc.h               |  30 +
> >  arch/x86/boot/compressed/utils.c              |  11 +
> >  arch/x86/boot/compressed/vmlinux.symbols      |  17 +
> >  arch/x86/include/asm/boot.h                   |  15 +-
> >  arch/x86/kernel/vmlinux.lds.S                 |  17 +-
> >  arch/x86/lib/kaslr.c                          |  18 +-
> >  arch/x86/tools/relocs.c                       | 143 ++-
> >  arch/x86/tools/relocs.h                       |   4 +-
> >  arch/x86/tools/relocs_common.c                |  15 +-
> >  include/asm-generic/vmlinux.lds.h             |  18 +-
> >  include/linux/decompress/mm.h                 |  12 +-
> >  include/uapi/linux/elf.h                      |   1 +
> >  init/Kconfig                                  |  26 +
> >  kernel/kallsyms.c                             | 163 +++-
> >  kernel/module.c                               |  81 ++
> >  tools/objtool/elf.c                           |   8 +-
> >  26 files changed, 1670 insertions(+), 85 deletions(-)
> >  create mode 100644 Documentation/security/fgkaslr.rst
> >  create mode 100644 arch/x86/boot/compressed/fgkaslr.c
> >  create mode 100644 arch/x86/boot/compressed/utils.c
> >  create mode 100644 arch/x86/boot/compressed/vmlinux.symbols
> > 
> > 
> > base-commit: 11ba468877bb23f28956a35e896356252d63c983
> > -- 
> > 2.20.1
> > 
> 
> Apologies in advance if this has already been discussed elsewhere,
> but I
> did finally get around to testing the patchset against the
> livepatching
> kselftests.
> 
> The livepatching kselftests fail as all livepatches stall their
> transitions.  It appears that reliable (ORC) stack unwinding is
> broken
> when fgkaslr is enabled.
> 
> Relevant config options:
> 
>   CONFIG_ARCH_HAS_FG_KASLR=y
>   CONFIG_ARCH_STACKWALK=y
>   CONFIG_FG_KASLR=y
>   CONFIG_HAVE_LIVEPATCH=y
>   CONFIG_HAVE_RELIABLE_STACKTRACE=y
>   CONFIG_LIVEPATCH=y
>   CONFIG_MODULE_FG_KASLR=y
>   CONFIG_TEST_LIVEPATCH=m
>   CONFIG_UNWINDER_ORC=y
> 
> The livepatch transitions are stuck along this call path:
> 
>   klp_check_stack
>     stack_trace_save_tsk_reliable
>       arch_stack_walk_reliable
>   
>           /* Check for stack corruption */
>           if (unwind_error(&state))
>                   return -EINVAL;
> 
> where the unwinder error is set by unwind_next_frame():
> 
>   arch/x86/kernel/unwind_orc.c
>   bool unwind_next_frame(struct unwind_state *state)
>  
> sometimes here:
>  
>   	/* End-of-stack check for kernel threads: */
>   	if (orc->sp_reg == ORC_REG_UNDEFINED) {
>   		if (!orc->end)
>   			goto err;
>   
>   		goto the_end;
>   	}
> 
> or here:
> 
>   	/* Prevent a recursive loop due to bad ORC data:
> */                                                                   
>              
>   	if (state->stack_info.type == prev_type
> &&                                                                   
>                       
>   	    on_stack(&state->stack_info, (void *)state->sp,
> sizeof(long))
> &&                                                               
>   	    state->sp <= prev_sp)
> {                                                                    
>                                     
>   		orc_warn_current("stack going in the wrong direction?
> at %pB\n",                                                           
>   				 (void
> *)orig_ip);                                                          
>                                
>   		goto
> err;                                                                 
>                                                  
>   	}
> 
> (and probably other places the ORC unwinder gets confused.)
> 
> 
> It also manifests itself in other, more visible ways.  For example, a
> kernel module that calls dump_stack() in its init function or even
> /proc/<pid>/stack:
> 
> (fgkaslr on)
> ------------
> 
> Call Trace:
>  ? dump_stack+0x57/0x73
>  ? 0xffffffffc0850000
>  ? mymodule_init+0xa/0x1000 [dumpstack]
>  ? do_one_initcall+0x46/0x1f0
>  ? free_unref_page_commit+0x91/0x100
>  ? _cond_resched+0x15/0x30
>  ? kmem_cache_alloc_trace+0x14b/0x210
>  ? do_init_module+0x5a/0x220
>  ? load_module+0x1912/0x1b20
>  ? __do_sys_finit_module+0xa8/0x110
>  ? __do_sys_finit_module+0xa8/0x110
>  ? do_syscall_64+0x47/0x80
>  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> % sudo cat /proc/$$/stack
> [<0>] do_wait+0x1c3/0x230
> [<0>] kernel_wait4+0xa6/0x140
> 
> 
> fgkaslr=off
> -----------
> 
> Call Trace:
>  dump_stack+0x57/0x73
>  ? 0xffffffffc04f2000
>  mymodule_init+0xa/0x1000 [readonly]
>  do_one_initcall+0x46/0x1f0
>  ? free_unref_page_commit+0x91/0x100
>  ? _cond_resched+0x15/0x30
>  ? kmem_cache_alloc_trace+0x14b/0x210
>  do_init_module+0x5a/0x220
>  load_module+0x1912/0x1b20
>  ? __do_sys_finit_module+0xa8/0x110
>  __do_sys_finit_module+0xa8/0x110
>  do_syscall_64+0x47/0x80
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> % sudo cat /proc/$$/stack
> [<0>] do_wait+0x1c3/0x230
> [<0>] kernel_wait4+0xa6/0x140
> [<0>] __do_sys_wait4+0x83/0x90
> [<0>] do_syscall_64+0x47/0x80
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> 
> I would think fixing and verifying these latter cases would be easier
> than
> chasing livepatch transitions (but would still probably fix klp case,
> too).
> Perhaps Josh or someone has other ORC unwinder tests that could be
> used?
> 
> -- Joe
> 

OK, I have root caused these failures to the fact that the relocs for
the orc_unwind_ip table that I use to adjust the values of the
orc_unwind_ip table after randomization were incorrect because the
table is sorted at build time now, thus making the relocs invalid. I
can fix this either by turning off BUILDTIME_TABLE_SORT (and now the
relocs are fine), or by ignoring any relocs in the orc_unwind_ip table
and adjusting without relocs. I think it makes sense to just unset
BUILDTIME_TABLE_SORT if CONFIG_FG_KASLR and continue to rely on relocs
to work since it is a useless step anyway.



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-07-22 21:33           ` Josh Poimboeuf
@ 2020-08-21 23:02             ` Kristen Carlson Accardi
  2020-08-25 16:16               ` Joe Lawrence
  2020-08-28 10:21               ` Miroslav Benes
  0 siblings, 2 replies; 46+ messages in thread
From: Kristen Carlson Accardi @ 2020-08-21 23:02 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Kees Cook, Miroslav Benes, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching,
	Hongjiu Lu, joe.lawrence

On Wed, 2020-07-22 at 16:33 -0500, Josh Poimboeuf wrote:
> On Wed, Jul 22, 2020 at 12:56:10PM -0700, Kristen Carlson Accardi
> wrote:
> > On Wed, 2020-07-22 at 12:42 -0700, Kees Cook wrote:
> > > On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote:
> > > > On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote:
> > > > > On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes
> > > > > wrote:
> > > > > > Let me CC live-patching ML, because from a quick glance
> > > > > > this is
> > > > > > something 
> > > > > > which could impact live patching code. At least it
> > > > > > invalidates
> > > > > > assumptions 
> > > > > > which "sympos" is based on.
> > > > > 
> > > > > In a quick skim, it looks like the symbol resolution is using
> > > > > kallsyms_on_each_symbol(), so I think this is safe? What's a
> > > > > good
> > > > > selftest for live-patching?
> > > > 
> > > > The problem is duplicate symbols.  If there are two static
> > > > functions
> > > > named 'foo' then livepatch needs a way to distinguish them.
> > > > 
> > > > Our current approach to that problem is "sympos".  We rely on
> > > > the
> > > > fact
> > > > that the second foo() always comes after the first one in the
> > > > symbol
> > > > list and kallsyms.  So they're referred to as foo,1 and foo,2.
> > > 
> > > Ah. Fun. In that case, perhaps the LTO series has some solutions.
> > > I
> > > think builds with LTO end up renaming duplicate symbols like
> > > that, so
> > > it'll be back to being unique.
> > > 
> > 
> > Well, glad to hear there might be some precendence for how to solve
> > this, as I wasn't able to think of something reasonable off the top
> > of
> > my head. Are you speaking of the Clang LTO series? 
> > https://lore.kernel.org/lkml/20200624203200.78870-1-samitolvanen@google.com/
> 
> I'm not sure how LTO does it, but a few more (half-brained) ideas
> that
> could work:
> 
> 1) Add a field in kallsyms to keep track of a symbol's original
> offset
>    before randomization/re-sorting.  Livepatch could use that field
> to
>    determine the original sympos.
> 
> 2) In fgkaslr code, go through all the sections and mark the ones
> which
>    have duplicates (i.e. same name).  Then when shuffling the
> sections,
>    skip a shuffle if it involves a duplicate section.  That way all
> the
>    duplicates would retain their original sympos.
> 
> 3) Livepatch could uniquely identify symbols by some feature other
> than
>    sympos.  For example:
> 
>    Symbol/function size - obviously this would only work if
> duplicately
>    named symbols have different sizes.
> 
>    Checksum - as part of a separate feature we're also looking at
> giving
>    each function its own checksum, calculated based on its
> instruction
>    opcodes.  Though calculating checksums at runtime could be
>    complicated by IP-relative addressing.
> 
> I'm thinking #1 or #2 wouldn't be too bad.  #3 might be harder.
> 

Hi there! I was trying to find a super easy way to address this, so I
thought the best thing would be if there were a compiler or linker
switch to just eliminate any duplicate symbols at compile time for
vmlinux. I filed this question on the binutils bugzilla looking to see
if there were existing flags that might do this, but H.J. Lu went ahead
and created a new one "-z unique", that seems to do what we would need
it to do. 

https://sourceware.org/bugzilla/show_bug.cgi?id=26391

When I use this option, it renames any duplicate symbols with an
extension - for example duplicatefunc.1 or duplicatefunc.2. You could
either match on the full unique name of the specific binary you are
trying to patch, or you match the base name and use the extension to
determine original position. Do you think this solution would work? If
so, I can modify livepatch to refuse to patch on duplicated symbols if
CONFIG_FG_KASLR and when this option is merged into the tool chain I
can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching
should work in all cases. 


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-21 23:02             ` Kristen Carlson Accardi
@ 2020-08-25 16:16               ` Joe Lawrence
  2020-08-28 10:21               ` Miroslav Benes
  1 sibling, 0 replies; 46+ messages in thread
From: Joe Lawrence @ 2020-08-25 16:16 UTC (permalink / raw)
  To: Kristen Carlson Accardi, Josh Poimboeuf
  Cc: Kees Cook, Miroslav Benes, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching,
	Hongjiu Lu

On 8/21/20 7:02 PM, Kristen Carlson Accardi wrote:
> On Wed, 2020-07-22 at 16:33 -0500, Josh Poimboeuf wrote:
>> On Wed, Jul 22, 2020 at 12:56:10PM -0700, Kristen Carlson Accardi
>> wrote:
>>> On Wed, 2020-07-22 at 12:42 -0700, Kees Cook wrote:
>>>> On Wed, Jul 22, 2020 at 11:07:30AM -0500, Josh Poimboeuf wrote:
>>>>> On Wed, Jul 22, 2020 at 07:39:55AM -0700, Kees Cook wrote:
>>>>>> On Wed, Jul 22, 2020 at 11:27:30AM +0200, Miroslav Benes
>>>>>> wrote:
>>>>>>> Let me CC live-patching ML, because from a quick glance
>>>>>>> this is
>>>>>>> something
>>>>>>> which could impact live patching code. At least it
>>>>>>> invalidates
>>>>>>> assumptions
>>>>>>> which "sympos" is based on.
>>>>>>
>>>>>> In a quick skim, it looks like the symbol resolution is using
>>>>>> kallsyms_on_each_symbol(), so I think this is safe? What's a
>>>>>> good
>>>>>> selftest for live-patching?
>>>>>
>>>>> The problem is duplicate symbols.  If there are two static
>>>>> functions
>>>>> named 'foo' then livepatch needs a way to distinguish them.
>>>>>
>>>>> Our current approach to that problem is "sympos".  We rely on
>>>>> the
>>>>> fact
>>>>> that the second foo() always comes after the first one in the
>>>>> symbol
>>>>> list and kallsyms.  So they're referred to as foo,1 and foo,2.
>>>>
>>>> Ah. Fun. In that case, perhaps the LTO series has some solutions.
>>>> I
>>>> think builds with LTO end up renaming duplicate symbols like
>>>> that, so
>>>> it'll be back to being unique.
>>>>
>>>
>>> Well, glad to hear there might be some precendence for how to solve
>>> this, as I wasn't able to think of something reasonable off the top
>>> of
>>> my head. Are you speaking of the Clang LTO series?
>>> https://lore.kernel.org/lkml/20200624203200.78870-1-samitolvanen@google.com/
>>
>> I'm not sure how LTO does it, but a few more (half-brained) ideas
>> that
>> could work:
>>
>> 1) Add a field in kallsyms to keep track of a symbol's original
>> offset
>>     before randomization/re-sorting.  Livepatch could use that field
>> to
>>     determine the original sympos.
>>
>> 2) In fgkaslr code, go through all the sections and mark the ones
>> which
>>     have duplicates (i.e. same name).  Then when shuffling the
>> sections,
>>     skip a shuffle if it involves a duplicate section.  That way all
>> the
>>     duplicates would retain their original sympos.
>>
>> 3) Livepatch could uniquely identify symbols by some feature other
>> than
>>     sympos.  For example:
>>
>>     Symbol/function size - obviously this would only work if
>> duplicately
>>     named symbols have different sizes.
>>
>>     Checksum - as part of a separate feature we're also looking at
>> giving
>>     each function its own checksum, calculated based on its
>> instruction
>>     opcodes.  Though calculating checksums at runtime could be
>>     complicated by IP-relative addressing.
>>
>> I'm thinking #1 or #2 wouldn't be too bad.  #3 might be harder.
>>
> 
> Hi there! I was trying to find a super easy way to address this, so I
> thought the best thing would be if there were a compiler or linker
> switch to just eliminate any duplicate symbols at compile time for
> vmlinux. I filed this question on the binutils bugzilla looking to see
> if there were existing flags that might do this, but H.J. Lu went ahead
> and created a new one "-z unique", that seems to do what we would need
> it to do.
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=26391
> 
> When I use this option, it renames any duplicate symbols with an
> extension - for example duplicatefunc.1 or duplicatefunc.2.

I tried out H.J. Lu's branch and built some of the livepatch selftests 
with -z unique-symbol and indeed observe the following pattern:

  foo, foo.1, foo.2, etc.

for homonym symbol names.

 > You could
> either match on the full unique name of the specific binary you are
> trying to patch, or you match the base name and use the extension to
> determine original position. Do you think this solution would work? 

I think it could work for klp-relocations.

As a quick test, I was able to hack the WIP klp-convert branch [1] to 
generate klp-relocations with the following hack:

   const char *foo(void) __asm__("foo.1");

when building foo's target with -z unique-symbol.  (The target contained 
two static foo() functions.)  The asm rename trick exercised the 
klp-convert implicit conversion feature, as the symbol was now uniquely 
named and included a non-valid C symbol character.  User-defined 
klp-convert annotation support will require some refactoring, but 
shouldn't be too difficult to support as well.

> If
> so, I can modify livepatch to refuse to patch on duplicated symbols if
> CONFIG_FG_KASLR and when this option is merged into the tool chain I
> can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching
> should work in all cases.
> 

I don't have a grasp on how complicated the alternatives might be, so 
I'll let others comment on best paths forward.  I just wanted to note 
that -z unique-symbol looks like it could reasonable work well for this 
niche case.

[1] 
https://github.com/joe-lawrence/linux/tree/klp-convert-v5-expanded-v5.8 
(not modified for -z unique-symbol, but noted for reference)

-- Joe


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-21 23:02             ` Kristen Carlson Accardi
  2020-08-25 16:16               ` Joe Lawrence
@ 2020-08-28 10:21               ` Miroslav Benes
  2020-08-28 19:24                 ` Josh Poimboeuf
  1 sibling, 1 reply; 46+ messages in thread
From: Miroslav Benes @ 2020-08-28 10:21 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Josh Poimboeuf, Kees Cook, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching,
	Hongjiu Lu, joe.lawrence

Leaving Josh's proposals here for reference...

> > I'm not sure how LTO does it, but a few more (half-brained) ideas
> > that
> > could work:
> > 
> > 1) Add a field in kallsyms to keep track of a symbol's original
> > offset
> >    before randomization/re-sorting.  Livepatch could use that field
> > to
> >    determine the original sympos.
> > 
> > 2) In fgkaslr code, go through all the sections and mark the ones
> > which
> >    have duplicates (i.e. same name).  Then when shuffling the
> > sections,
> >    skip a shuffle if it involves a duplicate section.  That way all
> > the
> >    duplicates would retain their original sympos.
> > 
> > 3) Livepatch could uniquely identify symbols by some feature other
> > than
> >    sympos.  For example:
> > 
> >    Symbol/function size - obviously this would only work if
> > duplicately
> >    named symbols have different sizes.
> > 
> >    Checksum - as part of a separate feature we're also looking at
> > giving
> >    each function its own checksum, calculated based on its
> > instruction
> >    opcodes.  Though calculating checksums at runtime could be
> >    complicated by IP-relative addressing.
> > 
> > I'm thinking #1 or #2 wouldn't be too bad.  #3 might be harder.
> > 
> 
> Hi there! I was trying to find a super easy way to address this, so I
> thought the best thing would be if there were a compiler or linker
> switch to just eliminate any duplicate symbols at compile time for
> vmlinux. I filed this question on the binutils bugzilla looking to see
> if there were existing flags that might do this, but H.J. Lu went ahead
> and created a new one "-z unique", that seems to do what we would need
> it to do. 
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=26391
> 
> When I use this option, it renames any duplicate symbols with an
> extension - for example duplicatefunc.1 or duplicatefunc.2. You could
> either match on the full unique name of the specific binary you are
> trying to patch, or you match the base name and use the extension to
> determine original position. Do you think this solution would work?

Yes, I think so (thanks, Joe, for testing!).

It looks cleaner to me than the options above, but it may just be a matter 
of taste. Anyway, I'd go with full name matching, because -z unique-symbol 
would allow us to remove sympos altogether, which is appealing.

> If
> so, I can modify livepatch to refuse to patch on duplicated symbols if
> CONFIG_FG_KASLR and when this option is merged into the tool chain I
> can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching
> should work in all cases. 

Ok.

Josh, Petr, would this work for you too?

Thanks
Miroslav


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-28 10:21               ` Miroslav Benes
@ 2020-08-28 19:24                 ` Josh Poimboeuf
  2021-01-23 22:59                   ` Fangrui Song
  0 siblings, 1 reply; 46+ messages in thread
From: Josh Poimboeuf @ 2020-08-28 19:24 UTC (permalink / raw)
  To: Miroslav Benes
  Cc: Kristen Carlson Accardi, Kees Cook, tglx, mingo, bp, arjan, x86,
	linux-kernel, kernel-hardening, rick.p.edgecombe, live-patching,
	Hongjiu Lu, joe.lawrence

On Fri, Aug 28, 2020 at 12:21:13PM +0200, Miroslav Benes wrote:
> > Hi there! I was trying to find a super easy way to address this, so I
> > thought the best thing would be if there were a compiler or linker
> > switch to just eliminate any duplicate symbols at compile time for
> > vmlinux. I filed this question on the binutils bugzilla looking to see
> > if there were existing flags that might do this, but H.J. Lu went ahead
> > and created a new one "-z unique", that seems to do what we would need
> > it to do. 
> > 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=26391
> > 
> > When I use this option, it renames any duplicate symbols with an
> > extension - for example duplicatefunc.1 or duplicatefunc.2. You could
> > either match on the full unique name of the specific binary you are
> > trying to patch, or you match the base name and use the extension to
> > determine original position. Do you think this solution would work?
> 
> Yes, I think so (thanks, Joe, for testing!).
> 
> It looks cleaner to me than the options above, but it may just be a matter 
> of taste. Anyway, I'd go with full name matching, because -z unique-symbol 
> would allow us to remove sympos altogether, which is appealing.
> 
> > If
> > so, I can modify livepatch to refuse to patch on duplicated symbols if
> > CONFIG_FG_KASLR and when this option is merged into the tool chain I
> > can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching
> > should work in all cases. 
> 
> Ok.
> 
> Josh, Petr, would this work for you too?

Sounds good to me.  Kristen, thanks for finding a solution!

-- 
Josh


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2020-08-28 19:24                 ` Josh Poimboeuf
@ 2021-01-23 22:59                   ` Fangrui Song
  2021-01-25 17:21                     ` Josh Poimboeuf
  0 siblings, 1 reply; 46+ messages in thread
From: Fangrui Song @ 2021-01-23 22:59 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Miroslav Benes, Josh Poimboeuf, Kees Cook, tglx, mingo, bp,
	arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	live-patching, Hongjiu Lu, joe.lawrence

On 2020-08-28, Josh Poimboeuf wrote:
>On Fri, Aug 28, 2020 at 12:21:13PM +0200, Miroslav Benes wrote:
>> > Hi there! I was trying to find a super easy way to address this, so I
>> > thought the best thing would be if there were a compiler or linker
>> > switch to just eliminate any duplicate symbols at compile time for
>> > vmlinux. I filed this question on the binutils bugzilla looking to see
>> > if there were existing flags that might do this, but H.J. Lu went ahead
>> > and created a new one "-z unique", that seems to do what we would need
>> > it to do.
>> >
>> > https://sourceware.org/bugzilla/show_bug.cgi?id=26391
>> >
>> > When I use this option, it renames any duplicate symbols with an
>> > extension - for example duplicatefunc.1 or duplicatefunc.2. You could
>> > either match on the full unique name of the specific binary you are
>> > trying to patch, or you match the base name and use the extension to
>> > determine original position. Do you think this solution would work?
>>
>> Yes, I think so (thanks, Joe, for testing!).
>>
>> It looks cleaner to me than the options above, but it may just be a matter
>> of taste. Anyway, I'd go with full name matching, because -z unique-symbol
>> would allow us to remove sympos altogether, which is appealing.
>>
>> > If
>> > so, I can modify livepatch to refuse to patch on duplicated symbols if
>> > CONFIG_FG_KASLR and when this option is merged into the tool chain I
>> > can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching
>> > should work in all cases.
>>
>> Ok.
>>
>> Josh, Petr, would this work for you too?
>
>Sounds good to me.  Kristen, thanks for finding a solution!

(I am not subscribed. I came here via https://sourceware.org/bugzilla/show_bug.cgi?id=26391 (ld -z unique-symbol))

> This works great after randomization because it always receives the
> current address at runtime rather than relying on any kind of
> buildtime address. The issue with with the live-patching code's
> algorithm for resolving duplicate symbol names. If they request a
> symbol by name from the kernel and there are 3 symbols with the same
> name, they use the symbol's position in the built binary image to
> select the correct symbol.

If a.o, b.o and c.o define local symbol 'foo'.
By position, do you mean that

* the live-patching code uses something like (findall("foo")[0], findall("foo")[1], findall("foo")[2]) ?
* shuffling a.o/b.o/c.o will make the returned triple different

Local symbols are not required to be unique. Instead of patching the toolchain,
have you thought about making the live-patching code smarter?
(Depend on the duplicates, such a linker option can increase the link time/binary size considerably
AND I don't know in what other cases such an option will be useful)

For the following example, 

https://sourceware.org/bugzilla/show_bug.cgi?id=26822

   # RUN: split-file %s %t
   # RUN: gcc -c %t/a.s -o %t/a.o
   # RUN: gcc -c %t/b.s -o %t/b.o
   # RUN: gcc -c %t/c.s -o %t/c.o
   # RUN: ld-new %t/a.o %t/b.o %t/c.o -z unique-symbol -o %t.exe
   
   #--- a.s
   a: a.1: a.2: nop
   #--- b.s
   a: nop
   #--- c.s
   a: nop

readelf -Ws output:

Symbol table '.symtab' contains 13 entries:
    Num:    Value          Size Type    Bind   Vis      Ndx Name
      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
      1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS a.o
      2: 0000000000401000     0 NOTYPE  LOCAL  DEFAULT    1 a
      3: 0000000000401000     0 NOTYPE  LOCAL  DEFAULT    1 a.1
      4: 0000000000401000     0 NOTYPE  LOCAL  DEFAULT    1 a.2
      5: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS b.o
      6: 0000000000401001     0 NOTYPE  LOCAL  DEFAULT    1 a.1
      7: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS c.o
      8: 0000000000401002     0 NOTYPE  LOCAL  DEFAULT    1 a.2
      9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _start
     10: 0000000000402000     0 NOTYPE  GLOBAL DEFAULT    1 __bss_start
     11: 0000000000402000     0 NOTYPE  GLOBAL DEFAULT    1 _edata
     12: 0000000000402000     0 NOTYPE  GLOBAL DEFAULT    1 _end

Note that you have STT_FILE SHN_ABS symbols.
If the compiler does not produce them, they will be synthesized by GNU ld.

   https://sourceware.org/bugzilla/show_bug.cgi?id=26822
   ld.bfd copies non-STT_SECTION local symbols from input object files.  If an
   object file does not have STT_FILE symbols (no .file directive) but has
   non-STT_SECTION local symbols, ld.bfd synthesizes a STT_FILE symbol

The filenames are usually base names, so "a.o" and "a.o" in two directories will
be indistinguishable.  The live-patching code can possibly work around this by
not changing the relative order of the two "a.o".

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v4 00/10] Function Granular KASLR
  2021-01-23 22:59                   ` Fangrui Song
@ 2021-01-25 17:21                     ` Josh Poimboeuf
  0 siblings, 0 replies; 46+ messages in thread
From: Josh Poimboeuf @ 2021-01-25 17:21 UTC (permalink / raw)
  To: Fangrui Song
  Cc: Kristen Carlson Accardi, Miroslav Benes, Kees Cook, tglx, mingo,
	bp, arjan, x86, linux-kernel, kernel-hardening, rick.p.edgecombe,
	live-patching, Hongjiu Lu, joe.lawrence

On Sat, Jan 23, 2021 at 02:59:28PM -0800, Fangrui Song wrote:
> On 2020-08-28, Josh Poimboeuf wrote:
> > On Fri, Aug 28, 2020 at 12:21:13PM +0200, Miroslav Benes wrote:
> > > > Hi there! I was trying to find a super easy way to address this, so I
> > > > thought the best thing would be if there were a compiler or linker
> > > > switch to just eliminate any duplicate symbols at compile time for
> > > > vmlinux. I filed this question on the binutils bugzilla looking to see
> > > > if there were existing flags that might do this, but H.J. Lu went ahead
> > > > and created a new one "-z unique", that seems to do what we would need
> > > > it to do.
> > > >
> > > > https://sourceware.org/bugzilla/show_bug.cgi?id=26391
> > > >
> > > > When I use this option, it renames any duplicate symbols with an
> > > > extension - for example duplicatefunc.1 or duplicatefunc.2. You could
> > > > either match on the full unique name of the specific binary you are
> > > > trying to patch, or you match the base name and use the extension to
> > > > determine original position. Do you think this solution would work?
> > > 
> > > Yes, I think so (thanks, Joe, for testing!).
> > > 
> > > It looks cleaner to me than the options above, but it may just be a matter
> > > of taste. Anyway, I'd go with full name matching, because -z unique-symbol
> > > would allow us to remove sympos altogether, which is appealing.
> > > 
> > > > If
> > > > so, I can modify livepatch to refuse to patch on duplicated symbols if
> > > > CONFIG_FG_KASLR and when this option is merged into the tool chain I
> > > > can add it to KBUILD_LDFLAGS when CONFIG_FG_KASLR and livepatching
> > > > should work in all cases.
> > > 
> > > Ok.
> > > 
> > > Josh, Petr, would this work for you too?
> > 
> > Sounds good to me.  Kristen, thanks for finding a solution!
> 
> (I am not subscribed. I came here via https://sourceware.org/bugzilla/show_bug.cgi?id=26391 (ld -z unique-symbol))
> 
> > This works great after randomization because it always receives the
> > current address at runtime rather than relying on any kind of
> > buildtime address. The issue with with the live-patching code's
> > algorithm for resolving duplicate symbol names. If they request a
> > symbol by name from the kernel and there are 3 symbols with the same
> > name, they use the symbol's position in the built binary image to
> > select the correct symbol.
> 
> If a.o, b.o and c.o define local symbol 'foo'.
> By position, do you mean that
> 
> * the live-patching code uses something like (findall("foo")[0], findall("foo")[1], findall("foo")[2]) ?

Yes, it depends on their order in the symbol table of the linked binary
(vmlinux).

> * shuffling a.o/b.o/c.o will make the returned triple different

Yes, though it's actually functions that get shuffled.

> Local symbols are not required to be unique. Instead of patching the toolchain,
> have you thought about making the live-patching code smarter?

It's a possibility (more on that below).

> (Depend on the duplicates, such a linker option can increase the link time/binary size considerably

Have you tried it on vmlinux?  Just wondering what the time/size impact
would be in real-world numbers.

Duplicate symbols make up a very small percentage of all symbols in the
kernel, so I would think the binary size change (to the strtab?) would
be insignificant?

> AND I don't know in what other cases such an option will be useful)

I believe some other kernel components (tracing, kprobes, bpf) have the
same problem as livepatch with respect to disambiguating duplicate
symbols, for the purposes of tracing/debugging.  So I'm thinking it
would be a nice overall improvement to the kernel.

> For the following example,
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=26822
> 
>   # RUN: split-file %s %t
>   # RUN: gcc -c %t/a.s -o %t/a.o
>   # RUN: gcc -c %t/b.s -o %t/b.o
>   # RUN: gcc -c %t/c.s -o %t/c.o
>   # RUN: ld-new %t/a.o %t/b.o %t/c.o -z unique-symbol -o %t.exe
>   #--- a.s
>   a: a.1: a.2: nop
>   #--- b.s
>   a: nop
>   #--- c.s
>   a: nop
> 
> readelf -Ws output:
> 
> Symbol table '.symtab' contains 13 entries:
>    Num:    Value          Size Type    Bind   Vis      Ndx Name
>      0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND      1:
> 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS a.o
>      2: 0000000000401000     0 NOTYPE  LOCAL  DEFAULT    1 a
>      3: 0000000000401000     0 NOTYPE  LOCAL  DEFAULT    1 a.1
>      4: 0000000000401000     0 NOTYPE  LOCAL  DEFAULT    1 a.2
>      5: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS b.o
>      6: 0000000000401001     0 NOTYPE  LOCAL  DEFAULT    1 a.1
>      7: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS c.o
>      8: 0000000000401002     0 NOTYPE  LOCAL  DEFAULT    1 a.2
>      9: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND _start
>     10: 0000000000402000     0 NOTYPE  GLOBAL DEFAULT    1 __bss_start
>     11: 0000000000402000     0 NOTYPE  GLOBAL DEFAULT    1 _edata
>     12: 0000000000402000     0 NOTYPE  GLOBAL DEFAULT    1 _end
> 
> Note that you have STT_FILE SHN_ABS symbols.
> If the compiler does not produce them, they will be synthesized by GNU ld.
> 
>   https://sourceware.org/bugzilla/show_bug.cgi?id=26822
>   ld.bfd copies non-STT_SECTION local symbols from input object files.  If an
>   object file does not have STT_FILE symbols (no .file directive) but has
>   non-STT_SECTION local symbols, ld.bfd synthesizes a STT_FILE symbol

Right, I see what you're getting at.  As far as I can tell, there are
potentially two ways for fgkaslr to handle this:

a) shuffle files, not functions.  i.e. keep the functions' order intact
   within the STT_FILE group, shuffling the file groups themselves.

   (NOTE: this may have an additional benefit of improving i-cache
   performance, compared to the current fgkaslr implementation.)

   or

b) shuffle functions, keeping track of what file they belonged to.

Maybe Kristen could comment on the feasibility of either of these
options.  I believe the STT_FILE symbols are not currently available to
the kernel at runtime.  They would need to be made available to both
fgkaslr and livepatch code.

Overall "ld -z unique-symbol" would be much easier from a kernel
standpoint, and would benefit multiple components as I mentioned above.

> The filenames are usually base names, so "a.o" and "a.o" in two directories will
> be indistinguishable.  The live-patching code can possibly work around this by
> not changing the relative order of the two "a.o".

Right, there are some file:func duplicates so this case would indeed
need to be handled somehow.

$ readelf -s --wide vmlinux |awk '$4 == "FILE" {file=$8; next} $4 == "FUNC" {printf "%s:%s\n", file, $8}' |sort |uniq -d
bus.c:new_id_store
core.c:cmask_show
core.c:edge_show
core.c:event_show
core.c:inv_show
core.c:paravirt_read_msr
core.c:paravirt_read_msr_safe
core.c:type_show
core.c:umask_show
hid-core.c:hid_exit
hid-core.c:hid_init
inode.c:init_once
inode.c:remove_one
msr.c:msr_init
proc.c:c_next
proc.c:c_start
proc.c:c_stop
raw.c:dst_output
raw.c:raw_ioctl
route.c:dst_discard
super.c:init_once
udp.c:udp_lib_close
udp.c:udp_lib_hash
udp.c:udplite_getfrag
udplite.c:udp_lib_close
udplite.c:udp_lib_hash
udplite.c:udplite_sk_init

-- 
Josh


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2021-01-25 17:22 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-17 16:59 [PATCH v4 00/10] Function Granular KASLR Kristen Carlson Accardi
2020-07-17 16:59 ` [PATCH v4 01/10] objtool: Do not assume order of parent/child functions Kristen Carlson Accardi
2020-07-17 16:59 ` [PATCH v4 02/10] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
2020-07-17 17:00 ` [PATCH v4 03/10] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
2020-07-17 17:00 ` [PATCH v4 04/10] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
2020-07-17 17:00 ` [PATCH v4 05/10] x86: Make sure _etext includes function sections Kristen Carlson Accardi
2020-07-17 17:00 ` [PATCH v4 06/10] x86/tools: Add relative relocs for randomized functions Kristen Carlson Accardi
2020-07-17 17:00 ` [PATCH v4 07/10] x86/boot/compressed: Avoid duplicate malloc() implementations Kristen Carlson Accardi
2020-07-17 17:00 ` [PATCH v4 08/10] x86: Add support for function granular KASLR Kristen Carlson Accardi
2020-07-17 17:00 ` [PATCH v4 09/10] kallsyms: Hide layout Kristen Carlson Accardi
2020-07-20  1:25   ` Kees Cook
2020-07-20 16:59     ` Kristen Carlson Accardi
2020-07-17 17:00 ` [PATCH v4 10/10] module: Reorder functions Kristen Carlson Accardi
2020-07-28 17:29   ` Jessica Yu
2020-07-22  9:27 ` [PATCH v4 00/10] Function Granular KASLR Miroslav Benes
2020-07-22 14:39   ` Kees Cook
2020-07-22 14:51     ` Joe Lawrence
2020-07-22 14:56       ` Joe Lawrence
2020-07-22 18:24         ` Kristen Carlson Accardi
2020-07-22 16:07     ` Josh Poimboeuf
2020-07-22 19:42       ` Kees Cook
2020-07-22 19:56         ` Kristen Carlson Accardi
2020-07-22 21:33           ` Josh Poimboeuf
2020-08-21 23:02             ` Kristen Carlson Accardi
2020-08-25 16:16               ` Joe Lawrence
2020-08-28 10:21               ` Miroslav Benes
2020-08-28 19:24                 ` Josh Poimboeuf
2021-01-23 22:59                   ` Fangrui Song
2021-01-25 17:21                     ` Josh Poimboeuf
2020-08-03 11:39   ` Evgenii Shatokhin
2020-08-03 17:45     ` Kees Cook
2020-08-03 18:17       ` Joe Lawrence
2020-08-03 19:38         ` Frank Ch. Eigler
2020-08-03 20:11           ` Kees Cook
2020-08-03 21:12             ` Frank Ch. Eigler
2020-08-03 21:41               ` Kees Cook
2020-08-04  0:48                 ` Frank Ch. Eigler
2020-08-04 17:04         ` Jessica Yu
2020-08-04 18:23 ` Joe Lawrence
2020-08-07 16:38   ` Kristen Carlson Accardi
2020-08-07 17:20     ` Kees Cook
2020-08-10 16:10       ` Kristen Carlson Accardi
2020-08-12 17:18   ` Kristen Carlson Accardi
2020-08-06 15:32 ` Ingo Molnar
2020-08-06 19:24   ` Kristen Carlson Accardi
2020-08-06 19:27   ` Kees Cook

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).