Kernel-hardening archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 00/11] Finer grained kernel address space randomization
@ 2020-02-05 22:39 Kristen Carlson Accardi
  2020-02-05 22:39 ` [RFC PATCH 01/11] modpost: Support >64K sections Kristen Carlson Accardi
                   ` (10 more replies)
  0 siblings, 11 replies; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

Hello! This is an RFC for a proof of concept I've been working on to
improve KASLR by making it finer grained. I hope you will take a look at this
and give me some feedback on the general concept as well as the early
implementation. At this point in time, the code is functional, although
there is lots of room for optimization and improvement. This patchset applies
to 5.5.0-rc7

The TL;DR summary is: This patch set rearranges your kernel code at load time 
on a per-function level granularity, with only around a second added to
boot time.

Background
----------
KASLR was merged into the kernel with the objective of increasing the
difficulty of code reuse attacks. Code reuse attacks reused existing code
snippets to get around existing memory protections. They exploit software bugs
which expose addresses of useful code snippets to control the flow of
execution for their own nefarious purposes. KASLR moves the entire kernel
code text as a unit at boot time in order to make addresses less predictable.
The order of the code within the segment is unchanged - only the base address
is shifted. There are a few shortcomings to this algorithm.

1. Low Entropy - there are only so many locations the kernel can fit in. This
   means an attacker could guess without too much trouble.
2. Knowledge of a single address can reveal the offset of the base address,
   exposing all other locations for a published/known kernel image.
3. Info leaks abound.

Finer grained ASLR has been proposed as a way to make ASLR more resistant
to info leaks. It is not a new concept at all, and there are many variations
possible.

Proposed Improvement
--------------------
This patch set proposes adding function reordering on top of the existing
KASLR base address randomization. The over-arching objective is incremental
improvement over what we already have, as well as something that can be
merged and deployed with as little disruption to our existing kernel/ecosystem
as possible. It is designed to work with the existing solution, although it
can be used independently (not sure why you would do that though...). The
implementation is really pretty simple, and there are 2 main area where
changes occur:

* Build time

GCC has an option to place functions into individual .text sections. We can
use this option to implement function reordering at load time. The final
compiled vmlinux retains all the section headers, which can be used to help
us find the address ranges of each function. Using this information and
an expanded table of relocation addresses, we can shuffle the individual
text sections immediately after decompression. You are probably asking
yourself how this could possibly work given the number of tables of
addresses that exist inside the kernel today for features such as exception
handling and kprobes. Most of these tables generate relocations and
require a simple update, and some tables that have assumptions about order
require sorting after the update. In order to modify these tables, we
preserve a few key symbols from the objcopy symbol stripping process for use
after shuffling the text segments.

Some highlights from the build time changes to look for:

The top level kernel Makefile was modified to add the gcc flag if it
is supported. For this RFC, I am applying this flag to everything it is
possible to randomize. Future work could turn off this flags for selected
files or even entire subsystems, although obviously at the cost of security.

The relocs tool is updated to add relative relocations. This information
previously wasn't included because it wasn't necessary when moving the
entire .text segment as a unit. 

A new file was created to contain a list of symbols that objcopy should
keep. We use those symbols at load time as described below.

* Load time

The boot kernel was modified to parse the vmlinux elf file after
decompression to check for our interesting symbols that we kept, and to
look for any .text.* sections to randomize. We then shuffle the sections
and update any tables that need to be updated or resorted. The existing
code which updated relocation addresses was modified to account for not
just a fixed delta from the load address, but the offset that the function
section was moved to. This requires inspection of each address to see if
it was impacted by a randomization. We use a bsearch to make this less
horrible on performce.

In this patch we utilize a pseudo-random number generator in order to allow
for known seeds to be used to create repeatable images across multiple
boots. This is purely for debugging - obviously not recommended for use
in production. This pseudo-random number generator code is very much just
an experiment and not ready for merging yet. We also block access to 
/proc/kallsyms for any non-privileged user so that we don't give away our new
layout.

Does this improve security though?
----------------------------------
The objective of this patch set is to improve a technology that is already
merged into the kernel (KASLR). Obviously, this code is not a one stop
shopping place for blocking code reuse attacks, but should instead be
considered as one of several tools that can be used. In particular, this
code is meant to make KASLR more effective in the presence of info leaks.
A key point to note is that we are basically accepting that there are
many and various ways to leak address today and in the future, and rather
than assume that we can stop them all, we should find a method which makes
individual info leaks less important. Many people claim that standard KASLR
is good enough protection if the attacker does not have access to the host
machine, but for example, CVE-2019-0688 demonstrated that addresses can
be obtained even with remote attacks. So if you define a threat model in which
the kernel has W^X (executable memory is not writable), and ideally XOM
(execute only memory, neither readable nor writable), and does have info leaks,
this patch will make derandomizing the kernel considerably harder. How much
harder will depend on how much entropy we add to the existing entropy of
standard KASLR. There are some variables that determine this. Firstly and most
obviously, the number of functions you randomize matters. This implementation
keeps the existing .text section for code that cannot be randomized - for
example, because it was assembly code, or we opted out of randomization for
performance reasons. The less sections to randomize, the less entropy. In
addition, due to alignment (16 bytes for x86_64), the number of bits in a
address that the attacker needs to guess is reduced, as the lower bits are
identical. For future work, we could explore randomizing the starting
position of the function and padding with INT3s if we wanted to make the
lower bits unique. Having a XOM solution for the kernel such as the one
for VM guests on X86 platforms that is currently under discussion makes
finer grained randomization more effective against JIT-ROP and other
variations.

Other solutions
---------------
CFI is another method of mitigating code reuse attacks. CFI attempts to
prevent control flow from being diverted to gadgets (snippets of code
anywhere in the text section) by restricting calls to validated
function entry points.

* Forward Edge CFI
Common forward edge CFI implementations will check the function signature to
make sure that control flow matches the expected function signature. This
reduced the number of calls that will pass a forward edge CFI check to those
that match the original function's signature. Unfortunately, the kernel has
many functions with the same signature, so forward CFI in and of itself still
allows an attacker to potentially change to a different valid function with
the same signature.

In theory, finer grained randomization can be combined with CFI to make this
even harder, so CFI and finer grained KASLR do not need to be
competing solutions. For a kernel with both forward edge CFI and function
level randomization, an attacker would have to call to a function which not
only matched the function signature, but they would also need to take the
extra step to find the address of the new function they want to call.

In practice, I have not tested this combination, as I used standard kernel
build tools (gcc, not clang) and CFI was not an option.

* Backward edge CFI
Backward edge CFI is not available for X86 at all today, so in the case of
modifying a return address, there is no CFI based solution for today's
X86 hardware running upstream Linux.

Performance Impact
------------------
There are two areas where function reordering can impact performance: boot
time latency, and run time performance.

* Boot time latency
This implementation of finer grained KASLR impacts the boot time of the kernel
in several places. It requires additional parsing of the kernel ELF file to
obtain the section headers of the sections to be randomized. It calls the
random number generator for each section to be randomized to determine that
section's new memory location. It copies the decompressed kernel into a new
area of memory to avoid corruption when laying out the newly randomized
sections. It increases the number of relocations the kernel has to perform at
boot time vs. standard KASLR, and it also requires a lookup on each address
that needs to be relocated to see if it was in a randomized section and needs
to be adjusted by a new offset.

Booting a test VM on a modern, well appointed system showed an increase in
latency of approximately 1 second.

* Run time
The performance impact at run-time of function reordering varies by workload.
Using kcbench, a kernel compilation benchmark, the performance of a kernel
build with finer grained KASLR was about 1% slower than a kernel with standard
KASLR. Analysis with perf showed a slightly higher percentage of 
L1-icache-load-misses. Other workloads were examined as well, with varied
results. Some workloads performed significantly worse under FGKASLR, while
others stayed the same or were mysteriously better. In general, it will
depend on the code flow whether or not finer grained KASLR will impact
your workload, and how the underlying code was designed. Future work could
identify hot areas that may not be randomized and either leave them in the
.text section or group them together into a single section that may be
randomized. If grouping things together helps, one other thing to consider
is that if we could identify text blobs that should be grouped together
to benefit a particular code flow, it could be interesting to explore
whether this security feature could be also be used as a performance
feature if you are interested in optimizing your kernel layout for a
particular workload at boot time. Optimizing function layout for a particular
workload has been researched and proven effective - for more information
read the Facebook paper "Optimizing Function Placement for Large-Scale
Data-Center Applications" (see references section below).

Image Size
----------
Adding additional section headers as a result of compiling with
-ffunction-sections will increase the size of the vmlinux ELF file. In
addition, the vmlinux.bin file generated in arch/x86/boot/compressed by
objcopy grows significantly with the current POC implementation. This is
because the boot heap size must be dramatically increased to support shuffling
the sections and re-sorting kallsyms. With a sample kernel compilation using a
stock Fedora config, bzImage grew about 7.5X when CONFIG_FG_KASLR was enabled.
This is because the boot heap area is included in the image itself.

It is possible to mitigate this issue by moving the boot heap out of .bss.
Kees Cook has a prototype of this working, and it is included in this
patchset. 

Building
--------
To enable fine grained KASLR in the prototype code, you need to have the
following config options set (including all the ones you would use to build
normal KASLR)

CONFIG_FG_KASLR=y

Modules
-------
The same technique can be applied to modules very easily. This patchset
demonstrates how you can randomize modules at load time by shuffling
the sections prior to moving them into memory. The module code has
some TODOs left in it and has been tested less than the kernel code.

References
----------
There are a lot of academic papers which explore finer grained ASLR and
CFI. This paper in particular contributed the most to my implementation design
as well as my overall understanding of the problem space:

Selfrando: Securing the Tor Browser against De-anonymization Exploits,
M. Conti, S. Crane, T. Frassetto, et al.

For more information on how function layout impacts performance, see:

Optimizing Function Placement for Large-Scale Data-Center Applications,
G. Ottoni, B. Maher

Kees Cook (3):
  x86/boot: Allow a "silent" kaslr random byte fetch
  x86/boot/KASLR: Introduce PRNG for faster shuffling
  x86/boot: Move "boot heap" out of .bss

Kristen Carlson Accardi (8):
  modpost: Support >64K sections
  x86: tools/relocs: Support >64K section headers
  x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  x86: make sure _etext includes function sections
  x86/tools: Adding relative relocs for randomized functions
  x86: Add support for finer grained KASLR
  kallsyms: hide layout and expose seed
  module: Reorder functions

 Makefile                                 |   4 +
 arch/x86/Kconfig                         |  13 +
 arch/x86/boot/compressed/Makefile        |   8 +-
 arch/x86/boot/compressed/fgkaslr.c       | 751 +++++++++++++++++++++++
 arch/x86/boot/compressed/head_32.S       |   5 +-
 arch/x86/boot/compressed/head_64.S       |   7 +-
 arch/x86/boot/compressed/kaslr.c         |   6 +-
 arch/x86/boot/compressed/misc.c          | 109 +++-
 arch/x86/boot/compressed/misc.h          |  26 +
 arch/x86/boot/compressed/vmlinux.lds.S   |   1 +
 arch/x86/boot/compressed/vmlinux.symbols |  15 +
 arch/x86/include/asm/boot.h              |  15 +-
 arch/x86/include/asm/kaslr.h             |   4 +-
 arch/x86/kernel/vmlinux.lds.S            |  18 +-
 arch/x86/lib/kaslr.c                     |  83 ++-
 arch/x86/mm/init.c                       |   2 +-
 arch/x86/mm/kaslr.c                      |   2 +-
 arch/x86/tools/relocs.c                  | 143 ++++-
 arch/x86/tools/relocs.h                  |   4 +-
 arch/x86/tools/relocs_common.c           |  15 +-
 include/asm-generic/vmlinux.lds.h        |   2 +-
 kernel/kallsyms.c                        |  30 +-
 kernel/module.c                          |  85 +++
 scripts/kallsyms.c                       |  14 +-
 scripts/link-vmlinux.sh                  |   4 +
 scripts/mod/modpost.c                    |  16 +-
 26 files changed, 1312 insertions(+), 70 deletions(-)
 create mode 100644 arch/x86/boot/compressed/fgkaslr.c
 create mode 100644 arch/x86/boot/compressed/vmlinux.symbols

-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 01/11] modpost: Support >64K sections
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06 12:38   ` Kees Cook
  2020-02-05 22:39 ` [RFC PATCH 02/11] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

According to the ELF specification, if the value of st_shndx
contains SH_XINDEX, the actual section header index is too
large to fit in the st_shndx field and you should use the
value out of the SHT_SYMTAB_SHNDX section instead. This table
was already being parsed and saved into symtab_shndx_start, however
it was not being used, causing segfaults when the number of
sections is greater than 64K. Check the st_shndx field for
SHN_XINDEX prior to using.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 scripts/mod/modpost.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 6e892c93d104..5ce7e9dc2f04 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -305,9 +305,23 @@ static const char *sec_name(struct elf_info *elf, int secindex)
 	return sech_name(elf, &elf->sechdrs[secindex]);
 }
 
+static int sym_index(const Elf_Sym *sym, const struct elf_info *info)
+{
+	unsigned long offset;
+	int index;
+
+	if (sym->st_shndx != SHN_XINDEX)
+		return sym->st_shndx;
+
+	offset = (unsigned long)sym - (unsigned long)info->symtab_start;
+	index = offset/(sizeof(*sym));
+
+	return TO_NATIVE(info->symtab_shndx_start[index]);
+}
+
 static void *sym_get_data(const struct elf_info *info, const Elf_Sym *sym)
 {
-	Elf_Shdr *sechdr = &info->sechdrs[sym->st_shndx];
+	Elf_Shdr *sechdr = &info->sechdrs[sym_index(sym, info)];
 	unsigned long offset;
 
 	offset = sym->st_value;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 02/11] x86: tools/relocs: Support >64K section headers
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
  2020-02-05 22:39 ` [RFC PATCH 01/11] modpost: Support >64K sections Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06 12:39   ` Kees Cook
  2020-02-05 22:39 ` [RFC PATCH 03/11] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

While it is already supported to find the total number of section
headers if we exceed 64K sections, we need to support the
extended symbol table to get section header indexes for symbols
when there are > 64K sections. Parse the elf file to read
the extended symbol table info, and then replace all direct references
to st_shndx with calls to sym_index(), which will determine whether
we can read the value directly or whether we need to pull it out of
the extended table.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/tools/relocs.c | 104 ++++++++++++++++++++++++++++++----------
 1 file changed, 78 insertions(+), 26 deletions(-)

diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index ce7188cbdae5..a00dc133f109 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -14,6 +14,10 @@
 static Elf_Ehdr		ehdr;
 static unsigned long	shnum;
 static unsigned int	shstrndx;
+static unsigned int	shsymtabndx;
+static unsigned int	shxsymtabndx;
+
+static int sym_index(Elf_Sym *sym);
 
 struct relocs {
 	uint32_t	*offset;
@@ -32,6 +36,7 @@ struct section {
 	Elf_Shdr       shdr;
 	struct section *link;
 	Elf_Sym        *symtab;
+	Elf32_Word     *xsymtab;
 	Elf_Rel        *reltab;
 	char           *strtab;
 };
@@ -265,7 +270,7 @@ static const char *sym_name(const char *sym_strtab, Elf_Sym *sym)
 		name = sym_strtab + sym->st_name;
 	}
 	else {
-		name = sec_name(sym->st_shndx);
+		name = sec_name(sym_index(sym));
 	}
 	return name;
 }
@@ -335,6 +340,23 @@ static uint64_t elf64_to_cpu(uint64_t val)
 #define elf_xword_to_cpu(x)	elf32_to_cpu(x)
 #endif
 
+static int sym_index(Elf_Sym *sym)
+{
+	Elf_Sym *symtab = secs[shsymtabndx].symtab;
+	Elf32_Word *xsymtab = secs[shxsymtabndx].xsymtab;
+	unsigned long offset;
+	int index;
+
+	if (sym->st_shndx != SHN_XINDEX)
+		return sym->st_shndx;
+
+	/* calculate offset of sym from head of table. */
+	offset = (unsigned long) sym - (unsigned long) symtab;
+	index = offset/sizeof(*sym);
+
+	return elf32_to_cpu(xsymtab[index]);
+}
+
 static void read_ehdr(FILE *fp)
 {
 	if (fread(&ehdr, sizeof(ehdr), 1, fp) != 1) {
@@ -468,31 +490,60 @@ static void read_strtabs(FILE *fp)
 static void read_symtabs(FILE *fp)
 {
 	int i,j;
+
 	for (i = 0; i < shnum; i++) {
 		struct section *sec = &secs[i];
-		if (sec->shdr.sh_type != SHT_SYMTAB) {
+		int num_syms;
+
+		switch (sec->shdr.sh_type) {
+		case SHT_SYMTAB_SHNDX:
+			sec->xsymtab = malloc(sec->shdr.sh_size);
+			if (!sec->xsymtab) {
+				die("malloc of %d bytes for xsymtab failed\n",
+					sec->shdr.sh_size);
+			}
+			if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+				die("Seek to %d failed: %s\n",
+					sec->shdr.sh_offset, strerror(errno));
+			}
+			if (fread(sec->xsymtab, 1, sec->shdr.sh_size, fp)
+					!= sec->shdr.sh_size) {
+				die("Cannot read extended symbol table: %s\n",
+					strerror(errno));
+			}
+			shxsymtabndx = i;
+			continue;
+
+		case SHT_SYMTAB:
+			num_syms = sec->shdr.sh_size/sizeof(Elf_Sym);
+
+			sec->symtab = malloc(sec->shdr.sh_size);
+			if (!sec->symtab) {
+				die("malloc of %d bytes for symtab failed\n",
+					sec->shdr.sh_size);
+			}
+			if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
+				die("Seek to %d failed: %s\n",
+					sec->shdr.sh_offset, strerror(errno));
+			}
+			if (fread(sec->symtab, 1, sec->shdr.sh_size, fp)
+					!= sec->shdr.sh_size) {
+				die("Cannot read symbol table: %s\n",
+					strerror(errno));
+			}
+			for (j = 0; j < num_syms; j++) {
+				Elf_Sym *sym = &sec->symtab[j];
+
+				sym->st_name  = elf_word_to_cpu(sym->st_name);
+				sym->st_value = elf_addr_to_cpu(sym->st_value);
+				sym->st_size  = elf_xword_to_cpu(sym->st_size);
+				sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
+			}
+			shsymtabndx = i;
+			continue;
+
+		default:
 			continue;
-		}
-		sec->symtab = malloc(sec->shdr.sh_size);
-		if (!sec->symtab) {
-			die("malloc of %d bytes for symtab failed\n",
-				sec->shdr.sh_size);
-		}
-		if (fseek(fp, sec->shdr.sh_offset, SEEK_SET) < 0) {
-			die("Seek to %d failed: %s\n",
-				sec->shdr.sh_offset, strerror(errno));
-		}
-		if (fread(sec->symtab, 1, sec->shdr.sh_size, fp)
-		    != sec->shdr.sh_size) {
-			die("Cannot read symbol table: %s\n",
-				strerror(errno));
-		}
-		for (j = 0; j < sec->shdr.sh_size/sizeof(Elf_Sym); j++) {
-			Elf_Sym *sym = &sec->symtab[j];
-			sym->st_name  = elf_word_to_cpu(sym->st_name);
-			sym->st_value = elf_addr_to_cpu(sym->st_value);
-			sym->st_size  = elf_xword_to_cpu(sym->st_size);
-			sym->st_shndx = elf_half_to_cpu(sym->st_shndx);
 		}
 	}
 }
@@ -759,13 +810,14 @@ static void percpu_init(void)
  */
 static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 {
-	return (sym->st_shndx == per_cpu_shndx) &&
+	int shndx = sym_index(sym);
+
+	return (shndx == per_cpu_shndx) &&
 		strcmp(symname, "__init_begin") &&
 		strcmp(symname, "__per_cpu_load") &&
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
-
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
 {
@@ -1088,7 +1140,7 @@ static int do_reloc_info(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		sec_name(sec->shdr.sh_info),
 		rel_type(ELF_R_TYPE(rel->r_info)),
 		symname,
-		sec_name(sym->st_shndx));
+		sec_name(sym_index(sym)));
 	return 0;
 }
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 03/11] x86/boot: Allow a "silent" kaslr random byte fetch
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
  2020-02-05 22:39 ` [RFC PATCH 01/11] modpost: Support >64K sections Kristen Carlson Accardi
  2020-02-05 22:39 ` [RFC PATCH 02/11] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06  1:08   ` Andy Lutomirski
  2020-02-05 22:39 ` [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling Kristen Carlson Accardi
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

From: Kees Cook <keescook@chromium.org>

Under earlyprintk, each RNG call produces a debug report line. When
shuffling hundreds of functions, this is not useful information (each
line is identical and tells us nothing new). Instead, allow for a NULL
"purpose" to suppress the debug reporting.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/lib/kaslr.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index a53665116458..2b3eb8c948a3 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -56,11 +56,14 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	unsigned long raw, random = get_boot_seed();
 	bool use_i8254 = true;
 
-	debug_putstr(purpose);
-	debug_putstr(" KASLR using");
+	if (purpose) {
+		debug_putstr(purpose);
+		debug_putstr(" KASLR using");
+	}
 
 	if (has_cpuflag(X86_FEATURE_RDRAND)) {
-		debug_putstr(" RDRAND");
+		if (purpose)
+			debug_putstr(" RDRAND");
 		if (rdrand_long(&raw)) {
 			random ^= raw;
 			use_i8254 = false;
@@ -68,7 +71,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	}
 
 	if (has_cpuflag(X86_FEATURE_TSC)) {
-		debug_putstr(" RDTSC");
+		if (purpose)
+			debug_putstr(" RDTSC");
 		raw = rdtsc();
 
 		random ^= raw;
@@ -76,7 +80,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	}
 
 	if (use_i8254) {
-		debug_putstr(" i8254");
+		if (purpose)
+			debug_putstr(" i8254");
 		random ^= i8254();
 	}
 
@@ -86,7 +91,8 @@ unsigned long kaslr_get_random_long(const char *purpose)
 	    : "a" (random), "rm" (mix_const));
 	random += raw;
 
-	debug_putstr("...\n");
+	if (purpose)
+		debug_putstr("...\n");
 
 	return random;
 }
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
                   ` (2 preceding siblings ...)
  2020-02-05 22:39 ` [RFC PATCH 03/11] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06  1:11   ` Andy Lutomirski
  2020-02-06 15:10   ` Jason A. Donenfeld
  2020-02-05 22:39 ` [RFC PATCH 05/11] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
                   ` (6 subsequent siblings)
  10 siblings, 2 replies; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

From: Kees Cook <keescook@chromium.org>

This *might* improve shuffling speed at boot. Possibly only marginally.
This has not yet been tested, and would need to have some performance
tests run to determine if it helps before merging.

(notes from Kristen) - initial performance tests suggest that any
improvement is indeed marginal. However, this code is useful
for using a known seed.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/boot/compressed/kaslr.c |  4 +--
 arch/x86/include/asm/kaslr.h     |  3 +-
 arch/x86/lib/kaslr.c             | 50 +++++++++++++++++++++++++++++++-
 arch/x86/mm/init.c               |  2 +-
 arch/x86/mm/kaslr.c              |  2 +-
 5 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index d7408af55738..ae4dce76a9bd 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -598,7 +598,7 @@ static unsigned long slots_fetch_random(void)
 	if (slot_max == 0)
 		return 0;
 
-	slot = kaslr_get_random_long("Physical") % slot_max;
+	slot = kaslr_get_random_seed("Physical") % slot_max;
 
 	for (i = 0; i < slot_area_index; i++) {
 		if (slot >= slot_areas[i].num) {
@@ -880,7 +880,7 @@ static unsigned long find_random_virt_addr(unsigned long minimum,
 	slots = (KERNEL_IMAGE_SIZE - minimum - image_size) /
 		 CONFIG_PHYSICAL_ALIGN + 1;
 
-	random_addr = kaslr_get_random_long("Virtual") % slots;
+	random_addr = kaslr_get_random_seed("Virtual") % slots;
 
 	return random_addr * CONFIG_PHYSICAL_ALIGN + minimum;
 }
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index db7ba2feb947..47d5b25e5b11 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -2,7 +2,8 @@
 #ifndef _ASM_KASLR_H_
 #define _ASM_KASLR_H_
 
-unsigned long kaslr_get_random_long(const char *purpose);
+unsigned long kaslr_get_random_seed(const char *purpose);
+unsigned long kaslr_get_prandom_long(void);
 
 #ifdef CONFIG_RANDOMIZE_MEMORY
 void kernel_randomize_memory(void);
diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index 2b3eb8c948a3..41b5610855a3 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -46,7 +46,7 @@ static inline u16 i8254(void)
 	return timer;
 }
 
-unsigned long kaslr_get_random_long(const char *purpose)
+unsigned long kaslr_get_random_seed(const char *purpose)
 {
 #ifdef CONFIG_X86_64
 	const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
@@ -96,3 +96,51 @@ unsigned long kaslr_get_random_long(const char *purpose)
 
 	return random;
 }
+
+/*
+ * 64bit variant of Bob Jenkins' public domain PRNG
+ * 256 bits of internal state
+ */
+struct prng_state {
+	u64 a, b, c, d;
+};
+
+static struct prng_state state;
+static bool initialized;
+
+#define rot(x, k) (((x)<<(k))|((x)>>(64-(k))))
+static u64 prng_u64(struct prng_state *x)
+{
+	u64 e;
+
+	e = x->a - rot(x->b, 7);
+	x->a = x->b ^ rot(x->c, 13);
+	x->b = x->c + rot(x->d, 37);
+	x->c = x->d + e;
+	x->d = e + x->a;
+
+	return x->d;
+}
+
+static void prng_init(struct prng_state *state)
+{
+	int i;
+
+	state->a = kaslr_get_random_seed(NULL);
+	state->b = kaslr_get_random_seed(NULL);
+	state->c = kaslr_get_random_seed(NULL);
+	state->d = kaslr_get_random_seed(NULL);
+
+	for (i = 0; i < 30; ++i)
+		(void)prng_u64(state);
+
+	initialized = true;
+}
+
+unsigned long kaslr_get_prandom_long(void)
+{
+	if (!initialized)
+		prng_init(&state);
+
+	return prng_u64(&state);
+}
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e7bb483557c9..c974dbab2293 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -722,7 +722,7 @@ void __init poking_init(void)
 	 */
 	poking_addr = TASK_UNMAPPED_BASE;
 	if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
-		poking_addr += (kaslr_get_random_long("Poking") & PAGE_MASK) %
+		poking_addr += (kaslr_get_random_seed("Poking") & PAGE_MASK) %
 			(TASK_SIZE - TASK_UNMAPPED_BASE - 3 * PAGE_SIZE);
 
 	if (((poking_addr + PAGE_SIZE) & ~PMD_MASK) == 0)
diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
index dc6182eecefa..b30bd1db7543 100644
--- a/arch/x86/mm/kaslr.c
+++ b/arch/x86/mm/kaslr.c
@@ -123,7 +123,7 @@ void __init kernel_randomize_memory(void)
 	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
 		remain_entropy -= get_padding(&kaslr_regions[i]);
 
-	prandom_seed_state(&rand_state, kaslr_get_random_long("Memory"));
+	prandom_seed_state(&rand_state, kaslr_get_random_seed("Memory"));
 
 	for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
 		unsigned long entropy;
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 05/11] x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
                   ` (3 preceding siblings ...)
  2020-02-05 22:39 ` [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06 10:30   ` Peter Zijlstra
  2020-02-05 22:39 ` [RFC PATCH 06/11] x86: make sure _etext includes function sections Kristen Carlson Accardi
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
the make file to build with -ffunction-sections if CONFIG_FG_KASLR

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 Makefile         |  4 ++++
 arch/x86/Kconfig | 13 +++++++++++++
 2 files changed, 17 insertions(+)

diff --git a/Makefile b/Makefile
index c50ef91f6136..41438a921666 100644
--- a/Makefile
+++ b/Makefile
@@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
 KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
 endif
 
+ifdef CONFIG_FG_KASLR
+KBUILD_CFLAGS += -ffunction-sections
+endif
+
 # arch Makefile may override CC so keep this after arch Makefile is included
 NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 5e8949953660..b4b1b17076a8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2208,6 +2208,19 @@ config RANDOMIZE_BASE
 
 	  If unsure, say Y.
 
+config FG_KASLR
+	bool "Finer grained Kernel Address Space Layout Randomization"
+	depends on $(cc-option, -ffunction-sections)
+	depends on RANDOMIZE_BASE && X86_64
+	help
+	  This option improves the randomness of the kernel text
+	  over basic Kernel Address Space Layout Randomization (KASLR)
+	  by reordering the kernel text at boot time. This feature
+	  uses information generated at compile time to re-layout the
+	  kernel text section at boot time at function level granularity.
+
+	  If unsure, say N.
+
 # Relocation on x86 needs some additional build support
 config X86_NEED_RELOCS
 	def_bool y
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
                   ` (4 preceding siblings ...)
  2020-02-05 22:39 ` [RFC PATCH 05/11] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06 12:26   ` Kees Cook
  2020-02-05 22:39 ` [RFC PATCH 07/11] x86/tools: Adding relative relocs for randomized functions Kristen Carlson Accardi
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

We will be using -ffunction-sections to place each function in
it's own text section so it can be randomized at load time. The
linker considers these .text.* sections "orphaned sections", and
will place them after the first similar section (.text). However,
we need to move _etext so that it is after both .text and .text.*
We also need to calculate text size to include .text AND .text.*

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/kernel/vmlinux.lds.S     | 18 +++++++++++++++++-
 include/asm-generic/vmlinux.lds.h |  2 +-
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 3a1a819da137..e54e9ac5b429 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -146,8 +146,24 @@ SECTIONS
 #endif
 	} :text =0xcccc
 
-	/* End of text section, which should occupy whole number of pages */
+#ifdef CONFIG_FG_KASLR
+	/*
+	 * -ffunction-sections creates .text.* sections, which are considered
+	 * "orphan sections" and added after the first similar section (.text).
+	 * Adding this ALIGN statement causes the address of _etext
+	 * to be below that of all the .text.* orphaned sections
+	 */
+	. = ALIGN(PAGE_SIZE);
+#endif
 	_etext = .;
+
+	/*
+	 * the size of the .text section is used to calculate the address
+	 * range for orc lookups. If we just use SIZEOF(.text), we will
+	 * miss all the .text.* sections. Calculate the size using _etext
+	 * and _stext and save the value for later.
+	 */
+	text_size = _etext - _stext;
 	. = ALIGN(PAGE_SIZE);
 
 	X86_ALIGN_RODATA_BEGIN
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index e00f41aa8ec4..edf19f4296e2 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -798,7 +798,7 @@
 	. = ALIGN(4);							\
 	.orc_lookup : AT(ADDR(.orc_lookup) - LOAD_OFFSET) {		\
 		orc_lookup = .;						\
-		. += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) /	\
+		. += (((text_size + LOOKUP_BLOCK_SIZE - 1) /	\
 			LOOKUP_BLOCK_SIZE) + 1) * 4;			\
 		orc_lookup_end = .;					\
 	}
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 07/11] x86/tools: Adding relative relocs for randomized functions
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
                   ` (5 preceding siblings ...)
  2020-02-05 22:39 ` [RFC PATCH 06/11] x86: make sure _etext includes function sections Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06 12:37   ` Kees Cook
  2020-02-05 22:39 ` [RFC PATCH 08/11] x86: Add support for finer grained KASLR Kristen Carlson Accardi
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

If we are randomizing function sections, we are going to need
to recalculate relative offsets for relocs that are either in
the randomized sections, or refer to the randomized sections.
Add code to detect whether a reloc satisifies these cases, and if
so, add them to the appropriate reloc list.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/boot/compressed/Makefile |  7 +++++-
 arch/x86/tools/relocs.c           | 41 ++++++++++++++++++++++++++++---
 arch/x86/tools/relocs.h           |  4 +--
 arch/x86/tools/relocs_common.c    | 15 +++++++----
 4 files changed, 55 insertions(+), 12 deletions(-)

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index 1dac210f7d44..b7e5ea757ef4 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -119,6 +119,11 @@ $(obj)/vmlinux: $(vmlinux-objs-y) FORCE
 	$(call if_changed,check-and-link-vmlinux)
 
 OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
+
+ifdef CONFIG_FG_KASLR
+	RELOCS_ARGS += --fg-kaslr
+endif
+
 $(obj)/vmlinux.bin: vmlinux FORCE
 	$(call if_changed,objcopy)
 
@@ -126,7 +131,7 @@ targets += $(patsubst $(obj)/%,%,$(vmlinux-objs-y)) vmlinux.bin.all vmlinux.relo
 
 CMD_RELOCS = arch/x86/tools/relocs
 quiet_cmd_relocs = RELOCS  $@
-      cmd_relocs = $(CMD_RELOCS) $< > $@;$(CMD_RELOCS) --abs-relocs $<
+      cmd_relocs = $(CMD_RELOCS) $(RELOCS_ARGS) $< > $@;$(CMD_RELOCS) $(RELOCS_ARGS) --abs-relocs $<
 $(obj)/vmlinux.relocs: vmlinux FORCE
 	$(call if_changed,relocs)
 
diff --git a/arch/x86/tools/relocs.c b/arch/x86/tools/relocs.c
index a00dc133f109..2641eec5f2de 100644
--- a/arch/x86/tools/relocs.c
+++ b/arch/x86/tools/relocs.c
@@ -42,6 +42,8 @@ struct section {
 };
 static struct section *secs;
 
+static int fg_kaslr;
+
 static const char * const sym_regex_kernel[S_NSYMTYPES] = {
 /*
  * Following symbols have been audited. There values are constant and do
@@ -818,6 +820,32 @@ static int is_percpu_sym(ElfW(Sym) *sym, const char *symname)
 		strncmp(symname, "init_per_cpu_", 13);
 }
 
+static int is_function_section(struct section *sec)
+{
+	const char *name;
+
+	if (!fg_kaslr)
+		return 0;
+
+	name = sec_name(sec->shdr.sh_info);
+
+	return(!strncmp(name, ".text.", 6));
+}
+
+static int is_randomized_sym(ElfW(Sym *sym))
+{
+	const char *name;
+
+	if (!fg_kaslr)
+		return 0;
+
+	if (sym->st_shndx > shnum)
+		return 0;
+
+	name = sec_name(sym_index(sym));
+	return(!strncmp(name, ".text.", 6));
+}
+
 static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 		      const char *symname)
 {
@@ -842,13 +870,17 @@ static int do_reloc64(struct section *sec, Elf_Rel *rel, ElfW(Sym) *sym,
 	case R_X86_64_PC32:
 	case R_X86_64_PLT32:
 		/*
-		 * PC relative relocations don't need to be adjusted unless
-		 * referencing a percpu symbol.
+		 * we need to keep pc relative relocations for sections which
+		 * might be randomized, and for the percpu section.
+		 * We also need to keep relocations for any offset which might
+		 * reference an address in a section which has been randomized.
 		 *
 		 * NB: R_X86_64_PLT32 can be treated as R_X86_64_PC32.
 		 */
-		if (is_percpu_sym(sym, symname))
+		if (is_function_section(sec) || is_randomized_sym(sym) ||
+		    is_percpu_sym(sym, symname))
 			add_reloc(&relocs32neg, offset);
+
 		break;
 
 	case R_X86_64_PC64:
@@ -1158,8 +1190,9 @@ static void print_reloc_info(void)
 
 void process(FILE *fp, int use_real_mode, int as_text,
 	     int show_absolute_syms, int show_absolute_relocs,
-	     int show_reloc_info)
+	     int show_reloc_info, int fgkaslr)
 {
+	fg_kaslr = fgkaslr;
 	regex_init(use_real_mode);
 	read_ehdr(fp);
 	read_shdrs(fp);
diff --git a/arch/x86/tools/relocs.h b/arch/x86/tools/relocs.h
index 43c83c0fd22c..05504052c846 100644
--- a/arch/x86/tools/relocs.h
+++ b/arch/x86/tools/relocs.h
@@ -31,8 +31,8 @@ enum symtype {
 
 void process_32(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int fg_kaslr);
 void process_64(FILE *fp, int use_real_mode, int as_text,
 		int show_absolute_syms, int show_absolute_relocs,
-		int show_reloc_info);
+		int show_reloc_info, int fg_kaslr);
 #endif /* RELOCS_H */
diff --git a/arch/x86/tools/relocs_common.c b/arch/x86/tools/relocs_common.c
index 6634352a20bc..f76cd6f53961 100644
--- a/arch/x86/tools/relocs_common.c
+++ b/arch/x86/tools/relocs_common.c
@@ -12,14 +12,14 @@ void die(char *fmt, ...)
 
 static void usage(void)
 {
-	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode]" \
-	    " vmlinux\n");
+	die("relocs [--abs-syms|--abs-relocs|--reloc-info|--text|--realmode"
+		"|--fg-kaslr] vmlinux\n");
 }
 
 int main(int argc, char **argv)
 {
 	int show_absolute_syms, show_absolute_relocs, show_reloc_info;
-	int as_text, use_real_mode;
+	int as_text, use_real_mode, fg_kaslr;
 	const char *fname;
 	FILE *fp;
 	int i;
@@ -30,6 +30,7 @@ int main(int argc, char **argv)
 	show_reloc_info = 0;
 	as_text = 0;
 	use_real_mode = 0;
+	fg_kaslr = 0;
 	fname = NULL;
 	for (i = 1; i < argc; i++) {
 		char *arg = argv[i];
@@ -54,6 +55,10 @@ int main(int argc, char **argv)
 				use_real_mode = 1;
 				continue;
 			}
+			if (strcmp(arg, "--fg-kaslr") == 0) {
+				fg_kaslr = 1;
+				continue;
+			}
 		}
 		else if (!fname) {
 			fname = arg;
@@ -75,11 +80,11 @@ int main(int argc, char **argv)
 	if (e_ident[EI_CLASS] == ELFCLASS64)
 		process_64(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, fg_kaslr);
 	else
 		process_32(fp, use_real_mode, as_text,
 			   show_absolute_syms, show_absolute_relocs,
-			   show_reloc_info);
+			   show_reloc_info, fg_kaslr);
 	fclose(fp);
 	return 0;
 }
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
                   ` (6 preceding siblings ...)
  2020-02-05 22:39 ` [RFC PATCH 07/11] x86/tools: Adding relative relocs for randomized functions Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06  1:17   ` Andy Lutomirski
  2020-02-06 10:38   ` Peter Zijlstra
  2020-02-05 22:39 ` [RFC PATCH 09/11] kallsyms: hide layout and expose seed Kristen Carlson Accardi
                   ` (2 subsequent siblings)
  10 siblings, 2 replies; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

At boot time, find all the function sections that have separate .text
sections, shuffle them, and then copy them to new locations. Adjust
any relocations accordingly.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/boot/compressed/Makefile        |   1 +
 arch/x86/boot/compressed/fgkaslr.c       | 751 +++++++++++++++++++++++
 arch/x86/boot/compressed/misc.c          | 106 +++-
 arch/x86/boot/compressed/misc.h          |  26 +
 arch/x86/boot/compressed/vmlinux.symbols |  15 +
 arch/x86/include/asm/boot.h              |  15 +-
 arch/x86/include/asm/kaslr.h             |   1 +
 arch/x86/lib/kaslr.c                     |  15 +
 scripts/kallsyms.c                       |  14 +-
 scripts/link-vmlinux.sh                  |   4 +
 10 files changed, 939 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/boot/compressed/fgkaslr.c
 create mode 100644 arch/x86/boot/compressed/vmlinux.symbols

diff --git a/arch/x86/boot/compressed/Makefile b/arch/x86/boot/compressed/Makefile
index b7e5ea757ef4..60d4c4e59c05 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -122,6 +122,7 @@ OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
 
 ifdef CONFIG_FG_KASLR
 	RELOCS_ARGS += --fg-kaslr
+	OBJCOPYFLAGS += --keep-symbols=$(obj)/vmlinux.symbols
 endif
 
 $(obj)/vmlinux.bin: vmlinux FORCE
diff --git a/arch/x86/boot/compressed/fgkaslr.c b/arch/x86/boot/compressed/fgkaslr.c
new file mode 100644
index 000000000000..fa4e15488a6e
--- /dev/null
+++ b/arch/x86/boot/compressed/fgkaslr.c
@@ -0,0 +1,751 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * fgkaslr.c
+ *
+ * This contains the routines needed to reorder the kernel text section
+ * at boot time.
+ */
+#define __DISABLE_EXPORTS
+#define _LINUX_KPROBES_H
+#define NOKPROBE_SYMBOL(fname)
+
+#include "misc.h"
+#include "error.h"
+#include "pgtable.h"
+#include "../string.h"
+#include "../voffset.h"
+#include <linux/sort.h>
+#include <linux/bsearch.h>
+#include "../../include/asm/extable.h"
+
+/* Macros used by the included decompressor code below. */
+#define STATIC		static
+
+/*
+ * Use normal definitions of mem*() from string.c. There are already
+ * included header files which expect a definition of memset() and by
+ * the time we define memset macro, it is too late.
+ */
+#undef memcpy
+#undef memset
+#define memzero(s, n)	memset((s), 0, (n))
+#define memmove		memmove
+
+/* Functions used by the included decompressor code below. */
+#include <linux/decompress/mm.h>
+void *memmove(void *dest, const void *src, size_t n);
+
+static unsigned long percpu_start;
+static unsigned long percpu_end;
+
+static long kallsyms_names;
+static long kallsyms_offsets;
+static long kallsyms_num_syms;
+static long kallsyms_relative_base;
+static long kallsyms_markers;
+static long __start___ex_table_addr;
+static long __stop___ex_table_addr;
+static long _stext;
+static long _etext;
+static long _sinittext;
+static long _einittext;
+static long fgkaslr_seed;
+
+/* addresses in mapped address space */
+static int *base;
+static u8 *names;
+static unsigned long relative_base;
+static unsigned int *markers_addr;
+static unsigned long long seed[4];
+
+struct kallsyms_name {
+	u8 len;
+	u8 indecis[256];
+};
+struct kallsyms_name *names_table;
+
+/*
+ * This is an array of pointers to sections headers for randomized sections
+ */
+Elf64_Shdr **sections;
+
+/*
+ * This is an array of all section headers, randomized or otherwise.
+ */
+Elf64_Shdr *sechdrs;
+
+/*
+ * The number of elements in the randomized section header array (sections)
+ */
+int sections_size;
+
+static void parse_prandom_seed(void)
+{
+	char optstr[128], *options;
+	unsigned long long a, b, c, d;
+
+	a = b = c = d = 0;
+
+	/*
+	 * passing the fgkaslr_seed option should only be used for
+	 * debugging since the cmdline is exposed to the user
+	 * via /proc/cmdline.
+	 */
+	if (cmdline_find_option("fgkaslr_seed", optstr, sizeof(optstr)) > 0) {
+		int retval;
+
+		options = optstr;
+
+		a = simple_strtoull(options, &options, 16);
+
+		if (options && options[0] == ',')
+			b = simple_strtoull(options+1, &options, 16);
+		if (options && options[0] == ',')
+			c = simple_strtoull(options+1, &options, 16);
+		if (options && options[0] == ',')
+			d = simple_strtoull(options+1, &options, 16);
+	}
+
+	if (a == 0)
+		a = kaslr_get_random_seed(NULL);
+	if (b == 0)
+		b = kaslr_get_random_seed(NULL);
+	if (c == 0)
+		c = kaslr_get_random_seed(NULL);
+	if (d == 0)
+		d = kaslr_get_random_seed(NULL);
+
+	prng_init_seed(a, b, c, d);
+	seed[0] = a;
+	seed[1] = b;
+	seed[2] = c;
+	seed[3] = d;
+}
+
+static bool is_text(long addr)
+{
+	if ((addr >= _stext && addr < _etext) ||
+	   (addr >= _sinittext && addr < _einittext))
+		return true;
+	return false;
+}
+
+bool is_percpu_addr(long pc, long offset)
+{
+	unsigned long ptr;
+	long address;
+
+	address = pc + offset + 4;
+
+	ptr = (unsigned long) address;
+
+	if (ptr >= percpu_start && ptr < percpu_end)
+		return true;
+
+	return false;
+}
+
+static int cmp_section_addr(const void *a, const void *b)
+{
+	unsigned long ptr = (unsigned long)a;
+	Elf64_Shdr *s = *(Elf64_Shdr **)b;
+	unsigned long end = s->sh_addr + s->sh_size;
+
+	if (ptr >= s->sh_addr && ptr < end)
+		return 0;
+
+	if (ptr < s->sh_addr)
+		return -1;
+
+	return 1;
+}
+
+/*
+ * Discover if the address is in a randomized section and if so, adjust
+ * by the saved offset.
+ */
+Elf64_Shdr *adjust_address(long *address)
+{
+	Elf64_Shdr **s;
+	Elf64_Shdr *shdr;
+
+	if (sections == NULL) {
+		debug_putstr("\nsections is null\n");
+		return NULL;
+	}
+
+	s = bsearch((const void *)*address, sections, sections_size, sizeof(*s),
+			cmp_section_addr);
+	if (s != NULL) {
+		shdr = *s;
+		*address += shdr->sh_offset;
+		return shdr;
+	}
+
+	return NULL;
+}
+
+void adjust_relative_offset(long pc, long *value, Elf64_Shdr *section)
+{
+	Elf64_Shdr *s;
+	long address;
+
+	/*
+	 * sometimes we are updating a relative offset that would
+	 * normally be relative to the next instruction (such as a call).
+	 * In this case to calculate the target, you need to add 32bits to
+	 * the pc to get the next instruction value. However, sometimes
+	 * targets are just data that was stored in a table such as ksymtab
+	 * or cpu alternatives. In this case our target is not relative to
+	 * the next instruction.
+	 */
+
+	/*
+	 * Calculate the address that this offset would call.
+	 */
+	if (!is_text(pc))
+		address = pc + *value;
+	else
+		address = pc + *value + 4;
+
+	/*
+	 * if the address is in section that was randomized,
+	 * we need to adjust the offset.
+	 */
+	s = adjust_address(&address);
+	if (s != NULL)
+		*value += s->sh_offset;
+
+	/*
+	 * If the PC that this offset was calculated for was in a section
+	 * that has been randomized, the value needs to be adjusted by the
+	 * same amount as the randomized section was adjusted from it's original
+	 * location.
+	 */
+	if (section != NULL)
+		*value -= section->sh_offset;
+
+}
+
+static void kallsyms_swp(void *a, void *b, int size)
+{
+	int idx1, idx2;
+	int temp;
+	struct kallsyms_name name_a;
+
+	/* determine our index into the array */
+	idx1 = (int *)a - base;
+	idx2 = (int *)b - base;
+	temp = base[idx1];
+	base[idx1] = base[idx2];
+	base[idx2] = temp;
+
+	/* also swap the names table */
+	memcpy(&name_a, &names_table[idx1], sizeof(name_a));
+	memcpy(&names_table[idx1], &names_table[idx2],
+		sizeof(struct kallsyms_name));
+	memcpy(&names_table[idx2], &name_a, sizeof(struct kallsyms_name));
+}
+
+static int kallsyms_cmp(const void *a, const void *b)
+{
+	int addr_a, addr_b;
+	unsigned long uaddr_a, uaddr_b;
+
+	addr_a = *(int *)a;
+	addr_b = *(int *)b;
+
+	if (addr_a >= 0)
+		uaddr_a = addr_a;
+	if (addr_b >= 0)
+		uaddr_b = addr_b;
+
+	if (addr_a < 0)
+		uaddr_a = relative_base - 1 - addr_a;
+	if (addr_b < 0)
+		uaddr_b = relative_base - 1 - addr_b;
+
+	if (uaddr_b > uaddr_a)
+		return -1;
+
+	return 0;
+}
+
+static void deal_with_names(int num_syms)
+{
+	int num_bytes;
+	int i, j;
+	int offset;
+
+
+	/* we should have num_syms kallsyms_name entries */
+	num_bytes = num_syms * sizeof(*names_table);
+	names_table = malloc(num_syms * sizeof(*names_table));
+	if (names_table == NULL) {
+		debug_putstr("\nbytes requested: ");
+		debug_puthex(num_bytes);
+		error("\nunable to allocate space for names table\n");
+	}
+
+	/* read all the names entries */
+	offset = 0;
+	for (i = 0; i < num_syms; i++) {
+		names_table[i].len = names[offset];
+		offset++;
+		for (j = 0; j < names_table[i].len; j++) {
+			names_table[i].indecis[j] = names[offset];
+			offset++;
+		}
+	}
+}
+
+static void write_sorted_names(int num_syms)
+{
+	int i, j;
+	int offset = 0;
+	unsigned int *markers;
+
+	/*
+	 * we are going to need to regenerate the markers table, which is a
+	 * table of offsets into the compressed stream every 256 symbols.
+	 * this code copied almost directly from scripts/kallsyms.c
+	 */
+	markers = malloc(sizeof(unsigned int) * ((num_syms + 255) / 256));
+	if (!markers) {
+		debug_putstr("\nfailed to allocate heap space of ");
+		debug_puthex(((num_syms + 255) / 256));
+		debug_putstr(" bytes\n");
+		error("Unable to allocate space for markers table");
+	}
+
+	for (i = 0; i < num_syms; i++) {
+		if ((i & 0xFF) == 0)
+			markers[i >> 8] = offset;
+
+		names[offset] = (u8) names_table[i].len;
+		offset++;
+		for (j = 0; j < names_table[i].len; j++) {
+			names[offset] = (u8) names_table[i].indecis[j];
+			offset++;
+		}
+	}
+
+	/* write new markers table over old one */
+	for (i = 0; i < ((num_syms + 255) >> 8); i++)
+		markers_addr[i] = markers[i];
+
+	free(markers);
+	free(names_table);
+}
+
+static void sort_kallsyms(unsigned long map)
+{
+	int num_syms;
+	int i;
+
+	debug_putstr("\nRe-sorting kallsyms ...");
+
+	num_syms = *(int *)(kallsyms_num_syms + map);
+	base = (int *)(kallsyms_offsets + map);
+	relative_base = *(unsigned long *)(kallsyms_relative_base + map);
+	markers_addr = (unsigned int *)(kallsyms_markers + map);
+	names = (u8 *)(kallsyms_names + map);
+
+	/*
+	 * the kallsyms table was generated prior to any randomization.
+	 * it is a bunch of offsets from "relative base". In order for
+	 * us to check if a symbol has an address that was in a randomized
+	 * section, we need to reconstruct the address to it's original
+	 * value prior to handle_relocations.
+	 */
+	for (i = 0; i < num_syms; i++) {
+		unsigned long addr;
+		int new_base;
+
+		/*
+		 * according to kernel/kallsyms.c, positive offsets are absolute
+		 * values and negative offsets are relative to the base.
+		 *
+		 * TBD: I think we can just continue if positive value
+		 *      since that would be in the percpu section.
+		 */
+		if (base[i] >= 0)
+			addr = base[i];
+		else
+			addr = relative_base - 1 - base[i];
+
+		if (adjust_address(&addr)) {
+			/* here we need to recalcuate the offset */
+			new_base = relative_base - 1 - addr;
+			base[i] = new_base;
+		}
+	}
+
+	/*
+	 * here we need to read in all the kallsyms_names info
+	 * so that we can regenerate it.
+	 */
+	deal_with_names(num_syms);
+
+	sort(base, num_syms, sizeof(int), kallsyms_cmp, kallsyms_swp);
+
+	/* write the newly sorted names table over the old one */
+	write_sorted_names(num_syms);
+}
+
+#define ARCH_HAS_SEARCH_EXTABLE
+#include "../../../../lib/extable.c"
+
+static inline unsigned long
+ex_fixup_handler(const struct exception_table_entry *x)
+{
+	return ((unsigned long)&x->handler + x->handler);
+}
+
+static inline unsigned long
+ex_fixup_addr(const struct exception_table_entry *x)
+{
+	return ((unsigned long)&x->fixup + x->fixup);
+}
+
+static void update_ex_table(unsigned long map)
+{
+	struct exception_table_entry *start_ex_table = (struct exception_table_entry *) (__start___ex_table_addr + map);
+	struct exception_table_entry *stop_ex_table = (struct exception_table_entry *) (__stop___ex_table_addr + map);
+	int num_entries = ( __stop___ex_table_addr - __start___ex_table_addr ) / sizeof(struct exception_table_entry);
+	int i;
+
+	debug_putstr("\nUpdating exception table...\n");
+	for (i = 0; i < num_entries; i++) {
+		unsigned long insn = ex_to_insn(&start_ex_table[i]);
+		unsigned long fixup = ex_fixup_addr(&start_ex_table[i]);
+		unsigned long handler = ex_fixup_handler(&start_ex_table[i]);
+		unsigned long addr;
+		Elf64_Shdr *s;
+
+		/* check each address to see if it needs adjusting */
+		addr = insn - map;
+		s = adjust_address(&addr);
+		if (s != NULL)
+			start_ex_table[i].insn += s->sh_offset;
+
+		addr = fixup - map;
+		s = adjust_address(&addr);
+		if (s != NULL)
+			start_ex_table[i].fixup += s->sh_offset;
+
+		addr = handler - map;
+		s = adjust_address(&addr);
+		if (s != NULL)
+			start_ex_table[i].handler += s->sh_offset;
+	}
+}
+
+static void sort_ex_table(unsigned long map)
+{
+	struct exception_table_entry *start_ex_table = (struct exception_table_entry *) (__start___ex_table_addr + map);
+	struct exception_table_entry *stop_ex_table = (struct exception_table_entry *) (__stop___ex_table_addr + map);
+
+	debug_putstr("\nRe-sorting exception table...\n");
+
+	sort_extable(start_ex_table, stop_ex_table);
+}
+
+static void write_seed(unsigned long map)
+{
+	unsigned long long *ptr;
+	int i;
+
+	ptr = (unsigned long long *)(fgkaslr_seed + map);
+	for (i = 0; i < 4; i++)
+		ptr[i] = seed[i];
+}
+
+void post_relocations_cleanup(unsigned long map)
+{
+	update_ex_table(map);
+	sort_ex_table(map);
+	write_seed(map);
+	free(sections);
+	free(sechdrs);
+}
+
+void pre_relocations_cleanup(unsigned long map)
+{
+	sort_kallsyms(map);
+}
+
+static void shuffle_sections(int *list, int size)
+{
+	int i;
+	unsigned long j;
+	int temp;
+
+	parse_prandom_seed();
+
+	for (i = size - 1; i > 0; i--) {
+		j = kaslr_get_prandom_long() % (i + 1);
+
+		temp = list[i];
+		list[i] = list[j];
+		list[j] = temp;
+	}
+}
+
+static void move_text(int num_sections, char *secstrings, Elf64_Shdr *text,
+			void *source, void *dest, Elf64_Phdr *phdr)
+{
+	unsigned long adjusted_addr;
+	int copy_bytes;
+	void *stash;
+	Elf64_Shdr **sorted_sections;
+	int *index_list;
+
+	memmove(dest, source + text->sh_offset, text->sh_size);
+	copy_bytes = text->sh_size;
+	dest += text->sh_size;
+	adjusted_addr = text->sh_addr + text->sh_size;
+
+	/*
+	 * we leave the sections sorted in their original order
+	 * by s->sh_addr, but shuffle the indexes in a random
+	 * order for copying.
+	 */
+	index_list = malloc(sizeof(int) * num_sections);
+	if (!index_list)
+		error("Failed to allocate space for index list");
+
+	for (int i = 0; i < num_sections; i++)
+		index_list[i] = i;
+
+	shuffle_sections(index_list, num_sections);
+
+	/*
+	 * to avoid overwriting earlier sections before they can get
+	 * copied to dest, stash everything into a buffer first.
+	 * this will cause our source address to be off by
+	 * phdr->p_offset though, so we'll adjust s->sh_offset below.
+	 *
+	 * TBD: ideally we'd simply decompress higher up so that our
+	 * copy wasn't in danger of overwriting anything important.
+	 */
+	stash = malloc(phdr->p_filesz);
+	if (!stash)
+		error("Failed to allocate space for text stash");
+
+	memcpy(stash, source + phdr->p_offset, phdr->p_filesz);
+
+	/* now we'd walk through the sections. */
+	for (int j = 0; j < num_sections; j++) {
+		unsigned long aligned_addr;
+		Elf64_Shdr *s;
+		const char *sname;
+		void *src;
+
+		s = sections[index_list[j]];
+
+		sname = secstrings + s->sh_name;
+
+		/* align addr for this section */
+		aligned_addr = ALIGN(adjusted_addr, s->sh_addralign);
+		dest = (void *) ALIGN((unsigned long)dest, s->sh_addralign);
+
+		/*
+		 * copy out of stash, so adjust offset
+		 */
+		src = stash + s->sh_offset - phdr->p_offset;
+
+		/* tbd - fill pad bytes with int3 */
+		memmove(dest, src, s->sh_size);
+
+		dest += s->sh_size;
+		copy_bytes += s->sh_size + aligned_addr - adjusted_addr;
+		adjusted_addr = aligned_addr + s->sh_size;
+
+		/* we can blow away sh_offset for our own uses */
+		s->sh_offset = aligned_addr - s->sh_addr;
+	}
+
+	free(index_list);
+
+	/*
+	 * move remainder of text segment. Ok to just use original source
+	 * here since this area is untouched.
+	 */
+	memmove(dest, source + text->sh_offset + copy_bytes, phdr->p_filesz - copy_bytes);
+	free(stash);
+}
+
+#define GET_SYM(name)					\
+	if (strcmp(#name, strtab + sym->st_name) == 0) {\
+		name = sym->st_value;			\
+		continue;				\
+	}
+
+static void parse_symtab(Elf64_Sym *symtab, char *strtab, long num_syms)
+{
+	Elf64_Sym *sym;
+
+	if (symtab == NULL || strtab == NULL)
+		return;
+
+	debug_putstr("\nLooking for symbols... ");
+
+	/*
+	 * walk through the symbol table looking for the symbols
+	 * that we care about.
+	 */
+	for (sym = symtab; --num_syms >= 0; sym++) {
+		if (!sym->st_name)
+			continue;
+
+		GET_SYM(kallsyms_num_syms);
+		GET_SYM(kallsyms_offsets);
+		GET_SYM(kallsyms_relative_base);
+		GET_SYM(kallsyms_names);
+		GET_SYM(kallsyms_markers);
+		GET_SYM(_stext);
+		GET_SYM(_etext);
+		GET_SYM(_sinittext);
+		GET_SYM(_einittext);
+		GET_SYM(fgkaslr_seed);
+
+		/* these have to be renamed */
+		if (strcmp("__start___ex_table", strtab + sym->st_name) == 0) {
+			 __start___ex_table_addr = sym->st_value;
+			continue;
+		}
+
+		if (strcmp("__stop___ex_table", strtab + sym->st_name) == 0) {
+			 __stop___ex_table_addr = sym->st_value;
+			continue;
+		}
+	}
+}
+
+void parse_sections_headers(void *output, Elf64_Ehdr *ehdr, Elf64_Phdr *phdrs)
+{
+	Elf64_Phdr *phdr;
+	Elf64_Shdr *s;
+	Elf64_Shdr *text = NULL;
+	Elf64_Shdr *percpu = NULL;
+	char *secstrings;
+	const char *sname;
+	int num_sections = 0;
+	Elf64_Sym *symtab = NULL;
+	char *strtab = NULL;
+	long num_syms = 0;
+	void *dest;
+	int i;
+
+	debug_putstr("\nParsing ELF section headers... ");
+
+	/*
+	 * TBD: support more than 64K section headers.
+	 */
+
+	/* we are going to need to allocate space for the section headers */
+	sechdrs = malloc(sizeof(*sechdrs) * ehdr->e_shnum);
+	if (!sechdrs)
+		error("Failed to allocate space for shdrs");
+
+	sections = malloc(sizeof(*sections) * ehdr->e_shnum);
+	if (!sections)
+		error("Failed to allocate space for section pointers");
+
+	memcpy(sechdrs, output + ehdr->e_shoff,
+		sizeof(*sechdrs) * ehdr->e_shnum);
+
+	/* we need to allocate space for the section string table */
+	s = &sechdrs[ehdr->e_shstrndx];
+
+	secstrings = malloc(s->sh_size);
+	if (!secstrings)
+		error("Failed to allocate space for shstr");
+
+	memcpy(secstrings, output + s->sh_offset, s->sh_size);
+
+	/*
+	 * now we need to walk through the section headers and collect the
+	 * sizes of the .text sections to be randomized.
+	 */
+	for (i = 0; i < ehdr->e_shnum; i++) {
+		s = &sechdrs[i];
+		sname = secstrings + s->sh_name;
+
+		if (s->sh_type == SHT_SYMTAB) {
+			/* only one symtab per image */
+			symtab = malloc(s->sh_size);
+			if (!symtab)
+				error("Failed to allocate space for symtab");
+
+			memcpy(symtab, output + s->sh_offset, s->sh_size);
+			num_syms = s->sh_size/sizeof(*symtab);
+			continue;
+		}
+
+		if (s->sh_type == SHT_STRTAB && (i != ehdr->e_shstrndx)) {
+			strtab = malloc(s->sh_size);
+			if (!strtab)
+				error("Failed to allocate space for strtab");
+
+			memcpy(strtab, output + s->sh_offset, s->sh_size);
+		}
+
+		if (!strcmp(sname, ".text")) {
+			text = s;
+			continue;
+		}
+
+		if (!strcmp(sname, ".data..percpu")) {
+			/* get start addr for later */
+			percpu = s;
+		}
+
+		if (!(s->sh_flags & SHF_ALLOC) ||
+		    !(s->sh_flags & SHF_EXECINSTR) ||
+		    !(strstarts(sname, ".text")))
+			continue;
+
+		sections[num_sections] = s;
+
+		num_sections++;
+	}
+	sections[num_sections] = NULL;
+	sections_size = num_sections;
+
+	parse_symtab(symtab, strtab, num_syms);
+
+	for (i = 0; i < ehdr->e_phnum; i++) {
+		phdr = &phdrs[i];
+
+		switch (phdr->p_type) {
+		case PT_LOAD:
+			if ((phdr->p_align % 0x200000) != 0)
+				error("Alignment of LOAD segment isn't multiple of 2MB");
+			dest = output;
+			dest += (phdr->p_paddr - LOAD_PHYSICAL_ADDR);
+			if (text && (phdr->p_offset == text->sh_offset)) {
+				move_text(num_sections, secstrings, text, output, dest, phdr);
+			} else {
+				if (percpu && (phdr->p_offset == percpu->sh_offset)) {
+					percpu_start = percpu->sh_addr;
+					percpu_end = percpu_start + phdr->p_filesz;
+				}
+				memmove(dest, output + phdr->p_offset,
+					phdr->p_filesz);
+			}
+			break;
+		default: /* Ignore other PT_* */
+			break;
+		}
+	}
+
+	/* we need to keep the section info to redo relocs */
+	free(secstrings);
+
+	free(phdrs);
+}
+
+#include "../../../../lib/sort.c"
+#include "../../../../lib/bsearch.c"
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 9652d5c2afda..977da0911ce7 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -203,10 +203,20 @@ static void handle_relocations(void *output, unsigned long output_len,
 	if (IS_ENABLED(CONFIG_X86_64))
 		delta = virt_addr - LOAD_PHYSICAL_ADDR;
 
-	if (!delta) {
-		debug_putstr("No relocation needed... ");
-		return;
+	/*
+	 * it is possible to have delta be zero and
+	 * still have enabled fg kaslr. We need to perform relocations
+	 * for fgkaslr regardless of whether the base address has moved.
+	 */
+	if (!IS_ENABLED(CONFIG_FG_KASLR)) {
+		if (!delta) {
+			debug_putstr("No relocation needed... ");
+			return;
+		}
 	}
+
+	pre_relocations_cleanup(map);
+
 	debug_putstr("Performing relocations... ");
 
 	/*
@@ -230,35 +240,106 @@ static void handle_relocations(void *output, unsigned long output_len,
 	 */
 	for (reloc = output + output_len - sizeof(*reloc); *reloc; reloc--) {
 		long extended = *reloc;
+		long value;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		(void) adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("32-bit relocation outside of kernel!\n");
 
-		*(uint32_t *)ptr += delta;
+		value = *(int32_t *)ptr;
+
+		/*
+		 * If using fgkaslr, the value of the relocation
+		 * might need to be changed because it referred
+		 * to an address that has moved.
+		 */
+		adjust_address(&value);
+
+		value += delta;
+
+		*(uint32_t *)ptr = value;
 	}
 #ifdef CONFIG_X86_64
 	while (*--reloc) {
 		long extended = *reloc;
+		long value;
+		long oldvalue;
+		Elf64_Shdr *s;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		s = adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("inverse 32-bit relocation outside of kernel!\n");
 
-		*(int32_t *)ptr -= delta;
+		value = *(int32_t *)ptr;
+		oldvalue = value;
+
+		/*
+		 * If using fgkaslr, these relocs will contain
+		 * relative offsets which might need to be
+		 * changed because it referred
+		 * to an address that has moved.
+		 */
+		adjust_relative_offset(*reloc, &value, s);
+
+		/*
+		 * only percpu symbols need to have their values adjusted for
+		 * base address kaslr since relative offsets within the .text
+		 * and .text.* sections are ok wrt each other.
+		 */
+		if (is_percpu_addr(*reloc, oldvalue))
+			value -= delta;
+
+		*(int32_t *)ptr = value;
 	}
 	for (reloc--; *reloc; reloc--) {
 		long extended = *reloc;
+		long value;
+
+		/*
+		 * if using fgkaslr, we might have moved the address
+		 * of the relocation. Check it to see if it needs adjusting
+		 * from the original address.
+		 */
+		(void) adjust_address(&extended);
+
 		extended += map;
 
 		ptr = (unsigned long)extended;
 		if (ptr < min_addr || ptr > max_addr)
 			error("64-bit relocation outside of kernel!\n");
 
-		*(uint64_t *)ptr += delta;
+		value = *(int64_t *)ptr;
+
+		/*
+		 * If using fgkaslr, the value of the relocation
+		 * might need to be changed because it referred
+		 * to an address that has moved.
+		 */
+		(void) adjust_address(&value);
+
+		value += delta;
+
+		*(uint64_t *)ptr = value;
 	}
+	post_relocations_cleanup(map);
 #endif
 }
 #else
@@ -296,6 +377,11 @@ static void parse_elf(void *output)
 
 	memcpy(phdrs, output + ehdr.e_phoff, sizeof(*phdrs) * ehdr.e_phnum);
 
+	if (IS_ENABLED(CONFIG_FG_KASLR)) {
+		parse_sections_headers(output, &ehdr, phdrs);
+		return;
+	}
+
 	for (i = 0; i < ehdr.e_phnum; i++) {
 		phdr = &phdrs[i];
 
@@ -448,3 +534,11 @@ void fortify_panic(const char *name)
 {
 	error("detected buffer overflow");
 }
+
+/*
+ * TBD: why does including the .c file in this way work, but building
+ * a separate fgkaslr.o file cause memory reads to fail (garbage)?
+ */
+#ifdef CONFIG_FG_KASLR
+#include "fgkaslr.c"
+#endif
diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index c8181392f70d..60ceb277596d 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -74,6 +74,32 @@ struct mem_vector {
 	unsigned long long size;
 };
 
+#ifdef CONFIG_X86_64
+#define Elf_Ehdr Elf64_Ehdr
+#define Elf_Phdr Elf64_Phdr
+#define Elf_Shdr Elf64_Shdr
+#else
+#define Elf_Ehdr Elf32_Ehdr
+#define Elf_Phdr Elf32_Phdr
+#define Elf_Shdr Elf32_Shdr
+#endif
+
+#if CONFIG_FG_KASLR
+void parse_sections_headers(void *output, Elf_Ehdr *ehdr, Elf_Phdr *phdrs);
+void pre_relocations_cleanup(unsigned long map);
+void post_relocations_cleanup(unsigned long map);
+Elf_Shdr *adjust_address(long *address);
+void adjust_relative_offset(long pc, long *value, Elf_Shdr *section);
+bool is_percpu_addr(long pc, long offset);
+#else
+static inline void parse_sections_headers(void *output, Elf_Ehdr *ehdr, Elf_Phdr *phdrs) { }
+static inline void pre_relocations_cleanup(unsigned long map) { }
+static inline void post_relocations_cleanup(unsigned long map) { }
+static inline Elf_Shdr *adjust_address(long *address) { return NULL; }
+static inline void adjust_relative_offset(long pc, long *value, Elf_Shdr *section) { }
+static inline bool is_percpu_addr(long pc, long offset) { return true; }
+#endif /* CONFIG_FG_KASLR */
+
 #if CONFIG_RANDOMIZE_BASE
 /* kaslr.c */
 void choose_random_location(unsigned long input,
diff --git a/arch/x86/boot/compressed/vmlinux.symbols b/arch/x86/boot/compressed/vmlinux.symbols
new file mode 100644
index 000000000000..13ef31a0aaf2
--- /dev/null
+++ b/arch/x86/boot/compressed/vmlinux.symbols
@@ -0,0 +1,15 @@
+kallsyms_offsets
+kallsyms_addresses
+kallsyms_num_syms
+kallsyms_relative_base
+kallsyms_names
+kallsyms_token_table
+kallsyms_token_index
+kallsyms_markers
+__start___ex_table
+__stop___ex_table
+_sinittext
+_einittext
+_stext
+_etext
+fgkaslr_seed
diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h
index 680c320363db..6918d33eb5ef 100644
--- a/arch/x86/include/asm/boot.h
+++ b/arch/x86/include/asm/boot.h
@@ -26,8 +26,19 @@
 
 #ifdef CONFIG_KERNEL_BZIP2
 # define BOOT_HEAP_SIZE		0x400000
-#else /* !CONFIG_KERNEL_BZIP2 */
-# define BOOT_HEAP_SIZE		 0x10000
+#elif CONFIG_FG_KASLR
+/*
+ * We need extra boot heap when using fgkaslr because we make a copy
+ * of the original decompressed kernel to avoid issues with writing
+ * over ourselves when shuffling the sections. We also need extra
+ * space for resorting kallsyms after shuffling. This value could
+ * be decreased if free() would release memory properly, or if we
+ * could avoid the kernel copy. It would need to be increased if we
+ * find additional tables that need to be resorted.
+ */
+# define BOOT_HEAP_SIZE		0x4000000
+#else /* !CONFIG_KERNEL_BZIP2 && !CONFIG_FG_KASLR */
+# define BOOT_HEAP_SIZE		0x10000
 #endif
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
index 47d5b25e5b11..f35a9831aaec 100644
--- a/arch/x86/include/asm/kaslr.h
+++ b/arch/x86/include/asm/kaslr.h
@@ -4,6 +4,7 @@
 
 unsigned long kaslr_get_random_seed(const char *purpose);
 unsigned long kaslr_get_prandom_long(void);
+void prng_init_seed(unsigned long a, unsigned long b, unsigned long c, unsigned long d);
 
 #ifdef CONFIG_RANDOMIZE_MEMORY
 void kernel_randomize_memory(void);
diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
index 41b5610855a3..950299d64e1e 100644
--- a/arch/x86/lib/kaslr.c
+++ b/arch/x86/lib/kaslr.c
@@ -144,3 +144,18 @@ unsigned long kaslr_get_prandom_long(void)
 
 	return prng_u64(&state);
 }
+
+void prng_init_seed(unsigned long a, unsigned long b, unsigned long c, unsigned long d)
+{
+	int i;
+
+	state.a = a;
+	state.b = b;
+	state.c = c;
+	state.d = d;
+
+	for (i = 0; i < 30; ++i)
+		(void)prng_u64(&state);
+
+	initialized = true;
+}
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c
index 94153732ec00..2f42c14df0f3 100644
--- a/scripts/kallsyms.c
+++ b/scripts/kallsyms.c
@@ -60,6 +60,7 @@ static unsigned int table_size, table_cnt;
 static int all_symbols;
 static int absolute_percpu;
 static int base_relative;
+static int fg_kaslr;
 
 static int token_profit[0x10000];
 
@@ -71,7 +72,7 @@ static unsigned char best_table_len[256];
 static void usage(void)
 {
 	fprintf(stderr, "Usage: kallsyms [--all-symbols] "
-			"[--base-relative] < in.map > out.S\n");
+			"[--base-relative] [--fg-kaslr] < in.map > out.S\n");
 	exit(1);
 }
 
@@ -98,6 +99,7 @@ static bool is_ignored_symbol(const char *name, char type)
 		"kallsyms_markers",
 		"kallsyms_token_table",
 		"kallsyms_token_index",
+		"fgkaslr_seed",
 		/* Exclude linker generated symbols which vary between passes */
 		"_SDA_BASE_",		/* ppc */
 		"_SDA2_BASE_",		/* ppc */
@@ -466,6 +468,14 @@ static void write_src(void)
 	output_label("kallsyms_token_index");
 	for (i = 0; i < 256; i++)
 		printf("\t.short\t%d\n", best_idx[i]);
+
+	if (fg_kaslr) {
+		output_label("fgkaslr_seed");
+		printf("\t.quad\t%d\n", 0);
+		printf("\t.quad\t%d\n", 0);
+		printf("\t.quad\t%d\n", 0);
+		printf("\t.quad\t%d\n", 0);
+	}
 	printf("\n");
 }
 
@@ -743,6 +753,8 @@ int main(int argc, char **argv)
 				absolute_percpu = 1;
 			else if (strcmp(argv[i], "--base-relative") == 0)
 				base_relative = 1;
+			else if (strcmp(argv[i], "--fg-kaslr") == 0)
+				fg_kaslr = 1;
 			else
 				usage();
 		}
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 436379940356..33882bbf95cc 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -152,6 +152,10 @@ kallsyms()
 		kallsymopt="${kallsymopt} --base-relative"
 	fi
 
+	if [ -n "${CONFIG_FG_KASLR}" ]; then
+		kallsymopt="${kallsymopt} --fg-kaslr"
+	fi
+
 	local aflags="${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL}               \
 		      ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS}"
 
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 09/11] kallsyms: hide layout and expose seed
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
                   ` (7 preceding siblings ...)
  2020-02-05 22:39 ` [RFC PATCH 08/11] x86: Add support for finer grained KASLR Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06 12:32   ` Kees Cook
  2020-02-05 22:39 ` [RFC PATCH 10/11] module: Reorder functions Kristen Carlson Accardi
  2020-02-05 22:39 ` [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss Kristen Carlson Accardi
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

To support finer grained kaslr (fgkaslr), we need to make a couple changes
to kallsyms. Firstly, we need to hide our sorted list of symbols, since
this will give away our new layout. Secondly, we will export the seed used
for randomizing the layout so that it can be used to make a particular
layout persist across boots for debug purposes.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 kernel/kallsyms.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 136ce049c4ad..432b13a3a033 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -698,6 +698,21 @@ const char *kdb_walk_kallsyms(loff_t *pos)
 }
 #endif	/* CONFIG_KGDB_KDB */
 
+#ifdef CONFIG_FG_KASLR
+extern const u64 fgkaslr_seed[] __weak;
+
+static int proc_fgkaslr_show(struct seq_file *m, void *v)
+{
+	seq_printf(m, "%llx\n", fgkaslr_seed[0]);
+	seq_printf(m, "%llx\n", fgkaslr_seed[1]);
+	seq_printf(m, "%llx\n", fgkaslr_seed[2]);
+	seq_printf(m, "%llx\n", fgkaslr_seed[3]);
+	return 0;
+}
+#else
+static inline int proc_fgkaslr_show(struct seq_file *m, void *v) { return 0; }
+#endif
+
 static const struct file_operations kallsyms_operations = {
 	.open = kallsyms_open,
 	.read = seq_read,
@@ -707,7 +722,20 @@ static const struct file_operations kallsyms_operations = {
 
 static int __init kallsyms_init(void)
 {
-	proc_create("kallsyms", 0444, NULL, &kallsyms_operations);
+	/*
+	 * When fine grained kaslr is enabled, we don't want to
+	 * print out the symbols even with zero pointers because
+	 * this reveals the randomization order. If fg kaslr is
+	 * enabled, make kallsyms available only to privileged
+	 * users.
+	 */
+	if (!IS_ENABLED(CONFIG_FG_KASLR))
+		proc_create("kallsyms", 0444, NULL, &kallsyms_operations);
+	else {
+		proc_create_single("fgkaslr_seed", 0400, NULL,
+					proc_fgkaslr_show);
+		proc_create("kallsyms", 0400, NULL, &kallsyms_operations);
+	}
 	return 0;
 }
 device_initcall(kallsyms_init);
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 10/11] module: Reorder functions
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
                   ` (8 preceding siblings ...)
  2020-02-05 22:39 ` [RFC PATCH 09/11] kallsyms: hide layout and expose seed Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06 12:41   ` Kees Cook
  2020-02-05 22:39 ` [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss Kristen Carlson Accardi
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

If a module has functions split out into separate text sections
(i.e. compiled with the -ffunction-sections flag), reorder the
functions to provide some code diversification to modules.

Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 kernel/module.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)

diff --git a/kernel/module.c b/kernel/module.c
index b56f3224b161..231563e95e61 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -53,6 +53,8 @@
 #include <linux/bsearch.h>
 #include <linux/dynamic_debug.h>
 #include <linux/audit.h>
+#include <linux/random.h>
+#include <asm/setup.h>
 #include <uapi/linux/module.h>
 #include "module-internal.h"
 
@@ -3245,6 +3247,87 @@ static int find_module_sections(struct module *mod, struct load_info *info)
 	return 0;
 }
 
+/*
+ * shuffle_text_list()
+ * Use a Fisher Yates algorithm to shuffle a list of text sections.
+ */
+static void shuffle_text_list(Elf_Shdr **list, int size)
+{
+	int i;
+	unsigned int j;
+	Elf_Shdr *temp;
+
+	for (i = size - 1; i > 0; i--) {
+		/*
+		 * TBD - seed. We need to be able to use a known
+		 * seed so that we can non-randomly randomize for
+		 * debugging.
+		 */
+
+		// pick a random index from 0 to i
+		get_random_bytes(&j, sizeof(j));
+		j = j % (i + 1);
+
+		temp = list[i];
+		list[i] = list[j];
+		list[j] = temp;
+	}
+}
+
+/*
+ * randomize_text()
+ * Look through the core section looking for executable code sections.
+ * Store sections in an array and then shuffle the sections
+ * to reorder the functions.
+ */
+static void randomize_text(struct module *mod, struct load_info *info)
+{
+	int i;
+	int num_text_sections = 0;
+	Elf_Shdr **text_list;
+	int size = 0;
+	int max_sections = info->hdr->e_shnum;
+	unsigned int sec = find_sec(info, ".text");
+
+	if (!IS_ENABLED(CONFIG_FG_KASLR) || !kaslr_enabled())
+		return;
+
+	if (sec == 0)
+		return;
+
+	text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
+	if (text_list == NULL)
+		return;
+
+	for (i = 0; i < max_sections; i++) {
+		Elf_Shdr *shdr = &info->sechdrs[i];
+		const char *sname = info->secstrings + shdr->sh_name;
+
+		if (!(shdr->sh_flags & SHF_ALLOC) ||
+		    !(shdr->sh_flags & SHF_EXECINSTR) ||
+		    strstarts(sname, ".init"))
+			continue;
+
+		text_list[num_text_sections] = shdr;
+		num_text_sections++;
+	}
+
+	shuffle_text_list(text_list, num_text_sections);
+
+	for (i = 0; i < num_text_sections; i++) {
+		Elf_Shdr *shdr = text_list[i];
+		unsigned int infosec;
+		const char *sname;
+
+		sname = info->secstrings + shdr->sh_name;
+		infosec	= shdr->sh_info;
+
+		shdr->sh_entsize = get_offset(mod, &size, shdr, infosec);
+	}
+
+	kfree(text_list);
+}
+
 static int move_module(struct module *mod, struct load_info *info)
 {
 	int i;
@@ -3282,6 +3365,8 @@ static int move_module(struct module *mod, struct load_info *info)
 	} else
 		mod->init_layout.base = NULL;
 
+	randomize_text(mod, info);
+
 	/* Transfer each section which specifies SHF_ALLOC */
 	pr_debug("final section addresses:\n");
 	for (i = 0; i < info->hdr->e_shnum; i++) {
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss
  2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
                   ` (9 preceding siblings ...)
  2020-02-05 22:39 ` [RFC PATCH 10/11] module: Reorder functions Kristen Carlson Accardi
@ 2020-02-05 22:39 ` Kristen Carlson Accardi
  2020-02-06  0:11   ` Arvind Sankar
  10 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-05 22:39 UTC (permalink / raw)
  To: tglx, mingo, bp, hpa, arjan, keescook
  Cc: rick.p.edgecombe, x86, linux-kernel, kernel-hardening,
	Kristen Carlson Accardi

From: Kees Cook <keescook@chromium.org>

Currently the on-disk decompression image includes the "dynamic" heap
region that is used for malloc() during kernel extraction, relocation,
and decompression ("boot_heap" of BOOT_HEAP_SIZE bytes in the .bss
section). It makes no sense to balloon the bzImage with "boot_heap"
as it is zeroed at boot, and acts much more like a "brk" region.

This seems to be a trivial change because head_{64,32}.S already only
copies up to the start of the .bss section, so any growth in the .bss
area was already not meaningful when placing the image in memory. The
.bss size is, however, reflected in the boot params "init_size", so the
memory range calculations included the "boot_heap" region. Instead of
wasting the on-disk image size bytes, just account for this heap area
when identifying the mem_avoid ranges, and leave it out of the .bss
section entirely. For good measure, also zero initialize it, as this
was already happening for when zeroing the entire .bss section.

While the bzImage size is dominated by the compressed vmlinux, the
difference removes 64KB for all compressors except bzip2, which removes
4MB. For example, this is less than 1% under CONFIG_KERNEL_XZ:

-rw-r--r-- 1 kees kees 7813168 Feb  2 23:39 arch/x86/boot/bzImage.stock-xz
-rw-r--r-- 1 kees kees 7747632 Feb  2 23:42 arch/x86/boot/bzImage.brk-xz

but much more pronounced under CONFIG_KERNEL_BZIP2 (~27%):

-rw-r--r-- 1 kees kees 15231024 Feb  2 23:44 arch/x86/boot/bzImage.stock-bzip2
-rw-r--r-- 1 kees kees 11036720 Feb  2 23:47 arch/x86/boot/bzImage.brk-bzip2

For the future fine-grain KASLR work, this will avoid significant pain,
as the ELF section parser will use much more memory during boot and
filling the bzImage with megabytes of zeros seemed like a poor idea. :)

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
---
 arch/x86/boot/compressed/head_32.S     | 5 ++---
 arch/x86/boot/compressed/head_64.S     | 7 +++----
 arch/x86/boot/compressed/kaslr.c       | 2 +-
 arch/x86/boot/compressed/misc.c        | 3 +++
 arch/x86/boot/compressed/vmlinux.lds.S | 1 +
 5 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/boot/compressed/head_32.S b/arch/x86/boot/compressed/head_32.S
index f2dfd6d083ef..1f3de8efd40e 100644
--- a/arch/x86/boot/compressed/head_32.S
+++ b/arch/x86/boot/compressed/head_32.S
@@ -59,6 +59,7 @@
 	.hidden _ebss
 	.hidden _got
 	.hidden _egot
+	.hidden _brk
 
 	__HEAD
 SYM_FUNC_START(startup_32)
@@ -249,7 +250,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
 	pushl	$z_input_len	/* input_len */
 	leal	input_data(%ebx), %eax
 	pushl	%eax		/* input_data */
-	leal	boot_heap(%ebx), %eax
+	leal	_brk(%ebx), %eax
 	pushl	%eax		/* heap area */
 	pushl	%esi		/* real mode pointer */
 	call	extract_kernel	/* returns kernel location in %eax */
@@ -276,8 +277,6 @@ efi32_config:
  */
 	.bss
 	.balign 4
-boot_heap:
-	.fill BOOT_HEAP_SIZE, 1, 0
 boot_stack:
 	.fill BOOT_STACK_SIZE, 1, 0
 boot_stack_end:
diff --git a/arch/x86/boot/compressed/head_64.S b/arch/x86/boot/compressed/head_64.S
index ee60b81944a7..850bc5220a8d 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -42,6 +42,7 @@
 	.hidden _ebss
 	.hidden _got
 	.hidden _egot
+	.hidden _brk
 
 	__HEAD
 	.code32
@@ -534,7 +535,7 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
  */
 	pushq	%rsi			/* Save the real mode argument */
 	movq	%rsi, %rdi		/* real mode address */
-	leaq	boot_heap(%rip), %rsi	/* malloc area for uncompression */
+	leaq	_brk(%rip), %rsi	/* malloc area for uncompression */
 	leaq	input_data(%rip), %rdx  /* input_data */
 	movl	$z_input_len, %ecx	/* input_len */
 	movq	%rbp, %r8		/* output target address */
@@ -701,12 +702,10 @@ SYM_DATA_END(efi64_config)
 #endif /* CONFIG_EFI_STUB */
 
 /*
- * Stack and heap for uncompression
+ * Stack for placement and uncompression
  */
 	.bss
 	.balign 4
-SYM_DATA_LOCAL(boot_heap,	.fill BOOT_HEAP_SIZE, 1, 0)
-
 SYM_DATA_START_LOCAL(boot_stack)
 	.fill BOOT_STACK_SIZE, 1, 0
 SYM_DATA_END_LABEL(boot_stack, SYM_L_LOCAL, boot_stack_end)
diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index ae4dce76a9bd..da64d2cdbb42 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -397,7 +397,7 @@ static void handle_mem_options(void)
 static void mem_avoid_init(unsigned long input, unsigned long input_size,
 			   unsigned long output)
 {
-	unsigned long init_size = boot_params->hdr.init_size;
+	unsigned long init_size = boot_params->hdr.init_size + BOOT_HEAP_SIZE;
 	u64 initrd_start, initrd_size;
 	u64 cmd_line, cmd_line_size;
 	char *ptr;
diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c
index 977da0911ce7..cb12da264b59 100644
--- a/arch/x86/boot/compressed/misc.c
+++ b/arch/x86/boot/compressed/misc.c
@@ -463,6 +463,9 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap,
 
 	debug_putstr("early console in extract_kernel\n");
 
+	/* Zero what is effectively our .brk section. */
+	memset((void *)heap, 0, BOOT_HEAP_SIZE);
+	debug_putaddr(heap);
 	free_mem_ptr     = heap;	/* Heap */
 	free_mem_end_ptr = heap + BOOT_HEAP_SIZE;
 
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index 508cfa6828c5..3ce690474940 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -73,4 +73,5 @@ SECTIONS
 #endif
 	. = ALIGN(PAGE_SIZE);	/* keep ZO size page aligned */
 	_end = .;
+	_brk = .;
 }
-- 
2.24.1


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss
  2020-02-05 22:39 ` [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss Kristen Carlson Accardi
@ 2020-02-06  0:11   ` Arvind Sankar
  2020-02-06  0:33     ` Kristen Carlson Accardi
  2020-02-06 11:13     ` Kees Cook
  0 siblings, 2 replies; 59+ messages in thread
From: Arvind Sankar @ 2020-02-06  0:11 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, keescook, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening

On Wed, Feb 05, 2020 at 02:39:50PM -0800, Kristen Carlson Accardi wrote:
> From: Kees Cook <keescook@chromium.org>
> 
> Currently the on-disk decompression image includes the "dynamic" heap
> region that is used for malloc() during kernel extraction, relocation,
> and decompression ("boot_heap" of BOOT_HEAP_SIZE bytes in the .bss
> section). It makes no sense to balloon the bzImage with "boot_heap"
> as it is zeroed at boot, and acts much more like a "brk" region.
> 
> This seems to be a trivial change because head_{64,32}.S already only
> copies up to the start of the .bss section, so any growth in the .bss
> area was already not meaningful when placing the image in memory. The
> .bss size is, however, reflected in the boot params "init_size", so the
> memory range calculations included the "boot_heap" region. Instead of
> wasting the on-disk image size bytes, just account for this heap area
> when identifying the mem_avoid ranges, and leave it out of the .bss
> section entirely. For good measure, also zero initialize it, as this
> was already happening for when zeroing the entire .bss section.
> 
> While the bzImage size is dominated by the compressed vmlinux, the
> difference removes 64KB for all compressors except bzip2, which removes
> 4MB. For example, this is less than 1% under CONFIG_KERNEL_XZ:
> 
> -rw-r--r-- 1 kees kees 7813168 Feb  2 23:39 arch/x86/boot/bzImage.stock-xz
> -rw-r--r-- 1 kees kees 7747632 Feb  2 23:42 arch/x86/boot/bzImage.brk-xz
> 
> but much more pronounced under CONFIG_KERNEL_BZIP2 (~27%):
> 
> -rw-r--r-- 1 kees kees 15231024 Feb  2 23:44 arch/x86/boot/bzImage.stock-bzip2
> -rw-r--r-- 1 kees kees 11036720 Feb  2 23:47 arch/x86/boot/bzImage.brk-bzip2
> 
> For the future fine-grain KASLR work, this will avoid significant pain,
> as the ELF section parser will use much more memory during boot and
> filling the bzImage with megabytes of zeros seemed like a poor idea. :)
> 

I'm not sure I follow this: the reason the bzImage currently contains
.bss and a fix for it is in a patch I have out for review at
https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu

This alone shouldn't make much of a difference across compressors. The
entire .bss is just stored uncompressed as 0's in bzImage currently.
The only thing that gets compressed is the original kernel ELF file. Is
the difference above just from this patch, or is it including the
overhead of function-sections?

It is not necessary for it to contain .bss to get the correct init_size.
The latter is calculated (in x86/boot/header.S) based on the offset of
the _end symbol in the compressed vmlinux, so storing the .bss is just a
bug.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559

From the cover letter:
> Image Size
> ----------
> Adding additional section headers as a result of compiling with
> -ffunction-sections will increase the size of the vmlinux ELF file. In
> addition, the vmlinux.bin file generated in arch/x86/boot/compressed by
> objcopy grows significantly with the current POC implementation. This is
> because the boot heap size must be dramatically increased to support shuffling
> the sections and re-sorting kallsyms. With a sample kernel compilation using a
> stock Fedora config, bzImage grew about 7.5X when CONFIG_FG_KASLR was enabled.
> This is because the boot heap area is included in the image itself.
> 
> It is possible to mitigate this issue by moving the boot heap out of .bss.
> Kees Cook has a prototype of this working, and it is included in this
> patchset.

I am also confused by this -- the boot heap is not part of the
vmlinux.bin in arch/x86/boot/compressed: that's a stripped copy of the
decompressed kernel, just before we apply the selected compression to it
and vmlinux.relocs.

Do you mean arch/x86/boot/vmlinux.bin? That is an objcopy of
compressed/vmlinux, and it grows in size with increasing .bss for the
same reason as above (rather it's the cause of bzImage growing).

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss
  2020-02-06  0:11   ` Arvind Sankar
@ 2020-02-06  0:33     ` Kristen Carlson Accardi
  2020-02-06 11:13     ` Kees Cook
  1 sibling, 0 replies; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-06  0:33 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: tglx, mingo, bp, hpa, arjan, keescook, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening

On Wed, 2020-02-05 at 19:11 -0500, Arvind Sankar wrote:
> On Wed, Feb 05, 2020 at 02:39:50PM -0800, Kristen Carlson Accardi
> wrote:
> > From: Kees Cook <keescook@chromium.org>
> > 
> > Currently the on-disk decompression image includes the "dynamic"
> > heap
> > region that is used for malloc() during kernel extraction,
> > relocation,
> > and decompression ("boot_heap" of BOOT_HEAP_SIZE bytes in the .bss
> > section). It makes no sense to balloon the bzImage with "boot_heap"
> > as it is zeroed at boot, and acts much more like a "brk" region.
> > 
> > This seems to be a trivial change because head_{64,32}.S already
> > only
> > copies up to the start of the .bss section, so any growth in the
> > .bss
> > area was already not meaningful when placing the image in memory.
> > The
> > .bss size is, however, reflected in the boot params "init_size", so
> > the
> > memory range calculations included the "boot_heap" region. Instead
> > of
> > wasting the on-disk image size bytes, just account for this heap
> > area
> > when identifying the mem_avoid ranges, and leave it out of the .bss
> > section entirely. For good measure, also zero initialize it, as
> > this
> > was already happening for when zeroing the entire .bss section.
> > 
> > While the bzImage size is dominated by the compressed vmlinux, the
> > difference removes 64KB for all compressors except bzip2, which
> > removes
> > 4MB. For example, this is less than 1% under CONFIG_KERNEL_XZ:
> > 
> > -rw-r--r-- 1 kees kees 7813168 Feb  2 23:39
> > arch/x86/boot/bzImage.stock-xz
> > -rw-r--r-- 1 kees kees 7747632 Feb  2 23:42
> > arch/x86/boot/bzImage.brk-xz
> > 
> > but much more pronounced under CONFIG_KERNEL_BZIP2 (~27%):
> > 
> > -rw-r--r-- 1 kees kees 15231024 Feb  2 23:44
> > arch/x86/boot/bzImage.stock-bzip2
> > -rw-r--r-- 1 kees kees 11036720 Feb  2 23:47
> > arch/x86/boot/bzImage.brk-bzip2
> > 
> > For the future fine-grain KASLR work, this will avoid significant
> > pain,
> > as the ELF section parser will use much more memory during boot and
> > filling the bzImage with megabytes of zeros seemed like a poor
> > idea. :)
> > 
> 
> I'm not sure I follow this: the reason the bzImage currently contains
> .bss and a fix for it is in a patch I have out for review at
> https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu
> 
> This alone shouldn't make much of a difference across compressors.
> The
> entire .bss is just stored uncompressed as 0's in bzImage currently.
> The only thing that gets compressed is the original kernel ELF file.
> Is
> the difference above just from this patch, or is it including the
> overhead of function-sections?
> 
> It is not necessary for it to contain .bss to get the correct
> init_size.
> The latter is calculated (in x86/boot/header.S) based on the offset
> of
> the _end symbol in the compressed vmlinux, so storing the .bss is
> just a
> bug.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559
> 
> From the cover letter:
> > Image Size
> > ----------
> > Adding additional section headers as a result of compiling with
> > -ffunction-sections will increase the size of the vmlinux ELF file.
> > In
> > addition, the vmlinux.bin file generated in
> > arch/x86/boot/compressed by
> > objcopy grows significantly with the current POC implementation.
> > This is
> > because the boot heap size must be dramatically increased to
> > support shuffling
> > the sections and re-sorting kallsyms. With a sample kernel
> > compilation using a
> > stock Fedora config, bzImage grew about 7.5X when CONFIG_FG_KASLR
> > was enabled.
> > This is because the boot heap area is included in the image itself.
> > 
> > It is possible to mitigate this issue by moving the boot heap out
> > of .bss.
> > Kees Cook has a prototype of this working, and it is included in
> > this
> > patchset.
> 
> I am also confused by this -- the boot heap is not part of the
> vmlinux.bin in arch/x86/boot/compressed: that's a stripped copy of
> the
> decompressed kernel, just before we apply the selected compression to
> it
> and vmlinux.relocs.
> 
> Do you mean arch/x86/boot/vmlinux.bin? That is an objcopy of
> compressed/vmlinux, and it grows in size with increasing .bss for the
> same reason as above (rather it's the cause of bzImage growing).

Right, sorry for the confusion - I see now that I could have worded
that better. the cover letter should say "In addition, the vmlinux.bin
file generated by the objcopy in arch/x86/boot/compressed/Makefile
grows significantly with the current POC implementation."



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 03/11] x86/boot: Allow a "silent" kaslr random byte fetch
  2020-02-05 22:39 ` [RFC PATCH 03/11] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
@ 2020-02-06  1:08   ` Andy Lutomirski
  2020-02-06 11:48     ` Kees Cook
  2020-02-06 16:58     ` Kristen Carlson Accardi
  0 siblings, 2 replies; 59+ messages in thread
From: Andy Lutomirski @ 2020-02-06  1:08 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, keescook, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening



> On Feb 5, 2020, at 2:39 PM, Kristen Carlson Accardi <kristen@linux.intel.com> wrote:
> 
> From: Kees Cook <keescook@chromium.org>
> 
> Under earlyprintk, each RNG call produces a debug report line. When
> shuffling hundreds of functions, this is not useful information (each
> line is identical and tells us nothing new). Instead, allow for a NULL
> "purpose" to suppress the debug reporting.

Have you counted how many RDRAND calls this causes?  RDRAND is exceedingly slow on all CPUs I’ve looked at. The whole “RDRAND has great bandwidth” marketing BS actually means that it has decent bandwidth if all CPUs hammer it at the same time. The latency is abysmal.  I have asked Intel to improve this, but the latency of that request will be quadrillions of cycles :)

It wouldn’t shock me if just the RDRAND calls account for a respectable fraction of total time. The RDTSC fallback, on the other hand, may be so predictable as to be useless.

I would suggest adding a little ChaCha20 DRBG or similar to the KASLR environment instead. What crypto primitives are available there?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling
  2020-02-05 22:39 ` [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling Kristen Carlson Accardi
@ 2020-02-06  1:11   ` Andy Lutomirski
  2020-02-06 15:10   ` Jason A. Donenfeld
  1 sibling, 0 replies; 59+ messages in thread
From: Andy Lutomirski @ 2020-02-06  1:11 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, keescook, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening



> On Feb 5, 2020, at 2:39 PM, Kristen Carlson Accardi <kristen@linux.intel.com> wrote:
> 
> From: Kees Cook <keescook@chromium.org>
> 
> This *might* improve shuffling speed at boot. Possibly only marginally.
> This has not yet been tested, and would need to have some performance
> tests run to determine if it helps before merging.

Ugh, don’t do this. Use a real DRBG.  Someone is going to break the construction in your patch just to prove they can.

ChaCha20 is a good bet.

> 
> (notes from Kristen) - initial performance tests suggest that any
> improvement is indeed marginal. However, this code is useful
> for using a known seed.
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> ---
> arch/x86/boot/compressed/kaslr.c |  4 +--
> arch/x86/include/asm/kaslr.h     |  3 +-
> arch/x86/lib/kaslr.c             | 50 +++++++++++++++++++++++++++++++-
> arch/x86/mm/init.c               |  2 +-
> arch/x86/mm/kaslr.c              |  2 +-
> 5 files changed, 55 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
> index d7408af55738..ae4dce76a9bd 100644
> --- a/arch/x86/boot/compressed/kaslr.c
> +++ b/arch/x86/boot/compressed/kaslr.c
> @@ -598,7 +598,7 @@ static unsigned long slots_fetch_random(void)
>    if (slot_max == 0)
>        return 0;
> 
> -    slot = kaslr_get_random_long("Physical") % slot_max;
> +    slot = kaslr_get_random_seed("Physical") % slot_max;
> 
>    for (i = 0; i < slot_area_index; i++) {
>        if (slot >= slot_areas[i].num) {
> @@ -880,7 +880,7 @@ static unsigned long find_random_virt_addr(unsigned long minimum,
>    slots = (KERNEL_IMAGE_SIZE - minimum - image_size) /
>         CONFIG_PHYSICAL_ALIGN + 1;
> 
> -    random_addr = kaslr_get_random_long("Virtual") % slots;
> +    random_addr = kaslr_get_random_seed("Virtual") % slots;
> 
>    return random_addr * CONFIG_PHYSICAL_ALIGN + minimum;
> }
> diff --git a/arch/x86/include/asm/kaslr.h b/arch/x86/include/asm/kaslr.h
> index db7ba2feb947..47d5b25e5b11 100644
> --- a/arch/x86/include/asm/kaslr.h
> +++ b/arch/x86/include/asm/kaslr.h
> @@ -2,7 +2,8 @@
> #ifndef _ASM_KASLR_H_
> #define _ASM_KASLR_H_
> 
> -unsigned long kaslr_get_random_long(const char *purpose);
> +unsigned long kaslr_get_random_seed(const char *purpose);
> +unsigned long kaslr_get_prandom_long(void);
> 
> #ifdef CONFIG_RANDOMIZE_MEMORY
> void kernel_randomize_memory(void);
> diff --git a/arch/x86/lib/kaslr.c b/arch/x86/lib/kaslr.c
> index 2b3eb8c948a3..41b5610855a3 100644
> --- a/arch/x86/lib/kaslr.c
> +++ b/arch/x86/lib/kaslr.c
> @@ -46,7 +46,7 @@ static inline u16 i8254(void)
>    return timer;
> }
> 
> -unsigned long kaslr_get_random_long(const char *purpose)
> +unsigned long kaslr_get_random_seed(const char *purpose)
> {
> #ifdef CONFIG_X86_64
>    const unsigned long mix_const = 0x5d6008cbf3848dd3UL;
> @@ -96,3 +96,51 @@ unsigned long kaslr_get_random_long(const char *purpose)
> 
>    return random;
> }
> +
> +/*
> + * 64bit variant of Bob Jenkins' public domain PRNG
> + * 256 bits of internal state
> + */
> +struct prng_state {
> +    u64 a, b, c, d;
> +};
> +
> +static struct prng_state state;
> +static bool initialized;
> +
> +#define rot(x, k) (((x)<<(k))|((x)>>(64-(k))))
> +static u64 prng_u64(struct prng_state *x)
> +{
> +    u64 e;
> +
> +    e = x->a - rot(x->b, 7);
> +    x->a = x->b ^ rot(x->c, 13);
> +    x->b = x->c + rot(x->d, 37);
> +    x->c = x->d + e;
> +    x->d = e + x->a;
> +
> +    return x->d;
> +}
> +
> +static void prng_init(struct prng_state *state)
> +{
> +    int i;
> +
> +    state->a = kaslr_get_random_seed(NULL);
> +    state->b = kaslr_get_random_seed(NULL);
> +    state->c = kaslr_get_random_seed(NULL);
> +    state->d = kaslr_get_random_seed(NULL);
> +
> +    for (i = 0; i < 30; ++i)
> +        (void)prng_u64(state);
> +
> +    initialized = true;
> +}
> +
> +unsigned long kaslr_get_prandom_long(void)
> +{
> +    if (!initialized)
> +        prng_init(&state);
> +
> +    return prng_u64(&state);
> +}
> diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
> index e7bb483557c9..c974dbab2293 100644
> --- a/arch/x86/mm/init.c
> +++ b/arch/x86/mm/init.c
> @@ -722,7 +722,7 @@ void __init poking_init(void)
>     */
>    poking_addr = TASK_UNMAPPED_BASE;
>    if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> -        poking_addr += (kaslr_get_random_long("Poking") & PAGE_MASK) %
> +        poking_addr += (kaslr_get_random_seed("Poking") & PAGE_MASK) %
>            (TASK_SIZE - TASK_UNMAPPED_BASE - 3 * PAGE_SIZE);
> 
>    if (((poking_addr + PAGE_SIZE) & ~PMD_MASK) == 0)
> diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c
> index dc6182eecefa..b30bd1db7543 100644
> --- a/arch/x86/mm/kaslr.c
> +++ b/arch/x86/mm/kaslr.c
> @@ -123,7 +123,7 @@ void __init kernel_randomize_memory(void)
>    for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++)
>        remain_entropy -= get_padding(&kaslr_regions[i]);
> 
> -    prandom_seed_state(&rand_state, kaslr_get_random_long("Memory"));
> +    prandom_seed_state(&rand_state, kaslr_get_random_seed("Memory"));
> 
>    for (i = 0; i < ARRAY_SIZE(kaslr_regions); i++) {
>        unsigned long entropy;
> -- 
> 2.24.1
> 

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-05 22:39 ` [RFC PATCH 08/11] x86: Add support for finer grained KASLR Kristen Carlson Accardi
@ 2020-02-06  1:17   ` Andy Lutomirski
  2020-02-06 11:56     ` Kees Cook
  2020-02-06 10:38   ` Peter Zijlstra
  1 sibling, 1 reply; 59+ messages in thread
From: Andy Lutomirski @ 2020-02-06  1:17 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Arjan van de Ven, Kees Cook, Rick Edgecombe, X86 ML, LKML,
	Kernel Hardening

On Wed, Feb 5, 2020 at 2:39 PM Kristen Carlson Accardi
<kristen@linux.intel.com> wrote:
>
> At boot time, find all the function sections that have separate .text
> sections, shuffle them, and then copy them to new locations. Adjust
> any relocations accordingly.
>

> +       sort(base, num_syms, sizeof(int), kallsyms_cmp, kallsyms_swp);

Hah, here's a huge bottleneck.  Unless you are severely
memory-constrained, never do a sort with an expensive swap function
like this.  Instead allocate an array of indices that starts out as
[0, 1, 2, ...].  Sort *that* where the swap function just swaps the
indices.  Then use the sorted list of indices to permute the actual
data.  The result is exactly one expensive swap per item instead of
one expensive swap per swap.

--Andy

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 05/11] x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  2020-02-05 22:39 ` [RFC PATCH 05/11] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
@ 2020-02-06 10:30   ` Peter Zijlstra
  2020-02-06 11:52     ` Kees Cook
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2020-02-06 10:30 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, keescook, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening

On Wed, Feb 05, 2020 at 02:39:44PM -0800, Kristen Carlson Accardi wrote:
> Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
> the make file to build with -ffunction-sections if CONFIG_FG_KASLR
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> ---
>  Makefile         |  4 ++++
>  arch/x86/Kconfig | 13 +++++++++++++
>  2 files changed, 17 insertions(+)
> 
> diff --git a/Makefile b/Makefile
> index c50ef91f6136..41438a921666 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
>  KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
>  endif
>  
> +ifdef CONFIG_FG_KASLR
> +KBUILD_CFLAGS += -ffunction-sections
> +endif

The GCC manual has:

  -ffunction-sections
  -fdata-sections

      Place each function or data item into its own section in the output
      file if the target supports arbitrary sections. The name of the
      function or the name of the data item determines the section’s name
      in the output file.

      Use these options on systems where the linker can perform
      optimizations to improve locality of reference in the instruction
      space. Most systems using the ELF object format have linkers with
      such optimizations. On AIX, the linker rearranges sections (CSECTs)
      based on the call graph. The performance impact varies.

      Together with a linker garbage collection (linker --gc-sections
      option) these options may lead to smaller statically-linked
      executables (after stripping).

      On ELF/DWARF systems these options do not degenerate the quality of
      the debug information. There could be issues with other object
      files/debug info formats.

      Only use these options when there are significant benefits from
      doing so. When you specify these options, the assembler and linker
      create larger object and executable files and are also slower. These
      options affect code generation. They prevent optimizations by the
      compiler and assembler using relative locations inside a translation
      unit since the locations are unknown until link time. An example of
      such an optimization is relaxing calls to short call instructions.

In particular:

  "They prevent optimizations by the compiler and assembler using
  relative locations inside a translation unit since the locations are
  unknown until link time."

I suppose in practise this only means tail-calls are affected and will
no longer use JMP.d8. Or are more things affected?

(Also, should not the next patch come before this one?)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-05 22:39 ` [RFC PATCH 08/11] x86: Add support for finer grained KASLR Kristen Carlson Accardi
  2020-02-06  1:17   ` Andy Lutomirski
@ 2020-02-06 10:38   ` Peter Zijlstra
  2020-02-06 12:06     ` Kees Cook
  1 sibling, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2020-02-06 10:38 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, keescook, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening

On Wed, Feb 05, 2020 at 02:39:47PM -0800, Kristen Carlson Accardi wrote:
> +static long __start___ex_table_addr;
> +static long __stop___ex_table_addr;
> +static long _stext;
> +static long _etext;
> +static long _sinittext;
> +static long _einittext;

Should you not also adjust __jump_table, __mcount_loc,
__kprobe_blacklist and possibly others that include text addresses?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss
  2020-02-06  0:11   ` Arvind Sankar
  2020-02-06  0:33     ` Kristen Carlson Accardi
@ 2020-02-06 11:13     ` Kees Cook
  2020-02-06 14:25       ` Arvind Sankar
  1 sibling, 1 reply; 59+ messages in thread
From: Kees Cook @ 2020-02-06 11:13 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Wed, Feb 05, 2020 at 07:11:05PM -0500, Arvind Sankar wrote:
> From: Kees Cook <keescook@chromium.org>
> > This seems to be a trivial change because head_{64,32}.S already only
> > copies up to the start of the .bss section, so any growth in the .bss
> > area was already not meaningful when placing the image in memory. The
> > .bss size is, however, reflected in the boot params "init_size", so the
> > memory range calculations included the "boot_heap" region. Instead of
> > wasting the on-disk image size bytes, just account for this heap area
> > when identifying the mem_avoid ranges, and leave it out of the .bss
> > section entirely. For good measure, also zero initialize it, as this
> > was already happening for when zeroing the entire .bss section.
> 
> I'm not sure I follow this: the reason the bzImage currently contains
> .bss and a fix for it is in a patch I have out for review at
> https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu

Ah! Thank you. Yes, that's _much_ cleaner. I could not figure out why
the linker was actually keeping the .bss section allocated in the
on-disk image. :) We've only had this bug for 10 years. ;)

> This alone shouldn't make much of a difference across compressors. The
> entire .bss is just stored uncompressed as 0's in bzImage currently.
> The only thing that gets compressed is the original kernel ELF file. Is
> the difference above just from this patch, or is it including the
> overhead of function-sections?

With bzip2, it's a 4MB heap in .bss. Other compressors are 64KB. With
fg-kaslr, the heap is 64MB in .bss. It made the bzImage huge. ;) Another
thought I had to deal with the memory utilization in the fg-kaslr shuffle
was to actually choose _two_ kernel locations in memory (via a refactoring
of choose_random_location()). One to decompress into and the other to
write out during the shuffle. Though the symbol table still needs to be
reconstructed, etc, so probably just best to leave it all in the regular
heap (or improve the ZO heap allocator which doesn't really implement
free()).

> It is not necessary for it to contain .bss to get the correct init_size.
> The latter is calculated (in x86/boot/header.S) based on the offset of
> the _end symbol in the compressed vmlinux, so storing the .bss is just a
> bug.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559

Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I
was staring at this, since I only looked around the compressed/ directory.
:)

I should add this:

diff --git a/arch/x86/boot/compressed/kaslr.c b/arch/x86/boot/compressed/kaslr.c
index d7408af55738..346e36ae163e 100644
--- a/arch/x86/boot/compressed/kaslr.c
+++ b/arch/x86/boot/compressed/kaslr.c
@@ -346,7 +346,7 @@ static void handle_mem_options(void)
  * in header.S, and the memory diagram is based on the one found in
  * misc.c.
  *
  * The following conditions are already enforced by the image layouts
  * and
- * associated code:
+ * associated code (see ../boot/header.S):
  *  - input + input_size >= output + output_size
  *  - kernel_total_size <= init_size
  *  - kernel_total_size <= output_size (see Note below)


-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 03/11] x86/boot: Allow a "silent" kaslr random byte fetch
  2020-02-06  1:08   ` Andy Lutomirski
@ 2020-02-06 11:48     ` Kees Cook
  2020-02-06 16:58     ` Kristen Carlson Accardi
  1 sibling, 0 replies; 59+ messages in thread
From: Kees Cook @ 2020-02-06 11:48 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Wed, Feb 05, 2020 at 05:08:55PM -0800, Andy Lutomirski wrote:
> 
> 
> > On Feb 5, 2020, at 2:39 PM, Kristen Carlson Accardi <kristen@linux.intel.com> wrote:
> > 
> > From: Kees Cook <keescook@chromium.org>
> > 
> > Under earlyprintk, each RNG call produces a debug report line. When
> > shuffling hundreds of functions, this is not useful information (each
> > line is identical and tells us nothing new). Instead, allow for a NULL
> > "purpose" to suppress the debug reporting.
> 
> Have you counted how many RDRAND calls this causes?  RDRAND is
> exceedingly slow on all CPUs I’ve looked at. The whole “RDRAND
> has great bandwidth” marketing BS actually means that it has decent
> bandwidth if all CPUs hammer it at the same time. The latency is abysmal.
> I have asked Intel to improve this, but the latency of that request will
> be quadrillions of cycles :)

In an earlier version of this series, it was called once per function
section (so, about 50,000 times). The (lack of) speed was quite
measurable.

> I would suggest adding a little ChaCha20 DRBG or similar to the KASLR
> environment instead. What crypto primitives are available there?

Agreed. The simple PRNG in the next patch was most just a POC initially,
but Kristen kept it due to its debugging properties (specifying an
external seed). Pulling in ChaCha20 seems like a good approach.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 05/11] x86: Makefile: Add build and config option for CONFIG_FG_KASLR
  2020-02-06 10:30   ` Peter Zijlstra
@ 2020-02-06 11:52     ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2020-02-06 11:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 11:30:55AM +0100, Peter Zijlstra wrote:
> On Wed, Feb 05, 2020 at 02:39:44PM -0800, Kristen Carlson Accardi wrote:
> > Allow user to select CONFIG_FG_KASLR if dependencies are met. Change
> > the make file to build with -ffunction-sections if CONFIG_FG_KASLR
> > 
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > ---
> >  Makefile         |  4 ++++
> >  arch/x86/Kconfig | 13 +++++++++++++
> >  2 files changed, 17 insertions(+)
> > 
> > diff --git a/Makefile b/Makefile
> > index c50ef91f6136..41438a921666 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -846,6 +846,10 @@ ifdef CONFIG_LIVEPATCH
> >  KBUILD_CFLAGS += $(call cc-option, -flive-patching=inline-clone)
> >  endif
> >  
> > +ifdef CONFIG_FG_KASLR
> > +KBUILD_CFLAGS += -ffunction-sections
> > +endif
> [...]
> In particular:
> 
>   "They prevent optimizations by the compiler and assembler using
>   relative locations inside a translation unit since the locations are
>   unknown until link time."

I think this mainly a feature of this flag, since it's those relocations
that are used to do the post-shuffle fixups. But yes, I would imagine
this has some negative impact on code generation.

> I suppose in practise this only means tail-calls are affected and will
> no longer use JMP.d8. Or are more things affected?

It's worth looking at. I'm also curious to see how this will interact
with Link Time Optimization.

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-06  1:17   ` Andy Lutomirski
@ 2020-02-06 11:56     ` Kees Cook
  2020-02-06 17:36       ` Kristen Carlson Accardi
  0 siblings, 1 reply; 59+ messages in thread
From: Kees Cook @ 2020-02-06 11:56 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kristen Carlson Accardi, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H. Peter Anvin, Arjan van de Ven,
	Rick Edgecombe, X86 ML, LKML, Kernel Hardening

On Wed, Feb 05, 2020 at 05:17:11PM -0800, Andy Lutomirski wrote:
> On Wed, Feb 5, 2020 at 2:39 PM Kristen Carlson Accardi
> <kristen@linux.intel.com> wrote:
> >
> > At boot time, find all the function sections that have separate .text
> > sections, shuffle them, and then copy them to new locations. Adjust
> > any relocations accordingly.
> >
> 
> > +       sort(base, num_syms, sizeof(int), kallsyms_cmp, kallsyms_swp);
> 
> Hah, here's a huge bottleneck.  Unless you are severely
> memory-constrained, never do a sort with an expensive swap function
> like this.  Instead allocate an array of indices that starts out as
> [0, 1, 2, ...].  Sort *that* where the swap function just swaps the
> indices.  Then use the sorted list of indices to permute the actual
> data.  The result is exactly one expensive swap per item instead of
> one expensive swap per swap.

I think there are few places where memory-vs-speed need to be examined.
I remain surprised about how much memory the entire series already uses
(58MB in my local tests), but I suspect this is likely dominated by the
two factors: a full copy of the decompressed kernel, and that the
"allocator" in the image doesn't really implement free():
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/decompress/mm.h#n55

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-06 10:38   ` Peter Zijlstra
@ 2020-02-06 12:06     ` Kees Cook
  2020-02-06 14:52       ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Kees Cook @ 2020-02-06 12:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 11:38:30AM +0100, Peter Zijlstra wrote:
> On Wed, Feb 05, 2020 at 02:39:47PM -0800, Kristen Carlson Accardi wrote:
> > +static long __start___ex_table_addr;
> > +static long __stop___ex_table_addr;
> > +static long _stext;
> > +static long _etext;
> > +static long _sinittext;
> > +static long _einittext;
> 
> Should you not also adjust __jump_table, __mcount_loc,
> __kprobe_blacklist and possibly others that include text addresses?

These don't appear to be sorted at build time. AIUI, the problem with
ex_table and kallsyms is that they're preprocessed at build time and
opaque to the linker's relocation generation.

For example, looking at __jump_table, it gets sorted at runtime:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/jump_label.c#n474

As you're likely aware, we have a number of "special"
sections like this, currently collected manually, see *_TEXT:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/vmlinux.lds.S#n128

I think we can actually add (most of) these to fg-kaslr's awareness (at
which point their order will be shuffled respective to other sections,
but with their content order unchanged), but it'll require a bit of
linker work. I'll mention this series's dependency on the linker's
orphaned section handling in another thread...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-05 22:39 ` [RFC PATCH 06/11] x86: make sure _etext includes function sections Kristen Carlson Accardi
@ 2020-02-06 12:26   ` Kees Cook
  2020-02-06 13:15     ` Jann Horn
                       ` (3 more replies)
  0 siblings, 4 replies; 59+ messages in thread
From: Kees Cook @ 2020-02-06 12:26 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening

On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote:
> We will be using -ffunction-sections to place each function in
> it's own text section so it can be randomized at load time. The
> linker considers these .text.* sections "orphaned sections", and
> will place them after the first similar section (.text). However,
> we need to move _etext so that it is after both .text and .text.*
> We also need to calculate text size to include .text AND .text.*

The dependency on the linker's orphan section handling is, I feel,
rather fragile (during work on CFI and generally building kernels with
Clang's LLD linker, we keep tripping over difference between how BFD and
LLD handle orphans). However, this is currently no way to perform a
section "pass through" where input sections retain their name as an
output section. (If anyone knows a way to do this, I'm all ears).

Right now, you can only collect sections like this:

        .text :  AT(ADDR(.text) - LOAD_OFFSET) {
		*(.text.*)
	}

or let them be orphans, which then the linker attempts to find a
"similar" (code, data, etc) section to put them near:
https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html

So, basically, yes, this works, but I'd like to see BFD and LLD grow
some kind of /PASSTHRU/ special section (like /DISCARD/), that would let
a linker script specify _where_ these sections should roughly live.

Related thoughts:

I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
what function start alignment should be. It seems the linker is 16 byte
aligning these functions, when I think no alignment is needed for
function starts, so we're wasting some memory (average 8 bytes per
function, at say 50,000 functions, so approaching 512KB) between
functions. If we can specify a 1 byte alignment for these orphan
sections, that would be nice, as mentioned in the cover letter: we lose
a 4 bits of entropy to this alignment, since all randomized function
addresses will have their low bits set to zero.

And we can't adjust function section alignment, or there is some
benefit to a larger alignment, I would like to have a way for the linker
to specify the inter-section padding (or fill) bytes. Right now, the
FILL(0xnn) (or =0xnn) can be used to specify the padding bytes _within_
a section, but not between sections. Right now, BFD appears to 0-pad. I'd
like that to be 0xCC so "guessing" addresses incorrectly will trigger
a trap.

-Kees

> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> ---
>  arch/x86/kernel/vmlinux.lds.S     | 18 +++++++++++++++++-
>  include/asm-generic/vmlinux.lds.h |  2 +-
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
> index 3a1a819da137..e54e9ac5b429 100644
> --- a/arch/x86/kernel/vmlinux.lds.S
> +++ b/arch/x86/kernel/vmlinux.lds.S
> @@ -146,8 +146,24 @@ SECTIONS
>  #endif
>  	} :text =0xcccc
>  
> -	/* End of text section, which should occupy whole number of pages */
> +#ifdef CONFIG_FG_KASLR
> +	/*
> +	 * -ffunction-sections creates .text.* sections, which are considered
> +	 * "orphan sections" and added after the first similar section (.text).
> +	 * Adding this ALIGN statement causes the address of _etext
> +	 * to be below that of all the .text.* orphaned sections
> +	 */
> +	. = ALIGN(PAGE_SIZE);
> +#endif
>  	_etext = .;
> +
> +	/*
> +	 * the size of the .text section is used to calculate the address
> +	 * range for orc lookups. If we just use SIZEOF(.text), we will
> +	 * miss all the .text.* sections. Calculate the size using _etext
> +	 * and _stext and save the value for later.
> +	 */
> +	text_size = _etext - _stext;
>  	. = ALIGN(PAGE_SIZE);
>  
>  	X86_ALIGN_RODATA_BEGIN
> diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
> index e00f41aa8ec4..edf19f4296e2 100644
> --- a/include/asm-generic/vmlinux.lds.h
> +++ b/include/asm-generic/vmlinux.lds.h
> @@ -798,7 +798,7 @@
>  	. = ALIGN(4);							\
>  	.orc_lookup : AT(ADDR(.orc_lookup) - LOAD_OFFSET) {		\
>  		orc_lookup = .;						\
> -		. += (((SIZEOF(.text) + LOOKUP_BLOCK_SIZE - 1) /	\
> +		. += (((text_size + LOOKUP_BLOCK_SIZE - 1) /	\
>  			LOOKUP_BLOCK_SIZE) + 1) * 4;			\
>  		orc_lookup_end = .;					\
>  	}
> -- 
> 2.24.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 09/11] kallsyms: hide layout and expose seed
  2020-02-05 22:39 ` [RFC PATCH 09/11] kallsyms: hide layout and expose seed Kristen Carlson Accardi
@ 2020-02-06 12:32   ` Kees Cook
  2020-02-06 17:51     ` Kristen Carlson Accardi
  0 siblings, 1 reply; 59+ messages in thread
From: Kees Cook @ 2020-02-06 12:32 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening

On Wed, Feb 05, 2020 at 02:39:48PM -0800, Kristen Carlson Accardi wrote:
> To support finer grained kaslr (fgkaslr), we need to make a couple changes
> to kallsyms. Firstly, we need to hide our sorted list of symbols, since
> this will give away our new layout. Secondly, we will export the seed used
> for randomizing the layout so that it can be used to make a particular
> layout persist across boots for debug purposes.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> ---
>  kernel/kallsyms.c | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> index 136ce049c4ad..432b13a3a033 100644
> --- a/kernel/kallsyms.c
> +++ b/kernel/kallsyms.c
> @@ -698,6 +698,21 @@ const char *kdb_walk_kallsyms(loff_t *pos)
>  }
>  #endif	/* CONFIG_KGDB_KDB */
>  
> +#ifdef CONFIG_FG_KASLR
> +extern const u64 fgkaslr_seed[] __weak;
> +
> +static int proc_fgkaslr_show(struct seq_file *m, void *v)
> +{
> +	seq_printf(m, "%llx\n", fgkaslr_seed[0]);
> +	seq_printf(m, "%llx\n", fgkaslr_seed[1]);
> +	seq_printf(m, "%llx\n", fgkaslr_seed[2]);
> +	seq_printf(m, "%llx\n", fgkaslr_seed[3]);
> +	return 0;
> +}
> +#else
> +static inline int proc_fgkaslr_show(struct seq_file *m, void *v) { return 0; }
> +#endif
> +

I'd like to put the fgkaslr seed exposure behind a separate DEBUG
config, since it shouldn't be normally exposed. As such, its
infrastructure should be likely extracted from this and the main fgkaslr
patches and added back separately (and maybe it will entirely vanish
once the RNG is switched to ChaCha20).

>  static const struct file_operations kallsyms_operations = {
>  	.open = kallsyms_open,
>  	.read = seq_read,
> @@ -707,7 +722,20 @@ static const struct file_operations kallsyms_operations = {
>  
>  static int __init kallsyms_init(void)
>  {
> -	proc_create("kallsyms", 0444, NULL, &kallsyms_operations);
> +	/*
> +	 * When fine grained kaslr is enabled, we don't want to
> +	 * print out the symbols even with zero pointers because
> +	 * this reveals the randomization order. If fg kaslr is
> +	 * enabled, make kallsyms available only to privileged
> +	 * users.
> +	 */
> +	if (!IS_ENABLED(CONFIG_FG_KASLR))
> +		proc_create("kallsyms", 0444, NULL, &kallsyms_operations);
> +	else {
> +		proc_create_single("fgkaslr_seed", 0400, NULL,
> +					proc_fgkaslr_show);
> +		proc_create("kallsyms", 0400, NULL, &kallsyms_operations);
> +	}
>  	return 0;
>  }
>  device_initcall(kallsyms_init);
> -- 
> 2.24.1

In the past, making kallsyms entirely unreadable seemed to break weird
stuff in userspace. How about having an alternative view that just
contains a alphanumeric sort of the symbol names (and they will continue
to have zeroed addresses for unprivileged users)?

Or perhaps we wait to hear about this causing a problem, and deal with
it then? :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 07/11] x86/tools: Adding relative relocs for randomized functions
  2020-02-05 22:39 ` [RFC PATCH 07/11] x86/tools: Adding relative relocs for randomized functions Kristen Carlson Accardi
@ 2020-02-06 12:37   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2020-02-06 12:37 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening

On Wed, Feb 05, 2020 at 02:39:46PM -0800, Kristen Carlson Accardi wrote:
> If we are randomizing function sections, we are going to need
> to recalculate relative offsets for relocs that are either in
> the randomized sections, or refer to the randomized sections.
> Add code to detect whether a reloc satisifies these cases, and if
> so, add them to the appropriate reloc list.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>

Looks good to me. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 01/11] modpost: Support >64K sections
  2020-02-05 22:39 ` [RFC PATCH 01/11] modpost: Support >64K sections Kristen Carlson Accardi
@ 2020-02-06 12:38   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2020-02-06 12:38 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening

On Wed, Feb 05, 2020 at 02:39:40PM -0800, Kristen Carlson Accardi wrote:
> According to the ELF specification, if the value of st_shndx
> contains SH_XINDEX, the actual section header index is too
> large to fit in the st_shndx field and you should use the
> value out of the SHT_SYMTAB_SHNDX section instead. This table
> was already being parsed and saved into symtab_shndx_start, however
> it was not being used, causing segfaults when the number of
> sections is greater than 64K. Check the st_shndx field for
> SHN_XINDEX prior to using.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>

Looking at "readelf" output continues to make me laugh. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 02/11] x86: tools/relocs: Support >64K section headers
  2020-02-05 22:39 ` [RFC PATCH 02/11] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
@ 2020-02-06 12:39   ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2020-02-06 12:39 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening

On Wed, Feb 05, 2020 at 02:39:41PM -0800, Kristen Carlson Accardi wrote:
> While it is already supported to find the total number of section
> headers if we exceed 64K sections, we need to support the
> extended symbol table to get section header indexes for symbols
> when there are > 64K sections. Parse the elf file to read
> the extended symbol table info, and then replace all direct references
> to st_shndx with calls to sym_index(), which will determine whether
> we can read the value directly or whether we need to pull it out of
> the extended table.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>

Looks good to me.

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 10/11] module: Reorder functions
  2020-02-05 22:39 ` [RFC PATCH 10/11] module: Reorder functions Kristen Carlson Accardi
@ 2020-02-06 12:41   ` Kees Cook
  2020-02-11 12:39     ` Jessica Yu
  0 siblings, 1 reply; 59+ messages in thread
From: Kees Cook @ 2020-02-06 12:41 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening, Jessica Yu

On Wed, Feb 05, 2020 at 02:39:49PM -0800, Kristen Carlson Accardi wrote:
> If a module has functions split out into separate text sections
> (i.e. compiled with the -ffunction-sections flag), reorder the
> functions to provide some code diversification to modules.
> 
> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>

Probably a good idea to add Jessica to CC in next version:
	Jessica Yu <jeyu@kernel.org>

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  kernel/module.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 85 insertions(+)
> 
> diff --git a/kernel/module.c b/kernel/module.c
> index b56f3224b161..231563e95e61 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -53,6 +53,8 @@
>  #include <linux/bsearch.h>
>  #include <linux/dynamic_debug.h>
>  #include <linux/audit.h>
> +#include <linux/random.h>
> +#include <asm/setup.h>
>  #include <uapi/linux/module.h>
>  #include "module-internal.h"
>  
> @@ -3245,6 +3247,87 @@ static int find_module_sections(struct module *mod, struct load_info *info)
>  	return 0;
>  }
>  
> +/*
> + * shuffle_text_list()
> + * Use a Fisher Yates algorithm to shuffle a list of text sections.
> + */
> +static void shuffle_text_list(Elf_Shdr **list, int size)
> +{
> +	int i;
> +	unsigned int j;
> +	Elf_Shdr *temp;
> +
> +	for (i = size - 1; i > 0; i--) {
> +		/*
> +		 * TBD - seed. We need to be able to use a known
> +		 * seed so that we can non-randomly randomize for
> +		 * debugging.
> +		 */
> +
> +		// pick a random index from 0 to i
> +		get_random_bytes(&j, sizeof(j));
> +		j = j % (i + 1);
> +
> +		temp = list[i];
> +		list[i] = list[j];
> +		list[j] = temp;
> +	}
> +}
> +
> +/*
> + * randomize_text()
> + * Look through the core section looking for executable code sections.
> + * Store sections in an array and then shuffle the sections
> + * to reorder the functions.
> + */
> +static void randomize_text(struct module *mod, struct load_info *info)
> +{
> +	int i;
> +	int num_text_sections = 0;
> +	Elf_Shdr **text_list;
> +	int size = 0;
> +	int max_sections = info->hdr->e_shnum;
> +	unsigned int sec = find_sec(info, ".text");
> +
> +	if (!IS_ENABLED(CONFIG_FG_KASLR) || !kaslr_enabled())
> +		return;
> +
> +	if (sec == 0)
> +		return;
> +
> +	text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
> +	if (text_list == NULL)
> +		return;
> +
> +	for (i = 0; i < max_sections; i++) {
> +		Elf_Shdr *shdr = &info->sechdrs[i];
> +		const char *sname = info->secstrings + shdr->sh_name;
> +
> +		if (!(shdr->sh_flags & SHF_ALLOC) ||
> +		    !(shdr->sh_flags & SHF_EXECINSTR) ||
> +		    strstarts(sname, ".init"))
> +			continue;
> +
> +		text_list[num_text_sections] = shdr;
> +		num_text_sections++;
> +	}
> +
> +	shuffle_text_list(text_list, num_text_sections);
> +
> +	for (i = 0; i < num_text_sections; i++) {
> +		Elf_Shdr *shdr = text_list[i];
> +		unsigned int infosec;
> +		const char *sname;
> +
> +		sname = info->secstrings + shdr->sh_name;
> +		infosec	= shdr->sh_info;
> +
> +		shdr->sh_entsize = get_offset(mod, &size, shdr, infosec);
> +	}
> +
> +	kfree(text_list);
> +}
> +
>  static int move_module(struct module *mod, struct load_info *info)
>  {
>  	int i;
> @@ -3282,6 +3365,8 @@ static int move_module(struct module *mod, struct load_info *info)
>  	} else
>  		mod->init_layout.base = NULL;
>  
> +	randomize_text(mod, info);
> +
>  	/* Transfer each section which specifies SHF_ALLOC */
>  	pr_debug("final section addresses:\n");
>  	for (i = 0; i < info->hdr->e_shnum; i++) {
> -- 
> 2.24.1
> 

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 12:26   ` Kees Cook
@ 2020-02-06 13:15     ` Jann Horn
  2020-02-06 16:27       ` David Laight
  2020-02-06 14:39     ` Arvind Sankar
                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 59+ messages in thread
From: Jann Horn @ 2020-02-06 13:15 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Arjan van de Ven,
	Rick Edgecombe, the arch/x86 maintainers, kernel list,
	Kernel Hardening

On Thu, Feb 6, 2020 at 1:26 PM Kees Cook <keescook@chromium.org> wrote:
> I know x86_64 stack alignment is 16 bytes.

That's true for the standard sysv ABI that is used in userspace; but
the kernel uses a custom ABI with 8-byte stack alignment. See
arch/x86/Makefile:

# For gcc stack alignment is specified with -mpreferred-stack-boundary,
# clang has the option -mstack-alignment for that purpose.
ifneq ($(call cc-option, -mpreferred-stack-boundary=4),)
      cc_stack_align4 := -mpreferred-stack-boundary=2
      cc_stack_align8 := -mpreferred-stack-boundary=3
else ifneq ($(call cc-option, -mstack-alignment=16),)
      cc_stack_align4 := -mstack-alignment=4
      cc_stack_align8 := -mstack-alignment=8
endif
[...]
        # By default gcc and clang use a stack alignment of 16 bytes for x86.
        # However the standard kernel entry on x86-64 leaves the stack on an
        # 8-byte boundary. If the compiler isn't informed about the actual
        # alignment it will generate extra alignment instructions for the
        # default alignment which keep the stack *mis*aligned.
        # Furthermore an alignment to the register width reduces stack usage
        # and the number of alignment instructions.
        KBUILD_CFLAGS += $(call cc-option,$(cc_stack_align8))

> I cannot find evidence for
> what function start alignment should be.

There is no architecturally required alignment for functions, but
Intel's Optimization Manual
(<https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf>)
recommends in section 3.4.1.5, "Code Alignment":

| Assembly/Compiler Coding Rule 12. (M impact, H generality)
| All branch targets should be 16-byte aligned.

AFAIK this is recommended because, as documented in section 2.3.2.1,
"Legacy Decode Pipeline" (describing the frontend of Sandy Bridge, and
used as the base for newer microarchitectures):

| An instruction fetch is a 16-byte aligned lookup through the ITLB
and into the instruction cache.
| The instruction cache can deliver every cycle 16 bytes to the
instruction pre-decoder.

AFAIK this means that if a branch ends close to the end of a 16-byte
block, the frontend is less efficient because it may have to run two
instruction fetches before the first instruction can even be decoded.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss
  2020-02-06 11:13     ` Kees Cook
@ 2020-02-06 14:25       ` Arvind Sankar
  2020-02-06 21:32         ` Kees Cook
  0 siblings, 1 reply; 59+ messages in thread
From: Arvind Sankar @ 2020-02-06 14:25 UTC (permalink / raw)
  To: Kees Cook
  Cc: Arvind Sankar, Kristen Carlson Accardi, tglx, mingo, bp, hpa,
	arjan, rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 03:13:12AM -0800, Kees Cook wrote:
> On Wed, Feb 05, 2020 at 07:11:05PM -0500, Arvind Sankar wrote:
> > From: Kees Cook <keescook@chromium.org>
> > > This seems to be a trivial change because head_{64,32}.S already only
> > > copies up to the start of the .bss section, so any growth in the .bss
> > > area was already not meaningful when placing the image in memory. The
> > > .bss size is, however, reflected in the boot params "init_size", so the
> > > memory range calculations included the "boot_heap" region. Instead of
> > > wasting the on-disk image size bytes, just account for this heap area
> > > when identifying the mem_avoid ranges, and leave it out of the .bss
> > > section entirely. For good measure, also zero initialize it, as this
> > > was already happening for when zeroing the entire .bss section.
> > 
> > I'm not sure I follow this: the reason the bzImage currently contains
> > .bss and a fix for it is in a patch I have out for review at
> > https://lore.kernel.org/lkml/20200109150218.16544-1-nivedita@alum.mit.edu
> 
> Ah! Thank you. Yes, that's _much_ cleaner. I could not figure out why
> the linker was actually keeping the .bss section allocated in the
> on-disk image. :) We've only had this bug for 10 years. ;)
> 
> > This alone shouldn't make much of a difference across compressors. The
> > entire .bss is just stored uncompressed as 0's in bzImage currently.
> > The only thing that gets compressed is the original kernel ELF file. Is
> > the difference above just from this patch, or is it including the
> > overhead of function-sections?
> 
> With bzip2, it's a 4MB heap in .bss. Other compressors are 64KB. With
> fg-kaslr, the heap is 64MB in .bss. It made the bzImage huge. ;) Another

Ah, I just saw that. Makes more sense now -- so my patch actually saves
~4MiB even now for bz2-compressed bzImages.

> thought I had to deal with the memory utilization in the fg-kaslr shuffle
> was to actually choose _two_ kernel locations in memory (via a refactoring
> of choose_random_location()). One to decompress into and the other to
> write out during the shuffle. Though the symbol table still needs to be
> reconstructed, etc, so probably just best to leave it all in the regular
> heap (or improve the ZO heap allocator which doesn't really implement
> free()).
> 
> > It is not necessary for it to contain .bss to get the correct init_size.
> > The latter is calculated (in x86/boot/header.S) based on the offset of
> > the _end symbol in the compressed vmlinux, so storing the .bss is just a
> > bug.
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/boot/header.S#n559
> 
> Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I
> was staring at this, since I only looked around the compressed/ directory.
> :)
> 

There's another thing I noticed -- you would need to ensure that the
init_size in the header covers your boot heap even if you did split it
out. The reason is that the bootloader will only know to reserve enough
memory for init_size: it's possible it might put the initrd or something
else following the kernel, or theoretically there might be reserved
memory regions or the end of physical RAM immediately following, so you
can't assume that area will be available when you get to extract_kernel.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 12:26   ` Kees Cook
  2020-02-06 13:15     ` Jann Horn
@ 2020-02-06 14:39     ` Arvind Sankar
  2020-02-06 15:29       ` Arvind Sankar
  2020-02-06 14:57     ` Arvind Sankar
  2020-02-06 19:41     ` Kristen Carlson Accardi
  3 siblings, 1 reply; 59+ messages in thread
From: Arvind Sankar @ 2020-02-06 14:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
> what function start alignment should be. It seems the linker is 16 byte
> aligning these functions, when I think no alignment is needed for
> function starts, so we're wasting some memory (average 8 bytes per
> function, at say 50,000 functions, so approaching 512KB) between
> functions. If we can specify a 1 byte alignment for these orphan
> sections, that would be nice, as mentioned in the cover letter: we lose
> a 4 bits of entropy to this alignment, since all randomized function
> addresses will have their low bits set to zero.
> 

The default function alignment is 16-bytes for x64 at least with gcc.
You can use -falign-functions to specify a different alignment.

There was some old discussion on reducing it [1] but it doesn't seem to
have been merged.

[1] https://lore.kernel.org/lkml/tip-4874fe1eeb40b403a8c9d0ddeb4d166cab3f37ba@git.kernel.org/

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-06 12:06     ` Kees Cook
@ 2020-02-06 14:52       ` Peter Zijlstra
  2020-02-06 17:25         ` Kristen Carlson Accardi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2020-02-06 14:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 04:06:17AM -0800, Kees Cook wrote:
> On Thu, Feb 06, 2020 at 11:38:30AM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 05, 2020 at 02:39:47PM -0800, Kristen Carlson Accardi wrote:
> > > +static long __start___ex_table_addr;
> > > +static long __stop___ex_table_addr;
> > > +static long _stext;
> > > +static long _etext;
> > > +static long _sinittext;
> > > +static long _einittext;
> > 
> > Should you not also adjust __jump_table, __mcount_loc,
> > __kprobe_blacklist and possibly others that include text addresses?
> 
> These don't appear to be sorted at build time. 

The ORC tables are though:

  57fa18994285 ("scripts/sorttable: Implement build-time ORC unwind table sorting")

> AIUI, the problem with
> ex_table and kallsyms is that they're preprocessed at build time and
> opaque to the linker's relocation generation.

I was under the impression these tables no longer had relocation data;
that since they're part of the main kernel, the final link stage could
completely resolve them.

That said, I now see we actually have .rela__extable .rela.orc_unwind_ip
etc.

> For example, looking at __jump_table, it gets sorted at runtime:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/jump_label.c#n474

For now, yes. Depends a bit on how hard people are pushing on getting
jump_labels working earlier and ealier in boot.

> As you're likely aware, we have a number of "special"
> sections like this, currently collected manually, see *_TEXT:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/kernel/vmlinux.lds.S#n128

Right.

> I think we can actually add (most of) these to fg-kaslr's awareness (at
> which point their order will be shuffled respective to other sections,
> but with their content order unchanged), but it'll require a bit of
> linker work. I'll mention this series's dependency on the linker's
> orphaned section handling in another thread...

I have some patches pending where we rely on link script order. That's
data sections though, so I suppose that's safe for the moment.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 12:26   ` Kees Cook
  2020-02-06 13:15     ` Jann Horn
  2020-02-06 14:39     ` Arvind Sankar
@ 2020-02-06 14:57     ` Arvind Sankar
  2020-02-06 15:45       ` Arvind Sankar
  2020-02-06 19:41     ` Kristen Carlson Accardi
  3 siblings, 1 reply; 59+ messages in thread
From: Arvind Sankar @ 2020-02-06 14:57 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
> On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote:
> > We will be using -ffunction-sections to place each function in
> > it's own text section so it can be randomized at load time. The
> > linker considers these .text.* sections "orphaned sections", and
> > will place them after the first similar section (.text). However,
> > we need to move _etext so that it is after both .text and .text.*
> > We also need to calculate text size to include .text AND .text.*
> 
> The dependency on the linker's orphan section handling is, I feel,
> rather fragile (during work on CFI and generally building kernels with
> Clang's LLD linker, we keep tripping over difference between how BFD and
> LLD handle orphans). However, this is currently no way to perform a
> section "pass through" where input sections retain their name as an
> output section. (If anyone knows a way to do this, I'm all ears).
> 
> Right now, you can only collect sections like this:
> 
>         .text :  AT(ADDR(.text) - LOAD_OFFSET) {
> 		*(.text.*)
> 	}
> 
> or let them be orphans, which then the linker attempts to find a
> "similar" (code, data, etc) section to put them near:
> https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
> 
> So, basically, yes, this works, but I'd like to see BFD and LLD grow
> some kind of /PASSTHRU/ special section (like /DISCARD/), that would let
> a linker script specify _where_ these sections should roughly live.
> 

You could go through the objects that are being linked and find the
individual text sections, and generate the linker script using that?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling
  2020-02-05 22:39 ` [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling Kristen Carlson Accardi
  2020-02-06  1:11   ` Andy Lutomirski
@ 2020-02-06 15:10   ` Jason A. Donenfeld
  2020-02-07  7:23     ` Jean-Philippe Aumasson
  1 sibling, 1 reply; 59+ messages in thread
From: Jason A. Donenfeld @ 2020-02-06 15:10 UTC (permalink / raw)
  To: Kristen Carlson Accardi, keescook
  Cc: tglx, mingo, bp, hpa, arjan, keescook, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening, jeanphilippe.aumasson

Hey Kees,

On Wed, Feb 05, 2020 at 02:39:43PM -0800, Kristen Carlson Accardi wrote:
> +#define rot(x, k) (((x)<<(k))|((x)>>(64-(k))))
> +static u64 prng_u64(struct prng_state *x)
> +{
> +	u64 e;
> +
> +	e = x->a - rot(x->b, 7);
> +	x->a = x->b ^ rot(x->c, 13);
> +	x->b = x->c + rot(x->d, 37);
> +	x->c = x->d + e;
> +	x->d = e + x->a;
> +
> +	return x->d;
> +}

I haven't looked closely at where the original entropy sources are
coming from and how all this works, but on first glance, this prng
doesn't look like an especially cryptographically secure one. I realize
that isn't necessarily your intention (you're focused on speed), but
actually might this be sort of important? If I understand correctly, the
objective of this patch set is so that leaking the address of one
function doesn't leak the address of all other functions, as is the case
with fixed-offset kaslr. But if you leak the addresses of _some_ set of
functions, and your prng is bogus, might it be possible to figure out
the rest? For some prngs, if you give me the output stream of a few
numbers, I can predict the rest. For others, it's not this straight
forward, but there are some varieties of similar attacks. If any of that
set of concerns turns out to apply to your prng_u64 here, would that
undermine kaslr in similar ways as the current fixed-offset variety? Or
does it not matter because it's some kind of blinded fixed-size shuffle
with complex reasoning that makes this not a problem?

Jason

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 14:39     ` Arvind Sankar
@ 2020-02-06 15:29       ` Arvind Sankar
  2020-02-06 16:11         ` Andy Lutomirski
  0 siblings, 1 reply; 59+ messages in thread
From: Arvind Sankar @ 2020-02-06 15:29 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 09:39:43AM -0500, Arvind Sankar wrote:
> On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
> > I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
> > what function start alignment should be. It seems the linker is 16 byte
> > aligning these functions, when I think no alignment is needed for
> > function starts, so we're wasting some memory (average 8 bytes per
> > function, at say 50,000 functions, so approaching 512KB) between
> > functions. If we can specify a 1 byte alignment for these orphan
> > sections, that would be nice, as mentioned in the cover letter: we lose
> > a 4 bits of entropy to this alignment, since all randomized function
> > addresses will have their low bits set to zero.
> > 
> 
> The default function alignment is 16-bytes for x64 at least with gcc.
> You can use -falign-functions to specify a different alignment.
> 
> There was some old discussion on reducing it [1] but it doesn't seem to
> have been merged.
> 
> [1] https://lore.kernel.org/lkml/tip-4874fe1eeb40b403a8c9d0ddeb4d166cab3f37ba@git.kernel.org/

Though I don't think the entropy loss is real. With 50k functions, you
can use at most log(50k!) = ~35 KiB worth of entropy in permuting them,
no matter what the alignment is. The only way you can get more is if you
have more than 50k slots to put them in.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 14:57     ` Arvind Sankar
@ 2020-02-06 15:45       ` Arvind Sankar
  0 siblings, 0 replies; 59+ messages in thread
From: Arvind Sankar @ 2020-02-06 15:45 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Kees Cook, Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 09:57:40AM -0500, Arvind Sankar wrote:
> On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
> > On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi wrote:
> > > We will be using -ffunction-sections to place each function in
> > > it's own text section so it can be randomized at load time. The
> > > linker considers these .text.* sections "orphaned sections", and
> > > will place them after the first similar section (.text). However,
> > > we need to move _etext so that it is after both .text and .text.*
> > > We also need to calculate text size to include .text AND .text.*
> > 
> > The dependency on the linker's orphan section handling is, I feel,
> > rather fragile (during work on CFI and generally building kernels with
> > Clang's LLD linker, we keep tripping over difference between how BFD and
> > LLD handle orphans). However, this is currently no way to perform a
> > section "pass through" where input sections retain their name as an
> > output section. (If anyone knows a way to do this, I'm all ears).
> > 
> > Right now, you can only collect sections like this:
> > 
> >         .text :  AT(ADDR(.text) - LOAD_OFFSET) {
> > 		*(.text.*)
> > 	}
> > 
> > or let them be orphans, which then the linker attempts to find a
> > "similar" (code, data, etc) section to put them near:
> > https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
> > 
> > So, basically, yes, this works, but I'd like to see BFD and LLD grow
> > some kind of /PASSTHRU/ special section (like /DISCARD/), that would let
> > a linker script specify _where_ these sections should roughly live.
> > 
> 
> You could go through the objects that are being linked and find the
> individual text sections, and generate the linker script using that?

Also, one thing to note about the orphan section handling -- by default
ld will combine multiple orphan sections with the same name into a
single output section. So if you have sections corresponding to static
functions with the same name but from different files, they will get
unnecessarily combined. You may want to add --unique to the ld options
to keep them separate. That will create multiple sections with the same
name instead of merging them.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 15:29       ` Arvind Sankar
@ 2020-02-06 16:11         ` Andy Lutomirski
  0 siblings, 0 replies; 59+ messages in thread
From: Andy Lutomirski @ 2020-02-06 16:11 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Kees Cook, Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening


> On Feb 6, 2020, at 7:29 AM, Arvind Sankar <nivedita@alum.mit.edu> wrote:
> 
> On Thu, Feb 06, 2020 at 09:39:43AM -0500, Arvind Sankar wrote:
>>> On Thu, Feb 06, 2020 at 04:26:23AM -0800, Kees Cook wrote:
>>> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
>>> what function start alignment should be. It seems the linker is 16 byte
>>> aligning these functions, when I think no alignment is needed for
>>> function starts, so we're wasting some memory (average 8 bytes per
>>> function, at say 50,000 functions, so approaching 512KB) between
>>> functions. If we can specify a 1 byte alignment for these orphan
>>> sections, that would be nice, as mentioned in the cover letter: we lose
>>> a 4 bits of entropy to this alignment, since all randomized function
>>> addresses will have their low bits set to zero.
>>> 
>> 
>> The default function alignment is 16-bytes for x64 at least with gcc.
>> You can use -falign-functions to specify a different alignment.
>> 
>> There was some old discussion on reducing it [1] but it doesn't seem to
>> have been merged.
>> 
>> [1] https://lore.kernel.org/lkml/tip-4874fe1eeb40b403a8c9d0ddeb4d166cab3f37ba@git.kernel.org/
> 
> Though I don't think the entropy loss is real. With 50k functions, you
> can use at most log(50k!) = ~35 KiB worth of entropy in permuting them,
> no matter what the alignment is. The only way you can get more is if you
> have more than 50k slots to put them in.

There is a security consideration here that has nothing to do with entropy per se. If an attacker locates two functions, they learn the distance between them. This constrains what can fit in the gap. Padding reduces the strength of this type of attack, as would some degree of random padding.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* RE: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 13:15     ` Jann Horn
@ 2020-02-06 16:27       ` David Laight
  0 siblings, 0 replies; 59+ messages in thread
From: David Laight @ 2020-02-06 16:27 UTC (permalink / raw)
  To: Jann Horn, Kees Cook
  Cc: Kristen Carlson Accardi, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, H . Peter Anvin, Arjan van de Ven,
	Rick Edgecombe, the arch/x86 maintainers, kernel list,
	Kernel Hardening

From: Jann Horn
> Sent: 06 February 2020 13:16
...
> > I cannot find evidence for
> > what function start alignment should be.
> 
> There is no architecturally required alignment for functions, but
> Intel's Optimization Manual
> (<https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-
> optimization-manual.pdf>)
> recommends in section 3.4.1.5, "Code Alignment":
> 
> | Assembly/Compiler Coding Rule 12. (M impact, H generality)
> | All branch targets should be 16-byte aligned.
> 
> AFAIK this is recommended because, as documented in section 2.3.2.1,
> "Legacy Decode Pipeline" (describing the frontend of Sandy Bridge, and
> used as the base for newer microarchitectures):
> 
> | An instruction fetch is a 16-byte aligned lookup through the ITLB
> and into the instruction cache.
> | The instruction cache can deliver every cycle 16 bytes to the
> instruction pre-decoder.
> 
> AFAIK this means that if a branch ends close to the end of a 16-byte
> block, the frontend is less efficient because it may have to run two
> instruction fetches before the first instruction can even be decoded.

See also The microarchitecture of Intel, AMD and VIA CPUs from www.agner.org/optimize 

My suspicion is that reducing the cache size (so more code fits in)
will almost always be a win over aligning branch targets and entry points.
If the alignment of a function matters then there are probably other
changes to that bit of code that will give a larger benefit.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 03/11] x86/boot: Allow a "silent" kaslr random byte fetch
  2020-02-06  1:08   ` Andy Lutomirski
  2020-02-06 11:48     ` Kees Cook
@ 2020-02-06 16:58     ` Kristen Carlson Accardi
  1 sibling, 0 replies; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-06 16:58 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: tglx, mingo, bp, hpa, arjan, keescook, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening

On Wed, 2020-02-05 at 17:08 -0800, Andy Lutomirski wrote:
> > On Feb 5, 2020, at 2:39 PM, Kristen Carlson Accardi <
> > kristen@linux.intel.com> wrote:
> > 
> > From: Kees Cook <keescook@chromium.org>
> > 
> > Under earlyprintk, each RNG call produces a debug report line. When
> > shuffling hundreds of functions, this is not useful information
> > (each
> > line is identical and tells us nothing new). Instead, allow for a
> > NULL
> > "purpose" to suppress the debug reporting.
> 
> Have you counted how many RDRAND calls this causes?  RDRAND is
> exceedingly slow on all CPUs I’ve looked at. The whole “RDRAND has
> great bandwidth” marketing BS actually means that it has decent
> bandwidth if all CPUs hammer it at the same time. The latency is
> abysmal.  I have asked Intel to improve this, but the latency of that
> request will be quadrillions of cycles :)
> 
> It wouldn’t shock me if just the RDRAND calls account for a
> respectable fraction of total time. The RDTSC fallback, on the other
> hand, may be so predictable as to be useless.

I think at the moment the calls to rdrand are really not the largest
contributor to the latency. The relocations are the real bottleneck -
each address must be inspected to see if it is in the list of function
sections that have been randomized, and the value at that address must
also be inspected to see if it's in the list of function sections.
That's a lot of lookups. That said, I tried to measure the difference
between using Kees' prng vs. the rdrand calls and found little to no
measurable difference. I think at this point it's in the noise -
hopefully we will get to a point where this matters more.

> 
> I would suggest adding a little ChaCha20 DRBG or similar to the KASLR
> environment instead. What crypto primitives are available there?

I will read up on this.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-06 14:52       ` Peter Zijlstra
@ 2020-02-06 17:25         ` Kristen Carlson Accardi
  2020-02-06 17:35           ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-06 17:25 UTC (permalink / raw)
  To: Peter Zijlstra, Kees Cook
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening

On Thu, 2020-02-06 at 15:52 +0100, Peter Zijlstra wrote:
> On Thu, Feb 06, 2020 at 04:06:17AM -0800, Kees Cook wrote:
> > On Thu, Feb 06, 2020 at 11:38:30AM +0100, Peter Zijlstra wrote:
> > > On Wed, Feb 05, 2020 at 02:39:47PM -0800, Kristen Carlson Accardi
> > > wrote:
> > > > +static long __start___ex_table_addr;
> > > > +static long __stop___ex_table_addr;
> > > > +static long _stext;
> > > > +static long _etext;
> > > > +static long _sinittext;
> > > > +static long _einittext;
> > > 
> > > Should you not also adjust __jump_table, __mcount_loc,
> > > __kprobe_blacklist and possibly others that include text
> > > addresses?
> > 
> > These don't appear to be sorted at build time. 
> 
> The ORC tables are though:
> 
>   57fa18994285 ("scripts/sorttable: Implement build-time ORC unwind
> table sorting")
> 
> > AIUI, the problem with
> > ex_table and kallsyms is that they're preprocessed at build time
> > and
> > opaque to the linker's relocation generation.
> 
> I was under the impression these tables no longer had relocation
> data;
> that since they're part of the main kernel, the final link stage
> could
> completely resolve them.
> 
> That said, I now see we actually have .rela__extable
> .rela.orc_unwind_ip
> etc.

That's right - all of these tables that you mention had relocs and thus
I did not have to do anything special for them. The orc_unwind_ip
tables get sorted during unwind_init(). If they are needed earlier than
that, then they could be re-sorted like we do with the exception table.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-06 17:25         ` Kristen Carlson Accardi
@ 2020-02-06 17:35           ` Peter Zijlstra
  2020-02-06 17:43             ` Kristen Carlson Accardi
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2020-02-06 17:35 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Kees Cook, tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 09:25:01AM -0800, Kristen Carlson Accardi wrote:

> That's right - all of these tables that you mention had relocs and thus
> I did not have to do anything special for them. The orc_unwind_ip
> tables get sorted during unwind_init(). 

No they're not:

  f14bf6a350df ("x86/unwind/orc: Remove boot-time ORC unwind tables sorting")

Or rather, it might be you're working on an old tree.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-06 11:56     ` Kees Cook
@ 2020-02-06 17:36       ` Kristen Carlson Accardi
  0 siblings, 0 replies; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-06 17:36 UTC (permalink / raw)
  To: Kees Cook, Andy Lutomirski
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, H. Peter Anvin,
	Arjan van de Ven, Rick Edgecombe, X86 ML, LKML, Kernel Hardening

On Thu, 2020-02-06 at 03:56 -0800, Kees Cook wrote:
> On Wed, Feb 05, 2020 at 05:17:11PM -0800, Andy Lutomirski wrote:
> > On Wed, Feb 5, 2020 at 2:39 PM Kristen Carlson Accardi
> > <kristen@linux.intel.com> wrote:
> > > At boot time, find all the function sections that have separate
> > > .text
> > > sections, shuffle them, and then copy them to new locations.
> > > Adjust
> > > any relocations accordingly.
> > > 
> > > +       sort(base, num_syms, sizeof(int), kallsyms_cmp,
> > > kallsyms_swp);
> > 
> > Hah, here's a huge bottleneck.  Unless you are severely
> > memory-constrained, never do a sort with an expensive swap function
> > like this.  Instead allocate an array of indices that starts out as
> > [0, 1, 2, ...].  Sort *that* where the swap function just swaps the
> > indices.  Then use the sorted list of indices to permute the actual
> > data.  The result is exactly one expensive swap per item instead of
> > one expensive swap per swap.
> 
> I think there are few places where memory-vs-speed need to be
> examined.
> I remain surprised about how much memory the entire series already
> uses
> (58MB in my local tests), but I suspect this is likely dominated by
> the
> two factors: a full copy of the decompressed kernel, and that the
> "allocator" in the image doesn't really implement free():
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/decompress/mm.h#n55
> 

Yes - that was a huge issue (that free() doesn't actually...). Having
to do the copy really caused me to need to bump up the boot heap.
Thankfully, this is a readily solvable problem.

I think there's a temptation to focus too hard on the boot latency.
While I measured this on a reasonably fast system, we aren't talking
minutes of latency here, just a second or a second and a half. I know
there are those who sweat the milliseconds on booting vms, but I expect
they might just turn this feature off anyway. That said, there are
absolutely a lot of great ideas for improving things here that I am
excited to try should people be interested enough in this feature for
me to take it to the next stage.




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 08/11] x86: Add support for finer grained KASLR
  2020-02-06 17:35           ` Peter Zijlstra
@ 2020-02-06 17:43             ` Kristen Carlson Accardi
  0 siblings, 0 replies; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-06 17:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Kees Cook, tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening

On Thu, 2020-02-06 at 18:35 +0100, Peter Zijlstra wrote:
> On Thu, Feb 06, 2020 at 09:25:01AM -0800, Kristen Carlson Accardi
> wrote:
> 
> > That's right - all of these tables that you mention had relocs and
> > thus
> > I did not have to do anything special for them. The orc_unwind_ip
> > tables get sorted during unwind_init(). 
> 
> No they're not:
> 
>   f14bf6a350df ("x86/unwind/orc: Remove boot-time ORC unwind tables
> sorting")
> 
> Or rather, it might be you're working on an old tree.

Doh! Ok, I can make a patch to add it back based on CONFIG_FG_KASLR, or
just do yet another resort at boot time.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 09/11] kallsyms: hide layout and expose seed
  2020-02-06 12:32   ` Kees Cook
@ 2020-02-06 17:51     ` Kristen Carlson Accardi
  2020-02-06 19:27       ` Jann Horn
  0 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-06 17:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening

On Thu, 2020-02-06 at 04:32 -0800, Kees Cook wrote:
> On Wed, Feb 05, 2020 at 02:39:48PM -0800, Kristen Carlson Accardi
> wrote:
> > To support finer grained kaslr (fgkaslr), we need to make a couple
> > changes
> > to kallsyms. Firstly, we need to hide our sorted list of symbols,
> > since
> > this will give away our new layout. Secondly, we will export the
> > seed used
> > for randomizing the layout so that it can be used to make a
> > particular
> > layout persist across boots for debug purposes.
> > 
> > Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
> > ---
> >  kernel/kallsyms.c | 30 +++++++++++++++++++++++++++++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
> > index 136ce049c4ad..432b13a3a033 100644
> > --- a/kernel/kallsyms.c
> > +++ b/kernel/kallsyms.c
> > @@ -698,6 +698,21 @@ const char *kdb_walk_kallsyms(loff_t *pos)
> >  }
> >  #endif	/* CONFIG_KGDB_KDB */
> >  
> > +#ifdef CONFIG_FG_KASLR
> > +extern const u64 fgkaslr_seed[] __weak;
> > +
> > +static int proc_fgkaslr_show(struct seq_file *m, void *v)
> > +{
> > +	seq_printf(m, "%llx\n", fgkaslr_seed[0]);
> > +	seq_printf(m, "%llx\n", fgkaslr_seed[1]);
> > +	seq_printf(m, "%llx\n", fgkaslr_seed[2]);
> > +	seq_printf(m, "%llx\n", fgkaslr_seed[3]);
> > +	return 0;
> > +}
> > +#else
> > +static inline int proc_fgkaslr_show(struct seq_file *m, void *v) {
> > return 0; }
> > +#endif
> > +
> 
> I'd like to put the fgkaslr seed exposure behind a separate DEBUG
> config, since it shouldn't be normally exposed. As such, its
> infrastructure should be likely extracted from this and the main
> fgkaslr
> patches and added back separately (and maybe it will entirely vanish
> once the RNG is switched to ChaCha20).

OK, sounds reasonable to me.

> 
> >  static const struct file_operations kallsyms_operations = {
> >  	.open = kallsyms_open,
> >  	.read = seq_read,
> > @@ -707,7 +722,20 @@ static const struct file_operations
> > kallsyms_operations = {
> >  
> >  static int __init kallsyms_init(void)
> >  {
> > -	proc_create("kallsyms", 0444, NULL, &kallsyms_operations);
> > +	/*
> > +	 * When fine grained kaslr is enabled, we don't want to
> > +	 * print out the symbols even with zero pointers because
> > +	 * this reveals the randomization order. If fg kaslr is
> > +	 * enabled, make kallsyms available only to privileged
> > +	 * users.
> > +	 */
> > +	if (!IS_ENABLED(CONFIG_FG_KASLR))
> > +		proc_create("kallsyms", 0444, NULL,
> > &kallsyms_operations);
> > +	else {
> > +		proc_create_single("fgkaslr_seed", 0400, NULL,
> > +					proc_fgkaslr_show);
> > +		proc_create("kallsyms", 0400, NULL,
> > &kallsyms_operations);
> > +	}
> >  	return 0;
> >  }
> >  device_initcall(kallsyms_init);
> > -- 
> > 2.24.1
> 
> In the past, making kallsyms entirely unreadable seemed to break
> weird
> stuff in userspace. How about having an alternative view that just
> contains a alphanumeric sort of the symbol names (and they will
> continue
> to have zeroed addresses for unprivileged users)?
> 
> Or perhaps we wait to hear about this causing a problem, and deal
> with
> it then? :)
> 

Yeah - I don't know what people want here. Clearly, we can't leave
kallsyms the way it is. Removing it entirely is a pretty fast way to
figure out how people use it though :).



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 09/11] kallsyms: hide layout and expose seed
  2020-02-06 17:51     ` Kristen Carlson Accardi
@ 2020-02-06 19:27       ` Jann Horn
  0 siblings, 0 replies; 59+ messages in thread
From: Jann Horn @ 2020-02-06 19:27 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Kees Cook, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H . Peter Anvin, Arjan van de Ven, Rick Edgecombe,
	the arch/x86 maintainers, kernel list, Kernel Hardening

On Thu, Feb 6, 2020 at 6:51 PM Kristen Carlson Accardi
<kristen@linux.intel.com> wrote:
> On Thu, 2020-02-06 at 04:32 -0800, Kees Cook wrote:
> > In the past, making kallsyms entirely unreadable seemed to break
> > weird
> > stuff in userspace. How about having an alternative view that just
> > contains a alphanumeric sort of the symbol names (and they will
> > continue
> > to have zeroed addresses for unprivileged users)?
> >
> > Or perhaps we wait to hear about this causing a problem, and deal
> > with
> > it then? :)
> >
>
> Yeah - I don't know what people want here. Clearly, we can't leave
> kallsyms the way it is. Removing it entirely is a pretty fast way to
> figure out how people use it though :).

FYI, a pretty decent way to see how people are using an API is
codesearch.debian.net, which searches through the source code of all
the packages debian ships:

https://codesearch.debian.net/search?q=%2Fproc%2Fkallsyms&literal=1

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 12:26   ` Kees Cook
                       ` (2 preceding siblings ...)
  2020-02-06 14:57     ` Arvind Sankar
@ 2020-02-06 19:41     ` Kristen Carlson Accardi
  2020-02-06 20:02       ` Andy Lutomirski
  3 siblings, 1 reply; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-06 19:41 UTC (permalink / raw)
  To: Kees Cook
  Cc: tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86, linux-kernel,
	kernel-hardening

On Thu, 2020-02-06 at 04:26 -0800, Kees Cook wrote:
> On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi
> wrote:
> > We will be using -ffunction-sections to place each function in
> > it's own text section so it can be randomized at load time. The
> > linker considers these .text.* sections "orphaned sections", and
> > will place them after the first similar section (.text). However,
> > we need to move _etext so that it is after both .text and .text.*
> > We also need to calculate text size to include .text AND .text.*
> 
> The dependency on the linker's orphan section handling is, I feel,
> rather fragile (during work on CFI and generally building kernels
> with
> Clang's LLD linker, we keep tripping over difference between how BFD
> and
> LLD handle orphans). However, this is currently no way to perform a
> section "pass through" where input sections retain their name as an
> output section. (If anyone knows a way to do this, I'm all ears).
> 
> Right now, you can only collect sections like this:
> 
>         .text :  AT(ADDR(.text) - LOAD_OFFSET) {
> 		*(.text.*)
> 	}
> 
> or let them be orphans, which then the linker attempts to find a
> "similar" (code, data, etc) section to put them near:
> https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
> 
> So, basically, yes, this works, but I'd like to see BFD and LLD grow
> some kind of /PASSTHRU/ special section (like /DISCARD/), that would
> let
> a linker script specify _where_ these sections should roughly live.
> 
> Related thoughts:
> 
> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
> what function start alignment should be. It seems the linker is 16
> byte
> aligning these functions, when I think no alignment is needed for
> function starts, so we're wasting some memory (average 8 bytes per
> function, at say 50,000 functions, so approaching 512KB) between
> functions. If we can specify a 1 byte alignment for these orphan
> sections, that would be nice, as mentioned in the cover letter: we
> lose
> a 4 bits of entropy to this alignment, since all randomized function
> addresses will have their low bits set to zero.

So, when I was developing this patch set, I initially ignored the value
of sh_addralign and just packed the functions in one right after
another when I did the new layout. They were byte aligned :). I later
realized that I should probably pay attention to alignment and thus
started respecting the value that was in sh_addralign. There is
actually nothing stopping me from ignoring it again, other than I am
concerned that I will make runtime performance suffer even more than I
already have.

> 
> And we can't adjust function section alignment, or there is some
> benefit to a larger alignment, I would like to have a way for the
> linker
> to specify the inter-section padding (or fill) bytes. Right now, the
> FILL(0xnn) (or =0xnn) can be used to specify the padding bytes
> _within_
> a section, but not between sections. Right now, BFD appears to 0-pad. 
> I'd
> like that to be 0xCC so "guessing" addresses incorrectly will trigger
> a trap.

Padding the space between functions with int3 is easy to add during
boot time, and I've got it on my todo list.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 19:41     ` Kristen Carlson Accardi
@ 2020-02-06 20:02       ` Andy Lutomirski
  2020-02-07  9:24         ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Andy Lutomirski @ 2020-02-06 20:02 UTC (permalink / raw)
  To: Kristen Carlson Accardi
  Cc: Kees Cook, tglx, mingo, bp, hpa, arjan, rick.p.edgecombe, x86,
	linux-kernel, kernel-hardening



> On Feb 6, 2020, at 11:41 AM, Kristen Carlson Accardi <kristen@linux.intel.com> wrote:
> 
> On Thu, 2020-02-06 at 04:26 -0800, Kees Cook wrote:
>>> On Wed, Feb 05, 2020 at 02:39:45PM -0800, Kristen Carlson Accardi
>>> wrote:
>>> We will be using -ffunction-sections to place each function in
>>> it's own text section so it can be randomized at load time. The
>>> linker considers these .text.* sections "orphaned sections", and
>>> will place them after the first similar section (.text). However,
>>> we need to move _etext so that it is after both .text and .text.*
>>> We also need to calculate text size to include .text AND .text.*
>> 
>> The dependency on the linker's orphan section handling is, I feel,
>> rather fragile (during work on CFI and generally building kernels
>> with
>> Clang's LLD linker, we keep tripping over difference between how BFD
>> and
>> LLD handle orphans). However, this is currently no way to perform a
>> section "pass through" where input sections retain their name as an
>> output section. (If anyone knows a way to do this, I'm all ears).
>> 
>> Right now, you can only collect sections like this:
>> 
>>        .text :  AT(ADDR(.text) - LOAD_OFFSET) {
>>        *(.text.*)
>>    }
>> 
>> or let them be orphans, which then the linker attempts to find a
>> "similar" (code, data, etc) section to put them near:
>> https://sourceware.org/binutils/docs-2.33.1/ld/Orphan-Sections.html
>> 
>> So, basically, yes, this works, but I'd like to see BFD and LLD grow
>> some kind of /PASSTHRU/ special section (like /DISCARD/), that would
>> let
>> a linker script specify _where_ these sections should roughly live.
>> 
>> Related thoughts:
>> 
>> I know x86_64 stack alignment is 16 bytes. I cannot find evidence for
>> what function start alignment should be. It seems the linker is 16
>> byte
>> aligning these functions, when I think no alignment is needed for
>> function starts, so we're wasting some memory (average 8 bytes per
>> function, at say 50,000 functions, so approaching 512KB) between
>> functions. If we can specify a 1 byte alignment for these orphan
>> sections, that would be nice, as mentioned in the cover letter: we
>> lose
>> a 4 bits of entropy to this alignment, since all randomized function
>> addresses will have their low bits set to zero.
> 
> So, when I was developing this patch set, I initially ignored the value
> of sh_addralign and just packed the functions in one right after
> another when I did the new layout. They were byte aligned :). I later
> realized that I should probably pay attention to alignment and thus
> started respecting the value that was in sh_addralign. There is
> actually nothing stopping me from ignoring it again, other than I am
> concerned that I will make runtime performance suffer even more than I
> already have.

If you start randomizing *data* sections, then alignment matters.

Also, in the shiny new era of Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment may actually matter.  Sigh.  The symptom will be horrible maybe-exploitable crashes on old microcode and “minimal performance impact” on new microcode. In this context, “minimal” may actually mean “huge, throw away your CPU and replace it with one from a different vendor.”

Of course, there doesn’t appear to be anything resembling credible public documentation for any of this.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss
  2020-02-06 14:25       ` Arvind Sankar
@ 2020-02-06 21:32         ` Kees Cook
  0 siblings, 0 replies; 59+ messages in thread
From: Kees Cook @ 2020-02-06 21:32 UTC (permalink / raw)
  To: Arvind Sankar
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 09:25:59AM -0500, Arvind Sankar wrote:
> On Thu, Feb 06, 2020 at 03:13:12AM -0800, Kees Cook wrote:
> > Yes, thank you for the reminder. I couldn't find the ZO_INIT_SIZE when I
> > was staring at this, since I only looked around the compressed/ directory.
> > :)
> > 
> 
> There's another thing I noticed -- you would need to ensure that the
> init_size in the header covers your boot heap even if you did split it
> out. The reason is that the bootloader will only know to reserve enough
> memory for init_size: it's possible it might put the initrd or something
> else following the kernel, or theoretically there might be reserved
> memory regions or the end of physical RAM immediately following, so you
> can't assume that area will be available when you get to extract_kernel.

Yeah, that's what I was worrying about after I wrote that patch. Yours
is the correct solution. :) (I Acked both of those now).

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling
  2020-02-06 15:10   ` Jason A. Donenfeld
@ 2020-02-07  7:23     ` Jean-Philippe Aumasson
  2020-02-07  9:05       ` Kees Cook
  0 siblings, 1 reply; 59+ messages in thread
From: Jean-Philippe Aumasson @ 2020-02-07  7:23 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Kristen Carlson Accardi, keescook, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, LKML, kernel-hardening

[-- Attachment #1: Type: text/plain, Size: 2317 bytes --]

Let me share my 2 cents:

That permutation might be safe but afaict it hasn't been analyzed wrt
modern cryptographic techniques and there might well be differential
characteristics, statistical biases, etc.

What about just using SipHash's permutation, already in the kernel? It
works on 4*u64 words too, and 6 rounds would be enough.

Doing a basic ops count, we currently have 5 group operations and 3
rotations per round or 150 and 90 for the 30 init rounds. With SipHash it'd
be 48 and 36 with the proposed 6 rounds. Probably insignificant speed wise
as init is only done once but just to show that we'd get both better
security assurance and better performance.


On Thu, Feb 6, 2020 at 4:10 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> Hey Kees,
>
> On Wed, Feb 05, 2020 at 02:39:43PM -0800, Kristen Carlson Accardi wrote:
> > +#define rot(x, k) (((x)<<(k))|((x)>>(64-(k))))
> > +static u64 prng_u64(struct prng_state *x)
> > +{
> > +     u64 e;
> > +
> > +     e = x->a - rot(x->b, 7);
> > +     x->a = x->b ^ rot(x->c, 13);
> > +     x->b = x->c + rot(x->d, 37);
> > +     x->c = x->d + e;
> > +     x->d = e + x->a;
> > +
> > +     return x->d;
> > +}
>
> I haven't looked closely at where the original entropy sources are
> coming from and how all this works, but on first glance, this prng
> doesn't look like an especially cryptographically secure one. I realize
> that isn't necessarily your intention (you're focused on speed), but
> actually might this be sort of important? If I understand correctly, the
> objective of this patch set is so that leaking the address of one
> function doesn't leak the address of all other functions, as is the case
> with fixed-offset kaslr. But if you leak the addresses of _some_ set of
> functions, and your prng is bogus, might it be possible to figure out
> the rest? For some prngs, if you give me the output stream of a few
> numbers, I can predict the rest. For others, it's not this straight
> forward, but there are some varieties of similar attacks. If any of that
> set of concerns turns out to apply to your prng_u64 here, would that
> undermine kaslr in similar ways as the current fixed-offset variety? Or
> does it not matter because it's some kind of blinded fixed-size shuffle
> with complex reasoning that makes this not a problem?
>
> Jason
>

[-- Attachment #2: Type: text/html, Size: 2943 bytes --]

<div dir="ltr">Let me share my 2 cents:<div><br></div><div>That permutation might be safe but afaict it hasn&#39;t been analyzed wrt modern cryptographic techniques and there might well be differential characteristics, statistical biases, etc. </div><div><br></div><div>What about just using SipHash&#39;s permutation, already in the kernel? It works on 4*u64 words too, and 6 rounds would be enough. </div><div><br></div><div>Doing a basic ops count, we currently have 5 group operations and 3 rotations per round or 150 and 90 for the 30 init rounds. With SipHash it&#39;d be 48 and 36 with the proposed 6 rounds. Probably insignificant speed wise as init is only done once but just to show that we&#39;d get both better security assurance and better performance.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 6, 2020 at 4:10 PM Jason A. Donenfeld &lt;<a href="mailto:Jason@zx2c4.com">Jason@zx2c4.com</a>&gt; wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hey Kees,<br>
<br>
On Wed, Feb 05, 2020 at 02:39:43PM -0800, Kristen Carlson Accardi wrote:<br>
&gt; +#define rot(x, k) (((x)&lt;&lt;(k))|((x)&gt;&gt;(64-(k))))<br>
&gt; +static u64 prng_u64(struct prng_state *x)<br>
&gt; +{<br>
&gt; +     u64 e;<br>
&gt; +<br>
&gt; +     e = x-&gt;a - rot(x-&gt;b, 7);<br>
&gt; +     x-&gt;a = x-&gt;b ^ rot(x-&gt;c, 13);<br>
&gt; +     x-&gt;b = x-&gt;c + rot(x-&gt;d, 37);<br>
&gt; +     x-&gt;c = x-&gt;d + e;<br>
&gt; +     x-&gt;d = e + x-&gt;a;<br>
&gt; +<br>
&gt; +     return x-&gt;d;<br>
&gt; +}<br>
<br>
I haven&#39;t looked closely at where the original entropy sources are<br>
coming from and how all this works, but on first glance, this prng<br>
doesn&#39;t look like an especially cryptographically secure one. I realize<br>
that isn&#39;t necessarily your intention (you&#39;re focused on speed), but<br>
actually might this be sort of important? If I understand correctly, the<br>
objective of this patch set is so that leaking the address of one<br>
function doesn&#39;t leak the address of all other functions, as is the case<br>
with fixed-offset kaslr. But if you leak the addresses of _some_ set of<br>
functions, and your prng is bogus, might it be possible to figure out<br>
the rest? For some prngs, if you give me the output stream of a few<br>
numbers, I can predict the rest. For others, it&#39;s not this straight<br>
forward, but there are some varieties of similar attacks. If any of that<br>
set of concerns turns out to apply to your prng_u64 here, would that<br>
undermine kaslr in similar ways as the current fixed-offset variety? Or<br>
does it not matter because it&#39;s some kind of blinded fixed-size shuffle<br>
with complex reasoning that makes this not a problem?<br>
<br>
Jason<br>
</blockquote></div>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling
  2020-02-07  7:23     ` Jean-Philippe Aumasson
@ 2020-02-07  9:05       ` Kees Cook
  2020-02-07 16:52         ` Kristen Carlson Accardi
  0 siblings, 1 reply; 59+ messages in thread
From: Kees Cook @ 2020-02-07  9:05 UTC (permalink / raw)
  To: Jean-Philippe Aumasson
  Cc: Jason A. Donenfeld, Kristen Carlson Accardi, tglx, mingo, bp,
	hpa, arjan, rick.p.edgecombe, x86, LKML, kernel-hardening

On Fri, Feb 07, 2020 at 08:23:53AM +0100, Jean-Philippe Aumasson wrote:
> 
> On Thu, Feb 6, 2020 at 4:10 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> 
> > Hey Kees,
> >
> > On Wed, Feb 05, 2020 at 02:39:43PM -0800, Kristen Carlson Accardi wrote:
> > > +#define rot(x, k) (((x)<<(k))|((x)>>(64-(k))))
> > > +static u64 prng_u64(struct prng_state *x)
> > > +{
> > > +     u64 e;
> > > +
> > > +     e = x->a - rot(x->b, 7);
> > > +     x->a = x->b ^ rot(x->c, 13);
> > > +     x->b = x->c + rot(x->d, 37);
> > > +     x->c = x->d + e;
> > > +     x->d = e + x->a;
> > > +
> > > +     return x->d;
> > > +}
> >
> > I haven't looked closely at where the original entropy sources are
> > coming from and how all this works, but on first glance, this prng
> > doesn't look like an especially cryptographically secure one. I realize
> > that isn't necessarily your intention (you're focused on speed), but
> > actually might this be sort of important? If I understand correctly, the
> > objective of this patch set is so that leaking the address of one
> > function doesn't leak the address of all other functions, as is the case
> > with fixed-offset kaslr. But if you leak the addresses of _some_ set of
> > functions, and your prng is bogus, might it be possible to figure out
> > the rest? For some prngs, if you give me the output stream of a few
> > numbers, I can predict the rest. For others, it's not this straight
> > forward, but there are some varieties of similar attacks. If any of that
> > set of concerns turns out to apply to your prng_u64 here, would that
> > undermine kaslr in similar ways as the current fixed-offset variety? Or
> > does it not matter because it's some kind of blinded fixed-size shuffle
> > with complex reasoning that makes this not a problem?
> 
> Let me share my 2 cents:
> 
> That permutation might be safe but afaict it hasn't been analyzed wrt
> modern cryptographic techniques and there might well be differential
> characteristics, statistical biases, etc.
> 
> What about just using SipHash's permutation, already in the kernel? It
> works on 4*u64 words too, and 6 rounds would be enough.
> 
> Doing a basic ops count, we currently have 5 group operations and 3
> rotations per round or 150 and 90 for the 30 init rounds. With SipHash it'd
> be 48 and 36 with the proposed 6 rounds. Probably insignificant speed wise
> as init is only done once but just to show that we'd get both better
> security assurance and better performance.

Yeah, this was never meant to be anything but a POC and after timing
tests, it seemed like an unneeded abstraction but was kept for this
RFC so it was possible to specify a stable seed at boot for debugging,
etc. I think this patch will not survive to v1. :)

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-06 20:02       ` Andy Lutomirski
@ 2020-02-07  9:24         ` Peter Zijlstra
  2020-02-10  1:43           ` Kees Cook
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2020-02-07  9:24 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Kristen Carlson Accardi, Kees Cook, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote:
> Also, in the shiny new era of
> Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment
> may actually matter.

*groan*, indeed. I just went and looked that up. I missed this one in
all the other fuss :/

So per:

  https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf

the toolchain mitigations only work if the offset in the ifetch window
(32 bytes) is preserved. Which seems to suggest we ought to align all
functions to 32byte before randomizing it, otherwise we're almost
guaranteed to change this offset by the act of randomizing.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling
  2020-02-07  9:05       ` Kees Cook
@ 2020-02-07 16:52         ` Kristen Carlson Accardi
  0 siblings, 0 replies; 59+ messages in thread
From: Kristen Carlson Accardi @ 2020-02-07 16:52 UTC (permalink / raw)
  To: Kees Cook, Jean-Philippe Aumasson
  Cc: Jason A. Donenfeld, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, LKML, kernel-hardening

On Fri, 2020-02-07 at 01:05 -0800, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 08:23:53AM +0100, Jean-Philippe Aumasson
> wrote:
> > On Thu, Feb 6, 2020 at 4:10 PM Jason A. Donenfeld <Jason@zx2c4.com>
> > wrote:
> > 
> > > Hey Kees,
> > > 
> > > On Wed, Feb 05, 2020 at 02:39:43PM -0800, Kristen Carlson Accardi
> > > wrote:
> > > > +#define rot(x, k) (((x)<<(k))|((x)>>(64-(k))))
> > > > +static u64 prng_u64(struct prng_state *x)
> > > > +{
> > > > +     u64 e;
> > > > +
> > > > +     e = x->a - rot(x->b, 7);
> > > > +     x->a = x->b ^ rot(x->c, 13);
> > > > +     x->b = x->c + rot(x->d, 37);
> > > > +     x->c = x->d + e;
> > > > +     x->d = e + x->a;
> > > > +
> > > > +     return x->d;
> > > > +}
> > > 
> > > I haven't looked closely at where the original entropy sources
> > > are
> > > coming from and how all this works, but on first glance, this
> > > prng
> > > doesn't look like an especially cryptographically secure one. I
> > > realize
> > > that isn't necessarily your intention (you're focused on speed),
> > > but
> > > actually might this be sort of important? If I understand
> > > correctly, the
> > > objective of this patch set is so that leaking the address of one
> > > function doesn't leak the address of all other functions, as is
> > > the case
> > > with fixed-offset kaslr. But if you leak the addresses of _some_
> > > set of
> > > functions, and your prng is bogus, might it be possible to figure
> > > out
> > > the rest? For some prngs, if you give me the output stream of a
> > > few
> > > numbers, I can predict the rest. For others, it's not this
> > > straight
> > > forward, but there are some varieties of similar attacks. If any
> > > of that
> > > set of concerns turns out to apply to your prng_u64 here, would
> > > that
> > > undermine kaslr in similar ways as the current fixed-offset
> > > variety? Or
> > > does it not matter because it's some kind of blinded fixed-size
> > > shuffle
> > > with complex reasoning that makes this not a problem?
> > 
> > Let me share my 2 cents:
> > 
> > That permutation might be safe but afaict it hasn't been analyzed
> > wrt
> > modern cryptographic techniques and there might well be
> > differential
> > characteristics, statistical biases, etc.
> > 
> > What about just using SipHash's permutation, already in the kernel?
> > It
> > works on 4*u64 words too, and 6 rounds would be enough.
> > 
> > Doing a basic ops count, we currently have 5 group operations and 3
> > rotations per round or 150 and 90 for the 30 init rounds. With
> > SipHash it'd
> > be 48 and 36 with the proposed 6 rounds. Probably insignificant
> > speed wise
> > as init is only done once but just to show that we'd get both
> > better
> > security assurance and better performance.
> 
> Yeah, this was never meant to be anything but a POC and after timing
> tests, it seemed like an unneeded abstraction but was kept for this
> RFC so it was possible to specify a stable seed at boot for
> debugging,
> etc. I think this patch will not survive to v1. :)

That's right, I'm going to drop it and go with the ChaCha20
implementation as was suggested.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-07  9:24         ` Peter Zijlstra
@ 2020-02-10  1:43           ` Kees Cook
  2020-02-10 10:51             ` Peter Zijlstra
  0 siblings, 1 reply; 59+ messages in thread
From: Kees Cook @ 2020-02-10  1:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andy Lutomirski, Kristen Carlson Accardi, tglx, mingo, bp, hpa,
	arjan, rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Fri, Feb 07, 2020 at 10:24:23AM +0100, Peter Zijlstra wrote:
> On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote:
> > Also, in the shiny new era of
> > Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment
> > may actually matter.
> 
> *groan*, indeed. I just went and looked that up. I missed this one in
> all the other fuss :/
> 
> So per:
> 
>   https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
> 
> the toolchain mitigations only work if the offset in the ifetch window
> (32 bytes) is preserved. Which seems to suggest we ought to align all
> functions to 32byte before randomizing it, otherwise we're almost
> guaranteed to change this offset by the act of randomizing.

Wheee! This sounds like in needs to be fixed generally, yes? (And I see
"FUNCTION_ALIGN" macro is currently 16 bytes...

-- 
Kees Cook

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-10  1:43           ` Kees Cook
@ 2020-02-10 10:51             ` Peter Zijlstra
  2020-02-10 15:54               ` Arjan van de Ven
  0 siblings, 1 reply; 59+ messages in thread
From: Peter Zijlstra @ 2020-02-10 10:51 UTC (permalink / raw)
  To: Kees Cook
  Cc: Andy Lutomirski, Kristen Carlson Accardi, tglx, mingo, bp, hpa,
	arjan, rick.p.edgecombe, x86, linux-kernel, kernel-hardening

On Sun, Feb 09, 2020 at 05:43:40PM -0800, Kees Cook wrote:
> On Fri, Feb 07, 2020 at 10:24:23AM +0100, Peter Zijlstra wrote:
> > On Thu, Feb 06, 2020 at 12:02:36PM -0800, Andy Lutomirski wrote:
> > > Also, in the shiny new era of
> > > Intel-CPUs-can’t-handle-Jcc-spanning-a-cacheline, function alignment
> > > may actually matter.
> > 
> > *groan*, indeed. I just went and looked that up. I missed this one in
> > all the other fuss :/
> > 
> > So per:
> > 
> >   https://www.intel.com/content/dam/support/us/en/documents/processors/mitigations-jump-conditional-code-erratum.pdf
> > 
> > the toolchain mitigations only work if the offset in the ifetch window
> > (32 bytes) is preserved. Which seems to suggest we ought to align all
> > functions to 32byte before randomizing it, otherwise we're almost
> > guaranteed to change this offset by the act of randomizing.
> 
> Wheee! This sounds like in needs to be fixed generally, yes? (And I see
> "FUNCTION_ALIGN" macro is currently 16 bytes...

It depends a bit on how it all works I suppose (I'm not too clear on the
details).

Suppose the linker appends translation units at (at least) 32 bytes
alignment, but the function alignment inside the translation unit is
smaller, then it could still work, because the assembler (which is going
to insert NOPs to avoid instructions being in the 'wrong' place) can
still know the offset.

If the linker is going to be fancy (say LTO) and move code around inside
sections/translation units, then this goes out the window obviously.

The same with this fine-grained-randomization, if the section alignment
is smaller than 32 bytes, the offset is going to change and the
mitigation will be nullified.

I'll leave it to others to figure out the exact details. But afaict it
should be possible to have fine-grained-randomization and preserve the
workaround in the end.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-10 10:51             ` Peter Zijlstra
@ 2020-02-10 15:54               ` Arjan van de Ven
  2020-02-10 16:36                 ` Arvind Sankar
  0 siblings, 1 reply; 59+ messages in thread
From: Arjan van de Ven @ 2020-02-10 15:54 UTC (permalink / raw)
  To: Peter Zijlstra, Kees Cook
  Cc: Andy Lutomirski, Kristen Carlson Accardi, tglx, mingo, bp, hpa,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

> 
> I'll leave it to others to figure out the exact details. But afaict it
> should be possible to have fine-grained-randomization and preserve the
> workaround in the end.
> 

the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing)
and then in the randomizer preserve the offset within 32 bytes, no matter what it is

that would get you an average padding of 16 bytes which is a bit more than now but not too insane
(queue Kees' argument that tiny bits of padding are actually good)





^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 06/11] x86: make sure _etext includes function sections
  2020-02-10 15:54               ` Arjan van de Ven
@ 2020-02-10 16:36                 ` Arvind Sankar
  0 siblings, 0 replies; 59+ messages in thread
From: Arvind Sankar @ 2020-02-10 16:36 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Peter Zijlstra, Kees Cook, Andy Lutomirski,
	Kristen Carlson Accardi, tglx, mingo, bp, hpa, rick.p.edgecombe,
	x86, linux-kernel, kernel-hardening

On Mon, Feb 10, 2020 at 07:54:58AM -0800, Arjan van de Ven wrote:
> > 
> > I'll leave it to others to figure out the exact details. But afaict it
> > should be possible to have fine-grained-randomization and preserve the
> > workaround in the end.
> > 
> 
> the most obvious "solution" is to compile with an alignment of 4 bytes (so tight packing)
> and then in the randomizer preserve the offset within 32 bytes, no matter what it is
> 
> that would get you an average padding of 16 bytes which is a bit more than now but not too insane
> (queue Kees' argument that tiny bits of padding are actually good)
> 

With the patchset for adding the mbranches-within-32B-boundaries option,
the section alignment gets forced to 32. With function-sections that
means function alignment has to be 32 too.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [RFC PATCH 10/11] module: Reorder functions
  2020-02-06 12:41   ` Kees Cook
@ 2020-02-11 12:39     ` Jessica Yu
  0 siblings, 0 replies; 59+ messages in thread
From: Jessica Yu @ 2020-02-11 12:39 UTC (permalink / raw)
  To: Kees Cook
  Cc: Kristen Carlson Accardi, tglx, mingo, bp, hpa, arjan,
	rick.p.edgecombe, x86, linux-kernel, kernel-hardening

+++ Kees Cook [06/02/20 04:41 -0800]:
>On Wed, Feb 05, 2020 at 02:39:49PM -0800, Kristen Carlson Accardi wrote:
>> If a module has functions split out into separate text sections
>> (i.e. compiled with the -ffunction-sections flag), reorder the
>> functions to provide some code diversification to modules.
>>
>> Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com>
>
>Probably a good idea to add Jessica to CC in next version:
>	Jessica Yu <jeyu@kernel.org>

Thanks :)

>Reviewed-by: Kees Cook <keescook@chromium.org>
>
>-Kees
>
>> ---
>>  kernel/module.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 85 insertions(+)
>>
>> diff --git a/kernel/module.c b/kernel/module.c
>> index b56f3224b161..231563e95e61 100644
>> --- a/kernel/module.c
>> +++ b/kernel/module.c
>> @@ -53,6 +53,8 @@
>>  #include <linux/bsearch.h>
>>  #include <linux/dynamic_debug.h>
>>  #include <linux/audit.h>
>> +#include <linux/random.h>
>> +#include <asm/setup.h>
>>  #include <uapi/linux/module.h>
>>  #include "module-internal.h"
>>
>> @@ -3245,6 +3247,87 @@ static int find_module_sections(struct module *mod, struct load_info *info)
>>  	return 0;
>>  }
>>
>> +/*
>> + * shuffle_text_list()
>> + * Use a Fisher Yates algorithm to shuffle a list of text sections.
>> + */
>> +static void shuffle_text_list(Elf_Shdr **list, int size)
>> +{
>> +	int i;
>> +	unsigned int j;
>> +	Elf_Shdr *temp;
>> +
>> +	for (i = size - 1; i > 0; i--) {
>> +		/*
>> +		 * TBD - seed. We need to be able to use a known
>> +		 * seed so that we can non-randomly randomize for
>> +		 * debugging.
>> +		 */
>> +
>> +		// pick a random index from 0 to i
>> +		get_random_bytes(&j, sizeof(j));
>> +		j = j % (i + 1);
>> +
>> +		temp = list[i];
>> +		list[i] = list[j];
>> +		list[j] = temp;
>> +	}
>> +}
>> +
>> +/*
>> + * randomize_text()
>> + * Look through the core section looking for executable code sections.
>> + * Store sections in an array and then shuffle the sections
>> + * to reorder the functions.
>> + */
>> +static void randomize_text(struct module *mod, struct load_info *info)
>> +{
>> +	int i;
>> +	int num_text_sections = 0;
>> +	Elf_Shdr **text_list;
>> +	int size = 0;
>> +	int max_sections = info->hdr->e_shnum;
>> +	unsigned int sec = find_sec(info, ".text");
>> +
>> +	if (!IS_ENABLED(CONFIG_FG_KASLR) || !kaslr_enabled())
>> +		return;

Maybe put this conditional before the call to randomize_text() in
move_module()? Just for code readability reasons (i.e., we only call
randomize_text() when CONFIG_FG_KASLR || kaslr_enabled()). But this is
just a matter of preference.

>> +
>> +	if (sec == 0)
>> +		return;
>> +
>> +	text_list = kmalloc_array(max_sections, sizeof(*text_list), GFP_KERNEL);
>> +	if (text_list == NULL)
>> +		return;
>> +
>> +	for (i = 0; i < max_sections; i++) {
>> +		Elf_Shdr *shdr = &info->sechdrs[i];
>> +		const char *sname = info->secstrings + shdr->sh_name;
>> +
>> +		if (!(shdr->sh_flags & SHF_ALLOC) ||
>> +		    !(shdr->sh_flags & SHF_EXECINSTR) ||
>> +		    strstarts(sname, ".init"))
>> +			continue;
>> +
>> +		text_list[num_text_sections] = shdr;
>> +		num_text_sections++;
>> +	}
>> +
>> +	shuffle_text_list(text_list, num_text_sections);
>> +
>> +	for (i = 0; i < num_text_sections; i++) {
>> +		Elf_Shdr *shdr = text_list[i];
>> +		unsigned int infosec;
>> +		const char *sname;
>> +
>> +		sname = info->secstrings + shdr->sh_name;

sname doesn't appear to be used after this (?)

>> +		infosec	= shdr->sh_info;
>> +
>> +		shdr->sh_entsize = get_offset(mod, &size, shdr, infosec);

get_offset() expects a section index as the last argument, but sh_info
is 0 for text sections (SHT_PROGBITS). It's really only used in
arch_mod_section_prepend() though, which is only defined for parisc.
We could perhaps save the section index in sh_info in the previous for
loop (hacky, I have to double check if this is actually safe, but it
seems sh_info only has signficance for SHT_REL* and SHT_SYMTAB
sections).

>> +	}
>> +
>> +	kfree(text_list);
>> +}
>> +
>>  static int move_module(struct module *mod, struct load_info *info)
>>  {
>>  	int i;
>> @@ -3282,6 +3365,8 @@ static int move_module(struct module *mod, struct load_info *info)
>>  	} else
>>  		mod->init_layout.base = NULL;
>>
>> +	randomize_text(mod, info);

Hm, I wonder if we should not incorporate the randomize_text() logic into
layout_sections() instead of move_module(), since logically speaking
layout_sections() determines all the section offsets and stores them
in sh_entsize, whereas move_module() does all the allocations and
copying to final destination. Just a thought.


Thanks!

Jessica

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, back to index

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-05 22:39 [RFC PATCH 00/11] Finer grained kernel address space randomization Kristen Carlson Accardi
2020-02-05 22:39 ` [RFC PATCH 01/11] modpost: Support >64K sections Kristen Carlson Accardi
2020-02-06 12:38   ` Kees Cook
2020-02-05 22:39 ` [RFC PATCH 02/11] x86: tools/relocs: Support >64K section headers Kristen Carlson Accardi
2020-02-06 12:39   ` Kees Cook
2020-02-05 22:39 ` [RFC PATCH 03/11] x86/boot: Allow a "silent" kaslr random byte fetch Kristen Carlson Accardi
2020-02-06  1:08   ` Andy Lutomirski
2020-02-06 11:48     ` Kees Cook
2020-02-06 16:58     ` Kristen Carlson Accardi
2020-02-05 22:39 ` [RFC PATCH 04/11] x86/boot/KASLR: Introduce PRNG for faster shuffling Kristen Carlson Accardi
2020-02-06  1:11   ` Andy Lutomirski
2020-02-06 15:10   ` Jason A. Donenfeld
2020-02-07  7:23     ` Jean-Philippe Aumasson
2020-02-07  9:05       ` Kees Cook
2020-02-07 16:52         ` Kristen Carlson Accardi
2020-02-05 22:39 ` [RFC PATCH 05/11] x86: Makefile: Add build and config option for CONFIG_FG_KASLR Kristen Carlson Accardi
2020-02-06 10:30   ` Peter Zijlstra
2020-02-06 11:52     ` Kees Cook
2020-02-05 22:39 ` [RFC PATCH 06/11] x86: make sure _etext includes function sections Kristen Carlson Accardi
2020-02-06 12:26   ` Kees Cook
2020-02-06 13:15     ` Jann Horn
2020-02-06 16:27       ` David Laight
2020-02-06 14:39     ` Arvind Sankar
2020-02-06 15:29       ` Arvind Sankar
2020-02-06 16:11         ` Andy Lutomirski
2020-02-06 14:57     ` Arvind Sankar
2020-02-06 15:45       ` Arvind Sankar
2020-02-06 19:41     ` Kristen Carlson Accardi
2020-02-06 20:02       ` Andy Lutomirski
2020-02-07  9:24         ` Peter Zijlstra
2020-02-10  1:43           ` Kees Cook
2020-02-10 10:51             ` Peter Zijlstra
2020-02-10 15:54               ` Arjan van de Ven
2020-02-10 16:36                 ` Arvind Sankar
2020-02-05 22:39 ` [RFC PATCH 07/11] x86/tools: Adding relative relocs for randomized functions Kristen Carlson Accardi
2020-02-06 12:37   ` Kees Cook
2020-02-05 22:39 ` [RFC PATCH 08/11] x86: Add support for finer grained KASLR Kristen Carlson Accardi
2020-02-06  1:17   ` Andy Lutomirski
2020-02-06 11:56     ` Kees Cook
2020-02-06 17:36       ` Kristen Carlson Accardi
2020-02-06 10:38   ` Peter Zijlstra
2020-02-06 12:06     ` Kees Cook
2020-02-06 14:52       ` Peter Zijlstra
2020-02-06 17:25         ` Kristen Carlson Accardi
2020-02-06 17:35           ` Peter Zijlstra
2020-02-06 17:43             ` Kristen Carlson Accardi
2020-02-05 22:39 ` [RFC PATCH 09/11] kallsyms: hide layout and expose seed Kristen Carlson Accardi
2020-02-06 12:32   ` Kees Cook
2020-02-06 17:51     ` Kristen Carlson Accardi
2020-02-06 19:27       ` Jann Horn
2020-02-05 22:39 ` [RFC PATCH 10/11] module: Reorder functions Kristen Carlson Accardi
2020-02-06 12:41   ` Kees Cook
2020-02-11 12:39     ` Jessica Yu
2020-02-05 22:39 ` [RFC PATCH 11/11] x86/boot: Move "boot heap" out of .bss Kristen Carlson Accardi
2020-02-06  0:11   ` Arvind Sankar
2020-02-06  0:33     ` Kristen Carlson Accardi
2020-02-06 11:13     ` Kees Cook
2020-02-06 14:25       ` Arvind Sankar
2020-02-06 21:32         ` Kees Cook

Kernel-hardening archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kernel-hardening/0 kernel-hardening/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kernel-hardening kernel-hardening/ https://lore.kernel.org/kernel-hardening \
		kernel-hardening@lists.openwall.com
	public-inbox-index kernel-hardening

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/com.openwall.lists.kernel-hardening


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git