linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] move mmap_area and PIE binaries away from the stack
@ 2017-06-02 15:20 riel
  2017-06-02 15:20 ` [PATCH 1/6] binfmt_elf: document load_bias a little bit riel
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: riel @ 2017-06-02 15:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-hardening, akpm, mingo, oleg, lwoodman, mhocko,
	danielmicay, will.deacon, benh

There are a few bugs causing the kernel to sometimes map PIE
binaries and the mmap_area where the stack is supposed to go.

This series fixes them for x86, ARM64, and PPC.
S390 seems to be ok.

If people are fine with this approach, I can work my way
through other architectures, too.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] binfmt_elf: document load_bias a little bit
  2017-06-02 15:20 [PATCH 0/6] move mmap_area and PIE binaries away from the stack riel
@ 2017-06-02 15:20 ` riel
  2017-06-02 19:27   ` [kernel-hardening] " Kees Cook
  2017-06-02 15:20 ` [PATCH 2/6] x86/elf: move 32 bit ELF_ET_DYN_BASE to 256MB riel
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: riel @ 2017-06-02 15:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-hardening, akpm, mingo, oleg, lwoodman, mhocko,
	danielmicay, will.deacon, benh

From: Rik van Riel <riel@redhat.com>

After me and another unnamed developer got confused by the subtraction
of vaddr in this branch of the code, followed by adding vaddr back in
a little bit later, for the third time, maybe it is time to document
this quirky bit of code.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 fs/binfmt_elf.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 5075fd5c62c8..8c3f4dbc7603 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -930,10 +930,16 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
 			elf_flags |= MAP_FIXED;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
-			/* Try and get dynamic programs out of the way of the
+			/*
+			 * Try and get dynamic programs out of the way of the
 			 * default mmap base, as well as whatever program they
 			 * might try to exec.  This is because the brk will
-			 * follow the loader, and is not movable.  */
+			 * follow the loader, and is not movable.
+			 *
+			 * The load_bias is the difference between the address
+			 * in the elf header and the address where the binary
+			 * is mmapped.
+			 */
 			load_bias = ELF_ET_DYN_BASE - vaddr;
 			if (current->flags & PF_RANDOMIZE)
 				load_bias += arch_mmap_rnd();
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] x86/elf: move 32 bit ELF_ET_DYN_BASE to 256MB
  2017-06-02 15:20 [PATCH 0/6] move mmap_area and PIE binaries away from the stack riel
  2017-06-02 15:20 ` [PATCH 1/6] binfmt_elf: document load_bias a little bit riel
@ 2017-06-02 15:20 ` riel
  2017-06-03  4:22   ` Kees Cook
  2017-06-02 15:20 ` [PATCH 3/6] x86/mmap: properly account for stack randomization in mmap_base riel
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: riel @ 2017-06-02 15:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-hardening, akpm, mingo, oleg, lwoodman, mhocko,
	danielmicay, will.deacon, benh

From: Rik van Riel <riel@redhat.com>

When setting up mmap_base, we take care to start the mmap base
below the maximum extent to which the stack will grow. However,
we take no such precautions with PIE binaries, which are placed
at 5/6 of TASK_SIZE plus a random offset. As a result, 32 bit PIE
binaries can end up smack in the middle of where the stack (which
is randomized down) is supposed to go.

That problem can be avoided by putting the 32 bit ELF_ET_DYN_BASE
at 256MB, which is a value linux-hardened and grsecurity have used
for a long time now without any known (to me) bug reports.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/include/asm/elf.h | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index e8ab9a46bc68..dafa098cc05a 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -245,12 +245,23 @@ extern int force_personality32;
 #define CORE_DUMP_USE_REGSET
 #define ELF_EXEC_PAGESIZE	4096
 
+/*
+ * True on X86_32 or when emulating IA32 on X86_64
+ */
+static inline int mmap_is_ia32(void)
+{
+	return IS_ENABLED(CONFIG_X86_32) ||
+	       (IS_ENABLED(CONFIG_COMPAT) &&
+		test_thread_flag(TIF_ADDR32));
+}
+
 /* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
    use of this is to invoke "./ld.so someprog" to test out a new version of
    the loader.  We need to make sure that it is out of the way of the program
    that it will "exec", and that there is sufficient room for the brk.  */
 
-#define ELF_ET_DYN_BASE		(TASK_SIZE / 3 * 2)
+#define ELF_ET_DYN_BASE		(mmap_is_ia32() ? 0x10000000UL : \
+				(TASK_SIZE / 3 * 2))
 
 /* This yields a mask that user programs can use to figure out what
    instruction set this CPU supports.  This could be done in user space,
@@ -293,16 +304,6 @@ do {									\
 	}								\
 } while (0)
 
-/*
- * True on X86_32 or when emulating IA32 on X86_64
- */
-static inline int mmap_is_ia32(void)
-{
-	return IS_ENABLED(CONFIG_X86_32) ||
-	       (IS_ENABLED(CONFIG_COMPAT) &&
-		test_thread_flag(TIF_ADDR32));
-}
-
 extern unsigned long tasksize_32bit(void);
 extern unsigned long tasksize_64bit(void);
 extern unsigned long get_mmap_base(int is_legacy);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] x86/mmap: properly account for stack randomization in mmap_base
  2017-06-02 15:20 [PATCH 0/6] move mmap_area and PIE binaries away from the stack riel
  2017-06-02 15:20 ` [PATCH 1/6] binfmt_elf: document load_bias a little bit riel
  2017-06-02 15:20 ` [PATCH 2/6] x86/elf: move 32 bit ELF_ET_DYN_BASE to 256MB riel
@ 2017-06-02 15:20 ` riel
  2017-06-03  4:46   ` [kernel-hardening] " Kees Cook
  2017-06-02 15:20 ` [PATCH 4/6] arm64/mmap: " riel
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: riel @ 2017-06-02 15:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-hardening, akpm, mingo, oleg, lwoodman, mhocko,
	danielmicay, will.deacon, benh

From: Rik van Riel <riel@redhat.com>

When RLIMIT_STACK is, for example, 256MB, the current code results in
a gap between the top of the task and mmap_base of 256MB, failing to
take into account the amount by which the stack address was randomized.
In other words, the stack gets less than RLIMIT_STACK space.

Ensure that the gap between the stack and mmap_base always takes stack
randomization into account.

>From Daniel Micay's linux-hardened tree.

Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/x86/mm/mmap.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 19ad095b41df..8c7ba1adb27b 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -95,13 +95,18 @@ unsigned long arch_mmap_rnd(void)
 static unsigned long mmap_base(unsigned long rnd, unsigned long task_size)
 {
 	unsigned long gap = rlimit(RLIMIT_STACK);
+	unsigned long pad = stack_maxrandom_size(task_size);
 	unsigned long gap_min, gap_max;
 
+	/* Values close to RLIM_INFINITY can overflow. */
+	if (gap + pad > gap)
+		gap += pad;
+
 	/*
 	 * Top of mmap area (just below the process stack).
 	 * Leave an at least ~128 MB hole with possible stack randomization.
 	 */
-	gap_min = SIZE_128M + stack_maxrandom_size(task_size);
+	gap_min = SIZE_128M;
 	gap_max = (task_size / 6) * 5;
 
 	if (gap < gap_min)
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] arm64/mmap: properly account for stack randomization in mmap_base
  2017-06-02 15:20 [PATCH 0/6] move mmap_area and PIE binaries away from the stack riel
                   ` (2 preceding siblings ...)
  2017-06-02 15:20 ` [PATCH 3/6] x86/mmap: properly account for stack randomization in mmap_base riel
@ 2017-06-02 15:20 ` riel
  2017-06-02 15:20 ` [PATCH 5/6] arm64: move COMPAT_ELF_ET_DYN_BASE lower in the address space riel
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 15+ messages in thread
From: riel @ 2017-06-02 15:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-hardening, akpm, mingo, oleg, lwoodman, mhocko,
	danielmicay, will.deacon, benh

From: Rik van Riel <riel@redhat.com>

When RLIMIT_STACK is, for example, 256MB, the current code results in
a gap between the top of the task and mmap_base of 256MB, failing to
take into account the amount by which the stack address was randomized.
In other words, the stack gets less than RLIMIT_STACK space.

Ensure that the gap between the stack and mmap_base always takes stack
randomization into account.

>From Daniel Micay's linux-hardened tree.

Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/arm64/mm/mmap.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/mm/mmap.c b/arch/arm64/mm/mmap.c
index 7b0d55756eb1..3f15eb193e33 100644
--- a/arch/arm64/mm/mmap.c
+++ b/arch/arm64/mm/mmap.c
@@ -34,7 +34,7 @@
  * Leave enough space between the mmap area and the stack to honour ulimit in
  * the face of randomisation.
  */
-#define MIN_GAP (SZ_128M + ((STACK_RND_MASK << PAGE_SHIFT) + 1))
+#define MIN_GAP (SZ_128M)
 #define MAX_GAP	(STACK_TOP/6*5)
 
 static int mmap_is_legacy(void)
@@ -64,6 +64,11 @@ unsigned long arch_mmap_rnd(void)
 static unsigned long mmap_base(unsigned long rnd)
 {
 	unsigned long gap = rlimit(RLIMIT_STACK);
+	unsigned long pad = STACK_RND_MASK << PAGE_SHIFT;
+
+	/* Values close to RLIM_INFINITY can overflow. */
+	if (gap + pad > gap)
+		gap += pad;
 
 	if (gap < MIN_GAP)
 		gap = MIN_GAP;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] arm64: move COMPAT_ELF_ET_DYN_BASE lower in the address space
  2017-06-02 15:20 [PATCH 0/6] move mmap_area and PIE binaries away from the stack riel
                   ` (3 preceding siblings ...)
  2017-06-02 15:20 ` [PATCH 4/6] arm64/mmap: " riel
@ 2017-06-02 15:20 ` riel
  2017-06-02 15:20 ` [PATCH 6/6] powerpc,mmap: properly account for stack randomization in mmap_base riel
  2017-06-03  4:37 ` [kernel-hardening] [PATCH 0/6] move mmap_area and PIE binaries away from the stack Kees Cook
  6 siblings, 0 replies; 15+ messages in thread
From: riel @ 2017-06-02 15:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-hardening, akpm, mingo, oleg, lwoodman, mhocko,
	danielmicay, will.deacon, benh

From: Rik van Riel <riel@redhat.com>

When setting up mmap_base, we take care to start the mmap base
below the maximum extent to which the stack will grow. However,
we take no such precautions with PIE binaries, which are placed
at 2/3 of TASK_SIZE plus a random offset. As a result, 32 bit PIE
binaries can end up smack in the middle of where the stack (which
is randomized down) is supposed to go.

That problem can be avoided by putting the 32 bit ELF_ET_DYN_BASE
at 256MB, which is a value linux-hardened and grsecurity have used
for a long time now without any known (to me) bug reports.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
---
 arch/arm64/include/asm/elf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/elf.h b/arch/arm64/include/asm/elf.h
index 5d1700425efe..88808a761816 100644
--- a/arch/arm64/include/asm/elf.h
+++ b/arch/arm64/include/asm/elf.h
@@ -173,7 +173,7 @@ extern int arch_setup_additional_pages(struct linux_binprm *bprm,
 
 #ifdef CONFIG_COMPAT
 
-#define COMPAT_ELF_ET_DYN_BASE		(2 * TASK_SIZE_32 / 3)
+#define COMPAT_ELF_ET_DYN_BASE		(0x10000000UL)
 
 /* AArch32 registers. */
 #define COMPAT_ELF_NGREG		18
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] powerpc,mmap: properly account for stack randomization in mmap_base
  2017-06-02 15:20 [PATCH 0/6] move mmap_area and PIE binaries away from the stack riel
                   ` (4 preceding siblings ...)
  2017-06-02 15:20 ` [PATCH 5/6] arm64: move COMPAT_ELF_ET_DYN_BASE lower in the address space riel
@ 2017-06-02 15:20 ` riel
  2017-06-03  4:37 ` [kernel-hardening] [PATCH 0/6] move mmap_area and PIE binaries away from the stack Kees Cook
  6 siblings, 0 replies; 15+ messages in thread
From: riel @ 2017-06-02 15:20 UTC (permalink / raw)
  To: linux-kernel
  Cc: kernel-hardening, akpm, mingo, oleg, lwoodman, mhocko,
	danielmicay, will.deacon, benh

From: Rik van Riel <riel@redhat.com>

When RLIMIT_STACK is, for example, 256MB, the current code results in
a gap between the top of the task and mmap_base of 256MB, failing to
take into account the amount by which the stack address was randomized.
In other words, the stack gets less than RLIMIT_STACK space.

Ensure that the gap between the stack and mmap_base always takes stack
randomization into account.

Inspired by Daniel Micay's linux-hardened tree.

Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
---
 arch/powerpc/mm/mmap.c | 28 +++++++++++++++++++---------
 1 file changed, 19 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/mm/mmap.c b/arch/powerpc/mm/mmap.c
index 9dbd2a733d6b..354cfa22e5d4 100644
--- a/arch/powerpc/mm/mmap.c
+++ b/arch/powerpc/mm/mmap.c
@@ -34,16 +34,9 @@
 /*
  * Top of mmap area (just below the process stack).
  *
- * Leave at least a ~128 MB hole on 32bit applications.
- *
- * On 64bit applications we randomise the stack by 1GB so we need to
- * space our mmap start address by a further 1GB, otherwise there is a
- * chance the mmap area will end up closer to the stack than our ulimit
- * requires.
+ * Leave at least a ~128 MB hole.
  */
-#define MIN_GAP32 (128*1024*1024)
-#define MIN_GAP64 ((128 + 1024)*1024*1024UL)
-#define MIN_GAP ((is_32bit_task()) ? MIN_GAP32 : MIN_GAP64)
+#define MIN_GAP (128*1024*1024)
 #define MAX_GAP (TASK_SIZE/6*5)
 
 static inline int mmap_is_legacy(void)
@@ -71,9 +64,26 @@ unsigned long arch_mmap_rnd(void)
 	return rnd << PAGE_SHIFT;
 }
 
+static inline unsigned long stack_maxrandom_size(void)
+{
+	if (!(current->flags & PF_RANDOMIZE))
+		return 0;
+
+	/* 8MB for 32bit, 1GB for 64bit */
+	if (is_32bit_task())
+		return (1<<23);
+	else
+		return (1<<30);
+}
+
 static inline unsigned long mmap_base(unsigned long rnd)
 {
 	unsigned long gap = rlimit(RLIMIT_STACK);
+	unsigned long pad = stack_maxrandom_size();
+
+	/* Values close to RLIM_INFINITY can overflow. */
+	if (gap + pad > gap)
+		gap += pad;
 
 	if (gap < MIN_GAP)
 		gap = MIN_GAP;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [kernel-hardening] [PATCH 1/6] binfmt_elf: document load_bias a little bit
  2017-06-02 15:20 ` [PATCH 1/6] binfmt_elf: document load_bias a little bit riel
@ 2017-06-02 19:27   ` Kees Cook
  0 siblings, 0 replies; 15+ messages in thread
From: Kees Cook @ 2017-06-02 19:27 UTC (permalink / raw)
  To: Rik van Riel
  Cc: LKML, kernel-hardening, Andrew Morton, Ingo Molnar,
	Oleg Nesterov, Larry Woodman, mhocko, Daniel Micay, Will Deacon,
	benh

On Fri, Jun 2, 2017 at 8:20 AM,  <riel@redhat.com> wrote:
> From: Rik van Riel <riel@redhat.com>
>
> After me and another unnamed developer got confused by the subtraction
> of vaddr in this branch of the code, followed by adding vaddr back in
> a little bit later, for the third time, maybe it is time to document
> this quirky bit of code.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>

Acked-by: Kees Cook <keescook@chromium.org>

Any and all improvements to this code and its documentation are welcome. :)

-Kees

> ---
>  fs/binfmt_elf.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 5075fd5c62c8..8c3f4dbc7603 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -930,10 +930,16 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                 if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
>                         elf_flags |= MAP_FIXED;
>                 } else if (loc->elf_ex.e_type == ET_DYN) {
> -                       /* Try and get dynamic programs out of the way of the
> +                       /*
> +                        * Try and get dynamic programs out of the way of the
>                          * default mmap base, as well as whatever program they
>                          * might try to exec.  This is because the brk will
> -                        * follow the loader, and is not movable.  */
> +                        * follow the loader, and is not movable.
> +                        *
> +                        * The load_bias is the difference between the address
> +                        * in the elf header and the address where the binary
> +                        * is mmapped.
> +                        */
>                         load_bias = ELF_ET_DYN_BASE - vaddr;
>                         if (current->flags & PF_RANDOMIZE)
>                                 load_bias += arch_mmap_rnd();
> --
> 2.9.3
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] x86/elf: move 32 bit ELF_ET_DYN_BASE to 256MB
  2017-06-02 15:20 ` [PATCH 2/6] x86/elf: move 32 bit ELF_ET_DYN_BASE to 256MB riel
@ 2017-06-03  4:22   ` Kees Cook
  2017-06-03 11:57     ` Daniel Micay
  2017-06-05 13:54     ` Rik van Riel
  0 siblings, 2 replies; 15+ messages in thread
From: Kees Cook @ 2017-06-03  4:22 UTC (permalink / raw)
  To: Rik van Riel
  Cc: LKML, kernel-hardening, Andrew Morton, Ingo Molnar,
	Oleg Nesterov, Larry Woodman, mhocko, Daniel Micay, Will Deacon,
	benh

On Fri, Jun 2, 2017 at 8:20 AM,  <riel@redhat.com> wrote:
> From: Rik van Riel <riel@redhat.com>
>
> When setting up mmap_base, we take care to start the mmap base
> below the maximum extent to which the stack will grow. However,
> we take no such precautions with PIE binaries, which are placed
> at 5/6 of TASK_SIZE plus a random offset. As a result, 32 bit PIE
> binaries can end up smack in the middle of where the stack (which
> is randomized down) is supposed to go.
>
> That problem can be avoided by putting the 32 bit ELF_ET_DYN_BASE
> at 256MB, which is a value linux-hardened and grsecurity have used
> for a long time now without any known (to me) bug reports.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  arch/x86/include/asm/elf.h | 23 ++++++++++++-----------
>  1 file changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> index e8ab9a46bc68..dafa098cc05a 100644
> --- a/arch/x86/include/asm/elf.h
> +++ b/arch/x86/include/asm/elf.h
> @@ -245,12 +245,23 @@ extern int force_personality32;
>  #define CORE_DUMP_USE_REGSET
>  #define ELF_EXEC_PAGESIZE      4096
>
> +/*
> + * True on X86_32 or when emulating IA32 on X86_64
> + */
> +static inline int mmap_is_ia32(void)
> +{
> +       return IS_ENABLED(CONFIG_X86_32) ||
> +              (IS_ENABLED(CONFIG_COMPAT) &&
> +               test_thread_flag(TIF_ADDR32));
> +}
> +
>  /* This is the location that an ET_DYN program is loaded if exec'ed.  Typical
>     use of this is to invoke "./ld.so someprog" to test out a new version of
>     the loader.  We need to make sure that it is out of the way of the program
>     that it will "exec", and that there is sufficient room for the brk.  */
>
> -#define ELF_ET_DYN_BASE                (TASK_SIZE / 3 * 2)
> +#define ELF_ET_DYN_BASE                (mmap_is_ia32() ? 0x10000000UL : \
> +                               (TASK_SIZE / 3 * 2))

I like the idea of this change, but I don't like this implementation
for a few reasons.

Before that, let me just to state what I think our obvious goal is: we
want to have the userspace memory layout wasting as little space as
possible, and have regions separated enough to maximize the amount of
ASLR entropy we can use, all without allowing them to collide.

I the previous non-PIE world common execution case, the kernel loaded
statically located ET_EXEC programs and had brk, stack, and mmap
randomized. The only ET_DYN program that got directly executed was the
loader, but that was rare. In normal execution, it would get mapped
after the ET_EXEC. When testing new loaders and running it directly
like the comment above hints at (e.g. "/lib64/ld-linux-x86-64.so.2
/bin/some-et-exec"), the loader would get loaded first, so it might
collide with the static ET_EXEC if it wasn't moved away from the
ET_EXEC range, which is why ELF_ET_DYN_BASE exists and a test for
ET_DYN is done. It would be better to force ET_DYN into mmap base when
no PT_INTERP is found (see binfmt_elf.c's elf_interpreter).)

For the PIE case we end up with the ET_DYN at one end of the mmap
range, and mmap base at the high-address end (growing down toward the
ET_DYN). As a result, I'd like to see the executable ET_DYN (PIE) case
use the lowest possible base address. The (TASK_SIZE / 3 * 2) thing
has always bothered me as being highly wasteful of address space now
that most userspaces are doing full PIE. It made sense for the rare
directly loaded loaders, but now it's not great. (My first step toward
fixing this was getting all the architectures to agree on how
randomization was happening, and I think the next step is to fix the
loader collision issue and then bring all the architecture's
ELF_ET_DYN_BASE way way down (and likely rename the define to be more
clear).)

As a result of these things, I'm not sure I like the 256MB change, as
I'd rather we get the PIE base as low as possible. I *think* this
should match the ET_EXEC addresses: 32-bit x86 ET_EXEC loads at
0x8048000. 64-bit x86 ET_EXEC loads at 0x400000 (though maybe we can
take 32-bit lower still). However, the kernel needs to notice that
when running a loader, it should go into the mmap base like a .so,
which will keep out out of the way no matter what. However, then the
question becomes "can the brk be placed sanely?" since the brk is
allocated with whatever is mapped first.

So, specifically:

I don't believe this is safe in the face of a loader running an
ET_EXEC. If the loader is mapped at 0x10000000 (or otherwise very low
value of PIE ASLR), and the ET_EXEC is mapped at 0x8048000, so the
ET_EXEC had better not be 128MB or larger...

I think loading an ET_DYN that lacks PT_INTERP should be mapped into
the mmap base so that we can lower ELF_ET_DYN_BASE as far as possible
in the address space without concern over ET_EXEC collisions.
(ELF_ET_DYN_BASE should be named ELF_PIE_BASE maybe, since it
shouldn't be exclusively used for ET_DYN, it should only be for ET_DYN
with PT_INTERP. ET_DYN without PT_INTERP should get loaded into mmap
like the .so it is.)

I'm not sure what to do yet for set_brk() when an ET_DYN without
PT_INTERP is loaded into mmap base, but we need make sure we don't
create collisions or inappropriately small brk space.

Additionally, I think we need some runtime checking of the address
space layout min/max positionns to make sure we can't collide
anything. (And maybe some pretty ASCII-art to help visualize the
possible layouts, since we have several sets of variables to take into
account, including: execution style (ET_EXEC, direct loader, and PIE),
sysctl entropy sizing of the regions, and stack rlimit.)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [kernel-hardening] [PATCH 0/6] move mmap_area and PIE binaries away from the stack
  2017-06-02 15:20 [PATCH 0/6] move mmap_area and PIE binaries away from the stack riel
                   ` (5 preceding siblings ...)
  2017-06-02 15:20 ` [PATCH 6/6] powerpc,mmap: properly account for stack randomization in mmap_base riel
@ 2017-06-03  4:37 ` Kees Cook
  2017-06-03 12:14   ` Daniel Micay
  6 siblings, 1 reply; 15+ messages in thread
From: Kees Cook @ 2017-06-03  4:37 UTC (permalink / raw)
  To: Rik van Riel, Andrew Morton
  Cc: LKML, kernel-hardening, Ingo Molnar, Oleg Nesterov,
	Larry Woodman, mhocko, Daniel Micay, Will Deacon, benh

On Fri, Jun 2, 2017 at 8:20 AM,  <riel@redhat.com> wrote:
> There are a few bugs causing the kernel to sometimes map PIE
> binaries and the mmap_area where the stack is supposed to go.
>
> This series fixes them for x86, ARM64, and PPC.
> S390 seems to be ok.
>
> If people are fine with this approach, I can work my way
> through other architectures, too.

Andrew, I'd rather the ELF_ET_DYN_BASE changes not be in -mm yet. I
think it's the wrong approach. Patch 1 (comment update) is fine.
Patches 3 and 4 need some (minor?) adjustments, but the ELF_DYN_BASE
patches need much more thought, IMO (I've sent a separate email about
those.) And please add me to explicit the Cc list for this series.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [kernel-hardening] [PATCH 3/6] x86/mmap: properly account for stack randomization in mmap_base
  2017-06-02 15:20 ` [PATCH 3/6] x86/mmap: properly account for stack randomization in mmap_base riel
@ 2017-06-03  4:46   ` Kees Cook
  2017-06-03 12:16     ` Daniel Micay
  0 siblings, 1 reply; 15+ messages in thread
From: Kees Cook @ 2017-06-03  4:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: LKML, kernel-hardening, Andrew Morton, Ingo Molnar,
	Oleg Nesterov, Larry Woodman, mhocko, Daniel Micay, Will Deacon,
	benh

On Fri, Jun 2, 2017 at 8:20 AM,  <riel@redhat.com> wrote:
> From: Rik van Riel <riel@redhat.com>
>
> When RLIMIT_STACK is, for example, 256MB, the current code results in
> a gap between the top of the task and mmap_base of 256MB, failing to
> take into account the amount by which the stack address was randomized.
> In other words, the stack gets less than RLIMIT_STACK space.

Is this entirely accurate? The top of the task would be task_size, but
this code is using task_size / 6 * 5 as the bottom of stack / top of
mmap gap_max. Is there a reason for this?

>
> Ensure that the gap between the stack and mmap_base always takes stack
> randomization into account.
>
> From Daniel Micay's linux-hardened tree.
>
> Reported-by: Florian Weimer <fweimer@redhat.com>
> Signed-off-by: Daniel Micay <danielmicay@gmail.com>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  arch/x86/mm/mmap.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
> index 19ad095b41df..8c7ba1adb27b 100644
> --- a/arch/x86/mm/mmap.c
> +++ b/arch/x86/mm/mmap.c
> @@ -95,13 +95,18 @@ unsigned long arch_mmap_rnd(void)
>  static unsigned long mmap_base(unsigned long rnd, unsigned long task_size)
>  {
>         unsigned long gap = rlimit(RLIMIT_STACK);
> +       unsigned long pad = stack_maxrandom_size(task_size);
>         unsigned long gap_min, gap_max;
>
> +       /* Values close to RLIM_INFINITY can overflow. */
> +       if (gap + pad > gap)
> +               gap += pad;
> +
>         /*
>          * Top of mmap area (just below the process stack).
>          * Leave an at least ~128 MB hole with possible stack randomization.
>          */
> -       gap_min = SIZE_128M + stack_maxrandom_size(task_size);
> +       gap_min = SIZE_128M;
>         gap_max = (task_size / 6) * 5;
>
>         if (gap < gap_min)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] x86/elf: move 32 bit ELF_ET_DYN_BASE to 256MB
  2017-06-03  4:22   ` Kees Cook
@ 2017-06-03 11:57     ` Daniel Micay
  2017-06-05 13:54     ` Rik van Riel
  1 sibling, 0 replies; 15+ messages in thread
From: Daniel Micay @ 2017-06-03 11:57 UTC (permalink / raw)
  To: Kees Cook, Rik van Riel
  Cc: LKML, kernel-hardening, Andrew Morton, Ingo Molnar,
	Oleg Nesterov, Larry Woodman, mhocko, Will Deacon, benh

On Fri, 2017-06-02 at 21:22 -0700, Kees Cook wrote:
> On Fri, Jun 2, 2017 at 8:20 AM,  <riel@redhat.com> wrote:
> > From: Rik van Riel <riel@redhat.com>
> > 
> > When setting up mmap_base, we take care to start the mmap base
> > below the maximum extent to which the stack will grow. However,
> > we take no such precautions with PIE binaries, which are placed
> > at 5/6 of TASK_SIZE plus a random offset. As a result, 32 bit PIE
> > binaries can end up smack in the middle of where the stack (which
> > is randomized down) is supposed to go.
> > 
> > That problem can be avoided by putting the 32 bit ELF_ET_DYN_BASE
> > at 256MB, which is a value linux-hardened and grsecurity have used
> > for a long time now without any known (to me) bug reports.
> > 
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > ---
> >  arch/x86/include/asm/elf.h | 23 ++++++++++++-----------
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
> > index e8ab9a46bc68..dafa098cc05a 100644
> > --- a/arch/x86/include/asm/elf.h
> > +++ b/arch/x86/include/asm/elf.h
> > @@ -245,12 +245,23 @@ extern int force_personality32;
> >  #define CORE_DUMP_USE_REGSET
> >  #define ELF_EXEC_PAGESIZE      4096
> > 
> > +/*
> > + * True on X86_32 or when emulating IA32 on X86_64
> > + */
> > +static inline int mmap_is_ia32(void)
> > +{
> > +       return IS_ENABLED(CONFIG_X86_32) ||
> > +              (IS_ENABLED(CONFIG_COMPAT) &&
> > +               test_thread_flag(TIF_ADDR32));
> > +}
> > +
> >  /* This is the location that an ET_DYN program is loaded if
> > exec'ed.  Typical
> >     use of this is to invoke "./ld.so someprog" to test out a new
> > version of
> >     the loader.  We need to make sure that it is out of the way of
> > the program
> >     that it will "exec", and that there is sufficient room for the
> > brk.  */
> > 
> > -#define ELF_ET_DYN_BASE                (TASK_SIZE / 3 * 2)
> > +#define ELF_ET_DYN_BASE                (mmap_is_ia32() ?
> > 0x10000000UL : \
> > +                               (TASK_SIZE / 3 * 2))
> 
> I like the idea of this change, but I don't like this implementation
> for a few reasons.
> 
> Before that, let me just to state what I think our obvious goal is: we
> want to have the userspace memory layout wasting as little space as
> possible, and have regions separated enough to maximize the amount of
> ASLR entropy we can use, all without allowing them to collide.
> 
> I the previous non-PIE world common execution case, the kernel loaded
> statically located ET_EXEC programs and had brk, stack, and mmap
> randomized. The only ET_DYN program that got directly executed was the
> loader, but that was rare. In normal execution, it would get mapped
> after the ET_EXEC. When testing new loaders and running it directly
> like the comment above hints at (e.g. "/lib64/ld-linux-x86-64.so.2
> /bin/some-et-exec"), the loader would get loaded first, so it might
> collide with the static ET_EXEC if it wasn't moved away from the
> ET_EXEC range, which is why ELF_ET_DYN_BASE exists and a test for
> ET_DYN is done. It would be better to force ET_DYN into mmap base when
> no PT_INTERP is found (see binfmt_elf.c's elf_interpreter).)
> 
> For the PIE case we end up with the ET_DYN at one end of the mmap
> range, and mmap base at the high-address end (growing down toward the
> ET_DYN). As a result, I'd like to see the executable ET_DYN (PIE) case
> use the lowest possible base address. The (TASK_SIZE / 3 * 2) thing
> has always bothered me as being highly wasteful of address space now
> that most userspaces are doing full PIE. It made sense for the rare
> directly loaded loaders, but now it's not great. (My first step toward
> fixing this was getting all the architectures to agree on how
> randomization was happening, and I think the next step is to fix the
> loader collision issue and then bring all the architecture's
> ELF_ET_DYN_BASE way way down (and likely rename the define to be more
> clear).)
> 
> As a result of these things, I'm not sure I like the 256MB change, as
> I'd rather we get the PIE base as low as possible. I *think* this
> should match the ET_EXEC addresses: 32-bit x86 ET_EXEC loads at
> 0x8048000. 64-bit x86 ET_EXEC loads at 0x400000 (though maybe we can
> take 32-bit lower still). However, the kernel needs to notice that
> when running a loader, it should go into the mmap base like a .so,
> which will keep out out of the way no matter what. However, then the
> question becomes "can the brk be placed sanely?" since the brk is
> allocated with whatever is mapped first.
>
> So, specifically:
> 
> I don't believe this is safe in the face of a loader running an
> ET_EXEC. If the loader is mapped at 0x10000000 (or otherwise very low
> value of PIE ASLR), and the ET_EXEC is mapped at 0x8048000, so the
> ET_EXEC had better not be 128MB or larger...
> 
> I think loading an ET_DYN that lacks PT_INTERP should be mapped into
> the mmap base so that we can lower ELF_ET_DYN_BASE as far as possible
> in the address space without concern over ET_EXEC collisions.
> (ELF_ET_DYN_BASE should be named ELF_PIE_BASE maybe, since it
> shouldn't be exclusively used for ET_DYN, it should only be for ET_DYN
> with PT_INTERP. ET_DYN without PT_INTERP should get loaded into mmap
> like the .so it is.)
> 
> I'm not sure what to do yet for set_brk() when an ET_DYN without
> PT_INTERP is loaded into mmap base, but we need make sure we don't
> create collisions or inappropriately small brk space.

Today, allocators either know how to fall back to using mmap if the brk
heap runs out (glibc, musl) or they don't use brk at all (jemalloc i.e.
Android, Firefox, redis, mariadb, etc. and others like OpenBSD malloc).

However, they didn't all do that in the past, so there are static
executables / old C libraries that may fail if loaded in a way where
they don't end up getting a usable area for brk. This was actually the
case until fairly recently for musl and some other lesser used C
libraries may not deal with it sanely yet. Isn't that already the case
since the interpreter simply goes in the mmap region with the brk heap
on top? Both ASLR and compatibility are broken right now, rather than
just compatibility for the loader edge for massive non-position
independent executables.

You could detect the interpreter case and map that into the middle of
the address space, but the current choice is too high to work properly
and that problem gets worse if stack entropy is fixed (not included in
this set). vm.mmap_rnd_bits allows up to 16 bits on 32-bit, which is
256M, so there's already up to 512M used at the top of the address space
for mmap base ASLR + stack base ASLR if it's fixed to follow the sysctl.
  It seems the stack rlimit can consume 5/6 of the address space at the
moment... The executable base already has up to a 256M shift already.

On 64-bit, I think 4G is a sensible base because it avoids causing
compatibility issues with users of the low 4G address space. An example
is the Android Runtime which still uses 32-bit pointers on 64-bit, so it
 needs to put the garbage collected heap and application code in the
lower 4G which makes lower 4G memory into a scarce resource. I think
it's likely add real 64-bit support in the near future but it still
wouldn't be normal to use it in practice since it'd be less efficient.

x86_64 has 128TiB address space for 32-bit so a 4G offset doesn't really
matter. arm64 address space is as large with 4 level page tables which
isn't typical so it's really usually 256G for userspace. 4G offset is
still okay there.

> Additionally, I think we need some runtime checking of the address
> space layout min/max positionns to make sure we can't collide
> anything. (And maybe some pretty ASCII-art to help visualize the
> possible layouts, since we have several sets of variables to take into
> account, including: execution style (ET_EXEC, direct loader, and PIE),
> sysctl entropy sizing of the regions, and stack rlimit.)
> 
> -Kees
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [kernel-hardening] [PATCH 0/6] move mmap_area and PIE binaries away from the stack
  2017-06-03  4:37 ` [kernel-hardening] [PATCH 0/6] move mmap_area and PIE binaries away from the stack Kees Cook
@ 2017-06-03 12:14   ` Daniel Micay
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Micay @ 2017-06-03 12:14 UTC (permalink / raw)
  To: Kees Cook, Rik van Riel, Andrew Morton
  Cc: LKML, kernel-hardening, Ingo Molnar, Oleg Nesterov,
	Larry Woodman, mhocko, Will Deacon, benh

On Fri, 2017-06-02 at 21:37 -0700, Kees Cook wrote:
> 
> Patches 3 and 4 need some (minor?) adjustments

It's currently a bug fix. Doing something further would be more than
fixing the bug and should probably be separate.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [kernel-hardening] [PATCH 3/6] x86/mmap: properly account for stack randomization in mmap_base
  2017-06-03  4:46   ` [kernel-hardening] " Kees Cook
@ 2017-06-03 12:16     ` Daniel Micay
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Micay @ 2017-06-03 12:16 UTC (permalink / raw)
  To: Kees Cook, Rik van Riel
  Cc: LKML, kernel-hardening, Andrew Morton, Ingo Molnar,
	Oleg Nesterov, Larry Woodman, mhocko, Will Deacon, benh

On Fri, 2017-06-02 at 21:46 -0700, Kees Cook wrote:
> On Fri, Jun 2, 2017 at 8:20 AM,  <riel@redhat.com> wrote:
> > From: Rik van Riel <riel@redhat.com>
> > 
> > When RLIMIT_STACK is, for example, 256MB, the current code results
> > in
> > a gap between the top of the task and mmap_base of 256MB, failing to
> > take into account the amount by which the stack address was
> > randomized.
> > In other words, the stack gets less than RLIMIT_STACK space.
> 
> Is this entirely accurate? The top of the task would be task_size, but
> this code is using task_size / 6 * 5 as the bottom of stack / top of
> mmap gap_max. Is there a reason for this?

MIN_GAP / MAX_GAP are only the boundaries that this gap is clamped to.

If it's not smaller than MIN_GAP, MIN_GAP isn't used. If it's not larger
than MAX_GAP, MAX_GAP isn't used. The stack randomization is currently
only taken into account for MIN_GAP. This only fixes that bug by always
taking it into account. It's not a subjective design change.

The MAX_GAP value is 5/6 of the address space which is overly large but
that's a separate bug.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] x86/elf: move 32 bit ELF_ET_DYN_BASE to 256MB
  2017-06-03  4:22   ` Kees Cook
  2017-06-03 11:57     ` Daniel Micay
@ 2017-06-05 13:54     ` Rik van Riel
  1 sibling, 0 replies; 15+ messages in thread
From: Rik van Riel @ 2017-06-05 13:54 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, kernel-hardening, Andrew Morton, Ingo Molnar,
	Oleg Nesterov, Larry Woodman, mhocko, Daniel Micay, Will Deacon,
	benh

On Fri, 2017-06-02 at 21:22 -0700, Kees Cook wrote:

> As a result of these things, I'm not sure I like the 256MB change, as
> I'd rather we get the PIE base as low as possible. I *think* this
> should match the ET_EXEC addresses: 32-bit x86 ET_EXEC loads at
> 0x8048000. 64-bit x86 ET_EXEC loads at 0x400000 (though maybe we can
> take 32-bit lower still). However, the kernel needs to notice that
> when running a loader, it should go into the mmap base like a .so,
> which will keep out out of the way no matter what. However, then the
> question becomes "can the brk be placed sanely?" since the brk is
> allocated with whatever is mapped first.

The problem is that the brk is allocated when the loader
is loaded, before the loader maps that second executable.

In other words, the problem has been changed from
"where to place the loader?" to "where to place the
brk?, but remains fundamentally the same.

> I think loading an ET_DYN that lacks PT_INTERP should be mapped
into
> the mmap base so that we can lower ELF_ET_DYN_BASE as far as possible
> in the address space without concern over ET_EXEC collisions.
> (ELF_ET_DYN_BASE should be named ELF_PIE_BASE maybe, since it
> shouldn't be exclusively used for ET_DYN, it should only be for
> ET_DYN
> with PT_INTERP. ET_DYN without PT_INTERP should get loaded into mmap
> like the .so it is.)
> 
> I'm not sure what to do yet for set_brk() when an ET_DYN without
> PT_INTERP is loaded into mmap base, but we need make sure we don't
> create collisions or inappropriately small brk space.

Exactly, we end up with placing the brk at ELF_ET_DYN_BASE +
randomization, instead of placing the interpreter and the
brk there. I don't see how it would help much.

> Additionally, I think we need some runtime checking of the address
> space layout min/max positionns to make sure we can't collide
> anything. (And maybe some pretty ASCII-art to help visualize the
> possible layouts, since we have several sets of variables to take
> into
> account, including: execution style (ET_EXEC, direct loader, and
> PIE),
> sysctl entropy sizing of the regions, and stack rlimit.)

If you want maximum randomization, we should allow things
like having the stack randomized to the bottom of the
address space, and the executable / brk growing up from
some distance above the top of the stack.

Essentially allowing anything to be placed anywhere, and
switching things like executable & mmap placement as a
result of where the stack ends up, etc..

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-06-05 13:55 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-02 15:20 [PATCH 0/6] move mmap_area and PIE binaries away from the stack riel
2017-06-02 15:20 ` [PATCH 1/6] binfmt_elf: document load_bias a little bit riel
2017-06-02 19:27   ` [kernel-hardening] " Kees Cook
2017-06-02 15:20 ` [PATCH 2/6] x86/elf: move 32 bit ELF_ET_DYN_BASE to 256MB riel
2017-06-03  4:22   ` Kees Cook
2017-06-03 11:57     ` Daniel Micay
2017-06-05 13:54     ` Rik van Riel
2017-06-02 15:20 ` [PATCH 3/6] x86/mmap: properly account for stack randomization in mmap_base riel
2017-06-03  4:46   ` [kernel-hardening] " Kees Cook
2017-06-03 12:16     ` Daniel Micay
2017-06-02 15:20 ` [PATCH 4/6] arm64/mmap: " riel
2017-06-02 15:20 ` [PATCH 5/6] arm64: move COMPAT_ELF_ET_DYN_BASE lower in the address space riel
2017-06-02 15:20 ` [PATCH 6/6] powerpc,mmap: properly account for stack randomization in mmap_base riel
2017-06-03  4:37 ` [kernel-hardening] [PATCH 0/6] move mmap_area and PIE binaries away from the stack Kees Cook
2017-06-03 12:14   ` Daniel Micay

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).