[RFC] fs/binfmt_elf: fix memory map for PIE applications
diff mbox series

Message ID 1380698395-5784-1-git-send-email-timo.teras@iki.fi
State New, archived
Headers show
Series
  • [RFC] fs/binfmt_elf: fix memory map for PIE applications
Related show

Commit Message

Timo Teras Oct. 2, 2013, 7:19 a.m. UTC
arch/*/include/asm/elf.h comments say:
  ELF_ET_DYN_BASE is the location that an ET_DYN program is loaded
  if exec'ed.  Typical use of this is to invoke "./ld.so someprog"
  to test out a new version of the loader.  We need to make sure
  that it is out of the way of the program that it will "exec",
  and that there is sufficient room for the brk.

In case we have main application linked as PIE, this can cause
problems as the main program itself is being loaded to this
alternate address. And this allows limited heap size. While
this is inevitable when exec'ing the interpreter directly,
we should do better for PIE applications.

This fixes the loader to detect PIE application by checking if
elf_interpreter is requested. This images are loaded to beginning
of the address space instead of the specially crafted place for elf
interpreter. This allows full heap address space for PIE applications
and fixes random "out of memory" errors.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
---
 fs/binfmt_elf.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

It might make sense to define ELF_ET_DYN_APP_BASE or similar
so that architectures can specify the load address of ET_DYN
applications.

Comments

Jiri Kosina Dec. 19, 2013, 2:17 p.m. UTC | #1
On Wed, 2 Oct 2013, Timo Teräs wrote:

> arch/*/include/asm/elf.h comments say:
>   ELF_ET_DYN_BASE is the location that an ET_DYN program is loaded
>   if exec'ed.  Typical use of this is to invoke "./ld.so someprog"
>   to test out a new version of the loader.  We need to make sure
>   that it is out of the way of the program that it will "exec",
>   and that there is sufficient room for the brk.
> 
> In case we have main application linked as PIE, this can cause
> problems as the main program itself is being loaded to this
> alternate address. And this allows limited heap size. While
> this is inevitable when exec'ing the interpreter directly,
> we should do better for PIE applications.

Timo,

could you please provide more specific example of the breakage scenario 
you are talking about? It's not completely clear to me from your commit 
message.

Thanks,
Timo Teras Dec. 19, 2013, 2:42 p.m. UTC | #2
On Thu, 19 Dec 2013 15:17:03 +0100 (CET)
Jiri Kosina <jkosina@suse.cz> wrote:

> On Wed, 2 Oct 2013, Timo Teräs wrote:
> 
> > arch/*/include/asm/elf.h comments say:
> >   ELF_ET_DYN_BASE is the location that an ET_DYN program is loaded
> >   if exec'ed.  Typical use of this is to invoke "./ld.so someprog"
> >   to test out a new version of the loader.  We need to make sure
> >   that it is out of the way of the program that it will "exec",
> >   and that there is sufficient room for the brk.
> > 
> > In case we have main application linked as PIE, this can cause
> > problems as the main program itself is being loaded to this
> > alternate address. And this allows limited heap size. While
> > this is inevitable when exec'ing the interpreter directly,
> > we should do better for PIE applications.
> 
> could you please provide more specific example of the breakage
> scenario you are talking about? It's not completely clear to me from
> your commit message.

The problem is that if PF_RANDOMIZE is on, the PIE application will be
mapped by letting mmap() to randomize the area for the main app (that
is used to base heap).

Example memory layout from x86, running PIE compiled busybox with
randomization enabled:

5756c000-57570000 rw-p 00000000 00:00 0 
57570000-575dd000 r-xp 00000000 00:0f 2039       /lib/libuClibc-0.9.33.2-git.so
575dd000-575de000 r--p 0006c000 00:0f 2039       /lib/libuClibc-0.9.33.2-git.so
575de000-575df000 rw-p 0006d000 00:0f 2039       /lib/libuClibc-0.9.33.2-git.so
575df000-575e5000 rw-p 00000000 00:00 0 
575e5000-575f5000 r-xp 00000000 00:0f 2038       /lib/libm-0.9.33.2-git.so
575f5000-575f6000 r--p 0000f000 00:0f 2038       /lib/libm-0.9.33.2-git.so
575f6000-575f7000 rw-p 00010000 00:0f 2038       /lib/libm-0.9.33.2-git.so
575f7000-575fd000 r-xp 00000000 00:0f 2042       /lib/libcrypt-0.9.33.2-git.so
575fd000-575fe000 r--p 00005000 00:0f 2042       /lib/libcrypt-0.9.33.2-git.so
575fe000-57610000 rw-p 00000000 00:00 0 
57611000-57613000 rw-p 00000000 00:00 0 
57613000-57614000 r-xp 00000000 00:00 0          [vdso]
57614000-5761e000 r-xp 00000000 00:0f 2044       /lib/ld-uClibc-0.9.33.2-git.so
5761e000-5761f000 r--p 00009000 00:0f 2044       /lib/ld-uClibc-0.9.33.2-git.so
5761f000-57620000 rw-p 0000a000 00:0f 2044       /lib/ld-uClibc-0.9.33.2-git.so
57620000-57708000 r-xp 00000000 00:0f 234487046  /root/busybox
57708000-5770a000 rw-p 000e8000 00:0f 234487046  /root/busybox
5770a000-5770c000 rw-p 00000000 00:00 0          [heap]
5ffdf000-60000000 rw-p 00000000 00:00 0          [stack]
b7570000-b75dd000 r-xp 00000000 00:0f 2039       /lib/libuClibc-0.9.33.2-git.so
b75e5000-b75f5000 r-xp 00000000 00:0f 2038       /lib/libm-0.9.33.2-git.so
b75f7000-b75fd000 r-xp 00000000 00:0f 2042       /lib/libcrypt-0.9.33.2-git.so
b7613000-b7614000 r-xp 00000000 00:00 0          [vdso]
b7614000-b761e000 r-xp 00000000 00:0f 2044       /lib/ld-uClibc-0.9.33.2-git.so
b7620000-b7708000 r-xp 00000000 00:0f 234487046  /root/busybox

As you see, the main executable is mapped 57620000-57708000 and
57708000-5770a000. Heap follow immediately after that
5770a000-5770c000 followed by anything mmaped after it (stack or some
other libraries). Heap can grow only up to 5ffdf000 meaning the
application is limited to 140 megs or so in this instance. This limit
can go much lower depending how the randomization went. And even 140
megs is very little for big apps.

Hope this clarifies the issue.

- Timo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jiri Kosina Dec. 19, 2013, 3:02 p.m. UTC | #3
On Thu, 19 Dec 2013, Timo Teras wrote:

> 5756c000-57570000 rw-p 00000000 00:00 0 
> 57570000-575dd000 r-xp 00000000 00:0f 2039       /lib/libuClibc-0.9.33.2-git.so
> 575dd000-575de000 r--p 0006c000 00:0f 2039       /lib/libuClibc-0.9.33.2-git.so
> 575de000-575df000 rw-p 0006d000 00:0f 2039       /lib/libuClibc-0.9.33.2-git.so
> 575df000-575e5000 rw-p 00000000 00:00 0 
> 575e5000-575f5000 r-xp 00000000 00:0f 2038       /lib/libm-0.9.33.2-git.so
> 575f5000-575f6000 r--p 0000f000 00:0f 2038       /lib/libm-0.9.33.2-git.so
> 575f6000-575f7000 rw-p 00010000 00:0f 2038       /lib/libm-0.9.33.2-git.so
> 575f7000-575fd000 r-xp 00000000 00:0f 2042       /lib/libcrypt-0.9.33.2-git.so
> 575fd000-575fe000 r--p 00005000 00:0f 2042       /lib/libcrypt-0.9.33.2-git.so
> 575fe000-57610000 rw-p 00000000 00:00 0 
> 57611000-57613000 rw-p 00000000 00:00 0 
> 57613000-57614000 r-xp 00000000 00:00 0          [vdso]
> 57614000-5761e000 r-xp 00000000 00:0f 2044       /lib/ld-uClibc-0.9.33.2-git.so
> 5761e000-5761f000 r--p 00009000 00:0f 2044       /lib/ld-uClibc-0.9.33.2-git.so
> 5761f000-57620000 rw-p 0000a000 00:0f 2044       /lib/ld-uClibc-0.9.33.2-git.so
> 57620000-57708000 r-xp 00000000 00:0f 234487046  /root/busybox
> 57708000-5770a000 rw-p 000e8000 00:0f 234487046  /root/busybox
> 5770a000-5770c000 rw-p 00000000 00:00 0          [heap]
> 5ffdf000-60000000 rw-p 00000000 00:00 0          [stack]
> b7570000-b75dd000 r-xp 00000000 00:0f 2039       /lib/libuClibc-0.9.33.2-git.so
> b75e5000-b75f5000 r-xp 00000000 00:0f 2038       /lib/libm-0.9.33.2-git.so
> b75f7000-b75fd000 r-xp 00000000 00:0f 2042       /lib/libcrypt-0.9.33.2-git.so
> b7613000-b7614000 r-xp 00000000 00:00 0          [vdso]
> b7614000-b761e000 r-xp 00000000 00:0f 2044       /lib/ld-uClibc-0.9.33.2-git.so
> b7620000-b7708000 r-xp 00000000 00:0f 234487046  /root/busybox
> 
> As you see, the main executable is mapped 57620000-57708000 and
> 57708000-5770a000. Heap follow immediately after that
> 5770a000-5770c000 followed by anything mmaped after it (stack or some
> other libraries). Heap can grow only up to 5ffdf000 meaning the
> application is limited to 140 megs or so in this instance. This limit
> can go much lower depending how the randomization went. And even 140
> megs is very little for big apps.

Right. And why is that a problem? 

Area marked [heap] is basically just an area reserved for brk() calls. 
There is no guarantee how big this area is going to be, there is always 
going to be some mapping getting in the way, that'll prevent it from 
growing indefinitely, and the userspace allocator will have to switch from 
using brk() to mmap() instead.

Glibc is doing this properly, and any allocator that wants to make sure to 
be able to make use of as much virtual address space as possible has 
either give up on brk() completely, or at least switch from brk() to 
mmap() when brk() reaches first mapping.

So what is the real problem again (i.e. the actual symptoms), please? Is 
it that your userspace memory allocator doesn't use mmap() for allocations 
at all?

Thanks,
Timo Teras Dec. 19, 2013, 3:26 p.m. UTC | #4
On Thu, 19 Dec 2013 16:02:19 +0100 (CET)
Jiri Kosina <jkosina@suse.cz> wrote:

> On Thu, 19 Dec 2013, Timo Teras wrote:
> 
> > As you see, the main executable is mapped 57620000-57708000 and
> > 57708000-5770a000. Heap follow immediately after that
> > 5770a000-5770c000 followed by anything mmaped after it (stack or
> > some other libraries). Heap can grow only up to 5ffdf000 meaning the
> > application is limited to 140 megs or so in this instance. This
> > limit can go much lower depending how the randomization went. And
> > even 140 megs is very little for big apps.
> 
> So what is the real problem again (i.e. the actual symptoms), please?
> Is it that your userspace memory allocator doesn't use mmap() for
> allocations at all?

Random application failures with uclibc and musl c-libraries. Both seem
to use mmap() for large allocations, and brk() for small ones. IIRC,
there was also some minor breakage with other applications that use
brk() to do some self-accounting / other funny stuff.

While this is not strictly a bug, I would still hope that the memory
layout is configured for maximum compatibility... or do you see it
introducing unwanted side effects?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jiri Kosina Dec. 19, 2013, 3:33 p.m. UTC | #5
On Thu, 19 Dec 2013, Timo Teras wrote:

> > > As you see, the main executable is mapped 57620000-57708000 and
> > > 57708000-5770a000. Heap follow immediately after that
> > > 5770a000-5770c000 followed by anything mmaped after it (stack or
> > > some other libraries). Heap can grow only up to 5ffdf000 meaning the
> > > application is limited to 140 megs or so in this instance. This
> > > limit can go much lower depending how the randomization went. And
> > > even 140 megs is very little for big apps.
> > 
> > So what is the real problem again (i.e. the actual symptoms), please?
> > Is it that your userspace memory allocator doesn't use mmap() for
> > allocations at all?
> 
> Random application failures with uclibc and musl c-libraries. Both seem
> to use mmap() for large allocations, and brk() for small ones. IIRC,
> there was also some minor breakage with other applications that use
> brk() to do some self-accounting / other funny stuff.

Hmm, that's really unfortunate. Applications really should be prepared for 
failing brk(), as there is no guarantee provided by anyobody whatsoever 
about the space allocated for the program break.

Heck, even my manpage tells me:

"Avoid using brk() and sbrk(): the malloc(3) memory allocation package is 
 the portable and comfortable  way  of allocating memory."

Patch
diff mbox series

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 100edcc..f1508c7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -802,21 +802,19 @@  static int load_elf_binary(struct linux_binprm *bprm)
 			 * default mmap base, as well as whatever program they
 			 * might try to exec.  This is because the brk will
 			 * follow the loader, and is not movable.  */
+			if (elf_interpreter)
+				load_bias = 0x00400000UL;
+			else
+				load_bias = ELF_ET_DYN_BASE;
 #ifdef CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE
 			/* Memory randomization might have been switched off
 			 * in runtime via sysctl or explicit setting of
 			 * personality flags.
-			 * If that is the case, retain the original non-zero
-			 * load_bias value in order to establish proper
-			 * non-randomized mappings.
 			 */
 			if (current->flags & PF_RANDOMIZE)
-				load_bias = 0;
-			else
-				load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
-#else
-			load_bias = ELF_PAGESTART(ELF_ET_DYN_BASE - vaddr);
+				load_bias += (get_random_int() & STACK_RND_MASK) << PAGE_SHIFT;
 #endif
+			load_bias = ELF_PAGESTART(vaddr + load_bias);
 		}
 
 		error = elf_map(bprm->file, load_bias + vaddr, elf_ppnt,