All of lore.kernel.org
 help / color / mirror / Atom feed
* MAP_FIXED for ELF mappings
@ 2017-10-04  7:50 Michal Hocko
  2017-10-04  7:59 ` Michal Hocko
                   ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-04  7:50 UTC (permalink / raw)
  To: Linus Torvalds, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, Baoquan He
  Cc: LKML

Hi,
while studying CVE-2017-1000253 and the MAP_FIXED usage in load_elf*
code paths I have stumbled over MAP_FIXED usage for elf segments
mapping. I am not really familiar with this area much so I might draw
completely incorrect conclusions here but I am really wondering why we
are doing MAP_FIXED there at all.

I can see why some segments really have to be mapped at a specific
address but I wonder whether MAP_FIXED is the right tool to achieve
that. It seems to me that MAP_FIXED is fundamentally dangerous because
it unmaps any existing mapping. I assume that nothing should be really
mapped in the requested range that early so we can only stumble over
something when the address space randomization place things unexpectedly
(which was the case of the above mentioned CVE AFAIU).

So my primary question is whether we can/should simply drop MAP_FIXED
from elf_map at all. Instead we should test whether the mapping was
successful for the requested address and fail otherwise. I realize that
failing due to something that a user has no idea about sucks a lot but
it seems to me safer to simply complain into the log and fail is a safer
option.

Something like the following completely untested diff. Or am I
completely missing the point of the MAP_FIXED purpose?
---
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6466153f2bf0..09456e2add18 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -341,6 +341,29 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
 
 #ifndef elf_map
 
+static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
+		unsigned long size, int prot, int type, unsigned long off)
+{
+	unsigned long map_addr;
+
+	/*
+	 * If caller requests the mapping at a specific place, make sure we fail
+	 * rather than potentially clobber an existing mapping which can have
+	 * security consequences (e.g. smash over the stack area).
+	 */
+	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
+	if (BAD_ADDR(map_addr))
+		return map_addr;
+
+	if ((type & MAP_FIXED) && map_addr != addr) {
+		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				(void*)addr);
+		return -EAGAIN;
+	}
+
+	return map_addr;
+}
+
 static unsigned long elf_map(struct file *filep, unsigned long addr,
 		struct elf_phdr *eppnt, int prot, int type,
 		unsigned long total_size)
@@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	*/
 	if (total_size) {
 		total_size = ELF_PAGEALIGN(total_size);
-		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
+		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
 		if (!BAD_ADDR(map_addr))
 			vm_munmap(map_addr+size, total_size-size);
 	} else
-		map_addr = vm_mmap(filep, addr, size, prot, type, off);
+		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
 
 	return(map_addr);
 }
@@ -1215,7 +1238,7 @@ static int load_elf_library(struct file *file)
 		eppnt++;
 
 	/* Now use mmap to map the library into memory. */
-	error = vm_mmap(file,
+	error = elf_vm_mmap(file,
 			ELF_PAGESTART(eppnt->p_vaddr),
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04  7:50 MAP_FIXED for ELF mappings Michal Hocko
@ 2017-10-04  7:59 ` Michal Hocko
  2017-10-04 15:03 ` Baoquan He
  2017-10-16 13:44 ` [PATCH 0/2] fs, elf: get rid of MAP_FIXED from the loader Michal Hocko
  2 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-04  7:59 UTC (permalink / raw)
  To: Linus Torvalds, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, Baoquan He
  Cc: LKML

Dohh, screwed up From. Sorry for spamming.

On Wed 04-10-17 09:50:59, Michal Hocko wrote:
> Hi,
> while studying CVE-2017-1000253 and the MAP_FIXED usage in load_elf*
> code paths I have stumbled over MAP_FIXED usage for elf segments
> mapping. I am not really familiar with this area much so I might draw
> completely incorrect conclusions here but I am really wondering why we
> are doing MAP_FIXED there at all.
> 
> I can see why some segments really have to be mapped at a specific
> address but I wonder whether MAP_FIXED is the right tool to achieve
> that. It seems to me that MAP_FIXED is fundamentally dangerous because
> it unmaps any existing mapping. I assume that nothing should be really
> mapped in the requested range that early so we can only stumble over
> something when the address space randomization place things unexpectedly
> (which was the case of the above mentioned CVE AFAIU).
> 
> So my primary question is whether we can/should simply drop MAP_FIXED
> from elf_map at all. Instead we should test whether the mapping was
> successful for the requested address and fail otherwise. I realize that
> failing due to something that a user has no idea about sucks a lot but
> it seems to me safer to simply complain into the log and fail is a safer
> option.
> 
> Something like the following completely untested diff. Or am I
> completely missing the point of the MAP_FIXED purpose?
> ---
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 6466153f2bf0..09456e2add18 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -341,6 +341,29 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
>  
>  #ifndef elf_map
>  
> +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> +		unsigned long size, int prot, int type, unsigned long off)
> +{
> +	unsigned long map_addr;
> +
> +	/*
> +	 * If caller requests the mapping at a specific place, make sure we fail
> +	 * rather than potentially clobber an existing mapping which can have
> +	 * security consequences (e.g. smash over the stack area).
> +	 */
> +	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> +	if (BAD_ADDR(map_addr))
> +		return map_addr;
> +
> +	if ((type & MAP_FIXED) && map_addr != addr) {
> +		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +				(void*)addr);
> +		return -EAGAIN;
> +	}
> +
> +	return map_addr;
> +}
> +
>  static unsigned long elf_map(struct file *filep, unsigned long addr,
>  		struct elf_phdr *eppnt, int prot, int type,
>  		unsigned long total_size)
> @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
>  	*/
>  	if (total_size) {
>  		total_size = ELF_PAGEALIGN(total_size);
> -		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
>  		if (!BAD_ADDR(map_addr))
>  			vm_munmap(map_addr+size, total_size-size);
>  	} else
> -		map_addr = vm_mmap(filep, addr, size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
>  
>  	return(map_addr);
>  }
> @@ -1215,7 +1238,7 @@ static int load_elf_library(struct file *file)
>  		eppnt++;
>  
>  	/* Now use mmap to map the library into memory. */
> -	error = vm_mmap(file,
> +	error = elf_vm_mmap(file,
>  			ELF_PAGESTART(eppnt->p_vaddr),
>  			(eppnt->p_filesz +
>  			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04  7:50 MAP_FIXED for ELF mappings Michal Hocko
  2017-10-04  7:59 ` Michal Hocko
@ 2017-10-04 15:03 ` Baoquan He
  2017-10-04 15:11   ` Michal Hocko
  2017-10-04 15:12   ` Baoquan He
  2017-10-16 13:44 ` [PATCH 0/2] fs, elf: get rid of MAP_FIXED from the loader Michal Hocko
  2 siblings, 2 replies; 39+ messages in thread
From: Baoquan He @ 2017-10-04 15:03 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On 10/04/17 at 09:50am, Michal Hocko wrote:
> Hi,
> while studying CVE-2017-1000253 and the MAP_FIXED usage in load_elf*
> code paths I have stumbled over MAP_FIXED usage for elf segments
> mapping. I am not really familiar with this area much so I might draw
> completely incorrect conclusions here but I am really wondering why we
> are doing MAP_FIXED there at all.
> 
> I can see why some segments really have to be mapped at a specific
> address but I wonder whether MAP_FIXED is the right tool to achieve
> that. It seems to me that MAP_FIXED is fundamentally dangerous because
> it unmaps any existing mapping. I assume that nothing should be really
> mapped in the requested range that early so we can only stumble over
> something when the address space randomization place things unexpectedly
> (which was the case of the above mentioned CVE AFAIU).
> 
> So my primary question is whether we can/should simply drop MAP_FIXED
> from elf_map at all. Instead we should test whether the mapping was
> successful for the requested address and fail otherwise. I realize that
> failing due to something that a user has no idea about sucks a lot but
> it seems to me safer to simply complain into the log and fail is a safer
> option.

Sorry to interrupt. I tried below example.c and example2.c files and
compile and link them with fpie and pie. Seems in the final PIE
executable, the local global is pc relative. Means the data segment has
to be put after the code segment though PIE program. Then MAP_FIXED is a
good to have flag, especially we have counted in the total_size to
search an area to cover the whole dynamic program and get the load_bias.
It's no way to fail to get map agrea in case load_addr_set == 1.
So with MAP_FIXED set, we won't take time to search the mm vma rb
tree when load_addr_set == 1, but just return the specified addr directly,
looks more efficient.

########### example2.c
int local_global_var=0x10;
int local_global_func(void)
{
        return 0x40;
}
########## example.c
#include <stdio.h>  
extern int local_global_var;
extern int local_global_func(void);

int main(void)  
{  
        int x = local_global_func();  
        local_global_var= 0x20;  
 
        while(getchar()!= 'q')  
                continue;  
        return 0;  
}

$gcc -fpie -c example.c example2.c
$ls example*.o
example.o example2.o
$objdump -d -r example.o
...
0000000000000000 <main>:
   0:   55                      push   %rbp
   1:   48 89 e5                mov    %rsp,%rbp
   4:   48 83 ec 10             sub    $0x10,%rsp
   8:   e8 00 00 00 00          callq  d <main+0xd>
                        9: R_X86_64_PLT32       local_global_func-0x4
   d:   89 45 fc                mov    %eax,-0x4(%rbp)
  10:   c7 05 00 00 00 00 20    movl   $0x20,0x0(%rip)        # 1a
<main+0x1a>
  17:   00 00 00 
                        12: R_X86_64_PC32       local_global_var-0x8
  1a:   eb 01                   jmp    1d <main+0x1d>
  1c:   90                      nop
  1d:   e8 00 00 00 00          callq  22 <main+0x22>
                        1e: R_X86_64_PLT32      getchar-0x4
  22:   83 f8 71                cmp    $0x71,%eax
  25:   75 f5                   jne    1c <main+0x1c>
  27:   b8 00 00 00 00          mov    $0x0,%eax
  2c:   c9                      leaveq 
  2d:   c3                      retq
...
$gcc -pie -o test example.o example2.o
$objdump -d -r test
...
0000000000000655 <main>:
 655:   55                      push   %rbp
 656:   48 89 e5                mov    %rsp,%rbp
 659:   48 83 ec 10             sub    $0x10,%rsp
 65d:   e8 e8 ff ff ff          callq  64a <local_global_func>
 662:   89 45 fc                mov    %eax,-0x4(%rbp)
 665:   c7 05 b5 09 20 00 20    movl   $0x20,0x2009b5(%rip)
#201024 <local_global_var>
 66c:   00 00 00 
 66f:   eb 01                   jmp    672 <main+0x1d>
 671:   90                      nop
 672:   e8 a9 fe ff ff          callq  520 <getchar@plt>
 677:   83 f8 71                cmp    $0x71,%eax
 67a:   75 f5                   jne    671 <main+0x1c>
 67c:   b8 00 00 00 00          mov    $0x0,%eax
 681:   c9                      leaveq 
 682:   c3                      retq   
 683:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
 68a:   00 00 00 
 68d:   0f 1f 00                nopl   (%rax)
...
 

> 
> Something like the following completely untested diff. Or am I
> completely missing the point of the MAP_FIXED purpose?
> ---
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 6466153f2bf0..09456e2add18 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -341,6 +341,29 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
>  
>  #ifndef elf_map
>  
> +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> +		unsigned long size, int prot, int type, unsigned long off)
> +{
> +	unsigned long map_addr;
> +
> +	/*
> +	 * If caller requests the mapping at a specific place, make sure we fail
> +	 * rather than potentially clobber an existing mapping which can have
> +	 * security consequences (e.g. smash over the stack area).
> +	 */
> +	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> +	if (BAD_ADDR(map_addr))
> +		return map_addr;
> +
> +	if ((type & MAP_FIXED) && map_addr != addr) {
> +		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +				(void*)addr);
> +		return -EAGAIN;
> +	}
> +
> +	return map_addr;
> +}
> +
>  static unsigned long elf_map(struct file *filep, unsigned long addr,
>  		struct elf_phdr *eppnt, int prot, int type,
>  		unsigned long total_size)
> @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
>  	*/
>  	if (total_size) {
>  		total_size = ELF_PAGEALIGN(total_size);
> -		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
>  		if (!BAD_ADDR(map_addr))
>  			vm_munmap(map_addr+size, total_size-size);
>  	} else
> -		map_addr = vm_mmap(filep, addr, size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
>  
>  	return(map_addr);
>  }
> @@ -1215,7 +1238,7 @@ static int load_elf_library(struct file *file)
>  		eppnt++;
>  
>  	/* Now use mmap to map the library into memory. */
> -	error = vm_mmap(file,
> +	error = elf_vm_mmap(file,
>  			ELF_PAGESTART(eppnt->p_vaddr),
>  			(eppnt->p_filesz +
>  			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04 15:03 ` Baoquan He
@ 2017-10-04 15:11   ` Michal Hocko
  2017-10-04 15:12   ` Baoquan He
  1 sibling, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-04 15:11 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linus Torvalds, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On Wed 04-10-17 23:03:34, Baoquan He wrote:
> On 10/04/17 at 09:50am, Michal Hocko wrote:
> > Hi,
> > while studying CVE-2017-1000253 and the MAP_FIXED usage in load_elf*
> > code paths I have stumbled over MAP_FIXED usage for elf segments
> > mapping. I am not really familiar with this area much so I might draw
> > completely incorrect conclusions here but I am really wondering why we
> > are doing MAP_FIXED there at all.
> > 
> > I can see why some segments really have to be mapped at a specific
> > address but I wonder whether MAP_FIXED is the right tool to achieve
> > that. It seems to me that MAP_FIXED is fundamentally dangerous because
> > it unmaps any existing mapping. I assume that nothing should be really
> > mapped in the requested range that early so we can only stumble over
> > something when the address space randomization place things unexpectedly
> > (which was the case of the above mentioned CVE AFAIU).
> > 
> > So my primary question is whether we can/should simply drop MAP_FIXED
> > from elf_map at all. Instead we should test whether the mapping was
> > successful for the requested address and fail otherwise. I realize that
> > failing due to something that a user has no idea about sucks a lot but
> > it seems to me safer to simply complain into the log and fail is a safer
> > option.
> 
> Sorry to interrupt. I tried below example.c and example2.c files and
> compile and link them with fpie and pie. Seems in the final PIE
> executable, the local global is pc relative. Means the data segment has
> to be put after the code segment though PIE program. Then MAP_FIXED is a
> good to have flag, especially we have counted in the total_size to
> search an area to cover the whole dynamic program and get the load_bias.
> It's no way to fail to get map agrea in case load_addr_set == 1.
> So with MAP_FIXED set, we won't take time to search the mm vma rb
> tree when load_addr_set == 1, but just return the specified addr directly,
> looks more efficient.

I am sorry, but I do not follow. elf_map should get the address hint to
use. MAP_FIXED merely unmaps anything underneath if there is something
which is exactly what I called dangerous. Without the flag we would just
fail in that case. Or, are you suggesting that MAP_FIXED is a
performance optimization because we are not doing find_vma in that case?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04 15:03 ` Baoquan He
  2017-10-04 15:11   ` Michal Hocko
@ 2017-10-04 15:12   ` Baoquan He
  2017-10-04 15:17     ` Michal Hocko
  1 sibling, 1 reply; 39+ messages in thread
From: Baoquan He @ 2017-10-04 15:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

I made a clean up patch according to Oleg's suggestion. It's trying to
get an map area to cover total_size, then do mmap for for the 1st
program segment only. Not sure if this way is correct.

>From 40f231bb78a74caebcb4a898089a9fa5323be05f Mon Sep 17 00:00:00 2001
From: Baoquan He <bhe@redhat.com>
Date: Fri, 29 Sep 2017 21:35:30 +0800
Subject: [PATCH] binfmt_elf: Clean up the elf_map

Oleg pointed out that it's really ugly to do mmap of the total_size, then
unmap the region excluding the 1st segment. The right way should be search
an unmapped area which can cover region of total_size, then map the 1st
segment only.

And also update the code comment accordingly. In below commit, the relevant
code comment is not changed to cover the ELF binary image case.
commit a87938b2e2 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 fs/binfmt_elf.c | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 72b7ecba7ead..43a47b2aa3f6 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -357,22 +357,25 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 		return addr;
 
 	/*
-	* total_size is the size of the ELF (interpreter) image.
-	* The _first_ mmap needs to know the full size, otherwise
-	* randomization might put this image into an overlapping
-	* position with the ELF binary image. (since size < total_size)
-	* So we first map the 'big' image - and unmap the remainder at
-	* the end. (which unmap is needed for ELF images with holes.)
+	* total_size is the size of the ELF binary image or the ELF loader
+	* image. For loader image, the _first_ mmap needs to know the full
+	* size, otherwise randomization might put image into an overlapping
+	* position with the ELF binary image.(since size < total_size)
+	* So we use total_size to get an area to cover the whole loader image,
+	* then map the 1st progment segment only with its own size. For binary
+	* image, similarly, the _first_ mmap also needs to know the full size,
+	* otherwise randomization might put image above mm->mmap_base.
 	*/
 	if (total_size) {
 		total_size = ELF_PAGEALIGN(total_size);
-		map_addr = vm_mmap(filep, addr, total_size, prot, flags, off);
-		if (!BAD_ADDR(map_addr))
-			vm_munmap(map_addr+size, total_size-size);
-	} else
-		map_addr = vm_mmap(filep, addr, size, prot, flags, off);
+		addr = get_unmapped_area(file, addr, total_size, off, flags);
+		if (offset_in_page(addr))
+			return addr;
+	}
+
+	map_addr = vm_mmap(filep, addr, size, prot, flags, off);
 
-	return(map_addr);
+	return (map_addr);
 }
 
 #endif /* !elf_map */
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04 15:12   ` Baoquan He
@ 2017-10-04 15:17     ` Michal Hocko
  2017-10-04 15:37       ` Baoquan He
  2017-10-05 16:33       ` Oleg Nesterov
  0 siblings, 2 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-04 15:17 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linus Torvalds, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On Wed 04-10-17 23:12:38, Baoquan He wrote:
> I made a clean up patch according to Oleg's suggestion. It's trying to
> get an map area to cover total_size, then do mmap for for the 1st
> program segment only. Not sure if this way is correct.
> 
> >From 40f231bb78a74caebcb4a898089a9fa5323be05f Mon Sep 17 00:00:00 2001
> From: Baoquan He <bhe@redhat.com>
> Date: Fri, 29 Sep 2017 21:35:30 +0800
> Subject: [PATCH] binfmt_elf: Clean up the elf_map
> 
> Oleg pointed out that it's really ugly to do mmap of the total_size, then
> unmap the region excluding the 1st segment. The right way should be search
> an unmapped area which can cover region of total_size, then map the 1st
> segment only.
> 
> And also update the code comment accordingly. In below commit, the relevant
> code comment is not changed to cover the ELF binary image case.
> commit a87938b2e2 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")
> 
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  fs/binfmt_elf.c | 27 +++++++++++++++------------
>  1 file changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 72b7ecba7ead..43a47b2aa3f6 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -357,22 +357,25 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
>  		return addr;
>  
>  	/*
> -	* total_size is the size of the ELF (interpreter) image.
> -	* The _first_ mmap needs to know the full size, otherwise
> -	* randomization might put this image into an overlapping
> -	* position with the ELF binary image. (since size < total_size)
> -	* So we first map the 'big' image - and unmap the remainder at
> -	* the end. (which unmap is needed for ELF images with holes.)
> +	* total_size is the size of the ELF binary image or the ELF loader
> +	* image. For loader image, the _first_ mmap needs to know the full
> +	* size, otherwise randomization might put image into an overlapping
> +	* position with the ELF binary image.(since size < total_size)
> +	* So we use total_size to get an area to cover the whole loader image,
> +	* then map the 1st progment segment only with its own size. For binary
> +	* image, similarly, the _first_ mmap also needs to know the full size,
> +	* otherwise randomization might put image above mm->mmap_base.
>  	*/
>  	if (total_size) {
>  		total_size = ELF_PAGEALIGN(total_size);
> -		map_addr = vm_mmap(filep, addr, total_size, prot, flags, off);
> -		if (!BAD_ADDR(map_addr))
> -			vm_munmap(map_addr+size, total_size-size);
> -	} else
> -		map_addr = vm_mmap(filep, addr, size, prot, flags, off);
> +		addr = get_unmapped_area(file, addr, total_size, off, flags);

So how does this prevent clobbering an existing VMA when flags contains
MAP_FIXED?

> +		if (offset_in_page(addr))
> +			return addr;
> +	}
> +
> +	map_addr = vm_mmap(filep, addr, size, prot, flags, off);
>  
> -	return(map_addr);
> +	return (map_addr);
>  }
>  
>  #endif /* !elf_map */
> -- 
> 2.5.5

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04 15:17     ` Michal Hocko
@ 2017-10-04 15:37       ` Baoquan He
  2017-10-04 17:12         ` Michal Hocko
  2017-10-05 16:33       ` Oleg Nesterov
  1 sibling, 1 reply; 39+ messages in thread
From: Baoquan He @ 2017-10-04 15:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linus Torvalds, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On 10/04/17 at 05:17pm, Michal Hocko wrote:
> On Wed 04-10-17 23:12:38, Baoquan He wrote:
> > I made a clean up patch according to Oleg's suggestion. It's trying to
> > get an map area to cover total_size, then do mmap for for the 1st
> > program segment only. Not sure if this way is correct.
> > 
> > >From 40f231bb78a74caebcb4a898089a9fa5323be05f Mon Sep 17 00:00:00 2001
> > From: Baoquan He <bhe@redhat.com>
> > Date: Fri, 29 Sep 2017 21:35:30 +0800
> > Subject: [PATCH] binfmt_elf: Clean up the elf_map
> > 
> > Oleg pointed out that it's really ugly to do mmap of the total_size, then
> > unmap the region excluding the 1st segment. The right way should be search
> > an unmapped area which can cover region of total_size, then map the 1st
> > segment only.
> > 
> > And also update the code comment accordingly. In below commit, the relevant
> > code comment is not changed to cover the ELF binary image case.
> > commit a87938b2e2 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")
> > 
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> >  fs/binfmt_elf.c | 27 +++++++++++++++------------
> >  1 file changed, 15 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index 72b7ecba7ead..43a47b2aa3f6 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -357,22 +357,25 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> >  		return addr;
> >  
> >  	/*
> > -	* total_size is the size of the ELF (interpreter) image.
> > -	* The _first_ mmap needs to know the full size, otherwise
> > -	* randomization might put this image into an overlapping
> > -	* position with the ELF binary image. (since size < total_size)
> > -	* So we first map the 'big' image - and unmap the remainder at
> > -	* the end. (which unmap is needed for ELF images with holes.)
> > +	* total_size is the size of the ELF binary image or the ELF loader
> > +	* image. For loader image, the _first_ mmap needs to know the full
> > +	* size, otherwise randomization might put image into an overlapping
> > +	* position with the ELF binary image.(since size < total_size)
> > +	* So we use total_size to get an area to cover the whole loader image,
> > +	* then map the 1st progment segment only with its own size. For binary
> > +	* image, similarly, the _first_ mmap also needs to know the full size,
> > +	* otherwise randomization might put image above mm->mmap_base.

Oh, no, here, it won't include the PIE binary case. Since it must be
from ELF_ET_DYN_BASE. Here should be the "ld.so program" case.

> >  	*/
> >  	if (total_size) {
> >  		total_size = ELF_PAGEALIGN(total_size);
> > -		map_addr = vm_mmap(filep, addr, total_size, prot, flags, off);
> > -		if (!BAD_ADDR(map_addr))
> > -			vm_munmap(map_addr+size, total_size-size);
> > -	} else
> > -		map_addr = vm_mmap(filep, addr, size, prot, flags, off);
> > +		addr = get_unmapped_area(file, addr, total_size, off, flags);
> 
> So how does this prevent clobbering an existing VMA when flags contains
> MAP_FIXED?

Earlier flush_old_exec() is called to clean all old VMAs. Here if
total_size is non-zero only if it's the 1st prgoram segment of dynamic
loader, either from load_elf_interp() or the "ld.so program" case. With
my understanding, it can't be meeting an existing VMA. Not sure if
it's correct. Currently, it haven't passed to ld.so to link other .so.

> 
> > +		if (offset_in_page(addr))
> > +			return addr;
> > +	}
> > +
> > +	map_addr = vm_mmap(filep, addr, size, prot, flags, off);
> >  
> > -	return(map_addr);
> > +	return (map_addr);
> >  }
> >  
> >  #endif /* !elf_map */
> > -- 
> > 2.5.5
> 
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04 15:37       ` Baoquan He
@ 2017-10-04 17:12         ` Michal Hocko
  2017-10-04 17:15           ` Linus Torvalds
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2017-10-04 17:12 UTC (permalink / raw)
  To: Baoquan He
  Cc: Linus Torvalds, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On Wed 04-10-17 23:37:41, Baoquan He wrote:
> On 10/04/17 at 05:17pm, Michal Hocko wrote:
> > On Wed 04-10-17 23:12:38, Baoquan He wrote:
> > > I made a clean up patch according to Oleg's suggestion. It's trying to
> > > get an map area to cover total_size, then do mmap for for the 1st
> > > program segment only. Not sure if this way is correct.
> > > 
> > > >From 40f231bb78a74caebcb4a898089a9fa5323be05f Mon Sep 17 00:00:00 2001
> > > From: Baoquan He <bhe@redhat.com>
> > > Date: Fri, 29 Sep 2017 21:35:30 +0800
> > > Subject: [PATCH] binfmt_elf: Clean up the elf_map
> > > 
> > > Oleg pointed out that it's really ugly to do mmap of the total_size, then
> > > unmap the region excluding the 1st segment. The right way should be search
> > > an unmapped area which can cover region of total_size, then map the 1st
> > > segment only.
> > > 
> > > And also update the code comment accordingly. In below commit, the relevant
> > > code comment is not changed to cover the ELF binary image case.
> > > commit a87938b2e2 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")
> > > 
> > > Signed-off-by: Baoquan He <bhe@redhat.com>
> > > ---
> > >  fs/binfmt_elf.c | 27 +++++++++++++++------------
> > >  1 file changed, 15 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > > index 72b7ecba7ead..43a47b2aa3f6 100644
> > > --- a/fs/binfmt_elf.c
> > > +++ b/fs/binfmt_elf.c
> > > @@ -357,22 +357,25 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> > >  		return addr;
> > >  
> > >  	/*
> > > -	* total_size is the size of the ELF (interpreter) image.
> > > -	* The _first_ mmap needs to know the full size, otherwise
> > > -	* randomization might put this image into an overlapping
> > > -	* position with the ELF binary image. (since size < total_size)
> > > -	* So we first map the 'big' image - and unmap the remainder at
> > > -	* the end. (which unmap is needed for ELF images with holes.)
> > > +	* total_size is the size of the ELF binary image or the ELF loader
> > > +	* image. For loader image, the _first_ mmap needs to know the full
> > > +	* size, otherwise randomization might put image into an overlapping
> > > +	* position with the ELF binary image.(since size < total_size)
> > > +	* So we use total_size to get an area to cover the whole loader image,
> > > +	* then map the 1st progment segment only with its own size. For binary
> > > +	* image, similarly, the _first_ mmap also needs to know the full size,
> > > +	* otherwise randomization might put image above mm->mmap_base.
> 
> Oh, no, here, it won't include the PIE binary case. Since it must be
> from ELF_ET_DYN_BASE. Here should be the "ld.so program" case.
> 
> > >  	*/
> > >  	if (total_size) {
> > >  		total_size = ELF_PAGEALIGN(total_size);
> > > -		map_addr = vm_mmap(filep, addr, total_size, prot, flags, off);
> > > -		if (!BAD_ADDR(map_addr))
> > > -			vm_munmap(map_addr+size, total_size-size);
> > > -	} else
> > > -		map_addr = vm_mmap(filep, addr, size, prot, flags, off);
> > > +		addr = get_unmapped_area(file, addr, total_size, off, flags);
> > 
> > So how does this prevent clobbering an existing VMA when flags contains
> > MAP_FIXED?
> 
> Earlier flush_old_exec() is called to clean all old VMAs.

Yes, but we already have a new stack mapped and that was the point of
the referenced CVE where the binary segments got mapped over the stack
AFAIU.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04 17:12         ` Michal Hocko
@ 2017-10-04 17:15           ` Linus Torvalds
  2017-10-04 17:28             ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Linus Torvalds @ 2017-10-04 17:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Baoquan He, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On Wed, Oct 4, 2017 at 10:12 AM, Michal Hocko <mhocko@kernel.org> wrote:
>
> Yes, but we already have a new stack mapped and that was the point of
> the referenced CVE where the binary segments got mapped over the stack
> AFAIU.

Well, if you control the binary to the point where you just make the
ELF section map on top of the stack, what's the problem?

I mean, it's not a security issue. You could just have written the
code to do mmap() instead.

So I think this is a "crazy users can do crazy things, we're not
arbiters of taste" thing.

                       Linus

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04 17:15           ` Linus Torvalds
@ 2017-10-04 17:28             ` Michal Hocko
  0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-04 17:28 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Baoquan He, Kees Cook, Oleg Nesterov, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On Wed 04-10-17 10:15:31, Linus Torvalds wrote:
> On Wed, Oct 4, 2017 at 10:12 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >
> > Yes, but we already have a new stack mapped and that was the point of
> > the referenced CVE where the binary segments got mapped over the stack
> > AFAIU.
> 
> Well, if you control the binary to the point where you just make the
> ELF section map on top of the stack, what's the problem?
> 
> I mean, it's not a security issue. You could just have written the
> code to do mmap() instead.
> 
> So I think this is a "crazy users can do crazy things, we're not
> arbiters of taste" thing.

Well, my understanding of the CVE (fixed by a87938b2e246
("fs/binfmt_elf.c: fix bug in loading of PIE binaries")) is that the elf
and stack randomization worked against each other.
7ffbad995000-7ffbad996000 rw-p 00000000 00:00 0 
7ffbad996000-7ffbad998000 r-xp 00000000 fd:00 4194375                    /tmp/PIE
7ffbad999000-7ffbadb97000 rw-p 00000000 00:00 0                          [stack]
7ffbadb97000-7ffbadb98000 r--p 00001000 fd:00 4194375                    /tmp/PIE
7ffbadb98000-7ffbadbb9000 rw-p 00002000 fd:00 4194375                    /tmp/PIE
7ffbadbba000-7ffc0d9ba000 rw-p 00000000 00:00 0                          [heap] # incorrectly marked stack

So while the issue is already fixed that made me think how MAP_FIXED
silently corrupted the underlying stack and how it is inherently
dangerous and also why do we need it in the first place. I would rather
make any issues like that impossible for ever.

But as I've said I have hard time to wrap my head around the whole
load_elf_binary so I can be missing something easily here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-04 15:17     ` Michal Hocko
  2017-10-04 15:37       ` Baoquan He
@ 2017-10-05 16:33       ` Oleg Nesterov
  2017-10-05 16:42         ` Michal Hocko
  1 sibling, 1 reply; 39+ messages in thread
From: Oleg Nesterov @ 2017-10-05 16:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Baoquan He, Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On 10/04, Michal Hocko wrote:
>
> On Wed 04-10-17 23:12:38, Baoquan He wrote:
> >  	if (total_size) {
> >  		total_size = ELF_PAGEALIGN(total_size);
> > -		map_addr = vm_mmap(filep, addr, total_size, prot, flags, off);
> > -		if (!BAD_ADDR(map_addr))
> > -			vm_munmap(map_addr+size, total_size-size);
> > -	} else
> > -		map_addr = vm_mmap(filep, addr, size, prot, flags, off);
> > +		addr = get_unmapped_area(file, addr, total_size, off, flags);
>
> So how does this prevent clobbering an existing VMA when flags contains
> MAP_FIXED?

I got lost...

this is just cleanup, it should not change the behaviour, with or without
MAP_FIXED. It just avoids the mmap(total_size) + munmap(extra_size).

Oleg.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: MAP_FIXED for ELF mappings
  2017-10-05 16:33       ` Oleg Nesterov
@ 2017-10-05 16:42         ` Michal Hocko
  0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-05 16:42 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Baoquan He, Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro,
	Ingo Molnar, LKML

On Thu 05-10-17 18:33:20, Oleg Nesterov wrote:
> On 10/04, Michal Hocko wrote:
> >
> > On Wed 04-10-17 23:12:38, Baoquan He wrote:
> > >  	if (total_size) {
> > >  		total_size = ELF_PAGEALIGN(total_size);
> > > -		map_addr = vm_mmap(filep, addr, total_size, prot, flags, off);
> > > -		if (!BAD_ADDR(map_addr))
> > > -			vm_munmap(map_addr+size, total_size-size);
> > > -	} else
> > > -		map_addr = vm_mmap(filep, addr, size, prot, flags, off);
> > > +		addr = get_unmapped_area(file, addr, total_size, off, flags);
> >
> > So how does this prevent clobbering an existing VMA when flags contains
> > MAP_FIXED?
> 
> I got lost...
> 
> this is just cleanup, it should not change the behaviour, with or without
> MAP_FIXED. It just avoids the mmap(total_size) + munmap(extra_size).

OK, I have clearly misunderstood the intention then.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 0/2] fs, elf: get rid of MAP_FIXED from the loader
  2017-10-04  7:50 MAP_FIXED for ELF mappings Michal Hocko
  2017-10-04  7:59 ` Michal Hocko
  2017-10-04 15:03 ` Baoquan He
@ 2017-10-16 13:44 ` Michal Hocko
  2017-10-16 13:44   ` [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko
  2017-10-16 13:44   ` [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment Michal Hocko
  2 siblings, 2 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-16 13:44 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

Hi,
the previous discussion didn't really show any hard requirement for
MAP_FIXED usage for the elf segments mapping. I have spent some more
time studying the code (thanks to Qualys for their insight) and
concluded that the current MAP_FIXED usage is rather fragile and not
really needed. The first patch replaces it by a hint mmaping and failing
rather than silently corrupt an existing memory and the second patch
removes MAP_FIXED for the initial segment mapping because this shouldn't
be really needed either, I would even call it wrong.
Anyway, more details are in the changelog of patches. I will really
appreciate any feedback.

This has passed some testing with PIE/PIC binaries running in the loop
without any negative side effects detected.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-16 13:44 ` [PATCH 0/2] fs, elf: get rid of MAP_FIXED from the loader Michal Hocko
@ 2017-10-16 13:44   ` Michal Hocko
  2017-10-16 16:39     ` Kees Cook
  2017-10-17 12:26     ` Baoquan He
  2017-10-16 13:44   ` [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment Michal Hocko
  1 sibling, 2 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-16 13:44 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by adding a helper elf_vm_mmap used by elf_map which
drops MAP_FIXED when asking for the mapping and check whether the
returned address really matches what the caller asked for and complain
loudly if this is not the case and fail. Such a failure would be a
kernel bug and it should alarm us to look what has gone wrong.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/binfmt_elf.c | 29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6466153f2bf0..09456e2add18 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -341,6 +341,29 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
 
 #ifndef elf_map
 
+static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
+		unsigned long size, int prot, int type, unsigned long off)
+{
+	unsigned long map_addr;
+
+	/*
+	 * If caller requests the mapping at a specific place, make sure we fail
+	 * rather than potentially clobber an existing mapping which can have
+	 * security consequences (e.g. smash over the stack area).
+	 */
+	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
+	if (BAD_ADDR(map_addr))
+		return map_addr;
+
+	if ((type & MAP_FIXED) && map_addr != addr) {
+		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				(void*)addr);
+		return -EAGAIN;
+	}
+
+	return map_addr;
+}
+
 static unsigned long elf_map(struct file *filep, unsigned long addr,
 		struct elf_phdr *eppnt, int prot, int type,
 		unsigned long total_size)
@@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	*/
 	if (total_size) {
 		total_size = ELF_PAGEALIGN(total_size);
-		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
+		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
 		if (!BAD_ADDR(map_addr))
 			vm_munmap(map_addr+size, total_size-size);
 	} else
-		map_addr = vm_mmap(filep, addr, size, prot, type, off);
+		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
 
 	return(map_addr);
 }
@@ -1215,7 +1238,7 @@ static int load_elf_library(struct file *file)
 		eppnt++;
 
 	/* Now use mmap to map the library into memory. */
-	error = vm_mmap(file,
+	error = elf_vm_mmap(file,
 			ELF_PAGESTART(eppnt->p_vaddr),
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-16 13:44 ` [PATCH 0/2] fs, elf: get rid of MAP_FIXED from the loader Michal Hocko
  2017-10-16 13:44   ` [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko
@ 2017-10-16 13:44   ` Michal Hocko
  2017-10-16 16:44     ` Kees Cook
  1 sibling, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2017-10-16 13:44 UTC (permalink / raw)
  To: LKML
  Cc: Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") has added
MAP_FIXED flag to the initial ET_DYN segment mapping which defines the
randomized base for the PIE ELF segments. The thing is that MAP_FIXED
shouldn't be really needed because the address is essentially random
anyway. All other segments are mapped relatively to this base. elf_map
makes sure that all segments will fit into the address space by
enforcing total_mapping_size initial map.

Why do we want to drop MAP_FIXED in the first place? Because it is error
prone. If we happen to have an existing mapping in the requested range
then we do not want to corrupt it silently. Without MAP_FIXED vm_mmap
will simply fallback to another range. In reality there shouldn't be
any conflicting mappings at this early exec stage so the mmap should
succeed even without MAP_FIXED but subtle changes to the randomization
can break this assumption so we should rather be careful here.

Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 fs/binfmt_elf.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 09456e2add18..244cc30dfa24 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -988,7 +988,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
 			} else
 				load_bias = 0;
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-16 13:44   ` [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko
@ 2017-10-16 16:39     ` Kees Cook
  2017-10-16 19:00         ` Michal Hocko
  2017-10-17 12:26     ` Baoquan He
  1 sibling, 1 reply; 39+ messages in thread
From: Kees Cook @ 2017-10-16 16:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He, Michal Hocko

On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Both load_elf_interp and load_elf_binary rely on elf_map to map segments
> on a controlled address and they use MAP_FIXED to enforce that. This is
> however dangerous thing prone to silent data corruption which can be
> even exploitable. Let's take CVE-2017-1000253 as an example. At the time
> (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
> ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
> the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
> we could end up mapping over the existing stack with some luck.
>
> The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
> fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
> further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
> revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
> stack consumption early during execve fully stopped by da029c11e6b1
> ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
> safe and any attack should be impractical. On the other hand this is
> just too subtle assumption so it can break quite easily and hard to
> spot.
>
> I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
> fundamentally dangerous. Moreover it shouldn't be even needed. We are
> at the early process stage and so there shouldn't be unrelated mappings
> (except for stack and loader) existing so mmap for a given address
> should succeed even without MAP_FIXED. Something is terribly wrong if
> this is not the case and we should rather fail than silently corrupt the
> underlying mapping.
>
> Address this issue by adding a helper elf_vm_mmap used by elf_map which
> drops MAP_FIXED when asking for the mapping and check whether the
> returned address really matches what the caller asked for and complain
> loudly if this is not the case and fail. Such a failure would be a
> kernel bug and it should alarm us to look what has gone wrong.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/binfmt_elf.c | 29 ++++++++++++++++++++++++++---
>  1 file changed, 26 insertions(+), 3 deletions(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 6466153f2bf0..09456e2add18 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -341,6 +341,29 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
>
>  #ifndef elf_map
>
> +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> +               unsigned long size, int prot, int type, unsigned long off)
> +{
> +       unsigned long map_addr;
> +
> +       /*
> +        * If caller requests the mapping at a specific place, make sure we fail
> +        * rather than potentially clobber an existing mapping which can have
> +        * security consequences (e.g. smash over the stack area).
> +        */
> +       map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> +       if (BAD_ADDR(map_addr))
> +               return map_addr;
> +
> +       if ((type & MAP_FIXED) && map_addr != addr) {
> +               pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +                               (void*)addr);

Is "info" loud enough? I actually think this should be a WARN_ONCE().

> +               return -EAGAIN;
> +       }
> +
> +       return map_addr;
> +}
> +
>  static unsigned long elf_map(struct file *filep, unsigned long addr,
>                 struct elf_phdr *eppnt, int prot, int type,
>                 unsigned long total_size)
> @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,

elf_map is redirected on metag -- it should probably have its vm_mmap
calls adjust too.

>         */
>         if (total_size) {
>                 total_size = ELF_PAGEALIGN(total_size);
> -               map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> +               map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
>                 if (!BAD_ADDR(map_addr))
>                         vm_munmap(map_addr+size, total_size-size);
>         } else
> -               map_addr = vm_mmap(filep, addr, size, prot, type, off);
> +               map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
>
>         return(map_addr);
>  }
> @@ -1215,7 +1238,7 @@ static int load_elf_library(struct file *file)
>                 eppnt++;
>
>         /* Now use mmap to map the library into memory. */
> -       error = vm_mmap(file,
> +       error = elf_vm_mmap(file,
>                         ELF_PAGESTART(eppnt->p_vaddr),
>                         (eppnt->p_filesz +
>                          ELF_PAGEOFFSET(eppnt->p_vaddr)),
> --
> 2.14.2
>

Otherwise, yeah, this should be good.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-16 13:44   ` [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment Michal Hocko
@ 2017-10-16 16:44     ` Kees Cook
  2017-10-16 18:43       ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Kees Cook @ 2017-10-16 16:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He, Michal Hocko

On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") has added
> MAP_FIXED flag to the initial ET_DYN segment mapping which defines the
> randomized base for the PIE ELF segments. The thing is that MAP_FIXED
> shouldn't be really needed because the address is essentially random
> anyway. All other segments are mapped relatively to this base. elf_map
> makes sure that all segments will fit into the address space by
> enforcing total_mapping_size initial map.
>
> Why do we want to drop MAP_FIXED in the first place? Because it is error
> prone. If we happen to have an existing mapping in the requested range
> then we do not want to corrupt it silently. Without MAP_FIXED vm_mmap
> will simply fallback to another range. In reality there shouldn't be
> any conflicting mappings at this early exec stage so the mmap should
> succeed even without MAP_FIXED but subtle changes to the randomization
> can break this assumption so we should rather be careful here.
>
> Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  fs/binfmt_elf.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 09456e2add18..244cc30dfa24 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -988,7 +988,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                                 load_bias = ELF_ET_DYN_BASE;
>                                 if (current->flags & PF_RANDOMIZE)
>                                         load_bias += arch_mmap_rnd();
> -                               elf_flags |= MAP_FIXED;

If MAP_FIXED is being masked out in patch 1 (but used as a check for
correct position, I think this MAP_FIXED should _not_ be removed).
This provides for checking for the initial mapping. The failure mode
here would be to allow an attack to "push" a mapping away from some
overlapping region. This should not be allowed either: if the initial
mapping is "wrong", we should absolutely fail, otherwise we can be
introducing a silent reduction in PIE entropy.

So, yes to patch 1, that makes sense to not allow silent overlap, but
no to patch 2: we do not want silent entropy reduction.

-Kees

>                         } else
>                                 load_bias = 0;
>
> --
> 2.14.2
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-16 16:44     ` Kees Cook
@ 2017-10-16 18:43       ` Michal Hocko
  2017-10-16 19:38         ` Kees Cook
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2017-10-16 18:43 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

On Mon 16-10-17 09:44:31, Kees Cook wrote:
> On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> > eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") has added
> > MAP_FIXED flag to the initial ET_DYN segment mapping which defines the
> > randomized base for the PIE ELF segments. The thing is that MAP_FIXED
> > shouldn't be really needed because the address is essentially random
> > anyway. All other segments are mapped relatively to this base. elf_map
> > makes sure that all segments will fit into the address space by
> > enforcing total_mapping_size initial map.
> >
> > Why do we want to drop MAP_FIXED in the first place? Because it is error
> > prone. If we happen to have an existing mapping in the requested range
> > then we do not want to corrupt it silently. Without MAP_FIXED vm_mmap
> > will simply fallback to another range. In reality there shouldn't be
> > any conflicting mappings at this early exec stage so the mmap should
> > succeed even without MAP_FIXED but subtle changes to the randomization
> > can break this assumption so we should rather be careful here.
> >
> > Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  fs/binfmt_elf.c | 1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index 09456e2add18..244cc30dfa24 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -988,7 +988,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
> >                                 load_bias = ELF_ET_DYN_BASE;
> >                                 if (current->flags & PF_RANDOMIZE)
> >                                         load_bias += arch_mmap_rnd();
> > -                               elf_flags |= MAP_FIXED;
> 
> If MAP_FIXED is being masked out in patch 1 (but used as a check for
> correct position, I think this MAP_FIXED should _not_ be removed).
> This provides for checking for the initial mapping. The failure mode
> here would be to allow an attack to "push" a mapping away from some
> overlapping region. This should not be allowed either: if the initial
> mapping is "wrong", we should absolutely fail, otherwise we can be
> introducing a silent reduction in PIE entropy.

Do we really lose any entropy? We are using standard randomized mmap in
that case. So we are randomized in either case. Are you worried that
an attacker could tell the two cases and abuse some sort of offset2lib
attack?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-10-16 19:00         ` Michal Hocko
  0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-16 19:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He, James Hogan, linux-metag

[CCing metag people for the metag elf_map implementation specific. The thread
starts here http://lkml.kernel.org/r/20171016134446.19910-1-mhocko@kernel.org]

On Mon 16-10-17 09:39:14, Kees Cook wrote:
> On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> > +               unsigned long size, int prot, int type, unsigned long off)
> > +{
> > +       unsigned long map_addr;
> > +
> > +       /*
> > +        * If caller requests the mapping at a specific place, make sure we fail
> > +        * rather than potentially clobber an existing mapping which can have
> > +        * security consequences (e.g. smash over the stack area).
> > +        */
> > +       map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> > +       if (BAD_ADDR(map_addr))
> > +               return map_addr;
> > +
> > +       if ((type & MAP_FIXED) && map_addr != addr) {
> > +               pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> > +                               (void*)addr);
> 
> Is "info" loud enough? I actually think this should be a WARN_ONCE().

Does it make any sense to taint the kernel just because of this
condition? I can make it pr_error...

> > +               return -EAGAIN;
> > +       }
> > +
> > +       return map_addr;
> > +}
> > +
> >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> >                 struct elf_phdr *eppnt, int prot, int type,
> >                 unsigned long total_size)
> > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> 
> elf_map is redirected on metag -- it should probably have its vm_mmap
> calls adjust too.

Thanks for spotting this. I am not really familiar with metag. It seems
to clear MAP_FIXED already
	tcm_tag = tcm_lookup_tag(addr);

	if (tcm_tag != TCM_INVALID_TAG)
		type &= ~MAP_FIXED;

So if there is a tag the flag is cleared. I do not understand this code
(and git log doesn't help) but why is this MAP_FIXED code really needed?

> >         */
> >         if (total_size) {
> >                 total_size = ELF_PAGEALIGN(total_size);
> > -               map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> > +               map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
> >                 if (!BAD_ADDR(map_addr))
> >                         vm_munmap(map_addr+size, total_size-size);
> >         } else
> > -               map_addr = vm_mmap(filep, addr, size, prot, type, off);
> > +               map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
> >
> >         return(map_addr);
> >  }
> > @@ -1215,7 +1238,7 @@ static int load_elf_library(struct file *file)
> >                 eppnt++;
> >
> >         /* Now use mmap to map the library into memory. */
> > -       error = vm_mmap(file,
> > +       error = elf_vm_mmap(file,
> >                         ELF_PAGESTART(eppnt->p_vaddr),
> >                         (eppnt->p_filesz +
> >                          ELF_PAGEOFFSET(eppnt->p_vaddr)),
> > --
> > 2.14.2
> >
> 
> Otherwise, yeah, this should be good.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-10-16 19:00         ` Michal Hocko
  0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-16 19:00 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He, James Hogan,
	linux-metag-u79uwXL29TY76Z2rM5mHXA

[CCing metag people for the metag elf_map implementation specific. The thread
starts here http://lkml.kernel.org/r/20171016134446.19910-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org]

On Mon 16-10-17 09:39:14, Kees Cook wrote:
> On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
[...]
> > +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> > +               unsigned long size, int prot, int type, unsigned long off)
> > +{
> > +       unsigned long map_addr;
> > +
> > +       /*
> > +        * If caller requests the mapping at a specific place, make sure we fail
> > +        * rather than potentially clobber an existing mapping which can have
> > +        * security consequences (e.g. smash over the stack area).
> > +        */
> > +       map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> > +       if (BAD_ADDR(map_addr))
> > +               return map_addr;
> > +
> > +       if ((type & MAP_FIXED) && map_addr != addr) {
> > +               pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> > +                               (void*)addr);
> 
> Is "info" loud enough? I actually think this should be a WARN_ONCE().

Does it make any sense to taint the kernel just because of this
condition? I can make it pr_error...

> > +               return -EAGAIN;
> > +       }
> > +
> > +       return map_addr;
> > +}
> > +
> >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> >                 struct elf_phdr *eppnt, int prot, int type,
> >                 unsigned long total_size)
> > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> 
> elf_map is redirected on metag -- it should probably have its vm_mmap
> calls adjust too.

Thanks for spotting this. I am not really familiar with metag. It seems
to clear MAP_FIXED already
	tcm_tag = tcm_lookup_tag(addr);

	if (tcm_tag != TCM_INVALID_TAG)
		type &= ~MAP_FIXED;

So if there is a tag the flag is cleared. I do not understand this code
(and git log doesn't help) but why is this MAP_FIXED code really needed?

> >         */
> >         if (total_size) {
> >                 total_size = ELF_PAGEALIGN(total_size);
> > -               map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> > +               map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
> >                 if (!BAD_ADDR(map_addr))
> >                         vm_munmap(map_addr+size, total_size-size);
> >         } else
> > -               map_addr = vm_mmap(filep, addr, size, prot, type, off);
> > +               map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
> >
> >         return(map_addr);
> >  }
> > @@ -1215,7 +1238,7 @@ static int load_elf_library(struct file *file)
> >                 eppnt++;
> >
> >         /* Now use mmap to map the library into memory. */
> > -       error = vm_mmap(file,
> > +       error = elf_vm_mmap(file,
> >                         ELF_PAGESTART(eppnt->p_vaddr),
> >                         (eppnt->p_filesz +
> >                          ELF_PAGEOFFSET(eppnt->p_vaddr)),
> > --
> > 2.14.2
> >
> 
> Otherwise, yeah, this should be good.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-16 18:43       ` Michal Hocko
@ 2017-10-16 19:38         ` Kees Cook
  2017-10-17  9:04           ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Kees Cook @ 2017-10-16 19:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

On Mon, Oct 16, 2017 at 11:43 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Mon 16-10-17 09:44:31, Kees Cook wrote:
>> On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > From: Michal Hocko <mhocko@suse.com>
>> >
>> > eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") has added
>> > MAP_FIXED flag to the initial ET_DYN segment mapping which defines the
>> > randomized base for the PIE ELF segments. The thing is that MAP_FIXED
>> > shouldn't be really needed because the address is essentially random
>> > anyway. All other segments are mapped relatively to this base. elf_map
>> > makes sure that all segments will fit into the address space by
>> > enforcing total_mapping_size initial map.
>> >
>> > Why do we want to drop MAP_FIXED in the first place? Because it is error
>> > prone. If we happen to have an existing mapping in the requested range
>> > then we do not want to corrupt it silently. Without MAP_FIXED vm_mmap
>> > will simply fallback to another range. In reality there shouldn't be
>> > any conflicting mappings at this early exec stage so the mmap should
>> > succeed even without MAP_FIXED but subtle changes to the randomization
>> > can break this assumption so we should rather be careful here.
>> >
>> > Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
>> > Signed-off-by: Michal Hocko <mhocko@suse.com>
>> > ---
>> >  fs/binfmt_elf.c | 1 -
>> >  1 file changed, 1 deletion(-)
>> >
>> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> > index 09456e2add18..244cc30dfa24 100644
>> > --- a/fs/binfmt_elf.c
>> > +++ b/fs/binfmt_elf.c
>> > @@ -988,7 +988,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
>> >                                 load_bias = ELF_ET_DYN_BASE;
>> >                                 if (current->flags & PF_RANDOMIZE)
>> >                                         load_bias += arch_mmap_rnd();
>> > -                               elf_flags |= MAP_FIXED;
>>
>> If MAP_FIXED is being masked out in patch 1 (but used as a check for
>> correct position, I think this MAP_FIXED should _not_ be removed).
>> This provides for checking for the initial mapping. The failure mode
>> here would be to allow an attack to "push" a mapping away from some
>> overlapping region. This should not be allowed either: if the initial
>> mapping is "wrong", we should absolutely fail, otherwise we can be
>> introducing a silent reduction in PIE entropy.
>
> Do we really lose any entropy? We are using standard randomized mmap in
> that case. So we are randomized in either case. Are you worried that
> an attacker could tell the two cases and abuse some sort of offset2lib
> attack?

Not in the regular case. I'm suggesting that what your changes are
preparing for is an _unknown_ way to collide mappings. In that case,
we should be as defensive as we know how. And if we were to remove
MAP_FIXED here, it would allow an attacker (with some future method)
to potentially collapse a range of ASLR for execution, since missing
MAP_FIXED here would silently move a mapping somewhere else. So we
should keep MAP_FIXED, as any collision would indicate an unknown
method of crashing an exec into something else.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-16 19:00         ` Michal Hocko
@ 2017-10-16 20:02           ` James Hogan
  -1 siblings, 0 replies; 39+ messages in thread
From: James Hogan @ 2017-10-16 20:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, LKML, Linus Torvalds, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar, Baoquan He, linux-metag

[-- Attachment #1: Type: text/plain, Size: 2483 bytes --]

On Mon, Oct 16, 2017 at 09:00:47PM +0200, Michal Hocko wrote:
> [CCing metag people for the metag elf_map implementation specific. The thread
> starts here http://lkml.kernel.org/r/20171016134446.19910-1-mhocko@kernel.org]
> 
> On Mon 16-10-17 09:39:14, Kees Cook wrote:
> > On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > +               return -EAGAIN;
> > > +       }
> > > +
> > > +       return map_addr;
> > > +}
> > > +
> > >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> > >                 struct elf_phdr *eppnt, int prot, int type,
> > >                 unsigned long total_size)
> > > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> > 
> > elf_map is redirected on metag -- it should probably have its vm_mmap
> > calls adjust too.
> 
> Thanks for spotting this. I am not really familiar with metag. It seems
> to clear MAP_FIXED already
> 	tcm_tag = tcm_lookup_tag(addr);
> 
> 	if (tcm_tag != TCM_INVALID_TAG)
> 		type &= ~MAP_FIXED;
> 
> So if there is a tag the flag is cleared. I do not understand this code
> (and git log doesn't help) but why is this MAP_FIXED code really needed?

This function was added to the metag port in mid-2010 to support ELFs
with tightly coupled memory (TCM) segments, for example metag "core"
memories are at fixed virtual addresses and aren't MMU mappable (i.e.
globally accessible), and are outside of the usual userland address
range, but are as fast as cache. The commit message says this:

> Override the definition of the elf_map() function to special case
> sections that are loaded at the address of the internal memories.
> If we have such a section, map it at a different address and copy
> the contents of the section into the appropriate memory.

So yeh, it looks like if the section is meant to use TCM based on the
virtual address, it drops MAP_FIXED so that the vm_mmap can succeed
(because its outside the normally valid range), and then copies it
directly to the desired TCM so the program can use it.

Hope that helps add some context to understand whats needed.

There was some description of this in an ELCE-2010 talk by the original
author Will Newton that may also be of interest [1].

Cheers
James

[1] http://free-electrons.com/blog/elce-2010-videos/
    "Exploiting On-chip Memories in Embedded Linux Applications"
    See slides about "core memories".

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-10-16 20:02           ` James Hogan
  0 siblings, 0 replies; 39+ messages in thread
From: James Hogan @ 2017-10-16 20:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, LKML, Linus Torvalds, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar, Baoquan He, linux-metag

[-- Attachment #1: Type: text/plain, Size: 2483 bytes --]

On Mon, Oct 16, 2017 at 09:00:47PM +0200, Michal Hocko wrote:
> [CCing metag people for the metag elf_map implementation specific. The thread
> starts here http://lkml.kernel.org/r/20171016134446.19910-1-mhocko@kernel.org]
> 
> On Mon 16-10-17 09:39:14, Kees Cook wrote:
> > On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > +               return -EAGAIN;
> > > +       }
> > > +
> > > +       return map_addr;
> > > +}
> > > +
> > >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> > >                 struct elf_phdr *eppnt, int prot, int type,
> > >                 unsigned long total_size)
> > > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> > 
> > elf_map is redirected on metag -- it should probably have its vm_mmap
> > calls adjust too.
> 
> Thanks for spotting this. I am not really familiar with metag. It seems
> to clear MAP_FIXED already
> 	tcm_tag = tcm_lookup_tag(addr);
> 
> 	if (tcm_tag != TCM_INVALID_TAG)
> 		type &= ~MAP_FIXED;
> 
> So if there is a tag the flag is cleared. I do not understand this code
> (and git log doesn't help) but why is this MAP_FIXED code really needed?

This function was added to the metag port in mid-2010 to support ELFs
with tightly coupled memory (TCM) segments, for example metag "core"
memories are at fixed virtual addresses and aren't MMU mappable (i.e.
globally accessible), and are outside of the usual userland address
range, but are as fast as cache. The commit message says this:

> Override the definition of the elf_map() function to special case
> sections that are loaded at the address of the internal memories.
> If we have such a section, map it at a different address and copy
> the contents of the section into the appropriate memory.

So yeh, it looks like if the section is meant to use TCM based on the
virtual address, it drops MAP_FIXED so that the vm_mmap can succeed
(because its outside the normally valid range), and then copies it
directly to the desired TCM so the program can use it.

Hope that helps add some context to understand whats needed.

There was some description of this in an ELCE-2010 talk by the original
author Will Newton that may also be of interest [1].

Cheers
James

[1] http://free-electrons.com/blog/elce-2010-videos/
    "Exploiting On-chip Memories in Embedded Linux Applications"
    See slides about "core memories".

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-16 20:02           ` James Hogan
@ 2017-10-17  7:37             ` Michal Hocko
  -1 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-17  7:37 UTC (permalink / raw)
  To: James Hogan
  Cc: Kees Cook, LKML, Linus Torvalds, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar, Baoquan He, linux-metag

On Mon 16-10-17 21:02:09, James Hogan wrote:
> On Mon, Oct 16, 2017 at 09:00:47PM +0200, Michal Hocko wrote:
> > [CCing metag people for the metag elf_map implementation specific. The thread
> > starts here http://lkml.kernel.org/r/20171016134446.19910-1-mhocko@kernel.org]
> > 
> > On Mon 16-10-17 09:39:14, Kees Cook wrote:
> > > On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > > +               return -EAGAIN;
> > > > +       }
> > > > +
> > > > +       return map_addr;
> > > > +}
> > > > +
> > > >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> > > >                 struct elf_phdr *eppnt, int prot, int type,
> > > >                 unsigned long total_size)
> > > > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> > > 
> > > elf_map is redirected on metag -- it should probably have its vm_mmap
> > > calls adjust too.
> > 
> > Thanks for spotting this. I am not really familiar with metag. It seems
> > to clear MAP_FIXED already
> > 	tcm_tag = tcm_lookup_tag(addr);
> > 
> > 	if (tcm_tag != TCM_INVALID_TAG)
> > 		type &= ~MAP_FIXED;
> > 
> > So if there is a tag the flag is cleared. I do not understand this code
> > (and git log doesn't help) but why is this MAP_FIXED code really needed?
> 
> This function was added to the metag port in mid-2010 to support ELFs
> with tightly coupled memory (TCM) segments, for example metag "core"
> memories are at fixed virtual addresses and aren't MMU mappable (i.e.
> globally accessible), and are outside of the usual userland address
> range, but are as fast as cache. The commit message says this:
> 
> > Override the definition of the elf_map() function to special case
> > sections that are loaded at the address of the internal memories.
> > If we have such a section, map it at a different address and copy
> > the contents of the section into the appropriate memory.
> 
> So yeh, it looks like if the section is meant to use TCM based on the
> virtual address, it drops MAP_FIXED so that the vm_mmap can succeed
> (because its outside the normally valid range), and then copies it
> directly to the desired TCM so the program can use it.
> 
> Hope that helps add some context to understand whats needed.

Hmm, so IIUC then we need the same fix as
http://lkml.kernel.org/r/20171016134446.19910-2-mhocko@kernel.org,
right?

This would be something like. I wanted to share elf_vm_mmap but didn't
find a proper place to not cause include dependency hell so I balied out
to c&p.
---
diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index c4606ce743d2..b20596b4c4c2 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -378,6 +378,29 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpu)
 
 #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
 
+static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
+		unsigned long size, int prot, int type, unsigned long off)
+{
+	unsigned long map_addr;
+
+	/*
+	 * If caller requests the mapping at a specific place, make sure we fail
+	 * rather than potentially clobber an existing mapping which can have
+	 * security consequences (e.g. smash over the stack area).
+	 */
+	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
+	if (BAD_ADDR(map_addr))
+		return map_addr;
+
+	if ((type & MAP_FIXED) && map_addr != addr) {
+		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				(void*)addr);
+		return -EAGAIN;
+	}
+
+	return map_addr;
+}
+
 unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 			      struct elf_phdr *eppnt, int prot, int type,
 			      unsigned long total_size)
@@ -410,11 +433,11 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	*/
 	if (total_size) {
 		total_size = ELF_PAGEALIGN(total_size);
-		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
+		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
 		if (!BAD_ADDR(map_addr))
 			vm_munmap(map_addr+size, total_size-size);
 	} else
-		map_addr = vm_mmap(filep, addr, size, prot, type, off);
+		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
 
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-10-17  7:37             ` Michal Hocko
  0 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-17  7:37 UTC (permalink / raw)
  To: James Hogan
  Cc: Kees Cook, LKML, Linus Torvalds, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar, Baoquan He,
	linux-metag-u79uwXL29TY76Z2rM5mHXA

On Mon 16-10-17 21:02:09, James Hogan wrote:
> On Mon, Oct 16, 2017 at 09:00:47PM +0200, Michal Hocko wrote:
> > [CCing metag people for the metag elf_map implementation specific. The thread
> > starts here http://lkml.kernel.org/r/20171016134446.19910-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org]
> > 
> > On Mon 16-10-17 09:39:14, Kees Cook wrote:
> > > On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > > > +               return -EAGAIN;
> > > > +       }
> > > > +
> > > > +       return map_addr;
> > > > +}
> > > > +
> > > >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> > > >                 struct elf_phdr *eppnt, int prot, int type,
> > > >                 unsigned long total_size)
> > > > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> > > 
> > > elf_map is redirected on metag -- it should probably have its vm_mmap
> > > calls adjust too.
> > 
> > Thanks for spotting this. I am not really familiar with metag. It seems
> > to clear MAP_FIXED already
> > 	tcm_tag = tcm_lookup_tag(addr);
> > 
> > 	if (tcm_tag != TCM_INVALID_TAG)
> > 		type &= ~MAP_FIXED;
> > 
> > So if there is a tag the flag is cleared. I do not understand this code
> > (and git log doesn't help) but why is this MAP_FIXED code really needed?
> 
> This function was added to the metag port in mid-2010 to support ELFs
> with tightly coupled memory (TCM) segments, for example metag "core"
> memories are at fixed virtual addresses and aren't MMU mappable (i.e.
> globally accessible), and are outside of the usual userland address
> range, but are as fast as cache. The commit message says this:
> 
> > Override the definition of the elf_map() function to special case
> > sections that are loaded at the address of the internal memories.
> > If we have such a section, map it at a different address and copy
> > the contents of the section into the appropriate memory.
> 
> So yeh, it looks like if the section is meant to use TCM based on the
> virtual address, it drops MAP_FIXED so that the vm_mmap can succeed
> (because its outside the normally valid range), and then copies it
> directly to the desired TCM so the program can use it.
> 
> Hope that helps add some context to understand whats needed.

Hmm, so IIUC then we need the same fix as
http://lkml.kernel.org/r/20171016134446.19910-2-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
right?

This would be something like. I wanted to share elf_vm_mmap but didn't
find a proper place to not cause include dependency hell so I balied out
to c&p.
---
diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index c4606ce743d2..b20596b4c4c2 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -378,6 +378,29 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpu)
 
 #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
 
+static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
+		unsigned long size, int prot, int type, unsigned long off)
+{
+	unsigned long map_addr;
+
+	/*
+	 * If caller requests the mapping at a specific place, make sure we fail
+	 * rather than potentially clobber an existing mapping which can have
+	 * security consequences (e.g. smash over the stack area).
+	 */
+	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
+	if (BAD_ADDR(map_addr))
+		return map_addr;
+
+	if ((type & MAP_FIXED) && map_addr != addr) {
+		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				(void*)addr);
+		return -EAGAIN;
+	}
+
+	return map_addr;
+}
+
 unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 			      struct elf_phdr *eppnt, int prot, int type,
 			      unsigned long total_size)
@@ -410,11 +433,11 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	*/
 	if (total_size) {
 		total_size = ELF_PAGEALIGN(total_size);
-		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
+		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
 		if (!BAD_ADDR(map_addr))
 			vm_munmap(map_addr+size, total_size-size);
 	} else
-		map_addr = vm_mmap(filep, addr, size, prot, type, off);
+		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
 
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-17  7:37             ` Michal Hocko
@ 2017-10-17  8:35               ` James Hogan
  -1 siblings, 0 replies; 39+ messages in thread
From: James Hogan @ 2017-10-17  8:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, LKML, Linus Torvalds, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar, Baoquan He, linux-metag

[-- Attachment #1: Type: text/plain, Size: 4747 bytes --]

On Tue, Oct 17, 2017 at 09:37:48AM +0200, Michal Hocko wrote:
> On Mon 16-10-17 21:02:09, James Hogan wrote:
> > On Mon, Oct 16, 2017 at 09:00:47PM +0200, Michal Hocko wrote:
> > > [CCing metag people for the metag elf_map implementation specific. The thread
> > > starts here http://lkml.kernel.org/r/20171016134446.19910-1-mhocko@kernel.org]
> > > 
> > > On Mon 16-10-17 09:39:14, Kees Cook wrote:
> > > > On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > > > +               return -EAGAIN;
> > > > > +       }
> > > > > +
> > > > > +       return map_addr;
> > > > > +}
> > > > > +
> > > > >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> > > > >                 struct elf_phdr *eppnt, int prot, int type,
> > > > >                 unsigned long total_size)
> > > > > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> > > > 
> > > > elf_map is redirected on metag -- it should probably have its vm_mmap
> > > > calls adjust too.
> > > 
> > > Thanks for spotting this. I am not really familiar with metag. It seems
> > > to clear MAP_FIXED already
> > > 	tcm_tag = tcm_lookup_tag(addr);
> > > 
> > > 	if (tcm_tag != TCM_INVALID_TAG)
> > > 		type &= ~MAP_FIXED;
> > > 
> > > So if there is a tag the flag is cleared. I do not understand this code
> > > (and git log doesn't help) but why is this MAP_FIXED code really needed?
> > 
> > This function was added to the metag port in mid-2010 to support ELFs
> > with tightly coupled memory (TCM) segments, for example metag "core"
> > memories are at fixed virtual addresses and aren't MMU mappable (i.e.
> > globally accessible), and are outside of the usual userland address
> > range, but are as fast as cache. The commit message says this:
> > 
> > > Override the definition of the elf_map() function to special case
> > > sections that are loaded at the address of the internal memories.
> > > If we have such a section, map it at a different address and copy
> > > the contents of the section into the appropriate memory.
> > 
> > So yeh, it looks like if the section is meant to use TCM based on the
> > virtual address, it drops MAP_FIXED so that the vm_mmap can succeed
> > (because its outside the normally valid range), and then copies it
> > directly to the desired TCM so the program can use it.
> > 
> > Hope that helps add some context to understand whats needed.
> 
> Hmm, so IIUC then we need the same fix as
> http://lkml.kernel.org/r/20171016134446.19910-2-mhocko@kernel.org,
> right?
> 
> This would be something like. I wanted to share elf_vm_mmap but didn't
> find a proper place to not cause include dependency hell so I balied out
> to c&p.
> ---
> diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
> index c4606ce743d2..b20596b4c4c2 100644
> --- a/arch/metag/kernel/process.c
> +++ b/arch/metag/kernel/process.c
> @@ -378,6 +378,29 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpu)
>  
>  #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
>  
> +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> +		unsigned long size, int prot, int type, unsigned long off)
> +{
> +	unsigned long map_addr;
> +
> +	/*
> +	 * If caller requests the mapping at a specific place, make sure we fail
> +	 * rather than potentially clobber an existing mapping which can have
> +	 * security consequences (e.g. smash over the stack area).
> +	 */
> +	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> +	if (BAD_ADDR(map_addr))
> +		return map_addr;
> +
> +	if ((type & MAP_FIXED) && map_addr != addr) {
> +		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +				(void*)addr);
> +		return -EAGAIN;
> +	}
> +
> +	return map_addr;
> +}
> +
>  unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
>  			      struct elf_phdr *eppnt, int prot, int type,
>  			      unsigned long total_size)
> @@ -410,11 +433,11 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
>  	*/
>  	if (total_size) {
>  		total_size = ELF_PAGEALIGN(total_size);
> -		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
>  		if (!BAD_ADDR(map_addr))
>  			vm_munmap(map_addr+size, total_size-size);
>  	} else
> -		map_addr = vm_mmap(filep, addr, size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
>  
>  	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
>  		struct tcm_allocation *tcm;

Yeh that looks reasonable to me.

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-10-17  8:35               ` James Hogan
  0 siblings, 0 replies; 39+ messages in thread
From: James Hogan @ 2017-10-17  8:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kees Cook, LKML, Linus Torvalds, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar, Baoquan He, linux-metag

[-- Attachment #1: Type: text/plain, Size: 4747 bytes --]

On Tue, Oct 17, 2017 at 09:37:48AM +0200, Michal Hocko wrote:
> On Mon 16-10-17 21:02:09, James Hogan wrote:
> > On Mon, Oct 16, 2017 at 09:00:47PM +0200, Michal Hocko wrote:
> > > [CCing metag people for the metag elf_map implementation specific. The thread
> > > starts here http://lkml.kernel.org/r/20171016134446.19910-1-mhocko@kernel.org]
> > > 
> > > On Mon 16-10-17 09:39:14, Kees Cook wrote:
> > > > On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > > > +               return -EAGAIN;
> > > > > +       }
> > > > > +
> > > > > +       return map_addr;
> > > > > +}
> > > > > +
> > > > >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> > > > >                 struct elf_phdr *eppnt, int prot, int type,
> > > > >                 unsigned long total_size)
> > > > > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> > > > 
> > > > elf_map is redirected on metag -- it should probably have its vm_mmap
> > > > calls adjust too.
> > > 
> > > Thanks for spotting this. I am not really familiar with metag. It seems
> > > to clear MAP_FIXED already
> > > 	tcm_tag = tcm_lookup_tag(addr);
> > > 
> > > 	if (tcm_tag != TCM_INVALID_TAG)
> > > 		type &= ~MAP_FIXED;
> > > 
> > > So if there is a tag the flag is cleared. I do not understand this code
> > > (and git log doesn't help) but why is this MAP_FIXED code really needed?
> > 
> > This function was added to the metag port in mid-2010 to support ELFs
> > with tightly coupled memory (TCM) segments, for example metag "core"
> > memories are at fixed virtual addresses and aren't MMU mappable (i.e.
> > globally accessible), and are outside of the usual userland address
> > range, but are as fast as cache. The commit message says this:
> > 
> > > Override the definition of the elf_map() function to special case
> > > sections that are loaded at the address of the internal memories.
> > > If we have such a section, map it at a different address and copy
> > > the contents of the section into the appropriate memory.
> > 
> > So yeh, it looks like if the section is meant to use TCM based on the
> > virtual address, it drops MAP_FIXED so that the vm_mmap can succeed
> > (because its outside the normally valid range), and then copies it
> > directly to the desired TCM so the program can use it.
> > 
> > Hope that helps add some context to understand whats needed.
> 
> Hmm, so IIUC then we need the same fix as
> http://lkml.kernel.org/r/20171016134446.19910-2-mhocko@kernel.org,
> right?
> 
> This would be something like. I wanted to share elf_vm_mmap but didn't
> find a proper place to not cause include dependency hell so I balied out
> to c&p.
> ---
> diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
> index c4606ce743d2..b20596b4c4c2 100644
> --- a/arch/metag/kernel/process.c
> +++ b/arch/metag/kernel/process.c
> @@ -378,6 +378,29 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpu)
>  
>  #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
>  
> +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> +		unsigned long size, int prot, int type, unsigned long off)
> +{
> +	unsigned long map_addr;
> +
> +	/*
> +	 * If caller requests the mapping at a specific place, make sure we fail
> +	 * rather than potentially clobber an existing mapping which can have
> +	 * security consequences (e.g. smash over the stack area).
> +	 */
> +	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> +	if (BAD_ADDR(map_addr))
> +		return map_addr;
> +
> +	if ((type & MAP_FIXED) && map_addr != addr) {
> +		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +				(void*)addr);
> +		return -EAGAIN;
> +	}
> +
> +	return map_addr;
> +}
> +
>  unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
>  			      struct elf_phdr *eppnt, int prot, int type,
>  			      unsigned long total_size)
> @@ -410,11 +433,11 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
>  	*/
>  	if (total_size) {
>  		total_size = ELF_PAGEALIGN(total_size);
> -		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
>  		if (!BAD_ADDR(map_addr))
>  			vm_munmap(map_addr+size, total_size-size);
>  	} else
> -		map_addr = vm_mmap(filep, addr, size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
>  
>  	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
>  		struct tcm_allocation *tcm;

Yeh that looks reasonable to me.

Cheers
James

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-17  8:35               ` James Hogan
  (?)
@ 2017-10-17  8:56               ` Michal Hocko
  -1 siblings, 0 replies; 39+ messages in thread
From: Michal Hocko @ 2017-10-17  8:56 UTC (permalink / raw)
  To: James Hogan
  Cc: Kees Cook, LKML, Linus Torvalds, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar, Baoquan He, linux-metag

On Tue 17-10-17 09:35:57, James Hogan wrote:
> On Tue, Oct 17, 2017 at 09:37:48AM +0200, Michal Hocko wrote:
[...]
> > This would be something like. I wanted to share elf_vm_mmap but didn't
> > find a proper place to not cause include dependency hell so I balied out
> > to c&p.
> > ---
> > diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
> > index c4606ce743d2..b20596b4c4c2 100644
> > --- a/arch/metag/kernel/process.c
> > +++ b/arch/metag/kernel/process.c
> > @@ -378,6 +378,29 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpu)
> >  
> >  #define BAD_ADDR(x) ((unsigned long)(x) >= TASK_SIZE)
> >  
> > +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> > +		unsigned long size, int prot, int type, unsigned long off)
> > +{
> > +	unsigned long map_addr;
> > +
> > +	/*
> > +	 * If caller requests the mapping at a specific place, make sure we fail
> > +	 * rather than potentially clobber an existing mapping which can have
> > +	 * security consequences (e.g. smash over the stack area).
> > +	 */
> > +	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> > +	if (BAD_ADDR(map_addr))
> > +		return map_addr;
> > +
> > +	if ((type & MAP_FIXED) && map_addr != addr) {
> > +		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> > +				(void*)addr);
> > +		return -EAGAIN;
> > +	}
> > +
> > +	return map_addr;
> > +}
> > +
> >  unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
> >  			      struct elf_phdr *eppnt, int prot, int type,
> >  			      unsigned long total_size)
> > @@ -410,11 +433,11 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
> >  	*/
> >  	if (total_size) {
> >  		total_size = ELF_PAGEALIGN(total_size);
> > -		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> > +		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
> >  		if (!BAD_ADDR(map_addr))
> >  			vm_munmap(map_addr+size, total_size-size);
> >  	} else
> > -		map_addr = vm_mmap(filep, addr, size, prot, type, off);
> > +		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
> >  
> >  	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
> >  		struct tcm_allocation *tcm;
> 
> Yeh that looks reasonable to me.

Thanks for double checking. I will make sure to CC you when reposting
the patch after other concerns sort out.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-16 19:38         ` Kees Cook
@ 2017-10-17  9:04           ` Michal Hocko
  2017-10-17 20:01             ` Kees Cook
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2017-10-17  9:04 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

On Mon 16-10-17 12:38:19, Kees Cook wrote:
> On Mon, Oct 16, 2017 at 11:43 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Mon 16-10-17 09:44:31, Kees Cook wrote:
> >> On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
> >> > From: Michal Hocko <mhocko@suse.com>
> >> >
> >> > eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") has added
> >> > MAP_FIXED flag to the initial ET_DYN segment mapping which defines the
> >> > randomized base for the PIE ELF segments. The thing is that MAP_FIXED
> >> > shouldn't be really needed because the address is essentially random
> >> > anyway. All other segments are mapped relatively to this base. elf_map
> >> > makes sure that all segments will fit into the address space by
> >> > enforcing total_mapping_size initial map.
> >> >
> >> > Why do we want to drop MAP_FIXED in the first place? Because it is error
> >> > prone. If we happen to have an existing mapping in the requested range
> >> > then we do not want to corrupt it silently. Without MAP_FIXED vm_mmap
> >> > will simply fallback to another range. In reality there shouldn't be
> >> > any conflicting mappings at this early exec stage so the mmap should
> >> > succeed even without MAP_FIXED but subtle changes to the randomization
> >> > can break this assumption so we should rather be careful here.
> >> >
> >> > Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
> >> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> >> > ---
> >> >  fs/binfmt_elf.c | 1 -
> >> >  1 file changed, 1 deletion(-)
> >> >
> >> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> >> > index 09456e2add18..244cc30dfa24 100644
> >> > --- a/fs/binfmt_elf.c
> >> > +++ b/fs/binfmt_elf.c
> >> > @@ -988,7 +988,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
> >> >                                 load_bias = ELF_ET_DYN_BASE;
> >> >                                 if (current->flags & PF_RANDOMIZE)
> >> >                                         load_bias += arch_mmap_rnd();
> >> > -                               elf_flags |= MAP_FIXED;
> >>
> >> If MAP_FIXED is being masked out in patch 1 (but used as a check for
> >> correct position, I think this MAP_FIXED should _not_ be removed).
> >> This provides for checking for the initial mapping. The failure mode
> >> here would be to allow an attack to "push" a mapping away from some
> >> overlapping region. This should not be allowed either: if the initial
> >> mapping is "wrong", we should absolutely fail, otherwise we can be
> >> introducing a silent reduction in PIE entropy.
> >
> > Do we really lose any entropy? We are using standard randomized mmap in
> > that case. So we are randomized in either case. Are you worried that
> > an attacker could tell the two cases and abuse some sort of offset2lib
> > attack?
> 
> Not in the regular case. I'm suggesting that what your changes are
> preparing for is an _unknown_ way to collide mappings. In that case,
> we should be as defensive as we know how. And if we were to remove
> MAP_FIXED here, it would allow an attacker (with some future method)
> to potentially collapse a range of ASLR for execution, since missing
> MAP_FIXED here would silently move a mapping somewhere else. So we
> should keep MAP_FIXED, as any collision would indicate an unknown
> method of crashing an exec into something else.

I am sorry but I do not follow. I could see how offset2lib would be a
concern but you seem to be thinking about a different scenario. Could
you be more specific please.

I am not insisting on this patch but it seems to me is just makes a
recoverable state a failure.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-16 13:44   ` [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko
  2017-10-16 16:39     ` Kees Cook
@ 2017-10-17 12:26     ` Baoquan He
  2017-10-17 12:56       ` Michal Hocko
  1 sibling, 1 reply; 39+ messages in thread
From: Baoquan He @ 2017-10-17 12:26 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar, Michal Hocko

Hi Michal,

Earlier I posted a patchset to clean up these in a different way, but
patches sent to your below mail address are all rejected.

<mhocko@kerne.org>: host mail.kerne.org[104.131.33.237] said: 454 4.1.1
    <mhocko@kerne.org>: Recipient address rejected: User unknown in virtual
    mailbox table (in reply to RCPT TO command)

On 10/16/17 at 03:44pm, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 6466153f2bf0..09456e2add18 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -341,6 +341,29 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
>  
>  #ifndef elf_map
>  
> +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> +		unsigned long size, int prot, int type, unsigned long off)
> +{
> +	unsigned long map_addr;
> +
> +	/*
> +	 * If caller requests the mapping at a specific place, make sure we fail
> +	 * rather than potentially clobber an existing mapping which can have
> +	 * security consequences (e.g. smash over the stack area).
> +	 */
> +	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> +	if (BAD_ADDR(map_addr))
> +		return map_addr;
> +
> +	if ((type & MAP_FIXED) && map_addr != addr) {
> +		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +				(void*)addr);
> +		return -EAGAIN;
> +	}
> +
> +	return map_addr;
> +}
> +
>  static unsigned long elf_map(struct file *filep, unsigned long addr,
>  		struct elf_phdr *eppnt, int prot, int type,
>  		unsigned long total_size)
> @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
>  	*/
>  	if (total_size) {
>  		total_size = ELF_PAGEALIGN(total_size);
> -		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
>  		if (!BAD_ADDR(map_addr))
>  			vm_munmap(map_addr+size, total_size-size);

Here we will still take the map total, then unmap the rest way.

I am wondering why we don't fix those issues we figured out, but add 
another level of wrapper elf_vm_mmap() to hack it. So we will have
elf_map() -> elf_vm_mmap() -> vm_mmap(), not even counting
vm_mmap_pgoff(), then finally enter into do_mmap_pgoff(), to do the
maping for elf program.

Thanks
Baoquan


>  	} else
> -		map_addr = vm_mmap(filep, addr, size, prot, type, off);
> +		map_addr = elf_vm_mmap(filep, addr, size, prot, type, off);
>  
>  	return(map_addr);
>  }
> @@ -1215,7 +1238,7 @@ static int load_elf_library(struct file *file)
>  		eppnt++;
>  
>  	/* Now use mmap to map the library into memory. */
> -	error = vm_mmap(file,
> +	error = elf_vm_mmap(file,
>  			ELF_PAGESTART(eppnt->p_vaddr),
>  			(eppnt->p_filesz +
>  			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
> -- 
> 2.14.2
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-17 12:26     ` Baoquan He
@ 2017-10-17 12:56       ` Michal Hocko
  2017-10-17 13:22         ` Baoquan He
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2017-10-17 12:56 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar

On Tue 17-10-17 20:26:14, Baoquan He wrote:
> Hi Michal,
> 
> Earlier I posted a patchset to clean up these in a different way, but
> patches sent to your below mail address are all rejected.
> 
> <mhocko@kerne.org>: host mail.kerne.org[104.131.33.237] said: 454 4.1.1
>     <mhocko@kerne.org>: Recipient address rejected: User unknown in virtual
>     mailbox table (in reply to RCPT TO command)

yes, there was a typo in the email address which I've fixed up in the
reply... Your cleanup was mostly about replacing vm_mmap by
get_unmaped_area which is an independent issue to what I am trying to
achieve here.

> On 10/16/17 at 03:44pm, Michal Hocko wrote:
> > From: Michal Hocko <mhocko@suse.com>
> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > index 6466153f2bf0..09456e2add18 100644
> > --- a/fs/binfmt_elf.c
> > +++ b/fs/binfmt_elf.c
> > @@ -341,6 +341,29 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
> >  
> >  #ifndef elf_map
> >  
> > +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> > +		unsigned long size, int prot, int type, unsigned long off)
> > +{
> > +	unsigned long map_addr;
> > +
> > +	/*
> > +	 * If caller requests the mapping at a specific place, make sure we fail
> > +	 * rather than potentially clobber an existing mapping which can have
> > +	 * security consequences (e.g. smash over the stack area).
> > +	 */
> > +	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> > +	if (BAD_ADDR(map_addr))
> > +		return map_addr;
> > +
> > +	if ((type & MAP_FIXED) && map_addr != addr) {
> > +		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> > +				(void*)addr);
> > +		return -EAGAIN;
> > +	}
> > +
> > +	return map_addr;
> > +}
> > +
> >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> >  		struct elf_phdr *eppnt, int prot, int type,
> >  		unsigned long total_size)
> > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> >  	*/
> >  	if (total_size) {
> >  		total_size = ELF_PAGEALIGN(total_size);
> > -		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> > +		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
> >  		if (!BAD_ADDR(map_addr))
> >  			vm_munmap(map_addr+size, total_size-size);
> 
> Here we will still take the map total, then unmap the rest way.
> 
> I am wondering why we don't fix those issues we figured out, but add 
> another level of wrapper elf_vm_mmap() to hack it.

Could you be more specific please?

> So we will have
> elf_map() -> elf_vm_mmap() -> vm_mmap(), not even counting
> vm_mmap_pgoff(), then finally enter into do_mmap_pgoff(), to do the
> maping for elf program.

I've added another level of helper to keep the code in elf_map saner. It
is quite complex already.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-17 12:56       ` Michal Hocko
@ 2017-10-17 13:22         ` Baoquan He
  2017-10-17 13:33           ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Baoquan He @ 2017-10-17 13:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar

On 10/17/17 at 02:56pm, Michal Hocko wrote:
> On Tue 17-10-17 20:26:14, Baoquan He wrote:
> > Hi Michal,
> > 
> > Earlier I posted a patchset to clean up these in a different way, but
> > patches sent to your below mail address are all rejected.
> > 
> > <mhocko@kerne.org>: host mail.kerne.org[104.131.33.237] said: 454 4.1.1
> >     <mhocko@kerne.org>: Recipient address rejected: User unknown in virtual
> >     mailbox table (in reply to RCPT TO command)
> 
> yes, there was a typo in the email address which I've fixed up in the
> reply... Your cleanup was mostly about replacing vm_mmap by
> get_unmaped_area which is an independent issue to what I am trying to
> achieve here.

Oops. Yes, I must have typed your address by hand, but not copying
from the previous discussion thread. Sorry for that.

> 
> > On 10/16/17 at 03:44pm, Michal Hocko wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> > > index 6466153f2bf0..09456e2add18 100644
> > > --- a/fs/binfmt_elf.c
> > > +++ b/fs/binfmt_elf.c
> > > @@ -341,6 +341,29 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
> > >  
> > >  #ifndef elf_map
> > >  
> > > +static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr,
> > > +		unsigned long size, int prot, int type, unsigned long off)
> > > +{
> > > +	unsigned long map_addr;
> > > +
> > > +	/*
> > > +	 * If caller requests the mapping at a specific place, make sure we fail
> > > +	 * rather than potentially clobber an existing mapping which can have
> > > +	 * security consequences (e.g. smash over the stack area).
> > > +	 */
> > > +	map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off);
> > > +	if (BAD_ADDR(map_addr))
> > > +		return map_addr;
> > > +
> > > +	if ((type & MAP_FIXED) && map_addr != addr) {
> > > +		pr_info("Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> > > +				(void*)addr);
> > > +		return -EAGAIN;
> > > +	}
> > > +
> > > +	return map_addr;
> > > +}
> > > +
> > >  static unsigned long elf_map(struct file *filep, unsigned long addr,
> > >  		struct elf_phdr *eppnt, int prot, int type,
> > >  		unsigned long total_size)
> > > @@ -366,11 +389,11 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
> > >  	*/
> > >  	if (total_size) {
> > >  		total_size = ELF_PAGEALIGN(total_size);
> > > -		map_addr = vm_mmap(filep, addr, total_size, prot, type, off);
> > > +		map_addr = elf_vm_mmap(filep, addr, total_size, prot, type, off);
> > >  		if (!BAD_ADDR(map_addr))
> > >  			vm_munmap(map_addr+size, total_size-size);
> > 
> > Here we will still take the map total, then unmap the rest way.
> > 
> > I am wondering why we don't fix those issues we figured out, but add 
> > another level of wrapper elf_vm_mmap() to hack it.
> 
> Could you be more specific please?

In the current code, obviously the total size for PIE loading is
unnecessary since it's MAP_FIXED, and it's initial mapping from
ELF_ET_DYN_BASE, I can't see any chance it will fail to map. Though you
use a trick in this new elf_vm_mmap() to check if it failed on mapping,
in essence it's a MAP_FIXED.

And as I said before, then we will still have the
ungraceful 'mapping total'|'unmapping the rest' method.

If from a new code reader's point of view, the newly added elf_vm_mmap()
plus the existing elf_map() may confuse people more.

Just personnal opinion. Anyway, your patches are better, my patches have
been resting there for several days but no comment. :-) I may still need
strength the understanding about elf loading code.

Thanks
Baoquan

> 
> > So we will have
> > elf_map() -> elf_vm_mmap() -> vm_mmap(), not even counting
> > vm_mmap_pgoff(), then finally enter into do_mmap_pgoff(), to do the
> > maping for elf program.
> 
> I've added another level of helper to keep the code in elf_map saner. It
> is quite complex already.
> -- 
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-17 13:22         ` Baoquan He
@ 2017-10-17 13:33           ` Michal Hocko
  2017-10-17 13:42             ` Baoquan He
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2017-10-17 13:33 UTC (permalink / raw)
  To: Baoquan He
  Cc: LKML, Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar

On Tue 17-10-17 21:22:46, Baoquan He wrote:
[...]
> And as I said before, then we will still have the
> ungraceful 'mapping total'|'unmapping the rest' method.

yes I haven't touched this part and I guess you are right it can be
removed now, but this was not my primary concern to adddress so there is
still some room for improvements on top. If you want to refactor that
part, then feel free to do so but I think that we should address the
MAP_FIXED part first.

> If from a new code reader's point of view, the newly added elf_vm_mmap()
> plus the existing elf_map() may confuse people more.

Heh, I strongly suspect that this indirection is the last thing that
would make the code hard to grasp. The whole load_elf_binary is a giant
and incomprehensible mess.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-10-17 13:33           ` Michal Hocko
@ 2017-10-17 13:42             ` Baoquan He
  0 siblings, 0 replies; 39+ messages in thread
From: Baoquan He @ 2017-10-17 13:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Kees Cook, Jiri Kosina, Al Viro,
	Oleg Nesterov, Ingo Molnar

On 10/17/17 at 03:33pm, Michal Hocko wrote:
> On Tue 17-10-17 21:22:46, Baoquan He wrote:
> [...]
> > And as I said before, then we will still have the
> > ungraceful 'mapping total'|'unmapping the rest' method.
> 
> yes I haven't touched this part and I guess you are right it can be
> removed now, but this was not my primary concern to adddress so there is
> still some room for improvements on top. If you want to refactor that
> part, then feel free to do so but I think that we should address the
> MAP_FIXED part first.

OK, I am fine. Then let's focus on this issue you are addressing. Sorry
for the noise. :-)

Thanks
Baoquan

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-17  9:04           ` Michal Hocko
@ 2017-10-17 20:01             ` Kees Cook
  2017-10-19 11:20               ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Kees Cook @ 2017-10-17 20:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

On Tue, Oct 17, 2017 at 2:04 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Mon 16-10-17 12:38:19, Kees Cook wrote:
>> On Mon, Oct 16, 2017 at 11:43 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > On Mon 16-10-17 09:44:31, Kees Cook wrote:
>> >> On Mon, Oct 16, 2017 at 6:44 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> >> > From: Michal Hocko <mhocko@suse.com>
>> >> >
>> >> > eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE") has added
>> >> > MAP_FIXED flag to the initial ET_DYN segment mapping which defines the
>> >> > randomized base for the PIE ELF segments. The thing is that MAP_FIXED
>> >> > shouldn't be really needed because the address is essentially random
>> >> > anyway. All other segments are mapped relatively to this base. elf_map
>> >> > makes sure that all segments will fit into the address space by
>> >> > enforcing total_mapping_size initial map.
>> >> >
>> >> > Why do we want to drop MAP_FIXED in the first place? Because it is error
>> >> > prone. If we happen to have an existing mapping in the requested range
>> >> > then we do not want to corrupt it silently. Without MAP_FIXED vm_mmap
>> >> > will simply fallback to another range. In reality there shouldn't be
>> >> > any conflicting mappings at this early exec stage so the mmap should
>> >> > succeed even without MAP_FIXED but subtle changes to the randomization
>> >> > can break this assumption so we should rather be careful here.
>> >> >
>> >> > Fixes: eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
>> >> > Signed-off-by: Michal Hocko <mhocko@suse.com>
>> >> > ---
>> >> >  fs/binfmt_elf.c | 1 -
>> >> >  1 file changed, 1 deletion(-)
>> >> >
>> >> > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
>> >> > index 09456e2add18..244cc30dfa24 100644
>> >> > --- a/fs/binfmt_elf.c
>> >> > +++ b/fs/binfmt_elf.c
>> >> > @@ -988,7 +988,6 @@ static int load_elf_binary(struct linux_binprm *bprm)
>> >> >                                 load_bias = ELF_ET_DYN_BASE;
>> >> >                                 if (current->flags & PF_RANDOMIZE)
>> >> >                                         load_bias += arch_mmap_rnd();
>> >> > -                               elf_flags |= MAP_FIXED;
>> >>
>> >> If MAP_FIXED is being masked out in patch 1 (but used as a check for
>> >> correct position, I think this MAP_FIXED should _not_ be removed).
>> >> This provides for checking for the initial mapping. The failure mode
>> >> here would be to allow an attack to "push" a mapping away from some
>> >> overlapping region. This should not be allowed either: if the initial
>> >> mapping is "wrong", we should absolutely fail, otherwise we can be
>> >> introducing a silent reduction in PIE entropy.
>> >
>> > Do we really lose any entropy? We are using standard randomized mmap in
>> > that case. So we are randomized in either case. Are you worried that
>> > an attacker could tell the two cases and abuse some sort of offset2lib
>> > attack?
>>
>> Not in the regular case. I'm suggesting that what your changes are
>> preparing for is an _unknown_ way to collide mappings. In that case,
>> we should be as defensive as we know how. And if we were to remove
>> MAP_FIXED here, it would allow an attacker (with some future method)
>> to potentially collapse a range of ASLR for execution, since missing
>> MAP_FIXED here would silently move a mapping somewhere else. So we
>> should keep MAP_FIXED, as any collision would indicate an unknown
>> method of crashing an exec into something else.
>
> I am sorry but I do not follow. I could see how offset2lib would be a
> concern but you seem to be thinking about a different scenario. Could
> you be more specific please.

Sorry, I will try a specific example below.

> I am not insisting on this patch but it seems to me is just makes a
> recoverable state a failure.

Right, I understand you're trying to make it recoverable. I'm
suggesting that making it recoverable provides a way for an attack to
abuse it, and that what we'd be recovering from is a case we should
never ever see.

Consider the case where through some future bug/feature, it's possible
to put the stack at an arbitrary location during an exec. (We've
worked to fix that already, but who knows what the future holds either
through misfeatures or bugs.) If an attacker maps the stack over a
large portion of the PIE exec range, patch 2 will result in vmmap
searching out a location that isn't already allocated. This means that
instead of the PIE ASLR choosing from the entire possible range, it
will get limited to only the area where something isn't already
overlapping. This would give an attacker the ability to control the
PIE ASLR, possibly forcing it into a fixed location.

Without patch 2, an unexpected overlap will fail the exec, which is
_good_, because we should never see an overlap. If we do, something
has gone very wrong, and we shouldn't paper over that with a silent
loss of ASLR entropy.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-17 20:01             ` Kees Cook
@ 2017-10-19 11:20               ` Michal Hocko
  2017-10-19 17:19                 ` Kees Cook
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2017-10-19 11:20 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

On Tue 17-10-17 13:01:04, Kees Cook wrote:
> On Tue, Oct 17, 2017 at 2:04 AM, Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > I am not insisting on this patch but it seems to me is just makes a
> > recoverable state a failure.
> 
> Right, I understand you're trying to make it recoverable. I'm
> suggesting that making it recoverable provides a way for an attack to
> abuse it, and that what we'd be recovering from is a case we should
> never ever see.
> 
> Consider the case where through some future bug/feature, it's possible
> to put the stack at an arbitrary location during an exec. (We've
> worked to fix that already, but who knows what the future holds either
> through misfeatures or bugs.) If an attacker maps the stack over a
> large portion of the PIE exec range, patch 2 will result in vmmap
> searching out a location that isn't already allocated. This means that
> instead of the PIE ASLR choosing from the entire possible range, it
> will get limited to only the area where something isn't already
> overlapping. This would give an attacker the ability to control the
> PIE ASLR, possibly forcing it into a fixed location.

Yes, I guess I understand that part. What is not clear to me exactly is
why this matters as we have the mmap_base randomized and not under the
control of the attacker.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-19 11:20               ` Michal Hocko
@ 2017-10-19 17:19                 ` Kees Cook
  2017-10-20  8:45                   ` Michal Hocko
  0 siblings, 1 reply; 39+ messages in thread
From: Kees Cook @ 2017-10-19 17:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

On Thu, Oct 19, 2017 at 4:20 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Tue 17-10-17 13:01:04, Kees Cook wrote:
>> On Tue, Oct 17, 2017 at 2:04 AM, Michal Hocko <mhocko@kernel.org> wrote:
> [...]
>> > I am not insisting on this patch but it seems to me is just makes a
>> > recoverable state a failure.
>>
>> Right, I understand you're trying to make it recoverable. I'm
>> suggesting that making it recoverable provides a way for an attack to
>> abuse it, and that what we'd be recovering from is a case we should
>> never ever see.
>>
>> Consider the case where through some future bug/feature, it's possible
>> to put the stack at an arbitrary location during an exec. (We've
>> worked to fix that already, but who knows what the future holds either
>> through misfeatures or bugs.) If an attacker maps the stack over a
>> large portion of the PIE exec range, patch 2 will result in vmmap
>> searching out a location that isn't already allocated. This means that
>> instead of the PIE ASLR choosing from the entire possible range, it
>> will get limited to only the area where something isn't already
>> overlapping. This would give an attacker the ability to control the
>> PIE ASLR, possibly forcing it into a fixed location.
>
> Yes, I guess I understand that part. What is not clear to me exactly is
> why this matters as we have the mmap_base randomized and not under the
> control of the attacker.

mmap_base is separate from the PIE base, so patch 2 would allow for a
reduction of the PIE ASLR entropy in the case of a novel overlap
attack.

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-19 17:19                 ` Kees Cook
@ 2017-10-20  8:45                   ` Michal Hocko
  2017-10-20 14:12                     ` Kees Cook
  0 siblings, 1 reply; 39+ messages in thread
From: Michal Hocko @ 2017-10-20  8:45 UTC (permalink / raw)
  To: Kees Cook
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

On Thu 19-10-17 10:19:40, Kees Cook wrote:
> On Thu, Oct 19, 2017 at 4:20 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > On Tue 17-10-17 13:01:04, Kees Cook wrote:
> >> On Tue, Oct 17, 2017 at 2:04 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > [...]
> >> > I am not insisting on this patch but it seems to me is just makes a
> >> > recoverable state a failure.
> >>
> >> Right, I understand you're trying to make it recoverable. I'm
> >> suggesting that making it recoverable provides a way for an attack to
> >> abuse it, and that what we'd be recovering from is a case we should
> >> never ever see.
> >>
> >> Consider the case where through some future bug/feature, it's possible
> >> to put the stack at an arbitrary location during an exec. (We've
> >> worked to fix that already, but who knows what the future holds either
> >> through misfeatures or bugs.) If an attacker maps the stack over a
> >> large portion of the PIE exec range, patch 2 will result in vmmap
> >> searching out a location that isn't already allocated. This means that
> >> instead of the PIE ASLR choosing from the entire possible range, it
> >> will get limited to only the area where something isn't already
> >> overlapping. This would give an attacker the ability to control the
> >> PIE ASLR, possibly forcing it into a fixed location.
> >
> > Yes, I guess I understand that part. What is not clear to me exactly is
> > why this matters as we have the mmap_base randomized and not under the
> > control of the attacker.
> 
> mmap_base is separate from the PIE base, so patch 2 would allow for a
> reduction of the PIE ASLR entropy in the case of a novel overlap
> attack.

OK, it seems that I am just too dull see through your concerns here.
Anyway, are you willing to ack the patch 1 (when metag fix is included)?
I would resubmit in that case and ask for merging without patch 2.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment
  2017-10-20  8:45                   ` Michal Hocko
@ 2017-10-20 14:12                     ` Kees Cook
  0 siblings, 0 replies; 39+ messages in thread
From: Kees Cook @ 2017-10-20 14:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: LKML, Linus Torvalds, Jiri Kosina, Al Viro, Oleg Nesterov,
	Ingo Molnar, Baoquan He

On Fri, Oct 20, 2017 at 1:45 AM, Michal Hocko <mhocko@kernel.org> wrote:
> On Thu 19-10-17 10:19:40, Kees Cook wrote:
>> On Thu, Oct 19, 2017 at 4:20 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > On Tue 17-10-17 13:01:04, Kees Cook wrote:
>> >> On Tue, Oct 17, 2017 at 2:04 AM, Michal Hocko <mhocko@kernel.org> wrote:
>> > [...]
>> >> > I am not insisting on this patch but it seems to me is just makes a
>> >> > recoverable state a failure.
>> >>
>> >> Right, I understand you're trying to make it recoverable. I'm
>> >> suggesting that making it recoverable provides a way for an attack to
>> >> abuse it, and that what we'd be recovering from is a case we should
>> >> never ever see.
>> >>
>> >> Consider the case where through some future bug/feature, it's possible
>> >> to put the stack at an arbitrary location during an exec. (We've
>> >> worked to fix that already, but who knows what the future holds either
>> >> through misfeatures or bugs.) If an attacker maps the stack over a
>> >> large portion of the PIE exec range, patch 2 will result in vmmap
>> >> searching out a location that isn't already allocated. This means that
>> >> instead of the PIE ASLR choosing from the entire possible range, it
>> >> will get limited to only the area where something isn't already
>> >> overlapping. This would give an attacker the ability to control the
>> >> PIE ASLR, possibly forcing it into a fixed location.
>> >
>> > Yes, I guess I understand that part. What is not clear to me exactly is
>> > why this matters as we have the mmap_base randomized and not under the
>> > control of the attacker.
>>
>> mmap_base is separate from the PIE base, so patch 2 would allow for a
>> reduction of the PIE ASLR entropy in the case of a novel overlap
>> attack.
>
> OK, it seems that I am just too dull see through your concerns here.

I'm probably not explaining it well enough! :(

> Anyway, are you willing to ack the patch 1 (when metag fix is included)?
> I would resubmit in that case and ask for merging without patch 2.

Yup, I really like patch 1: it protects us from "impossible"
situations, which we know rarely stay impossible. :)

-Kees

-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-10-20 14:12 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-04  7:50 MAP_FIXED for ELF mappings Michal Hocko
2017-10-04  7:59 ` Michal Hocko
2017-10-04 15:03 ` Baoquan He
2017-10-04 15:11   ` Michal Hocko
2017-10-04 15:12   ` Baoquan He
2017-10-04 15:17     ` Michal Hocko
2017-10-04 15:37       ` Baoquan He
2017-10-04 17:12         ` Michal Hocko
2017-10-04 17:15           ` Linus Torvalds
2017-10-04 17:28             ` Michal Hocko
2017-10-05 16:33       ` Oleg Nesterov
2017-10-05 16:42         ` Michal Hocko
2017-10-16 13:44 ` [PATCH 0/2] fs, elf: get rid of MAP_FIXED from the loader Michal Hocko
2017-10-16 13:44   ` [PATCH 1/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko
2017-10-16 16:39     ` Kees Cook
2017-10-16 19:00       ` Michal Hocko
2017-10-16 19:00         ` Michal Hocko
2017-10-16 20:02         ` James Hogan
2017-10-16 20:02           ` James Hogan
2017-10-17  7:37           ` Michal Hocko
2017-10-17  7:37             ` Michal Hocko
2017-10-17  8:35             ` James Hogan
2017-10-17  8:35               ` James Hogan
2017-10-17  8:56               ` Michal Hocko
2017-10-17 12:26     ` Baoquan He
2017-10-17 12:56       ` Michal Hocko
2017-10-17 13:22         ` Baoquan He
2017-10-17 13:33           ` Michal Hocko
2017-10-17 13:42             ` Baoquan He
2017-10-16 13:44   ` [PATCH 2/2] fs, elf: drop MAP_FIXED from initial ET_DYN segment Michal Hocko
2017-10-16 16:44     ` Kees Cook
2017-10-16 18:43       ` Michal Hocko
2017-10-16 19:38         ` Kees Cook
2017-10-17  9:04           ` Michal Hocko
2017-10-17 20:01             ` Kees Cook
2017-10-19 11:20               ` Michal Hocko
2017-10-19 17:19                 ` Kees Cook
2017-10-20  8:45                   ` Michal Hocko
2017-10-20 14:12                     ` Kees Cook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.