All of lore.kernel.org
 help / color / mirror / Atom feed
* (no subject)
@ 2017-11-16 10:18 ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:18 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko

Hi,
this has started as a follow up discussion [1][2] resulting in the
runtime failure caused by hardening patch [3] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber and existing underlying mapping (e.g. stack). The
reason for the failure is that some architectures enforce an alignment
for the given address hint without MAP_FIXED used (e.g. for shared or
file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [4]. The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution. We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_SAFE which enforces the given
address but unlike MAP_FIXED it fails with ENOMEM if the given range
conflicts with an existing one. The flag is introduced as a completely
new flag rather than a MAP_FIXED extension because of the backward
compatibility. We really want a never-clobber semantic even on older
kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
flags evaluation because we do not EINVAL on unknown flags. On those
kernels we would simply use the traditional hint based semantic so the
caller can still get a different address (which sucks) but at least not
silently corrupt an existing mapping. I do not see a good way around
that. Except we won't export expose the new semantic to the userspace at
all. It seems there are users who would like to have something like that
[5], though. Atomic address range probing in the multithreaded programs
sounds like an interesting thing to me as well, although I do not have
any specific usecase in mind.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
follow. Actually real MAP_FIXED usages should be docummented properly
and they should be more of an exception.

Does anybody see any fundamental reasons why this is a wrong approach?

Diffstat says
 arch/alpha/include/uapi/asm/mman.h   |  2 ++
 arch/metag/kernel/process.c          |  6 +++++-
 arch/mips/include/uapi/asm/mman.h    |  2 ++
 arch/parisc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h    |  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  2 ++
 fs/binfmt_elf.c                      | 12 ++++++++----
 include/uapi/asm-generic/mman.h      |  1 +
 mm/mmap.c                            | 11 +++++++++++
 11 files changed, 36 insertions(+), 5 deletions(-)

[1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

^ permalink raw reply	[flat|nested] 76+ messages in thread

* (unknown), 
@ 2017-11-16 10:18 ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:18 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko

Hi,
this has started as a follow up discussion [1][2] resulting in the
runtime failure caused by hardening patch [3] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber and existing underlying mapping (e.g. stack). The
reason for the failure is that some architectures enforce an alignment
for the given address hint without MAP_FIXED used (e.g. for shared or
file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [4]. The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution. We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_SAFE which enforces the given
address but unlike MAP_FIXED it fails with ENOMEM if the given range
conflicts with an existing one. The flag is introduced as a completely
new flag rather than a MAP_FIXED extension because of the backward
compatibility. We really want a never-clobber semantic even on older
kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
flags evaluation because we do not EINVAL on unknown flags. On those
kernels we would simply use the traditional hint based semantic so the
caller can still get a different address (which sucks) but at least not
silently corrupt an existing mapping. I do not see a good way around
that. Except we won't export expose the new semantic to the userspace at
all. It seems there are users who would like to have something like that
[5], though. Atomic address range probing in the multithreaded programs
sounds like an interesting thing to me as well, although I do not have
any specific usecase in mind.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
follow. Actually real MAP_FIXED usages should be docummented properly
and they should be more of an exception.

Does anybody see any fundamental reasons why this is a wrong approach?

Diffstat says
 arch/alpha/include/uapi/asm/mman.h   |  2 ++
 arch/metag/kernel/process.c          |  6 +++++-
 arch/mips/include/uapi/asm/mman.h    |  2 ++
 arch/parisc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h    |  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  2 ++
 fs/binfmt_elf.c                      | 12 ++++++++----
 include/uapi/asm-generic/mman.h      |  1 +
 mm/mmap.c                            | 11 +++++++++++
 11 files changed, 36 insertions(+), 5 deletions(-)

[1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* (no subject)
@ 2017-11-16 10:18 ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:18 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko

Hi,
this has started as a follow up discussion [1][2] resulting in the
runtime failure caused by hardening patch [3] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber and existing underlying mapping (e.g. stack). The
reason for the failure is that some architectures enforce an alignment
for the given address hint without MAP_FIXED used (e.g. for shared or
file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [4]. The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution. We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_SAFE which enforces the given
address but unlike MAP_FIXED it fails with ENOMEM if the given range
conflicts with an existing one. The flag is introduced as a completely
new flag rather than a MAP_FIXED extension because of the backward
compatibility. We really want a never-clobber semantic even on older
kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
flags evaluation because we do not EINVAL on unknown flags. On those
kernels we would simply use the traditional hint based semantic so the
caller can still get a different address (which sucks) but at least not
silently corrupt an existing mapping. I do not see a good way around
that. Except we won't export expose the new semantic to the userspace at
all. It seems there are users who would like to have something like that
[5], though. Atomic address range probing in the multithreaded programs
sounds like an interesting thing to me as well, although I do not have
any specific usecase in mind.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
follow. Actually real MAP_FIXED usages should be docummented properly
and they should be more of an exception.

Does anybody see any fundamental reasons why this is a wrong approach?

Diffstat says
 arch/alpha/include/uapi/asm/mman.h   |  2 ++
 arch/metag/kernel/process.c          |  6 +++++-
 arch/mips/include/uapi/asm/mman.h    |  2 ++
 arch/parisc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h    |  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  2 ++
 fs/binfmt_elf.c                      | 12 ++++++++----
 include/uapi/asm-generic/mman.h      |  1 +
 mm/mmap.c                            | 11 +++++++++++
 11 files changed, 36 insertions(+), 5 deletions(-)

[1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

^ permalink raw reply	[flat|nested] 76+ messages in thread

* (no subject)
@ 2017-11-16 10:18 ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:18 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko

Hi,
this has started as a follow up discussion [1][2] resulting in the
runtime failure caused by hardening patch [3] which removes MAP_FIXED
from the elf loader because MAP_FIXED is inherently dangerous as it
might silently clobber and existing underlying mapping (e.g. stack). The
reason for the failure is that some architectures enforce an alignment
for the given address hint without MAP_FIXED used (e.g. for shared or
file backed mappings).

One way around this would be excluding those archs which do alignment
tricks from the hardening [4]. The patch is really trivial but it has
been objected, rightfully so, that this screams for a more generic
solution. We basically want a non-destructive MAP_FIXED.

The first patch introduced MAP_FIXED_SAFE which enforces the given
address but unlike MAP_FIXED it fails with ENOMEM if the given range
conflicts with an existing one. The flag is introduced as a completely
new flag rather than a MAP_FIXED extension because of the backward
compatibility. We really want a never-clobber semantic even on older
kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
flags evaluation because we do not EINVAL on unknown flags. On those
kernels we would simply use the traditional hint based semantic so the
caller can still get a different address (which sucks) but at least not
silently corrupt an existing mapping. I do not see a good way around
that. Except we won't export expose the new semantic to the userspace at
all. It seems there are users who would like to have something like that
[5], though. Atomic address range probing in the multithreaded programs
sounds like an interesting thing to me as well, although I do not have
any specific usecase in mind.

The second patch simply replaces MAP_FIXED use in elf loader by
MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
follow. Actually real MAP_FIXED usages should be docummented properly
and they should be more of an exception.

Does anybody see any fundamental reasons why this is a wrong approach?

Diffstat says
 arch/alpha/include/uapi/asm/mman.h   |  2 ++
 arch/metag/kernel/process.c          |  6 +++++-
 arch/mips/include/uapi/asm/mman.h    |  2 ++
 arch/parisc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h    |  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  2 ++
 fs/binfmt_elf.c                      | 12 ++++++++----
 include/uapi/asm-generic/mman.h      |  1 +
 mm/mmap.c                            | 11 +++++++++++
 11 files changed, 36 insertions(+), 5 deletions(-)

[1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
[2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
[3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
[4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
[5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-16 10:18 ` (unknown), Michal Hocko
  (?)
@ 2017-11-16 10:18   ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:18 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

MAP_FIXED is used quite often to enforce mapping at the particular
range. The main problem of this flag is, however, that it is inherently
dangerous because it unmaps existing mappings covered by the requested
range. This can cause silent memory corruptions. Some of them even with
serious security implications. While the current semantic might be
really desiderable in many cases there are others which would want to
enforce the given range but rather see a failure than a silent memory
corruption on a clashing range. Please note that there is no guarantee
that a given range is obeyed by the mmap even when it is free - e.g.
arch specific code is allowed to apply an alignment.

Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
It has the same semantic as MAP_FIXED wrt. the given address request
with a single exception that it fails with ENOMEM if the requested
address is already covered by an existing mapping. We still do rely on
get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
check for a conflicting vma after it returns.

[set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/alpha/include/uapi/asm/mman.h   |  2 ++
 arch/mips/include/uapi/asm/mman.h    |  2 ++
 arch/parisc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h    |  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  2 ++
 include/uapi/asm-generic/mman.h      |  1 +
 mm/mmap.c                            | 11 +++++++++++
 9 files changed, 23 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 3b26cc62dadb..0e5724e4b4ad 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -31,6 +31,8 @@
 #define MAP_STACK	0x80000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x100000	/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x200000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 #define MS_ASYNC	1		/* sync memory asynchronously */
 #define MS_SYNC		2		/* synchronous memory sync */
 #define MS_INVALIDATE	4		/* invalidate the caches */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index da3216007fe0..fc5e61ef9fd4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -49,6 +49,8 @@
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 /*
  * Flags for msync
  */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index cc9ba1d34779..c926487472fa 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -25,6 +25,8 @@
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 #define MS_SYNC		1		/* synchronous memory sync */
 #define MS_ASYNC	2		/* sync memory asynchronously */
 #define MS_INVALIDATE	4		/* invalidate the caches */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 03c06ba7464f..d97342ca25b1 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -28,5 +28,6 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x800000	/* MAP_FIXED which doesn't unmap underlying mapping */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 9765896ecb2c..7b00477a7f9a 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -23,6 +23,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index 63ee13faf17d..d5d58d2dc95e 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_HUGETLB	0x4000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x8000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 
 /*
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index b15b278aa314..d665bd8b7cbd 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -55,6 +55,7 @@
 #define MAP_NONBLOCK	0x20000		/* do not block on IO */
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
 #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
 # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
 					 * uninitialized */
@@ -62,6 +63,7 @@
 # define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
 #endif
 
+
 /*
  * Flags for msync
  */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..64c46047fbd3 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 680506faceae..89af0b5839a5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		if (!(file && path_noexec(&file->f_path)))
 			prot |= PROT_EXEC;
 
+	/* force arch specific MAP_FIXED handling in get_unmapped_area */
+	if (flags & MAP_FIXED_SAFE)
+		flags |= MAP_FIXED;
+
 	if (!(flags & MAP_FIXED))
 		addr = round_hint_to_min(addr);
 
@@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (offset_in_page(addr))
 		return addr;
 
+	if (flags & MAP_FIXED_SAFE) {
+		struct vm_area_struct *vma = find_vma(mm, addr);
+
+		if (vma && vma->vm_start <= addr)
+			return -ENOMEM;
+	}
+
 	if (prot == PROT_EXEC) {
 		pkey = execute_only_pkey(mm);
 		if (pkey < 0)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-16 10:18   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:18 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

MAP_FIXED is used quite often to enforce mapping at the particular
range. The main problem of this flag is, however, that it is inherently
dangerous because it unmaps existing mappings covered by the requested
range. This can cause silent memory corruptions. Some of them even with
serious security implications. While the current semantic might be
really desiderable in many cases there are others which would want to
enforce the given range but rather see a failure than a silent memory
corruption on a clashing range. Please note that there is no guarantee
that a given range is obeyed by the mmap even when it is free - e.g.
arch specific code is allowed to apply an alignment.

Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
It has the same semantic as MAP_FIXED wrt. the given address request
with a single exception that it fails with ENOMEM if the requested
address is already covered by an existing mapping. We still do rely on
get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
check for a conflicting vma after it returns.

[set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/alpha/include/uapi/asm/mman.h   |  2 ++
 arch/mips/include/uapi/asm/mman.h    |  2 ++
 arch/parisc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h    |  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  2 ++
 include/uapi/asm-generic/mman.h      |  1 +
 mm/mmap.c                            | 11 +++++++++++
 9 files changed, 23 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 3b26cc62dadb..0e5724e4b4ad 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -31,6 +31,8 @@
 #define MAP_STACK	0x80000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x100000	/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x200000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 #define MS_ASYNC	1		/* sync memory asynchronously */
 #define MS_SYNC		2		/* synchronous memory sync */
 #define MS_INVALIDATE	4		/* invalidate the caches */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index da3216007fe0..fc5e61ef9fd4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -49,6 +49,8 @@
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 /*
  * Flags for msync
  */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index cc9ba1d34779..c926487472fa 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -25,6 +25,8 @@
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 #define MS_SYNC		1		/* synchronous memory sync */
 #define MS_ASYNC	2		/* sync memory asynchronously */
 #define MS_INVALIDATE	4		/* invalidate the caches */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 03c06ba7464f..d97342ca25b1 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -28,5 +28,6 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x800000	/* MAP_FIXED which doesn't unmap underlying mapping */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 9765896ecb2c..7b00477a7f9a 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -23,6 +23,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index 63ee13faf17d..d5d58d2dc95e 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_HUGETLB	0x4000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x8000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 
 /*
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index b15b278aa314..d665bd8b7cbd 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -55,6 +55,7 @@
 #define MAP_NONBLOCK	0x20000		/* do not block on IO */
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
 #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
 # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
 					 * uninitialized */
@@ -62,6 +63,7 @@
 # define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
 #endif
 
+
 /*
  * Flags for msync
  */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..64c46047fbd3 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 680506faceae..89af0b5839a5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		if (!(file && path_noexec(&file->f_path)))
 			prot |= PROT_EXEC;
 
+	/* force arch specific MAP_FIXED handling in get_unmapped_area */
+	if (flags & MAP_FIXED_SAFE)
+		flags |= MAP_FIXED;
+
 	if (!(flags & MAP_FIXED))
 		addr = round_hint_to_min(addr);
 
@@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (offset_in_page(addr))
 		return addr;
 
+	if (flags & MAP_FIXED_SAFE) {
+		struct vm_area_struct *vma = find_vma(mm, addr);
+
+		if (vma && vma->vm_start <= addr)
+			return -ENOMEM;
+	}
+
 	if (prot == PROT_EXEC) {
 		pkey = execute_only_pkey(mm);
 		if (pkey < 0)
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-16 10:18   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:18 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

MAP_FIXED is used quite often to enforce mapping at the particular
range. The main problem of this flag is, however, that it is inherently
dangerous because it unmaps existing mappings covered by the requested
range. This can cause silent memory corruptions. Some of them even with
serious security implications. While the current semantic might be
really desiderable in many cases there are others which would want to
enforce the given range but rather see a failure than a silent memory
corruption on a clashing range. Please note that there is no guarantee
that a given range is obeyed by the mmap even when it is free - e.g.
arch specific code is allowed to apply an alignment.

Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
It has the same semantic as MAP_FIXED wrt. the given address request
with a single exception that it fails with ENOMEM if the requested
address is already covered by an existing mapping. We still do rely on
get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
check for a conflicting vma after it returns.

[set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/alpha/include/uapi/asm/mman.h   |  2 ++
 arch/mips/include/uapi/asm/mman.h    |  2 ++
 arch/parisc/include/uapi/asm/mman.h  |  2 ++
 arch/powerpc/include/uapi/asm/mman.h |  1 +
 arch/sparc/include/uapi/asm/mman.h   |  1 +
 arch/tile/include/uapi/asm/mman.h    |  1 +
 arch/xtensa/include/uapi/asm/mman.h  |  2 ++
 include/uapi/asm-generic/mman.h      |  1 +
 mm/mmap.c                            | 11 +++++++++++
 9 files changed, 23 insertions(+)

diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
index 3b26cc62dadb..0e5724e4b4ad 100644
--- a/arch/alpha/include/uapi/asm/mman.h
+++ b/arch/alpha/include/uapi/asm/mman.h
@@ -31,6 +31,8 @@
 #define MAP_STACK	0x80000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x100000	/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x200000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 #define MS_ASYNC	1		/* sync memory asynchronously */
 #define MS_SYNC		2		/* synchronous memory sync */
 #define MS_INVALIDATE	4		/* invalidate the caches */
diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
index da3216007fe0..fc5e61ef9fd4 100644
--- a/arch/mips/include/uapi/asm/mman.h
+++ b/arch/mips/include/uapi/asm/mman.h
@@ -49,6 +49,8 @@
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 /*
  * Flags for msync
  */
diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
index cc9ba1d34779..c926487472fa 100644
--- a/arch/parisc/include/uapi/asm/mman.h
+++ b/arch/parisc/include/uapi/asm/mman.h
@@ -25,6 +25,8 @@
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
 
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
+
 #define MS_SYNC		1		/* synchronous memory sync */
 #define MS_ASYNC	2		/* sync memory asynchronously */
 #define MS_INVALIDATE	4		/* invalidate the caches */
diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
index 03c06ba7464f..d97342ca25b1 100644
--- a/arch/powerpc/include/uapi/asm/mman.h
+++ b/arch/powerpc/include/uapi/asm/mman.h
@@ -28,5 +28,6 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x800000	/* MAP_FIXED which doesn't unmap underlying mapping */
 
 #endif /* _UAPI_ASM_POWERPC_MMAN_H */
diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
index 9765896ecb2c..7b00477a7f9a 100644
--- a/arch/sparc/include/uapi/asm/mman.h
+++ b/arch/sparc/include/uapi/asm/mman.h
@@ -23,6 +23,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 
 #endif /* _UAPI__SPARC_MMAN_H__ */
diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
index 63ee13faf17d..d5d58d2dc95e 100644
--- a/arch/tile/include/uapi/asm/mman.h
+++ b/arch/tile/include/uapi/asm/mman.h
@@ -29,6 +29,7 @@
 #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
 #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
 #define MAP_HUGETLB	0x4000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x8000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 
 /*
diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
index b15b278aa314..d665bd8b7cbd 100644
--- a/arch/xtensa/include/uapi/asm/mman.h
+++ b/arch/xtensa/include/uapi/asm/mman.h
@@ -55,6 +55,7 @@
 #define MAP_NONBLOCK	0x20000		/* do not block on IO */
 #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
 #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
 # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
 					 * uninitialized */
@@ -62,6 +63,7 @@
 # define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
 #endif
 
+
 /*
  * Flags for msync
  */
diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
index 7162cd4cca73..64c46047fbd3 100644
--- a/include/uapi/asm-generic/mman.h
+++ b/include/uapi/asm-generic/mman.h
@@ -12,6 +12,7 @@
 #define MAP_NONBLOCK	0x10000		/* do not block on IO */
 #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
 #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
+#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
 
 /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
 
diff --git a/mm/mmap.c b/mm/mmap.c
index 680506faceae..89af0b5839a5 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 		if (!(file && path_noexec(&file->f_path)))
 			prot |= PROT_EXEC;
 
+	/* force arch specific MAP_FIXED handling in get_unmapped_area */
+	if (flags & MAP_FIXED_SAFE)
+		flags |= MAP_FIXED;
+
 	if (!(flags & MAP_FIXED))
 		addr = round_hint_to_min(addr);
 
@@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
 	if (offset_in_page(addr))
 		return addr;
 
+	if (flags & MAP_FIXED_SAFE) {
+		struct vm_area_struct *vma = find_vma(mm, addr);
+
+		if (vma && vma->vm_start <= addr)
+			return -ENOMEM;
+	}
+
 	if (prot == PROT_EXEC) {
 		pkey = execute_only_pkey(mm);
 		if (pkey < 0)
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-11-16 10:18 ` (unknown), Michal Hocko
  (?)
@ 2017-11-16 10:19   ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:19 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko, Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index c4606ce743d2..2286140e54e0 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -398,7 +398,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -416,6 +416,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6466153f2bf0..12b21942ccde 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -929,7 +933,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -965,7 +969,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1220,7 +1224,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-11-16 10:19   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:19 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko, Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index c4606ce743d2..2286140e54e0 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -398,7 +398,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -416,6 +416,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6466153f2bf0..12b21942ccde 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -929,7 +933,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -965,7 +969,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1220,7 +1224,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-11-16 10:19   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 10:19 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko, Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index c4606ce743d2..2286140e54e0 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -398,7 +398,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -416,6 +416,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 6466153f2bf0..12b21942ccde 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -929,7 +933,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -965,7 +969,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1220,7 +1224,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-16 12:14   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 12:14 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

[Ups, managed to screw the subject - fix it]

On Thu 16-11-17 11:18:58, Michal Hocko wrote:
> Hi,
> this has started as a follow up discussion [1][2] resulting in the
> runtime failure caused by hardening patch [3] which removes MAP_FIXED
> from the elf loader because MAP_FIXED is inherently dangerous as it
> might silently clobber and existing underlying mapping (e.g. stack). The
> reason for the failure is that some architectures enforce an alignment
> for the given address hint without MAP_FIXED used (e.g. for shared or
> file backed mappings).
> 
> One way around this would be excluding those archs which do alignment
> tricks from the hardening [4]. The patch is really trivial but it has
> been objected, rightfully so, that this screams for a more generic
> solution. We basically want a non-destructive MAP_FIXED.
> 
> The first patch introduced MAP_FIXED_SAFE which enforces the given
> address but unlike MAP_FIXED it fails with ENOMEM if the given range
> conflicts with an existing one. The flag is introduced as a completely
> new flag rather than a MAP_FIXED extension because of the backward
> compatibility. We really want a never-clobber semantic even on older
> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
> flags evaluation because we do not EINVAL on unknown flags. On those
> kernels we would simply use the traditional hint based semantic so the
> caller can still get a different address (which sucks) but at least not
> silently corrupt an existing mapping. I do not see a good way around
> that. Except we won't export expose the new semantic to the userspace at
> all. It seems there are users who would like to have something like that
> [5], though. Atomic address range probing in the multithreaded programs
> sounds like an interesting thing to me as well, although I do not have
> any specific usecase in mind.
> 
> The second patch simply replaces MAP_FIXED use in elf loader by
> MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
> follow. Actually real MAP_FIXED usages should be docummented properly
> and they should be more of an exception.
> 
> Does anybody see any fundamental reasons why this is a wrong approach?
> 
> Diffstat says
>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>  arch/metag/kernel/process.c          |  6 +++++-
>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>  arch/tile/include/uapi/asm/mman.h    |  1 +
>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>  fs/binfmt_elf.c                      | 12 ++++++++----
>  include/uapi/asm-generic/mman.h      |  1 +
>  mm/mmap.c                            | 11 +++++++++++
>  11 files changed, 36 insertions(+), 5 deletions(-)
> 
> [1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
> [2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
> [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
> [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
> [5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-16 12:14   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 12:14 UTC (permalink / raw)
  To: linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, Abdul Haleem, Joel Stanley,
	Kees Cook

[Ups, managed to screw the subject - fix it]

On Thu 16-11-17 11:18:58, Michal Hocko wrote:
> Hi,
> this has started as a follow up discussion [1][2] resulting in the
> runtime failure caused by hardening patch [3] which removes MAP_FIXED
> from the elf loader because MAP_FIXED is inherently dangerous as it
> might silently clobber and existing underlying mapping (e.g. stack). The
> reason for the failure is that some architectures enforce an alignment
> for the given address hint without MAP_FIXED used (e.g. for shared or
> file backed mappings).
> 
> One way around this would be excluding those archs which do alignment
> tricks from the hardening [4]. The patch is really trivial but it has
> been objected, rightfully so, that this screams for a more generic
> solution. We basically want a non-destructive MAP_FIXED.
> 
> The first patch introduced MAP_FIXED_SAFE which enforces the given
> address but unlike MAP_FIXED it fails with ENOMEM if the given range
> conflicts with an existing one. The flag is introduced as a completely
> new flag rather than a MAP_FIXED extension because of the backward
> compatibility. We really want a never-clobber semantic even on older
> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
> flags evaluation because we do not EINVAL on unknown flags. On those
> kernels we would simply use the traditional hint based semantic so the
> caller can still get a different address (which sucks) but at least not
> silently corrupt an existing mapping. I do not see a good way around
> that. Except we won't export expose the new semantic to the userspace at
> all. It seems there are users who would like to have something like that
> [5], though. Atomic address range probing in the multithreaded programs
> sounds like an interesting thing to me as well, although I do not have
> any specific usecase in mind.
> 
> The second patch simply replaces MAP_FIXED use in elf loader by
> MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
> follow. Actually real MAP_FIXED usages should be docummented properly
> and they should be more of an exception.
> 
> Does anybody see any fundamental reasons why this is a wrong approach?
> 
> Diffstat says
>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>  arch/metag/kernel/process.c          |  6 +++++-
>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>  arch/tile/include/uapi/asm/mman.h    |  1 +
>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>  fs/binfmt_elf.c                      | 12 ++++++++----
>  include/uapi/asm-generic/mman.h      |  1 +
>  mm/mmap.c                            | 11 +++++++++++
>  11 files changed, 36 insertions(+), 5 deletions(-)
> 
> [1] http://lkml.kernel.org/r/20171107162217.382cd754-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org
> [2] http://lkml.kernel.org/r/1510048229.12079.7.camel-JKZ9t1WPFCv1ENwx4SLHqw@public.gmane.org
> [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
> [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org
> [5] http://lkml.kernel.org/r/87efp1w7vy.fsf-W0DJWXSxmBNbyGPkN3NxC2scP1bn1w/D@public.gmane.org
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-16 12:14   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-16 12:14 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

[Ups, managed to screw the subject - fix it]

On Thu 16-11-17 11:18:58, Michal Hocko wrote:
> Hi,
> this has started as a follow up discussion [1][2] resulting in the
> runtime failure caused by hardening patch [3] which removes MAP_FIXED
> from the elf loader because MAP_FIXED is inherently dangerous as it
> might silently clobber and existing underlying mapping (e.g. stack). The
> reason for the failure is that some architectures enforce an alignment
> for the given address hint without MAP_FIXED used (e.g. for shared or
> file backed mappings).
> 
> One way around this would be excluding those archs which do alignment
> tricks from the hardening [4]. The patch is really trivial but it has
> been objected, rightfully so, that this screams for a more generic
> solution. We basically want a non-destructive MAP_FIXED.
> 
> The first patch introduced MAP_FIXED_SAFE which enforces the given
> address but unlike MAP_FIXED it fails with ENOMEM if the given range
> conflicts with an existing one. The flag is introduced as a completely
> new flag rather than a MAP_FIXED extension because of the backward
> compatibility. We really want a never-clobber semantic even on older
> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
> flags evaluation because we do not EINVAL on unknown flags. On those
> kernels we would simply use the traditional hint based semantic so the
> caller can still get a different address (which sucks) but at least not
> silently corrupt an existing mapping. I do not see a good way around
> that. Except we won't export expose the new semantic to the userspace at
> all. It seems there are users who would like to have something like that
> [5], though. Atomic address range probing in the multithreaded programs
> sounds like an interesting thing to me as well, although I do not have
> any specific usecase in mind.
> 
> The second patch simply replaces MAP_FIXED use in elf loader by
> MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
> follow. Actually real MAP_FIXED usages should be docummented properly
> and they should be more of an exception.
> 
> Does anybody see any fundamental reasons why this is a wrong approach?
> 
> Diffstat says
>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>  arch/metag/kernel/process.c          |  6 +++++-
>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>  arch/tile/include/uapi/asm/mman.h    |  1 +
>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>  fs/binfmt_elf.c                      | 12 ++++++++----
>  include/uapi/asm-generic/mman.h      |  1 +
>  mm/mmap.c                            | 11 +++++++++++
>  11 files changed, 36 insertions(+), 5 deletions(-)
> 
> [1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
> [2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
> [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
> [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
> [5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-16 10:18   ` Michal Hocko
@ 2017-11-17  0:27     ` Kees Cook
  -1 siblings, 0 replies; 76+ messages in thread
From: Kees Cook @ 2017-11-17  0:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML,
	linux-arch, Michal Hocko

On Thu, Nov 16, 2017 at 2:18 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> MAP_FIXED is used quite often to enforce mapping at the particular
> range. The main problem of this flag is, however, that it is inherently
> dangerous because it unmaps existing mappings covered by the requested
> range. This can cause silent memory corruptions. Some of them even with
> serious security implications. While the current semantic might be
> really desiderable in many cases there are others which would want to
> enforce the given range but rather see a failure than a silent memory
> corruption on a clashing range. Please note that there is no guarantee
> that a given range is obeyed by the mmap even when it is free - e.g.
> arch specific code is allowed to apply an alignment.
>
> Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> It has the same semantic as MAP_FIXED wrt. the given address request
> with a single exception that it fails with ENOMEM if the requested
> address is already covered by an existing mapping. We still do rely on
> get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> check for a conflicting vma after it returns.

I like this much more than special-casing the ELF loader. It is an
unusual property that MAP_FIXED does _two_ things, so I like having
this split out.

Bikeshedding: maybe call this MAP_NO_CLOBBER? It's a modifier to
MAP_FIXED, really...

At the end of the day, I don't really care about the name, but "SAFE"
isn't very descriptive for what the flag changes about FIXED.

-Kees

>
> [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>  arch/tile/include/uapi/asm/mman.h    |  1 +
>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>  include/uapi/asm-generic/mman.h      |  1 +
>  mm/mmap.c                            | 11 +++++++++++
>  9 files changed, 23 insertions(+)
>
> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
> index 3b26cc62dadb..0e5724e4b4ad 100644
> --- a/arch/alpha/include/uapi/asm/mman.h
> +++ b/arch/alpha/include/uapi/asm/mman.h
> @@ -31,6 +31,8 @@
>  #define MAP_STACK      0x80000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x100000        /* create a huge page mapping */
>
> +#define MAP_FIXED_SAFE 0x200000        /* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_ASYNC       1               /* sync memory asynchronously */
>  #define MS_SYNC                2               /* synchronous memory sync */
>  #define MS_INVALIDATE  4               /* invalidate the caches */
> diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
> index da3216007fe0..fc5e61ef9fd4 100644
> --- a/arch/mips/include/uapi/asm/mman.h
> +++ b/arch/mips/include/uapi/asm/mman.h
> @@ -49,6 +49,8 @@
>  #define MAP_STACK      0x40000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x80000         /* create a huge page mapping */
>
> +#define MAP_FIXED_SAFE 0x100000        /* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index cc9ba1d34779..c926487472fa 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,8 @@
>  #define MAP_STACK      0x40000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x80000         /* create a huge page mapping */
>
> +#define MAP_FIXED_SAFE 0x100000        /* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_SYNC                1               /* synchronous memory sync */
>  #define MS_ASYNC       2               /* sync memory asynchronously */
>  #define MS_INVALIDATE  4               /* invalidate the caches */
> diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> index 03c06ba7464f..d97342ca25b1 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -28,5 +28,6 @@
>  #define MAP_NONBLOCK   0x10000         /* do not block on IO */
>  #define MAP_STACK      0x20000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x40000         /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x800000        /* MAP_FIXED which doesn't unmap underlying mapping */
>
>  #endif /* _UAPI_ASM_POWERPC_MMAN_H */
> diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
> index 9765896ecb2c..7b00477a7f9a 100644
> --- a/arch/sparc/include/uapi/asm/mman.h
> +++ b/arch/sparc/include/uapi/asm/mman.h
> @@ -23,6 +23,7 @@
>  #define MAP_NONBLOCK   0x10000         /* do not block on IO */
>  #define MAP_STACK      0x20000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x40000         /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x80000         /* MAP_FIXED which doesn't unmap underlying mapping */
>
>
>  #endif /* _UAPI__SPARC_MMAN_H__ */
> diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
> index 63ee13faf17d..d5d58d2dc95e 100644
> --- a/arch/tile/include/uapi/asm/mman.h
> +++ b/arch/tile/include/uapi/asm/mman.h
> @@ -29,6 +29,7 @@
>  #define MAP_DENYWRITE  0x0800          /* ETXTBSY */
>  #define MAP_EXECUTABLE 0x1000          /* mark it as an executable */
>  #define MAP_HUGETLB    0x4000          /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x8000          /* MAP_FIXED which doesn't unmap underlying mapping */
>
>
>  /*
> diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
> index b15b278aa314..d665bd8b7cbd 100644
> --- a/arch/xtensa/include/uapi/asm/mman.h
> +++ b/arch/xtensa/include/uapi/asm/mman.h
> @@ -55,6 +55,7 @@
>  #define MAP_NONBLOCK   0x20000         /* do not block on IO */
>  #define MAP_STACK      0x40000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x80000         /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x100000        /* MAP_FIXED which doesn't unmap underlying mapping */
>  #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
>  # define MAP_UNINITIALIZED 0x4000000   /* For anonymous mmap, memory could be
>                                          * uninitialized */
> @@ -62,6 +63,7 @@
>  # define MAP_UNINITIALIZED 0x0         /* Don't support this flag */
>  #endif
>
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
> index 7162cd4cca73..64c46047fbd3 100644
> --- a/include/uapi/asm-generic/mman.h
> +++ b/include/uapi/asm-generic/mman.h
> @@ -12,6 +12,7 @@
>  #define MAP_NONBLOCK   0x10000         /* do not block on IO */
>  #define MAP_STACK      0x20000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x40000         /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x80000         /* MAP_FIXED which doesn't unmap underlying mapping */
>
>  /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 680506faceae..89af0b5839a5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>                 if (!(file && path_noexec(&file->f_path)))
>                         prot |= PROT_EXEC;
>
> +       /* force arch specific MAP_FIXED handling in get_unmapped_area */
> +       if (flags & MAP_FIXED_SAFE)
> +               flags |= MAP_FIXED;
> +
>         if (!(flags & MAP_FIXED))
>                 addr = round_hint_to_min(addr);
>
> @@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>         if (offset_in_page(addr))
>                 return addr;
>
> +       if (flags & MAP_FIXED_SAFE) {
> +               struct vm_area_struct *vma = find_vma(mm, addr);
> +
> +               if (vma && vma->vm_start <= addr)
> +                       return -ENOMEM;
> +       }
> +
>         if (prot == PROT_EXEC) {
>                 pkey = execute_only_pkey(mm);
>                 if (pkey < 0)
> --
> 2.15.0
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  0:27     ` Kees Cook
  0 siblings, 0 replies; 76+ messages in thread
From: Kees Cook @ 2017-11-17  0:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML,
	linux-arch, Michal Hocko

On Thu, Nov 16, 2017 at 2:18 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> MAP_FIXED is used quite often to enforce mapping at the particular
> range. The main problem of this flag is, however, that it is inherently
> dangerous because it unmaps existing mappings covered by the requested
> range. This can cause silent memory corruptions. Some of them even with
> serious security implications. While the current semantic might be
> really desiderable in many cases there are others which would want to
> enforce the given range but rather see a failure than a silent memory
> corruption on a clashing range. Please note that there is no guarantee
> that a given range is obeyed by the mmap even when it is free - e.g.
> arch specific code is allowed to apply an alignment.
>
> Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> It has the same semantic as MAP_FIXED wrt. the given address request
> with a single exception that it fails with ENOMEM if the requested
> address is already covered by an existing mapping. We still do rely on
> get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> check for a conflicting vma after it returns.

I like this much more than special-casing the ELF loader. It is an
unusual property that MAP_FIXED does _two_ things, so I like having
this split out.

Bikeshedding: maybe call this MAP_NO_CLOBBER? It's a modifier to
MAP_FIXED, really...

At the end of the day, I don't really care about the name, but "SAFE"
isn't very descriptive for what the flag changes about FIXED.

-Kees

>
> [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>  arch/tile/include/uapi/asm/mman.h    |  1 +
>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>  include/uapi/asm-generic/mman.h      |  1 +
>  mm/mmap.c                            | 11 +++++++++++
>  9 files changed, 23 insertions(+)
>
> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
> index 3b26cc62dadb..0e5724e4b4ad 100644
> --- a/arch/alpha/include/uapi/asm/mman.h
> +++ b/arch/alpha/include/uapi/asm/mman.h
> @@ -31,6 +31,8 @@
>  #define MAP_STACK      0x80000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x100000        /* create a huge page mapping */
>
> +#define MAP_FIXED_SAFE 0x200000        /* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_ASYNC       1               /* sync memory asynchronously */
>  #define MS_SYNC                2               /* synchronous memory sync */
>  #define MS_INVALIDATE  4               /* invalidate the caches */
> diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
> index da3216007fe0..fc5e61ef9fd4 100644
> --- a/arch/mips/include/uapi/asm/mman.h
> +++ b/arch/mips/include/uapi/asm/mman.h
> @@ -49,6 +49,8 @@
>  #define MAP_STACK      0x40000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x80000         /* create a huge page mapping */
>
> +#define MAP_FIXED_SAFE 0x100000        /* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index cc9ba1d34779..c926487472fa 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,8 @@
>  #define MAP_STACK      0x40000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x80000         /* create a huge page mapping */
>
> +#define MAP_FIXED_SAFE 0x100000        /* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_SYNC                1               /* synchronous memory sync */
>  #define MS_ASYNC       2               /* sync memory asynchronously */
>  #define MS_INVALIDATE  4               /* invalidate the caches */
> diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> index 03c06ba7464f..d97342ca25b1 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -28,5 +28,6 @@
>  #define MAP_NONBLOCK   0x10000         /* do not block on IO */
>  #define MAP_STACK      0x20000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x40000         /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x800000        /* MAP_FIXED which doesn't unmap underlying mapping */
>
>  #endif /* _UAPI_ASM_POWERPC_MMAN_H */
> diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
> index 9765896ecb2c..7b00477a7f9a 100644
> --- a/arch/sparc/include/uapi/asm/mman.h
> +++ b/arch/sparc/include/uapi/asm/mman.h
> @@ -23,6 +23,7 @@
>  #define MAP_NONBLOCK   0x10000         /* do not block on IO */
>  #define MAP_STACK      0x20000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x40000         /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x80000         /* MAP_FIXED which doesn't unmap underlying mapping */
>
>
>  #endif /* _UAPI__SPARC_MMAN_H__ */
> diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
> index 63ee13faf17d..d5d58d2dc95e 100644
> --- a/arch/tile/include/uapi/asm/mman.h
> +++ b/arch/tile/include/uapi/asm/mman.h
> @@ -29,6 +29,7 @@
>  #define MAP_DENYWRITE  0x0800          /* ETXTBSY */
>  #define MAP_EXECUTABLE 0x1000          /* mark it as an executable */
>  #define MAP_HUGETLB    0x4000          /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x8000          /* MAP_FIXED which doesn't unmap underlying mapping */
>
>
>  /*
> diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
> index b15b278aa314..d665bd8b7cbd 100644
> --- a/arch/xtensa/include/uapi/asm/mman.h
> +++ b/arch/xtensa/include/uapi/asm/mman.h
> @@ -55,6 +55,7 @@
>  #define MAP_NONBLOCK   0x20000         /* do not block on IO */
>  #define MAP_STACK      0x40000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x80000         /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x100000        /* MAP_FIXED which doesn't unmap underlying mapping */
>  #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
>  # define MAP_UNINITIALIZED 0x4000000   /* For anonymous mmap, memory could be
>                                          * uninitialized */
> @@ -62,6 +63,7 @@
>  # define MAP_UNINITIALIZED 0x0         /* Don't support this flag */
>  #endif
>
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
> index 7162cd4cca73..64c46047fbd3 100644
> --- a/include/uapi/asm-generic/mman.h
> +++ b/include/uapi/asm-generic/mman.h
> @@ -12,6 +12,7 @@
>  #define MAP_NONBLOCK   0x10000         /* do not block on IO */
>  #define MAP_STACK      0x20000         /* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB    0x40000         /* create a huge page mapping */
> +#define MAP_FIXED_SAFE 0x80000         /* MAP_FIXED which doesn't unmap underlying mapping */
>
>  /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 680506faceae..89af0b5839a5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>                 if (!(file && path_noexec(&file->f_path)))
>                         prot |= PROT_EXEC;
>
> +       /* force arch specific MAP_FIXED handling in get_unmapped_area */
> +       if (flags & MAP_FIXED_SAFE)
> +               flags |= MAP_FIXED;
> +
>         if (!(flags & MAP_FIXED))
>                 addr = round_hint_to_min(addr);
>
> @@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>         if (offset_in_page(addr))
>                 return addr;
>
> +       if (flags & MAP_FIXED_SAFE) {
> +               struct vm_area_struct *vma = find_vma(mm, addr);
> +
> +               if (vma && vma->vm_start <= addr)
> +                       return -ENOMEM;
> +       }
> +
>         if (prot == PROT_EXEC) {
>                 pkey = execute_only_pkey(mm);
>                 if (pkey < 0)
> --
> 2.15.0
>



-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-11-16 10:19   ` Michal Hocko
@ 2017-11-17  0:30     ` Kees Cook
  -1 siblings, 0 replies; 76+ messages in thread
From: Kees Cook @ 2017-11-17  0:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML,
	linux-arch, Michal Hocko, Abdul Haleem, Joel Stanley

On Thu, Nov 16, 2017 at 2:19 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Both load_elf_interp and load_elf_binary rely on elf_map to map segments
> on a controlled address and they use MAP_FIXED to enforce that. This is
> however dangerous thing prone to silent data corruption which can be
> even exploitable. Let's take CVE-2017-1000253 as an example. At the time
> (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
> ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
> the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
> we could end up mapping over the existing stack with some luck.
>
> The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
> fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
> further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
> revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
> stack consumption early during execve fully stopped by da029c11e6b1
> ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
> safe and any attack should be impractical. On the other hand this is
> just too subtle assumption so it can break quite easily and hard to
> spot.
>
> I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
> fundamentally dangerous. Moreover it shouldn't be even needed. We are
> at the early process stage and so there shouldn't be unrelated mappings
> (except for stack and loader) existing so mmap for a given address
> should succeed even without MAP_FIXED. Something is terribly wrong if
> this is not the case and we should rather fail than silently corrupt the
> underlying mapping.
>
> Address this issue by changing MAP_FIXED to the newly added
> MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
> existing mapping clashing with the requested one without clobbering it.
>
> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Cc: Joel Stanley <joel@jms.id.au>
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Once (if?) the name gets settled, this looks good to me:

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/metag/kernel/process.c |  6 +++++-
>  fs/binfmt_elf.c             | 12 ++++++++----
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
> index c4606ce743d2..2286140e54e0 100644
> --- a/arch/metag/kernel/process.c
> +++ b/arch/metag/kernel/process.c
> @@ -398,7 +398,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
>         tcm_tag = tcm_lookup_tag(addr);
>
>         if (tcm_tag != TCM_INVALID_TAG)
> -               type &= ~MAP_FIXED;
> +               type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
>
>         /*
>         * total_size is the size of the ELF (interpreter) image.
> @@ -416,6 +416,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
>         } else
>                 map_addr = vm_mmap(filep, addr, size, prot, type, off);
>
> +       if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
> +               pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +                               task_pid_nr(current), tsk->comm, (void*)addr);
> +
>         if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
>                 struct tcm_allocation *tcm;
>                 unsigned long tcm_addr;
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 6466153f2bf0..12b21942ccde 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
>         } else
>                 map_addr = vm_mmap(filep, addr, size, prot, type, off);
>
> +       if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
> +               pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +                               task_pid_nr(current), current->comm, (void*)addr);
> +
>         return(map_addr);
>  }
>
> @@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
>                                 elf_prot |= PROT_EXEC;
>                         vaddr = eppnt->p_vaddr;
>                         if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
> -                               elf_type |= MAP_FIXED;
> +                               elf_type |= MAP_FIXED_SAFE;
>                         else if (no_base && interp_elf_ex->e_type == ET_DYN)
>                                 load_addr = -vaddr;
>
> @@ -929,7 +933,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                  * the ET_DYN load_addr calculations, proceed normally.
>                  */
>                 if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
> -                       elf_flags |= MAP_FIXED;
> +                       elf_flags |= MAP_FIXED_SAFE;
>                 } else if (loc->elf_ex.e_type == ET_DYN) {
>                         /*
>                          * This logic is run once for the first LOAD Program
> @@ -965,7 +969,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                                 load_bias = ELF_ET_DYN_BASE;
>                                 if (current->flags & PF_RANDOMIZE)
>                                         load_bias += arch_mmap_rnd();
> -                               elf_flags |= MAP_FIXED;
> +                               elf_flags |= MAP_FIXED_SAFE;
>                         } else
>                                 load_bias = 0;
>
> @@ -1220,7 +1224,7 @@ static int load_elf_library(struct file *file)
>                         (eppnt->p_filesz +
>                          ELF_PAGEOFFSET(eppnt->p_vaddr)),
>                         PROT_READ | PROT_WRITE | PROT_EXEC,
> -                       MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
> +                       MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
>                         (eppnt->p_offset -
>                          ELF_PAGEOFFSET(eppnt->p_vaddr)));
>         if (error != ELF_PAGESTART(eppnt->p_vaddr))
> --
> 2.15.0
>



-- 
Kees Cook
Pixel Security

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-11-17  0:30     ` Kees Cook
  0 siblings, 0 replies; 76+ messages in thread
From: Kees Cook @ 2017-11-17  0:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML,
	linux-arch, Michal Hocko, Abdul Haleem, Joel Stanley

On Thu, Nov 16, 2017 at 2:19 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Both load_elf_interp and load_elf_binary rely on elf_map to map segments
> on a controlled address and they use MAP_FIXED to enforce that. This is
> however dangerous thing prone to silent data corruption which can be
> even exploitable. Let's take CVE-2017-1000253 as an example. At the time
> (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
> ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
> the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
> we could end up mapping over the existing stack with some luck.
>
> The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
> fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
> further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
> revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
> stack consumption early during execve fully stopped by da029c11e6b1
> ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
> safe and any attack should be impractical. On the other hand this is
> just too subtle assumption so it can break quite easily and hard to
> spot.
>
> I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
> fundamentally dangerous. Moreover it shouldn't be even needed. We are
> at the early process stage and so there shouldn't be unrelated mappings
> (except for stack and loader) existing so mmap for a given address
> should succeed even without MAP_FIXED. Something is terribly wrong if
> this is not the case and we should rather fail than silently corrupt the
> underlying mapping.
>
> Address this issue by changing MAP_FIXED to the newly added
> MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
> existing mapping clashing with the requested one without clobbering it.
>
> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Cc: Joel Stanley <joel@jms.id.au>
> Cc: Kees Cook <keescook@chromium.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>

Once (if?) the name gets settled, this looks good to me:

Acked-by: Kees Cook <keescook@chromium.org>

-Kees

> ---
>  arch/metag/kernel/process.c |  6 +++++-
>  fs/binfmt_elf.c             | 12 ++++++++----
>  2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
> index c4606ce743d2..2286140e54e0 100644
> --- a/arch/metag/kernel/process.c
> +++ b/arch/metag/kernel/process.c
> @@ -398,7 +398,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
>         tcm_tag = tcm_lookup_tag(addr);
>
>         if (tcm_tag != TCM_INVALID_TAG)
> -               type &= ~MAP_FIXED;
> +               type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
>
>         /*
>         * total_size is the size of the ELF (interpreter) image.
> @@ -416,6 +416,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
>         } else
>                 map_addr = vm_mmap(filep, addr, size, prot, type, off);
>
> +       if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
> +               pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +                               task_pid_nr(current), tsk->comm, (void*)addr);
> +
>         if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
>                 struct tcm_allocation *tcm;
>                 unsigned long tcm_addr;
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 6466153f2bf0..12b21942ccde 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
>         } else
>                 map_addr = vm_mmap(filep, addr, size, prot, type, off);
>
> +       if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
> +               pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
> +                               task_pid_nr(current), current->comm, (void*)addr);
> +
>         return(map_addr);
>  }
>
> @@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
>                                 elf_prot |= PROT_EXEC;
>                         vaddr = eppnt->p_vaddr;
>                         if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
> -                               elf_type |= MAP_FIXED;
> +                               elf_type |= MAP_FIXED_SAFE;
>                         else if (no_base && interp_elf_ex->e_type == ET_DYN)
>                                 load_addr = -vaddr;
>
> @@ -929,7 +933,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                  * the ET_DYN load_addr calculations, proceed normally.
>                  */
>                 if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
> -                       elf_flags |= MAP_FIXED;
> +                       elf_flags |= MAP_FIXED_SAFE;
>                 } else if (loc->elf_ex.e_type == ET_DYN) {
>                         /*
>                          * This logic is run once for the first LOAD Program
> @@ -965,7 +969,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
>                                 load_bias = ELF_ET_DYN_BASE;
>                                 if (current->flags & PF_RANDOMIZE)
>                                         load_bias += arch_mmap_rnd();
> -                               elf_flags |= MAP_FIXED;
> +                               elf_flags |= MAP_FIXED_SAFE;
>                         } else
>                                 load_bias = 0;
>
> @@ -1220,7 +1224,7 @@ static int load_elf_library(struct file *file)
>                         (eppnt->p_filesz +
>                          ELF_PAGEOFFSET(eppnt->p_vaddr)),
>                         PROT_READ | PROT_WRITE | PROT_EXEC,
> -                       MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
> +                       MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
>                         (eppnt->p_offset -
>                          ELF_PAGEOFFSET(eppnt->p_vaddr)));
>         if (error != ELF_PAGESTART(eppnt->p_vaddr))
> --
> 2.15.0
>



-- 
Kees Cook
Pixel Security

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-16 10:18   ` Michal Hocko
@ 2017-11-17  7:30     ` Florian Weimer
  -1 siblings, 0 replies; 76+ messages in thread
From: Florian Weimer @ 2017-11-17  7:30 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko

On 11/16/2017 11:18 AM, Michal Hocko wrote:
> +	if (flags & MAP_FIXED_SAFE) {
> +		struct vm_area_struct *vma = find_vma(mm, addr);
> +
> +		if (vma && vma->vm_start <= addr)
> +			return -ENOMEM;
> +	}

Could you pick a different error code which cannot also be caused by a 
an unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?

This would definitely help with application-based randomization of 
mappings, and there, actual ENOMEM and this error would have to be 
handled differently.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  7:30     ` Florian Weimer
  0 siblings, 0 replies; 76+ messages in thread
From: Florian Weimer @ 2017-11-17  7:30 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko

On 11/16/2017 11:18 AM, Michal Hocko wrote:
> +	if (flags & MAP_FIXED_SAFE) {
> +		struct vm_area_struct *vma = find_vma(mm, addr);
> +
> +		if (vma && vma->vm_start <= addr)
> +			return -ENOMEM;
> +	}

Could you pick a different error code which cannot also be caused by a 
an unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?

This would definitely help with application-based randomization of 
mappings, and there, actual ENOMEM and this error would have to be 
handled differently.

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-16 10:18   ` Michal Hocko
  (?)
@ 2017-11-17  8:37     ` John Hubbard
  -1 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-17  8:37 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko

On 11/16/2017 02:18 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> MAP_FIXED is used quite often to enforce mapping at the particular
> range. The main problem of this flag is, however, that it is inherently
> dangerous because it unmaps existing mappings covered by the requested
> range. This can cause silent memory corruptions. Some of them even with
> serious security implications. While the current semantic might be
> really desiderable in many cases there are others which would want to
> enforce the given range but rather see a failure than a silent memory
> corruption on a clashing range. Please note that there is no guarantee
> that a given range is obeyed by the mmap even when it is free - e.g.
> arch specific code is allowed to apply an alignment.
> 
> Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> It has the same semantic as MAP_FIXED wrt. the given address request
> with a single exception that it fails with ENOMEM if the requested
> address is already covered by an existing mapping. We still do rely on
> get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> check for a conflicting vma after it returns.
> 
> [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>  arch/tile/include/uapi/asm/mman.h    |  1 +
>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>  include/uapi/asm-generic/mman.h      |  1 +
>  mm/mmap.c                            | 11 +++++++++++
>  9 files changed, 23 insertions(+)
> 
> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
> index 3b26cc62dadb..0e5724e4b4ad 100644
> --- a/arch/alpha/include/uapi/asm/mman.h
> +++ b/arch/alpha/include/uapi/asm/mman.h
> @@ -31,6 +31,8 @@
>  #define MAP_STACK	0x80000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x100000	/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x200000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_ASYNC	1		/* sync memory asynchronously */
>  #define MS_SYNC		2		/* synchronous memory sync */
>  #define MS_INVALIDATE	4		/* invalidate the caches */
> diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
> index da3216007fe0..fc5e61ef9fd4 100644
> --- a/arch/mips/include/uapi/asm/mman.h
> +++ b/arch/mips/include/uapi/asm/mman.h
> @@ -49,6 +49,8 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index cc9ba1d34779..c926487472fa 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,8 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_SYNC		1		/* synchronous memory sync */
>  #define MS_ASYNC	2		/* sync memory asynchronously */
>  #define MS_INVALIDATE	4		/* invalidate the caches */
> diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> index 03c06ba7464f..d97342ca25b1 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -28,5 +28,6 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x800000	/* MAP_FIXED which doesn't unmap underlying mapping */
 

Hi Michal,

1. The powerpc change, above, has one too many zeroes. It should be 0x80000, 
not 0x800000.

2. For the one-line comments, if you phrase them like this:

/* Like MAP_FIXED, except that it doesn't unmap pre-existing mappings */

...I think that would be a little clearer.

more below:

>  #endif /* _UAPI_ASM_POWERPC_MMAN_H */
> diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
> index 9765896ecb2c..7b00477a7f9a 100644
> --- a/arch/sparc/include/uapi/asm/mman.h
> +++ b/arch/sparc/include/uapi/asm/mman.h
> @@ -23,6 +23,7 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  
>  #endif /* _UAPI__SPARC_MMAN_H__ */
> diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
> index 63ee13faf17d..d5d58d2dc95e 100644
> --- a/arch/tile/include/uapi/asm/mman.h
> +++ b/arch/tile/include/uapi/asm/mman.h
> @@ -29,6 +29,7 @@
>  #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
>  #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
>  #define MAP_HUGETLB	0x4000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x8000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  
>  /*
> diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
> index b15b278aa314..d665bd8b7cbd 100644
> --- a/arch/xtensa/include/uapi/asm/mman.h
> +++ b/arch/xtensa/include/uapi/asm/mman.h
> @@ -55,6 +55,7 @@
>  #define MAP_NONBLOCK	0x20000		/* do not block on IO */
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
>  #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
>  # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> @@ -62,6 +63,7 @@
>  # define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
>  #endif
>  
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
> index 7162cd4cca73..64c46047fbd3 100644
> --- a/include/uapi/asm-generic/mman.h
> +++ b/include/uapi/asm-generic/mman.h
> @@ -12,6 +12,7 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 680506faceae..89af0b5839a5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  		if (!(file && path_noexec(&file->f_path)))
>  			prot |= PROT_EXEC;
>  
> +	/* force arch specific MAP_FIXED handling in get_unmapped_area */
> +	if (flags & MAP_FIXED_SAFE)
> +		flags |= MAP_FIXED;

Hooking in at this point is a nice way to solve the problem. :)

For the naming and implementation, I see a couple of things that might improve
it slightly:

a) Change MAP_FIXED_SAFE to MAP_NO_CLOBBER (as per Kees' idea), but keep the
new flag independent, by omitting the above two lines. Instead of forcing
MAP_FIXED as a result of the new flag, you could simply fail to take any action
on MAP_NO_CLOBBER *unless* MAP_FIXED is set.

This is a bit easier to explain and reason about, as compared to a flag that
auto-sets another flag. I like this approach best.

  or

b) Change MAP_FIXED_SAFE to MAP_FIXED_NO_CLOBBER (also a variation on Kees' name
idea, but a little longer, a bit uglier, and clearer), and leave the implementation
the same.

thanks,
John Hubbard

> +
>  	if (!(flags & MAP_FIXED))
>  		addr = round_hint_to_min(addr);
>  
> @@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  	if (offset_in_page(addr))
>  		return addr;
>  
> +	if (flags & MAP_FIXED_SAFE) {
> +		struct vm_area_struct *vma = find_vma(mm, addr);
> +
> +		if (vma && vma->vm_start <= addr)
> +			return -ENOMEM;
> +	}
> +
>  	if (prot == PROT_EXEC) {
>  		pkey = execute_only_pkey(mm);
>  		if (pkey < 0)
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  8:37     ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-17  8:37 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko

On 11/16/2017 02:18 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> MAP_FIXED is used quite often to enforce mapping at the particular
> range. The main problem of this flag is, however, that it is inherently
> dangerous because it unmaps existing mappings covered by the requested
> range. This can cause silent memory corruptions. Some of them even with
> serious security implications. While the current semantic might be
> really desiderable in many cases there are others which would want to
> enforce the given range but rather see a failure than a silent memory
> corruption on a clashing range. Please note that there is no guarantee
> that a given range is obeyed by the mmap even when it is free - e.g.
> arch specific code is allowed to apply an alignment.
> 
> Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> It has the same semantic as MAP_FIXED wrt. the given address request
> with a single exception that it fails with ENOMEM if the requested
> address is already covered by an existing mapping. We still do rely on
> get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> check for a conflicting vma after it returns.
> 
> [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>  arch/tile/include/uapi/asm/mman.h    |  1 +
>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>  include/uapi/asm-generic/mman.h      |  1 +
>  mm/mmap.c                            | 11 +++++++++++
>  9 files changed, 23 insertions(+)
> 
> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
> index 3b26cc62dadb..0e5724e4b4ad 100644
> --- a/arch/alpha/include/uapi/asm/mman.h
> +++ b/arch/alpha/include/uapi/asm/mman.h
> @@ -31,6 +31,8 @@
>  #define MAP_STACK	0x80000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x100000	/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x200000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_ASYNC	1		/* sync memory asynchronously */
>  #define MS_SYNC		2		/* synchronous memory sync */
>  #define MS_INVALIDATE	4		/* invalidate the caches */
> diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
> index da3216007fe0..fc5e61ef9fd4 100644
> --- a/arch/mips/include/uapi/asm/mman.h
> +++ b/arch/mips/include/uapi/asm/mman.h
> @@ -49,6 +49,8 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index cc9ba1d34779..c926487472fa 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,8 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_SYNC		1		/* synchronous memory sync */
>  #define MS_ASYNC	2		/* sync memory asynchronously */
>  #define MS_INVALIDATE	4		/* invalidate the caches */
> diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> index 03c06ba7464f..d97342ca25b1 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -28,5 +28,6 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x800000	/* MAP_FIXED which doesn't unmap underlying mapping */
 

Hi Michal,

1. The powerpc change, above, has one too many zeroes. It should be 0x80000, 
not 0x800000.

2. For the one-line comments, if you phrase them like this:

/* Like MAP_FIXED, except that it doesn't unmap pre-existing mappings */

...I think that would be a little clearer.

more below:

>  #endif /* _UAPI_ASM_POWERPC_MMAN_H */
> diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
> index 9765896ecb2c..7b00477a7f9a 100644
> --- a/arch/sparc/include/uapi/asm/mman.h
> +++ b/arch/sparc/include/uapi/asm/mman.h
> @@ -23,6 +23,7 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  
>  #endif /* _UAPI__SPARC_MMAN_H__ */
> diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
> index 63ee13faf17d..d5d58d2dc95e 100644
> --- a/arch/tile/include/uapi/asm/mman.h
> +++ b/arch/tile/include/uapi/asm/mman.h
> @@ -29,6 +29,7 @@
>  #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
>  #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
>  #define MAP_HUGETLB	0x4000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x8000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  
>  /*
> diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
> index b15b278aa314..d665bd8b7cbd 100644
> --- a/arch/xtensa/include/uapi/asm/mman.h
> +++ b/arch/xtensa/include/uapi/asm/mman.h
> @@ -55,6 +55,7 @@
>  #define MAP_NONBLOCK	0x20000		/* do not block on IO */
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
>  #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
>  # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> @@ -62,6 +63,7 @@
>  # define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
>  #endif
>  
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
> index 7162cd4cca73..64c46047fbd3 100644
> --- a/include/uapi/asm-generic/mman.h
> +++ b/include/uapi/asm-generic/mman.h
> @@ -12,6 +12,7 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 680506faceae..89af0b5839a5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  		if (!(file && path_noexec(&file->f_path)))
>  			prot |= PROT_EXEC;
>  
> +	/* force arch specific MAP_FIXED handling in get_unmapped_area */
> +	if (flags & MAP_FIXED_SAFE)
> +		flags |= MAP_FIXED;

Hooking in at this point is a nice way to solve the problem. :)

For the naming and implementation, I see a couple of things that might improve
it slightly:

a) Change MAP_FIXED_SAFE to MAP_NO_CLOBBER (as per Kees' idea), but keep the
new flag independent, by omitting the above two lines. Instead of forcing
MAP_FIXED as a result of the new flag, you could simply fail to take any action
on MAP_NO_CLOBBER *unless* MAP_FIXED is set.

This is a bit easier to explain and reason about, as compared to a flag that
auto-sets another flag. I like this approach best.

  or

b) Change MAP_FIXED_SAFE to MAP_FIXED_NO_CLOBBER (also a variation on Kees' name
idea, but a little longer, a bit uglier, and clearer), and leave the implementation
the same.

thanks,
John Hubbard

> +
>  	if (!(flags & MAP_FIXED))
>  		addr = round_hint_to_min(addr);
>  
> @@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  	if (offset_in_page(addr))
>  		return addr;
>  
> +	if (flags & MAP_FIXED_SAFE) {
> +		struct vm_area_struct *vma = find_vma(mm, addr);
> +
> +		if (vma && vma->vm_start <= addr)
> +			return -ENOMEM;
> +	}
> +
>  	if (prot == PROT_EXEC) {
>  		pkey = execute_only_pkey(mm);
>  		if (pkey < 0)
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  8:37     ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-17  8:37 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Michal Hocko

On 11/16/2017 02:18 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> MAP_FIXED is used quite often to enforce mapping at the particular
> range. The main problem of this flag is, however, that it is inherently
> dangerous because it unmaps existing mappings covered by the requested
> range. This can cause silent memory corruptions. Some of them even with
> serious security implications. While the current semantic might be
> really desiderable in many cases there are others which would want to
> enforce the given range but rather see a failure than a silent memory
> corruption on a clashing range. Please note that there is no guarantee
> that a given range is obeyed by the mmap even when it is free - e.g.
> arch specific code is allowed to apply an alignment.
> 
> Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> It has the same semantic as MAP_FIXED wrt. the given address request
> with a single exception that it fails with ENOMEM if the requested
> address is already covered by an existing mapping. We still do rely on
> get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> check for a conflicting vma after it returns.
> 
> [set MAP_FIXED before round_hint_to_min as per Khalid Aziz]
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>  arch/tile/include/uapi/asm/mman.h    |  1 +
>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>  include/uapi/asm-generic/mman.h      |  1 +
>  mm/mmap.c                            | 11 +++++++++++
>  9 files changed, 23 insertions(+)
> 
> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h
> index 3b26cc62dadb..0e5724e4b4ad 100644
> --- a/arch/alpha/include/uapi/asm/mman.h
> +++ b/arch/alpha/include/uapi/asm/mman.h
> @@ -31,6 +31,8 @@
>  #define MAP_STACK	0x80000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x100000	/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x200000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_ASYNC	1		/* sync memory asynchronously */
>  #define MS_SYNC		2		/* synchronous memory sync */
>  #define MS_INVALIDATE	4		/* invalidate the caches */
> diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h
> index da3216007fe0..fc5e61ef9fd4 100644
> --- a/arch/mips/include/uapi/asm/mman.h
> +++ b/arch/mips/include/uapi/asm/mman.h
> @@ -49,6 +49,8 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h
> index cc9ba1d34779..c926487472fa 100644
> --- a/arch/parisc/include/uapi/asm/mman.h
> +++ b/arch/parisc/include/uapi/asm/mman.h
> @@ -25,6 +25,8 @@
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
>  
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
> +
>  #define MS_SYNC		1		/* synchronous memory sync */
>  #define MS_ASYNC	2		/* sync memory asynchronously */
>  #define MS_INVALIDATE	4		/* invalidate the caches */
> diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> index 03c06ba7464f..d97342ca25b1 100644
> --- a/arch/powerpc/include/uapi/asm/mman.h
> +++ b/arch/powerpc/include/uapi/asm/mman.h
> @@ -28,5 +28,6 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x800000	/* MAP_FIXED which doesn't unmap underlying mapping */
 

Hi Michal,

1. The powerpc change, above, has one too many zeroes. It should be 0x80000, 
not 0x800000.

2. For the one-line comments, if you phrase them like this:

/* Like MAP_FIXED, except that it doesn't unmap pre-existing mappings */

...I think that would be a little clearer.

more below:

>  #endif /* _UAPI_ASM_POWERPC_MMAN_H */
> diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h
> index 9765896ecb2c..7b00477a7f9a 100644
> --- a/arch/sparc/include/uapi/asm/mman.h
> +++ b/arch/sparc/include/uapi/asm/mman.h
> @@ -23,6 +23,7 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  
>  #endif /* _UAPI__SPARC_MMAN_H__ */
> diff --git a/arch/tile/include/uapi/asm/mman.h b/arch/tile/include/uapi/asm/mman.h
> index 63ee13faf17d..d5d58d2dc95e 100644
> --- a/arch/tile/include/uapi/asm/mman.h
> +++ b/arch/tile/include/uapi/asm/mman.h
> @@ -29,6 +29,7 @@
>  #define MAP_DENYWRITE	0x0800		/* ETXTBSY */
>  #define MAP_EXECUTABLE	0x1000		/* mark it as an executable */
>  #define MAP_HUGETLB	0x4000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x8000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  
>  /*
> diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h
> index b15b278aa314..d665bd8b7cbd 100644
> --- a/arch/xtensa/include/uapi/asm/mman.h
> +++ b/arch/xtensa/include/uapi/asm/mman.h
> @@ -55,6 +55,7 @@
>  #define MAP_NONBLOCK	0x20000		/* do not block on IO */
>  #define MAP_STACK	0x40000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x80000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x100000	/* MAP_FIXED which doesn't unmap underlying mapping */
>  #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED
>  # define MAP_UNINITIALIZED 0x4000000	/* For anonymous mmap, memory could be
>  					 * uninitialized */
> @@ -62,6 +63,7 @@
>  # define MAP_UNINITIALIZED 0x0		/* Don't support this flag */
>  #endif
>  
> +
>  /*
>   * Flags for msync
>   */
> diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h
> index 7162cd4cca73..64c46047fbd3 100644
> --- a/include/uapi/asm-generic/mman.h
> +++ b/include/uapi/asm-generic/mman.h
> @@ -12,6 +12,7 @@
>  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
>  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
>  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> +#define MAP_FIXED_SAFE	0x80000		/* MAP_FIXED which doesn't unmap underlying mapping */
>  
>  /* Bits [26:31] are reserved, see mman-common.h for MAP_HUGETLB usage */
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 680506faceae..89af0b5839a5 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  		if (!(file && path_noexec(&file->f_path)))
>  			prot |= PROT_EXEC;
>  
> +	/* force arch specific MAP_FIXED handling in get_unmapped_area */
> +	if (flags & MAP_FIXED_SAFE)
> +		flags |= MAP_FIXED;

Hooking in at this point is a nice way to solve the problem. :)

For the naming and implementation, I see a couple of things that might improve
it slightly:

a) Change MAP_FIXED_SAFE to MAP_NO_CLOBBER (as per Kees' idea), but keep the
new flag independent, by omitting the above two lines. Instead of forcing
MAP_FIXED as a result of the new flag, you could simply fail to take any action
on MAP_NO_CLOBBER *unless* MAP_FIXED is set.

This is a bit easier to explain and reason about, as compared to a flag that
auto-sets another flag. I like this approach best.

  or

b) Change MAP_FIXED_SAFE to MAP_FIXED_NO_CLOBBER (also a variation on Kees' name
idea, but a little longer, a bit uglier, and clearer), and leave the implementation
the same.

thanks,
John Hubbard

> +
>  	if (!(flags & MAP_FIXED))
>  		addr = round_hint_to_min(addr);
>  
> @@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
>  	if (offset_in_page(addr))
>  		return addr;
>  
> +	if (flags & MAP_FIXED_SAFE) {
> +		struct vm_area_struct *vma = find_vma(mm, addr);
> +
> +		if (vma && vma->vm_start <= addr)
> +			return -ENOMEM;
> +	}
> +
>  	if (prot == PROT_EXEC) {
>  		pkey = execute_only_pkey(mm);
>  		if (pkey < 0)
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  8:45     ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-17  8:45 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On 11/16/2017 04:14 AM, Michal Hocko wrote:
> [Ups, managed to screw the subject - fix it]
> 
> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
>> Hi,
>> this has started as a follow up discussion [1][2] resulting in the
>> runtime failure caused by hardening patch [3] which removes MAP_FIXED
>> from the elf loader because MAP_FIXED is inherently dangerous as it
>> might silently clobber and existing underlying mapping (e.g. stack). The
>> reason for the failure is that some architectures enforce an alignment
>> for the given address hint without MAP_FIXED used (e.g. for shared or
>> file backed mappings).
>>
>> One way around this would be excluding those archs which do alignment
>> tricks from the hardening [4]. The patch is really trivial but it has
>> been objected, rightfully so, that this screams for a more generic
>> solution. We basically want a non-destructive MAP_FIXED.
>>
>> The first patch introduced MAP_FIXED_SAFE which enforces the given
>> address but unlike MAP_FIXED it fails with ENOMEM if the given range
>> conflicts with an existing one. The flag is introduced as a completely
>> new flag rather than a MAP_FIXED extension because of the backward
>> compatibility. We really want a never-clobber semantic even on older
>> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
>> flags evaluation because we do not EINVAL on unknown flags. On those
>> kernels we would simply use the traditional hint based semantic so the
>> caller can still get a different address (which sucks) but at least not
>> silently corrupt an existing mapping. I do not see a good way around
>> that. Except we won't export expose the new semantic to the userspace at
>> all. It seems there are users who would like to have something like that
>> [5], though. Atomic address range probing in the multithreaded programs
>> sounds like an interesting thing to me as well, although I do not have
>> any specific usecase in mind.

Hi Michal,

>From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
(or whatever it ends up being named) *would* be passed through from
user space. When you say that "we won't export expose the new semantic 
to the userspace at all", do you mean that glibc won't add it? Or
is there something I'm missing, that prevents that flag from getting
from the syscall, to do_mmap()?

On the usage: there are cases in user space that could probably make
good use of a no-clobber hint to MAP_FIXED. The user space code
that surrounds HMM (speaking loosely there--it's really any user space
code that manages a unified memory address space, across devices)
often ends up using MAP_FIXED, but MAP_FIXED crams several features
into one flag: an exact address, an "atomic" switch to the new mapping,
and unmapping the old mappings. That's pretty overloaded, so being
able to split it up a bit, by removing one of those features, seems
useful.

thanks,
John Hubbard

>>
>> The second patch simply replaces MAP_FIXED use in elf loader by
>> MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
>> follow. Actually real MAP_FIXED usages should be docummented properly
>> and they should be more of an exception.
>>
>> Does anybody see any fundamental reasons why this is a wrong approach?
>>
>> Diffstat says
>>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>>  arch/metag/kernel/process.c          |  6 +++++-
>>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>>  arch/tile/include/uapi/asm/mman.h    |  1 +
>>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>>  fs/binfmt_elf.c                      | 12 ++++++++----
>>  include/uapi/asm-generic/mman.h      |  1 +
>>  mm/mmap.c                            | 11 +++++++++++
>>  11 files changed, 36 insertions(+), 5 deletions(-)
>>
>> [1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
>> [2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
>> [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
>> [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
>> [5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  8:45     ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-17  8:45 UTC (permalink / raw)
  To: Michal Hocko, linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, Abdul Haleem, Joel Stanley,
	Kees Cook

On 11/16/2017 04:14 AM, Michal Hocko wrote:
> [Ups, managed to screw the subject - fix it]
> 
> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
>> Hi,
>> this has started as a follow up discussion [1][2] resulting in the
>> runtime failure caused by hardening patch [3] which removes MAP_FIXED
>> from the elf loader because MAP_FIXED is inherently dangerous as it
>> might silently clobber and existing underlying mapping (e.g. stack). The
>> reason for the failure is that some architectures enforce an alignment
>> for the given address hint without MAP_FIXED used (e.g. for shared or
>> file backed mappings).
>>
>> One way around this would be excluding those archs which do alignment
>> tricks from the hardening [4]. The patch is really trivial but it has
>> been objected, rightfully so, that this screams for a more generic
>> solution. We basically want a non-destructive MAP_FIXED.
>>
>> The first patch introduced MAP_FIXED_SAFE which enforces the given
>> address but unlike MAP_FIXED it fails with ENOMEM if the given range
>> conflicts with an existing one. The flag is introduced as a completely
>> new flag rather than a MAP_FIXED extension because of the backward
>> compatibility. We really want a never-clobber semantic even on older
>> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
>> flags evaluation because we do not EINVAL on unknown flags. On those
>> kernels we would simply use the traditional hint based semantic so the
>> caller can still get a different address (which sucks) but at least not
>> silently corrupt an existing mapping. I do not see a good way around
>> that. Except we won't export expose the new semantic to the userspace at
>> all. It seems there are users who would like to have something like that
>> [5], though. Atomic address range probing in the multithreaded programs
>> sounds like an interesting thing to me as well, although I do not have
>> any specific usecase in mind.

Hi Michal,

>From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
(or whatever it ends up being named) *would* be passed through from
user space. When you say that "we won't export expose the new semantic 
to the userspace at all", do you mean that glibc won't add it? Or
is there something I'm missing, that prevents that flag from getting
from the syscall, to do_mmap()?

On the usage: there are cases in user space that could probably make
good use of a no-clobber hint to MAP_FIXED. The user space code
that surrounds HMM (speaking loosely there--it's really any user space
code that manages a unified memory address space, across devices)
often ends up using MAP_FIXED, but MAP_FIXED crams several features
into one flag: an exact address, an "atomic" switch to the new mapping,
and unmapping the old mappings. That's pretty overloaded, so being
able to split it up a bit, by removing one of those features, seems
useful.

thanks,
John Hubbard

>>
>> The second patch simply replaces MAP_FIXED use in elf loader by
>> MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
>> follow. Actually real MAP_FIXED usages should be docummented properly
>> and they should be more of an exception.
>>
>> Does anybody see any fundamental reasons why this is a wrong approach?
>>
>> Diffstat says
>>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>>  arch/metag/kernel/process.c          |  6 +++++-
>>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>>  arch/tile/include/uapi/asm/mman.h    |  1 +
>>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>>  fs/binfmt_elf.c                      | 12 ++++++++----
>>  include/uapi/asm-generic/mman.h      |  1 +
>>  mm/mmap.c                            | 11 +++++++++++
>>  11 files changed, 36 insertions(+), 5 deletions(-)
>>
>> [1] http://lkml.kernel.org/r/20171107162217.382cd754-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org
>> [2] http://lkml.kernel.org/r/1510048229.12079.7.camel-JKZ9t1WPFCv1ENwx4SLHqw@public.gmane.org
>> [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
>> [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org
>> [5] http://lkml.kernel.org/r/87efp1w7vy.fsf-W0DJWXSxmBNbyGPkN3NxC2scP1bn1w/D@public.gmane.org
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  8:45     ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-17  8:45 UTC (permalink / raw)
  To: Michal Hocko, linux-api-u79uwXL29TY76Z2rM5mHXA
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML,
	linux-arch-u79uwXL29TY76Z2rM5mHXA, Abdul Haleem, Joel Stanley,
	Kees Cook

On 11/16/2017 04:14 AM, Michal Hocko wrote:
> [Ups, managed to screw the subject - fix it]
> 
> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
>> Hi,
>> this has started as a follow up discussion [1][2] resulting in the
>> runtime failure caused by hardening patch [3] which removes MAP_FIXED
>> from the elf loader because MAP_FIXED is inherently dangerous as it
>> might silently clobber and existing underlying mapping (e.g. stack). The
>> reason for the failure is that some architectures enforce an alignment
>> for the given address hint without MAP_FIXED used (e.g. for shared or
>> file backed mappings).
>>
>> One way around this would be excluding those archs which do alignment
>> tricks from the hardening [4]. The patch is really trivial but it has
>> been objected, rightfully so, that this screams for a more generic
>> solution. We basically want a non-destructive MAP_FIXED.
>>
>> The first patch introduced MAP_FIXED_SAFE which enforces the given
>> address but unlike MAP_FIXED it fails with ENOMEM if the given range
>> conflicts with an existing one. The flag is introduced as a completely
>> new flag rather than a MAP_FIXED extension because of the backward
>> compatibility. We really want a never-clobber semantic even on older
>> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
>> flags evaluation because we do not EINVAL on unknown flags. On those
>> kernels we would simply use the traditional hint based semantic so the
>> caller can still get a different address (which sucks) but at least not
>> silently corrupt an existing mapping. I do not see a good way around
>> that. Except we won't export expose the new semantic to the userspace at
>> all. It seems there are users who would like to have something like that
>> [5], though. Atomic address range probing in the multithreaded programs
>> sounds like an interesting thing to me as well, although I do not have
>> any specific usecase in mind.

Hi Michal,

From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
(or whatever it ends up being named) *would* be passed through from
user space. When you say that "we won't export expose the new semantic 
to the userspace at all", do you mean that glibc won't add it? Or
is there something I'm missing, that prevents that flag from getting
from the syscall, to do_mmap()?

On the usage: there are cases in user space that could probably make
good use of a no-clobber hint to MAP_FIXED. The user space code
that surrounds HMM (speaking loosely there--it's really any user space
code that manages a unified memory address space, across devices)
often ends up using MAP_FIXED, but MAP_FIXED crams several features
into one flag: an exact address, an "atomic" switch to the new mapping,
and unmapping the old mappings. That's pretty overloaded, so being
able to split it up a bit, by removing one of those features, seems
useful.

thanks,
John Hubbard

>>
>> The second patch simply replaces MAP_FIXED use in elf loader by
>> MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
>> follow. Actually real MAP_FIXED usages should be docummented properly
>> and they should be more of an exception.
>>
>> Does anybody see any fundamental reasons why this is a wrong approach?
>>
>> Diffstat says
>>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>>  arch/metag/kernel/process.c          |  6 +++++-
>>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>>  arch/tile/include/uapi/asm/mman.h    |  1 +
>>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>>  fs/binfmt_elf.c                      | 12 ++++++++----
>>  include/uapi/asm-generic/mman.h      |  1 +
>>  mm/mmap.c                            | 11 +++++++++++
>>  11 files changed, 36 insertions(+), 5 deletions(-)
>>
>> [1] http://lkml.kernel.org/r/20171107162217.382cd754-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org
>> [2] http://lkml.kernel.org/r/1510048229.12079.7.camel-JKZ9t1WPFCv1ENwx4SLHqw@public.gmane.org
>> [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org
>> [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org
>> [5] http://lkml.kernel.org/r/87efp1w7vy.fsf-W0DJWXSxmBNbyGPkN3NxC2scP1bn1w/D@public.gmane.org
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo-Bw31MaZKKs0EbZ0PF+XxCw@public.gmane.org  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org"> email-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org </a>
> 

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  8:45     ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-17  8:45 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On 11/16/2017 04:14 AM, Michal Hocko wrote:
> [Ups, managed to screw the subject - fix it]
> 
> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
>> Hi,
>> this has started as a follow up discussion [1][2] resulting in the
>> runtime failure caused by hardening patch [3] which removes MAP_FIXED
>> from the elf loader because MAP_FIXED is inherently dangerous as it
>> might silently clobber and existing underlying mapping (e.g. stack). The
>> reason for the failure is that some architectures enforce an alignment
>> for the given address hint without MAP_FIXED used (e.g. for shared or
>> file backed mappings).
>>
>> One way around this would be excluding those archs which do alignment
>> tricks from the hardening [4]. The patch is really trivial but it has
>> been objected, rightfully so, that this screams for a more generic
>> solution. We basically want a non-destructive MAP_FIXED.
>>
>> The first patch introduced MAP_FIXED_SAFE which enforces the given
>> address but unlike MAP_FIXED it fails with ENOMEM if the given range
>> conflicts with an existing one. The flag is introduced as a completely
>> new flag rather than a MAP_FIXED extension because of the backward
>> compatibility. We really want a never-clobber semantic even on older
>> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
>> flags evaluation because we do not EINVAL on unknown flags. On those
>> kernels we would simply use the traditional hint based semantic so the
>> caller can still get a different address (which sucks) but at least not
>> silently corrupt an existing mapping. I do not see a good way around
>> that. Except we won't export expose the new semantic to the userspace at
>> all. It seems there are users who would like to have something like that
>> [5], though. Atomic address range probing in the multithreaded programs
>> sounds like an interesting thing to me as well, although I do not have
>> any specific usecase in mind.

Hi Michal,

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17  8:45     ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-17  8:45 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On 11/16/2017 04:14 AM, Michal Hocko wrote:
> [Ups, managed to screw the subject - fix it]
> 
> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
>> Hi,
>> this has started as a follow up discussion [1][2] resulting in the
>> runtime failure caused by hardening patch [3] which removes MAP_FIXED
>> from the elf loader because MAP_FIXED is inherently dangerous as it
>> might silently clobber and existing underlying mapping (e.g. stack). The
>> reason for the failure is that some architectures enforce an alignment
>> for the given address hint without MAP_FIXED used (e.g. for shared or
>> file backed mappings).
>>
>> One way around this would be excluding those archs which do alignment
>> tricks from the hardening [4]. The patch is really trivial but it has
>> been objected, rightfully so, that this screams for a more generic
>> solution. We basically want a non-destructive MAP_FIXED.
>>
>> The first patch introduced MAP_FIXED_SAFE which enforces the given
>> address but unlike MAP_FIXED it fails with ENOMEM if the given range
>> conflicts with an existing one. The flag is introduced as a completely
>> new flag rather than a MAP_FIXED extension because of the backward
>> compatibility. We really want a never-clobber semantic even on older
>> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
>> flags evaluation because we do not EINVAL on unknown flags. On those
>> kernels we would simply use the traditional hint based semantic so the
>> caller can still get a different address (which sucks) but at least not
>> silently corrupt an existing mapping. I do not see a good way around
>> that. Except we won't export expose the new semantic to the userspace at
>> all. It seems there are users who would like to have something like that
>> [5], though. Atomic address range probing in the multithreaded programs
>> sounds like an interesting thing to me as well, although I do not have
>> any specific usecase in mind.

Hi Michal,

>From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
(or whatever it ends up being named) *would* be passed through from
user space. When you say that "we won't export expose the new semantic 
to the userspace at all", do you mean that glibc won't add it? Or
is there something I'm missing, that prevents that flag from getting
from the syscall, to do_mmap()?

On the usage: there are cases in user space that could probably make
good use of a no-clobber hint to MAP_FIXED. The user space code
that surrounds HMM (speaking loosely there--it's really any user space
code that manages a unified memory address space, across devices)
often ends up using MAP_FIXED, but MAP_FIXED crams several features
into one flag: an exact address, an "atomic" switch to the new mapping,
and unmapping the old mappings. That's pretty overloaded, so being
able to split it up a bit, by removing one of those features, seems
useful.

thanks,
John Hubbard

>>
>> The second patch simply replaces MAP_FIXED use in elf loader by
>> MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
>> follow. Actually real MAP_FIXED usages should be docummented properly
>> and they should be more of an exception.
>>
>> Does anybody see any fundamental reasons why this is a wrong approach?
>>
>> Diffstat says
>>  arch/alpha/include/uapi/asm/mman.h   |  2 ++
>>  arch/metag/kernel/process.c          |  6 +++++-
>>  arch/mips/include/uapi/asm/mman.h    |  2 ++
>>  arch/parisc/include/uapi/asm/mman.h  |  2 ++
>>  arch/powerpc/include/uapi/asm/mman.h |  1 +
>>  arch/sparc/include/uapi/asm/mman.h   |  1 +
>>  arch/tile/include/uapi/asm/mman.h    |  1 +
>>  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
>>  fs/binfmt_elf.c                      | 12 ++++++++----
>>  include/uapi/asm-generic/mman.h      |  1 +
>>  mm/mmap.c                            | 11 +++++++++++
>>  11 files changed, 36 insertions(+), 5 deletions(-)
>>
>> [1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
>> [2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
>> [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
>> [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
>> [5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17 19:12       ` Matthew Wilcox
  0 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2017-11-17 19:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michal Hocko, Linux API, Khalid Aziz, Michael Ellerman,
	Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli,
	Linux-MM, LKML, linux-arch, Michal Hocko

On Thu, Nov 16, 2017 at 04:27:36PM -0800, Kees Cook wrote:
> On Thu, Nov 16, 2017 at 2:18 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> > MAP_FIXED is used quite often to enforce mapping at the particular
> > range. The main problem of this flag is, however, that it is inherently
> > dangerous because it unmaps existing mappings covered by the requested
> > range. This can cause silent memory corruptions. Some of them even with
> > serious security implications. While the current semantic might be
> > really desiderable in many cases there are others which would want to
> > enforce the given range but rather see a failure than a silent memory
> > corruption on a clashing range. Please note that there is no guarantee
> > that a given range is obeyed by the mmap even when it is free - e.g.
> > arch specific code is allowed to apply an alignment.
> >
> > Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> > It has the same semantic as MAP_FIXED wrt. the given address request
> > with a single exception that it fails with ENOMEM if the requested
> > address is already covered by an existing mapping. We still do rely on
> > get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> > check for a conflicting vma after it returns.
> 
> I like this much more than special-casing the ELF loader. It is an
> unusual property that MAP_FIXED does _two_ things, so I like having
> this split out.
> 
> Bikeshedding: maybe call this MAP_NO_CLOBBER? It's a modifier to
> MAP_FIXED, really...

Way back when, I proposed a new flag called MAP_FIXED_WEAK.  I was
dissuaded from it when userspace people said it was just as easy for
them to provide the address hint, then run fixups on their data if the
address they were assigned wasn't the one they asked for.

The real problem is that MAP_FIXED should have been called MAP_FORCE.

So ... do we really have users that want failure instead of success at
a different address?  And if so, is it really a hardship for them to
make a call to unmap on success-at-the-wrong-address?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17 19:12       ` Matthew Wilcox
  0 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2017-11-17 19:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michal Hocko, Linux API, Khalid Aziz, Michael Ellerman,
	Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli,
	Linux-MM, LKML, linux-arch, Michal Hocko

On Thu, Nov 16, 2017 at 04:27:36PM -0800, Kees Cook wrote:
> On Thu, Nov 16, 2017 at 2:18 AM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> > From: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
> >
> > MAP_FIXED is used quite often to enforce mapping at the particular
> > range. The main problem of this flag is, however, that it is inherently
> > dangerous because it unmaps existing mappings covered by the requested
> > range. This can cause silent memory corruptions. Some of them even with
> > serious security implications. While the current semantic might be
> > really desiderable in many cases there are others which would want to
> > enforce the given range but rather see a failure than a silent memory
> > corruption on a clashing range. Please note that there is no guarantee
> > that a given range is obeyed by the mmap even when it is free - e.g.
> > arch specific code is allowed to apply an alignment.
> >
> > Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> > It has the same semantic as MAP_FIXED wrt. the given address request
> > with a single exception that it fails with ENOMEM if the requested
> > address is already covered by an existing mapping. We still do rely on
> > get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> > check for a conflicting vma after it returns.
> 
> I like this much more than special-casing the ELF loader. It is an
> unusual property that MAP_FIXED does _two_ things, so I like having
> this split out.
> 
> Bikeshedding: maybe call this MAP_NO_CLOBBER? It's a modifier to
> MAP_FIXED, really...

Way back when, I proposed a new flag called MAP_FIXED_WEAK.  I was
dissuaded from it when userspace people said it was just as easy for
them to provide the address hint, then run fixups on their data if the
address they were assigned wasn't the one they asked for.

The real problem is that MAP_FIXED should have been called MAP_FORCE.

So ... do we really have users that want failure instead of success at
a different address?  And if so, is it really a hardship for them to
make a call to unmap on success-at-the-wrong-address?

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-17 19:12       ` Matthew Wilcox
  0 siblings, 0 replies; 76+ messages in thread
From: Matthew Wilcox @ 2017-11-17 19:12 UTC (permalink / raw)
  To: Kees Cook
  Cc: Michal Hocko, Linux API, Khalid Aziz, Michael Ellerman,
	Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli,
	Linux-MM, LKML, linux-arch, Michal Hocko

On Thu, Nov 16, 2017 at 04:27:36PM -0800, Kees Cook wrote:
> On Thu, Nov 16, 2017 at 2:18 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> > MAP_FIXED is used quite often to enforce mapping at the particular
> > range. The main problem of this flag is, however, that it is inherently
> > dangerous because it unmaps existing mappings covered by the requested
> > range. This can cause silent memory corruptions. Some of them even with
> > serious security implications. While the current semantic might be
> > really desiderable in many cases there are others which would want to
> > enforce the given range but rather see a failure than a silent memory
> > corruption on a clashing range. Please note that there is no guarantee
> > that a given range is obeyed by the mmap even when it is free - e.g.
> > arch specific code is allowed to apply an alignment.
> >
> > Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> > It has the same semantic as MAP_FIXED wrt. the given address request
> > with a single exception that it fails with ENOMEM if the requested
> > address is already covered by an existing mapping. We still do rely on
> > get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> > check for a conflicting vma after it returns.
> 
> I like this much more than special-casing the ELF loader. It is an
> unusual property that MAP_FIXED does _two_ things, so I like having
> this split out.
> 
> Bikeshedding: maybe call this MAP_NO_CLOBBER? It's a modifier to
> MAP_FIXED, really...

Way back when, I proposed a new flag called MAP_FIXED_WEAK.  I was
dissuaded from it when userspace people said it was just as easy for
them to provide the address hint, then run fixups on their data if the
address they were assigned wasn't the one they asked for.

The real problem is that MAP_FIXED should have been called MAP_FORCE.

So ... do we really have users that want failure instead of success at
a different address?  And if so, is it really a hardship for them to
make a call to unmap on success-at-the-wrong-address?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-17 19:12       ` Matthew Wilcox
@ 2017-11-20  8:43         ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  8:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, Linux API, Khalid Aziz, Michael Ellerman,
	Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli,
	Linux-MM, LKML, linux-arch

On Fri 17-11-17 11:12:51, Matthew Wilcox wrote:
> On Thu, Nov 16, 2017 at 04:27:36PM -0800, Kees Cook wrote:
> > On Thu, Nov 16, 2017 at 2:18 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > >
> > > MAP_FIXED is used quite often to enforce mapping at the particular
> > > range. The main problem of this flag is, however, that it is inherently
> > > dangerous because it unmaps existing mappings covered by the requested
> > > range. This can cause silent memory corruptions. Some of them even with
> > > serious security implications. While the current semantic might be
> > > really desiderable in many cases there are others which would want to
> > > enforce the given range but rather see a failure than a silent memory
> > > corruption on a clashing range. Please note that there is no guarantee
> > > that a given range is obeyed by the mmap even when it is free - e.g.
> > > arch specific code is allowed to apply an alignment.
> > >
> > > Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> > > It has the same semantic as MAP_FIXED wrt. the given address request
> > > with a single exception that it fails with ENOMEM if the requested
> > > address is already covered by an existing mapping. We still do rely on
> > > get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> > > check for a conflicting vma after it returns.
> > 
> > I like this much more than special-casing the ELF loader. It is an
> > unusual property that MAP_FIXED does _two_ things, so I like having
> > this split out.
> > 
> > Bikeshedding: maybe call this MAP_NO_CLOBBER? It's a modifier to
> > MAP_FIXED, really...

Unfortunatelly doing this as an extension wouldn't work due to backward
compatibility issues. See [1]
 
> Way back when, I proposed a new flag called MAP_FIXED_WEAK.  I was
> dissuaded from it when userspace people said it was just as easy for
> them to provide the address hint, then run fixups on their data if the
> address they were assigned wasn't the one they asked for.
> 
> The real problem is that MAP_FIXED should have been called MAP_FORCE.
> 
> So ... do we really have users that want failure instead of success at
> a different address?

I am not really sure but Michael has pointed out to jemalloc [2] which
could probably use it.

> And if so, is it really a hardship for them to
> make a call to unmap on success-at-the-wrong-address?

How do you do something like that safely in a multithreaded environment?
You do not have any safe way to do atomic probe of a memory range.

As I've said, I do not insist on exporting this functionality to the
userspace. I can make it an internal flag (outside of the map type) and
use it solely in the kernel but considering how MAP_FIXED is tricky I
wouldn't be surprised if the userspace can find a use for this. The main
question is, are there any downsides to do so?

[1] http://lkml.kernel.org/r/20171114092916.ho5mvwc23xnelmod@dhcp22.suse.cz
[2] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  8:43         ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  8:43 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Kees Cook, Linux API, Khalid Aziz, Michael Ellerman,
	Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli,
	Linux-MM, LKML, linux-arch

On Fri 17-11-17 11:12:51, Matthew Wilcox wrote:
> On Thu, Nov 16, 2017 at 04:27:36PM -0800, Kees Cook wrote:
> > On Thu, Nov 16, 2017 at 2:18 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > > From: Michal Hocko <mhocko@suse.com>
> > >
> > > MAP_FIXED is used quite often to enforce mapping at the particular
> > > range. The main problem of this flag is, however, that it is inherently
> > > dangerous because it unmaps existing mappings covered by the requested
> > > range. This can cause silent memory corruptions. Some of them even with
> > > serious security implications. While the current semantic might be
> > > really desiderable in many cases there are others which would want to
> > > enforce the given range but rather see a failure than a silent memory
> > > corruption on a clashing range. Please note that there is no guarantee
> > > that a given range is obeyed by the mmap even when it is free - e.g.
> > > arch specific code is allowed to apply an alignment.
> > >
> > > Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior.
> > > It has the same semantic as MAP_FIXED wrt. the given address request
> > > with a single exception that it fails with ENOMEM if the requested
> > > address is already covered by an existing mapping. We still do rely on
> > > get_unmaped_area to handle all the arch specific MAP_FIXED treatment and
> > > check for a conflicting vma after it returns.
> > 
> > I like this much more than special-casing the ELF loader. It is an
> > unusual property that MAP_FIXED does _two_ things, so I like having
> > this split out.
> > 
> > Bikeshedding: maybe call this MAP_NO_CLOBBER? It's a modifier to
> > MAP_FIXED, really...

Unfortunatelly doing this as an extension wouldn't work due to backward
compatibility issues. See [1]
 
> Way back when, I proposed a new flag called MAP_FIXED_WEAK.  I was
> dissuaded from it when userspace people said it was just as easy for
> them to provide the address hint, then run fixups on their data if the
> address they were assigned wasn't the one they asked for.
> 
> The real problem is that MAP_FIXED should have been called MAP_FORCE.
> 
> So ... do we really have users that want failure instead of success at
> a different address?

I am not really sure but Michael has pointed out to jemalloc [2] which
could probably use it.

> And if so, is it really a hardship for them to
> make a call to unmap on success-at-the-wrong-address?

How do you do something like that safely in a multithreaded environment?
You do not have any safe way to do atomic probe of a memory range.

As I've said, I do not insist on exporting this functionality to the
userspace. I can make it an internal flag (outside of the map type) and
use it solely in the kernel but considering how MAP_FIXED is tricky I
wouldn't be surprised if the userspace can find a use for this. The main
question is, are there any downsides to do so?

[1] http://lkml.kernel.org/r/20171114092916.ho5mvwc23xnelmod@dhcp22.suse.cz
[2] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-17  7:30     ` Florian Weimer
@ 2017-11-20  8:55       ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  8:55 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On Fri 17-11-17 08:30:48, Florian Weimer wrote:
> On 11/16/2017 11:18 AM, Michal Hocko wrote:
> > +	if (flags & MAP_FIXED_SAFE) {
> > +		struct vm_area_struct *vma = find_vma(mm, addr);
> > +
> > +		if (vma && vma->vm_start <= addr)
> > +			return -ENOMEM;
> > +	}
> 
> Could you pick a different error code which cannot also be caused by a an
> unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?

Hmm, none of those are described in the man page. I am usually very
careful to not add new and potentially unexpected error codes but it is
true that a new flag should warrant a new error code. I am not sure
which one is more appropriate though. EBUSY suggests that retrying might
help which is true only if some other party unmaps the range. So EEXIST
would sound more natural.

> This would definitely help with application-based randomization of mappings,
> and there, actual ENOMEM and this error would have to be handled
> differently.

I see. Could you be more specific about the usecase you have in mind? I
would incorporate it into the patch description.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  8:55       ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  8:55 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On Fri 17-11-17 08:30:48, Florian Weimer wrote:
> On 11/16/2017 11:18 AM, Michal Hocko wrote:
> > +	if (flags & MAP_FIXED_SAFE) {
> > +		struct vm_area_struct *vma = find_vma(mm, addr);
> > +
> > +		if (vma && vma->vm_start <= addr)
> > +			return -ENOMEM;
> > +	}
> 
> Could you pick a different error code which cannot also be caused by a an
> unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?

Hmm, none of those are described in the man page. I am usually very
careful to not add new and potentially unexpected error codes but it is
true that a new flag should warrant a new error code. I am not sure
which one is more appropriate though. EBUSY suggests that retrying might
help which is true only if some other party unmaps the range. So EEXIST
would sound more natural.

> This would definitely help with application-based randomization of mappings,
> and there, actual ENOMEM and this error would have to be handled
> differently.

I see. Could you be more specific about the usecase you have in mind? I
would incorporate it into the patch description.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-17  8:37     ` John Hubbard
@ 2017-11-20  9:02       ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  9:02 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On Fri 17-11-17 00:37:18, John Hubbard wrote:
> On 11/16/2017 02:18 AM, Michal Hocko wrote:
[...]
> > diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> > index 03c06ba7464f..d97342ca25b1 100644
> > --- a/arch/powerpc/include/uapi/asm/mman.h
> > +++ b/arch/powerpc/include/uapi/asm/mman.h
> > @@ -28,5 +28,6 @@
> >  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
> >  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
> >  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> > +#define MAP_FIXED_SAFE	0x800000	/* MAP_FIXED which doesn't unmap underlying mapping */
>  
> 
> Hi Michal,
> 
> 1. The powerpc change, above, has one too many zeroes. It should be 0x80000, 
> not 0x800000.

OK, I will fix it. It shouldn't matter much, because we only care about
non-clashing address but I agree that we should consume them from bottom
bits.

> 2. For the one-line comments, if you phrase them like this:
> 
> /* Like MAP_FIXED, except that it doesn't unmap pre-existing mappings */
> 
> ...I think that would be a little clearer.

I do not have any strong preference here.
[...]
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 680506faceae..89af0b5839a5 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >  		if (!(file && path_noexec(&file->f_path)))
> >  			prot |= PROT_EXEC;
> >  
> > +	/* force arch specific MAP_FIXED handling in get_unmapped_area */
> > +	if (flags & MAP_FIXED_SAFE)
> > +		flags |= MAP_FIXED;
> 
> Hooking in at this point is a nice way to solve the problem. :)
> 
> For the naming and implementation, I see a couple of things that might improve
> it slightly:
> 
> a) Change MAP_FIXED_SAFE to MAP_NO_CLOBBER (as per Kees' idea), but keep the
> new flag independent, by omitting the above two lines. Instead of forcing
> MAP_FIXED as a result of the new flag, you could simply fail to take any action
> on MAP_NO_CLOBBER *unless* MAP_FIXED is set.
> 
> This is a bit easier to explain and reason about, as compared to a flag that
> auto-sets another flag. I like this approach best.

As I've exaplained in other email I do not think we can make this a
modifier.
 
>   or
> 
> b) Change MAP_FIXED_SAFE to MAP_FIXED_NO_CLOBBER (also a variation on Kees' name
> idea, but a little longer, a bit uglier, and clearer), and leave the implementation
> the same.

I do not have a _strong_ preference on the name itself. But I think that
_SAFE reflects the behavior slightly better because _NO_CLOBBER is not
very specific _when_ and _what_ we do not clobber while _SAFE is clear
on that it doesn't perform any unsafe operations.

But if the majority think that _NO_CLOBBER is better i will change it.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  9:02       ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  9:02 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On Fri 17-11-17 00:37:18, John Hubbard wrote:
> On 11/16/2017 02:18 AM, Michal Hocko wrote:
[...]
> > diff --git a/arch/powerpc/include/uapi/asm/mman.h b/arch/powerpc/include/uapi/asm/mman.h
> > index 03c06ba7464f..d97342ca25b1 100644
> > --- a/arch/powerpc/include/uapi/asm/mman.h
> > +++ b/arch/powerpc/include/uapi/asm/mman.h
> > @@ -28,5 +28,6 @@
> >  #define MAP_NONBLOCK	0x10000		/* do not block on IO */
> >  #define MAP_STACK	0x20000		/* give out an address that is best suited for process/thread stacks */
> >  #define MAP_HUGETLB	0x40000		/* create a huge page mapping */
> > +#define MAP_FIXED_SAFE	0x800000	/* MAP_FIXED which doesn't unmap underlying mapping */
>  
> 
> Hi Michal,
> 
> 1. The powerpc change, above, has one too many zeroes. It should be 0x80000, 
> not 0x800000.

OK, I will fix it. It shouldn't matter much, because we only care about
non-clashing address but I agree that we should consume them from bottom
bits.

> 2. For the one-line comments, if you phrase them like this:
> 
> /* Like MAP_FIXED, except that it doesn't unmap pre-existing mappings */
> 
> ...I think that would be a little clearer.

I do not have any strong preference here.
[...]
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 680506faceae..89af0b5839a5 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr,
> >  		if (!(file && path_noexec(&file->f_path)))
> >  			prot |= PROT_EXEC;
> >  
> > +	/* force arch specific MAP_FIXED handling in get_unmapped_area */
> > +	if (flags & MAP_FIXED_SAFE)
> > +		flags |= MAP_FIXED;
> 
> Hooking in at this point is a nice way to solve the problem. :)
> 
> For the naming and implementation, I see a couple of things that might improve
> it slightly:
> 
> a) Change MAP_FIXED_SAFE to MAP_NO_CLOBBER (as per Kees' idea), but keep the
> new flag independent, by omitting the above two lines. Instead of forcing
> MAP_FIXED as a result of the new flag, you could simply fail to take any action
> on MAP_NO_CLOBBER *unless* MAP_FIXED is set.
> 
> This is a bit easier to explain and reason about, as compared to a flag that
> auto-sets another flag. I like this approach best.

As I've exaplained in other email I do not think we can make this a
modifier.
 
>   or
> 
> b) Change MAP_FIXED_SAFE to MAP_FIXED_NO_CLOBBER (also a variation on Kees' name
> idea, but a little longer, a bit uglier, and clearer), and leave the implementation
> the same.

I do not have a _strong_ preference on the name itself. But I think that
_SAFE reflects the behavior slightly better because _NO_CLOBBER is not
very specific _when_ and _what_ we do not clobber while _SAFE is clear
on that it doesn't perform any unsafe operations.

But if the majority think that _NO_CLOBBER is better i will change it.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
  2017-11-17  8:45     ` John Hubbard
@ 2017-11-20  9:05       ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  9:05 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On Fri 17-11-17 00:45:49, John Hubbard wrote:
> On 11/16/2017 04:14 AM, Michal Hocko wrote:
> > [Ups, managed to screw the subject - fix it]
> > 
> > On Thu 16-11-17 11:18:58, Michal Hocko wrote:
> >> Hi,
> >> this has started as a follow up discussion [1][2] resulting in the
> >> runtime failure caused by hardening patch [3] which removes MAP_FIXED
> >> from the elf loader because MAP_FIXED is inherently dangerous as it
> >> might silently clobber and existing underlying mapping (e.g. stack). The
> >> reason for the failure is that some architectures enforce an alignment
> >> for the given address hint without MAP_FIXED used (e.g. for shared or
> >> file backed mappings).
> >>
> >> One way around this would be excluding those archs which do alignment
> >> tricks from the hardening [4]. The patch is really trivial but it has
> >> been objected, rightfully so, that this screams for a more generic
> >> solution. We basically want a non-destructive MAP_FIXED.
> >>
> >> The first patch introduced MAP_FIXED_SAFE which enforces the given
> >> address but unlike MAP_FIXED it fails with ENOMEM if the given range
> >> conflicts with an existing one. The flag is introduced as a completely
> >> new flag rather than a MAP_FIXED extension because of the backward
> >> compatibility. We really want a never-clobber semantic even on older
> >> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
> >> flags evaluation because we do not EINVAL on unknown flags. On those
> >> kernels we would simply use the traditional hint based semantic so the
> >> caller can still get a different address (which sucks) but at least not
> >> silently corrupt an existing mapping. I do not see a good way around
> >> that. Except we won't export expose the new semantic to the userspace at
> >> all. It seems there are users who would like to have something like that
> >> [5], though. Atomic address range probing in the multithreaded programs
> >> sounds like an interesting thing to me as well, although I do not have
> >> any specific usecase in mind.
> 
> Hi Michal,
> 
> From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
> (or whatever it ends up being named) *would* be passed through from
> user space. When you say that "we won't export expose the new semantic 
> to the userspace at all", do you mean that glibc won't add it? Or
> is there something I'm missing, that prevents that flag from getting
> from the syscall, to do_mmap()?

I meant that I could make it an internal flag outside of the map_type
space. So the userspace will not be able to use it.
 
> On the usage: there are cases in user space that could probably make
> good use of a no-clobber hint to MAP_FIXED. The user space code
> that surrounds HMM (speaking loosely there--it's really any user space
> code that manages a unified memory address space, across devices)
> often ends up using MAP_FIXED, but MAP_FIXED crams several features
> into one flag: an exact address, an "atomic" switch to the new mapping,
> and unmapping the old mappings. That's pretty overloaded, so being
> able to split it up a bit, by removing one of those features, seems
> useful.

Yes, atomic address range probing sounds useful. I cannot comment on HMM
usage but if you have any more specific I would welcome any links to add
them to the changelog.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  9:05       ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  9:05 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On Fri 17-11-17 00:45:49, John Hubbard wrote:
> On 11/16/2017 04:14 AM, Michal Hocko wrote:
> > [Ups, managed to screw the subject - fix it]
> > 
> > On Thu 16-11-17 11:18:58, Michal Hocko wrote:
> >> Hi,
> >> this has started as a follow up discussion [1][2] resulting in the
> >> runtime failure caused by hardening patch [3] which removes MAP_FIXED
> >> from the elf loader because MAP_FIXED is inherently dangerous as it
> >> might silently clobber and existing underlying mapping (e.g. stack). The
> >> reason for the failure is that some architectures enforce an alignment
> >> for the given address hint without MAP_FIXED used (e.g. for shared or
> >> file backed mappings).
> >>
> >> One way around this would be excluding those archs which do alignment
> >> tricks from the hardening [4]. The patch is really trivial but it has
> >> been objected, rightfully so, that this screams for a more generic
> >> solution. We basically want a non-destructive MAP_FIXED.
> >>
> >> The first patch introduced MAP_FIXED_SAFE which enforces the given
> >> address but unlike MAP_FIXED it fails with ENOMEM if the given range
> >> conflicts with an existing one. The flag is introduced as a completely
> >> new flag rather than a MAP_FIXED extension because of the backward
> >> compatibility. We really want a never-clobber semantic even on older
> >> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
> >> flags evaluation because we do not EINVAL on unknown flags. On those
> >> kernels we would simply use the traditional hint based semantic so the
> >> caller can still get a different address (which sucks) but at least not
> >> silently corrupt an existing mapping. I do not see a good way around
> >> that. Except we won't export expose the new semantic to the userspace at
> >> all. It seems there are users who would like to have something like that
> >> [5], though. Atomic address range probing in the multithreaded programs
> >> sounds like an interesting thing to me as well, although I do not have
> >> any specific usecase in mind.
> 
> Hi Michal,
> 
> From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
> (or whatever it ends up being named) *would* be passed through from
> user space. When you say that "we won't export expose the new semantic 
> to the userspace at all", do you mean that glibc won't add it? Or
> is there something I'm missing, that prevents that flag from getting
> from the syscall, to do_mmap()?

I meant that I could make it an internal flag outside of the map_type
space. So the userspace will not be able to use it.
 
> On the usage: there are cases in user space that could probably make
> good use of a no-clobber hint to MAP_FIXED. The user space code
> that surrounds HMM (speaking loosely there--it's really any user space
> code that manages a unified memory address space, across devices)
> often ends up using MAP_FIXED, but MAP_FIXED crams several features
> into one flag: an exact address, an "atomic" switch to the new mapping,
> and unmapping the old mappings. That's pretty overloaded, so being
> able to split it up a bit, by removing one of those features, seems
> useful.

Yes, atomic address range probing sounds useful. I cannot comment on HMM
usage but if you have any more specific I would welcome any links to add
them to the changelog.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-20  8:55       ` Michal Hocko
@ 2017-11-20  9:10         ` Florian Weimer
  -1 siblings, 0 replies; 76+ messages in thread
From: Florian Weimer @ 2017-11-20  9:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On 11/20/2017 09:55 AM, Michal Hocko wrote:
> On Fri 17-11-17 08:30:48, Florian Weimer wrote:
>> On 11/16/2017 11:18 AM, Michal Hocko wrote:
>>> +	if (flags & MAP_FIXED_SAFE) {
>>> +		struct vm_area_struct *vma = find_vma(mm, addr);
>>> +
>>> +		if (vma && vma->vm_start <= addr)
>>> +			return -ENOMEM;
>>> +	}
>>
>> Could you pick a different error code which cannot also be caused by a an
>> unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?
> 
> Hmm, none of those are described in the man page. I am usually very
> careful to not add new and potentially unexpected error codes but it is

I think this is a bad idea.  It leads to bizarre behavior, like open 
failing with EOVERFLOW with certain namespace configurations (which have 
nothing to do with file sizes).

Most of the manual pages are incomplete regarding error codes, and with 
seccomp filters and security modules, what error codes you actually get 
is anyone's guess.

> true that a new flag should warrant a new error code. I am not sure
> which one is more appropriate though. EBUSY suggests that retrying might
> help which is true only if some other party unmaps the range. So EEXIST
> would sound more natural.

Sure, EEXIST is completely fine.

>> This would definitely help with application-based randomization of mappings,
>> and there, actual ENOMEM and this error would have to be handled
>> differently.
> 
> I see. Could you be more specific about the usecase you have in mind? I
> would incorporate it into the patch description.

glibc ld.so currently maps DSOs without hints.  This means that the 
kernel will map right next to each other, and the offsets between them a 
completely predictable.  We would like to change that and supply a 
random address in a window of the address space.  If there is a 
conflict, we do not want the kernel to pick a non-random address. 
Instead, we would try again with a random address.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  9:10         ` Florian Weimer
  0 siblings, 0 replies; 76+ messages in thread
From: Florian Weimer @ 2017-11-20  9:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On 11/20/2017 09:55 AM, Michal Hocko wrote:
> On Fri 17-11-17 08:30:48, Florian Weimer wrote:
>> On 11/16/2017 11:18 AM, Michal Hocko wrote:
>>> +	if (flags & MAP_FIXED_SAFE) {
>>> +		struct vm_area_struct *vma = find_vma(mm, addr);
>>> +
>>> +		if (vma && vma->vm_start <= addr)
>>> +			return -ENOMEM;
>>> +	}
>>
>> Could you pick a different error code which cannot also be caused by a an
>> unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?
> 
> Hmm, none of those are described in the man page. I am usually very
> careful to not add new and potentially unexpected error codes but it is

I think this is a bad idea.  It leads to bizarre behavior, like open 
failing with EOVERFLOW with certain namespace configurations (which have 
nothing to do with file sizes).

Most of the manual pages are incomplete regarding error codes, and with 
seccomp filters and security modules, what error codes you actually get 
is anyone's guess.

> true that a new flag should warrant a new error code. I am not sure
> which one is more appropriate though. EBUSY suggests that retrying might
> help which is true only if some other party unmaps the range. So EEXIST
> would sound more natural.

Sure, EEXIST is completely fine.

>> This would definitely help with application-based randomization of mappings,
>> and there, actual ENOMEM and this error would have to be handled
>> differently.
> 
> I see. Could you be more specific about the usecase you have in mind? I
> would incorporate it into the patch description.

glibc ld.so currently maps DSOs without hints.  This means that the 
kernel will map right next to each other, and the offsets between them a 
completely predictable.  We would like to change that and supply a 
random address in a window of the address space.  If there is a 
conflict, we do not want the kernel to pick a non-random address. 
Instead, we would try again with a random address.

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  9:33           ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  9:33 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On Mon 20-11-17 10:10:32, Florian Weimer wrote:
> On 11/20/2017 09:55 AM, Michal Hocko wrote:
> > On Fri 17-11-17 08:30:48, Florian Weimer wrote:
> > > On 11/16/2017 11:18 AM, Michal Hocko wrote:
> > > > +	if (flags & MAP_FIXED_SAFE) {
> > > > +		struct vm_area_struct *vma = find_vma(mm, addr);
> > > > +
> > > > +		if (vma && vma->vm_start <= addr)
> > > > +			return -ENOMEM;
> > > > +	}
> > > 
> > > Could you pick a different error code which cannot also be caused by a an
> > > unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?
> > 
> > Hmm, none of those are described in the man page. I am usually very
> > careful to not add new and potentially unexpected error codes but it is
> 
> I think this is a bad idea.  It leads to bizarre behavior, like open failing
> with EOVERFLOW with certain namespace configurations (which have nothing to
> do with file sizes).

Ohh, I agree but breaking userspace is, you know, no-no. And an
unexpected error codes can break things terribly.

> Most of the manual pages are incomplete regarding error codes, and with
> seccomp filters and security modules, what error codes you actually get is
> anyone's guess.
> 
> > true that a new flag should warrant a new error code. I am not sure
> > which one is more appropriate though. EBUSY suggests that retrying might
> > help which is true only if some other party unmaps the range. So EEXIST
> > would sound more natural.
> 
> Sure, EEXIST is completely fine.

OK, I will use it.
 
> > > This would definitely help with application-based randomization of mappings,
> > > and there, actual ENOMEM and this error would have to be handled
> > > differently.
> > 
> > I see. Could you be more specific about the usecase you have in mind? I
> > would incorporate it into the patch description.
> 
> glibc ld.so currently maps DSOs without hints.  This means that the kernel
> will map right next to each other, and the offsets between them a completely
> predictable.  We would like to change that and supply a random address in a
> window of the address space.  If there is a conflict, we do not want the
> kernel to pick a non-random address. Instead, we would try again with a
> random address.

This makes sense to me. Thanks, I will add it to the cover letter.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  9:33           ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  9:33 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Khalid Aziz, Michael Ellerman,
	Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML,
	linux-arch-u79uwXL29TY76Z2rM5mHXA

On Mon 20-11-17 10:10:32, Florian Weimer wrote:
> On 11/20/2017 09:55 AM, Michal Hocko wrote:
> > On Fri 17-11-17 08:30:48, Florian Weimer wrote:
> > > On 11/16/2017 11:18 AM, Michal Hocko wrote:
> > > > +	if (flags & MAP_FIXED_SAFE) {
> > > > +		struct vm_area_struct *vma = find_vma(mm, addr);
> > > > +
> > > > +		if (vma && vma->vm_start <= addr)
> > > > +			return -ENOMEM;
> > > > +	}
> > > 
> > > Could you pick a different error code which cannot also be caused by a an
> > > unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?
> > 
> > Hmm, none of those are described in the man page. I am usually very
> > careful to not add new and potentially unexpected error codes but it is
> 
> I think this is a bad idea.  It leads to bizarre behavior, like open failing
> with EOVERFLOW with certain namespace configurations (which have nothing to
> do with file sizes).

Ohh, I agree but breaking userspace is, you know, no-no. And an
unexpected error codes can break things terribly.

> Most of the manual pages are incomplete regarding error codes, and with
> seccomp filters and security modules, what error codes you actually get is
> anyone's guess.
> 
> > true that a new flag should warrant a new error code. I am not sure
> > which one is more appropriate though. EBUSY suggests that retrying might
> > help which is true only if some other party unmaps the range. So EEXIST
> > would sound more natural.
> 
> Sure, EEXIST is completely fine.

OK, I will use it.
 
> > > This would definitely help with application-based randomization of mappings,
> > > and there, actual ENOMEM and this error would have to be handled
> > > differently.
> > 
> > I see. Could you be more specific about the usecase you have in mind? I
> > would incorporate it into the patch description.
> 
> glibc ld.so currently maps DSOs without hints.  This means that the kernel
> will map right next to each other, and the offsets between them a completely
> predictable.  We would like to change that and supply a random address in a
> window of the address space.  If there is a conflict, we do not want the
> kernel to pick a non-random address. Instead, we would try again with a
> random address.

This makes sense to me. Thanks, I will add it to the cover letter.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  9:33           ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-20  9:33 UTC (permalink / raw)
  To: Florian Weimer
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On Mon 20-11-17 10:10:32, Florian Weimer wrote:
> On 11/20/2017 09:55 AM, Michal Hocko wrote:
> > On Fri 17-11-17 08:30:48, Florian Weimer wrote:
> > > On 11/16/2017 11:18 AM, Michal Hocko wrote:
> > > > +	if (flags & MAP_FIXED_SAFE) {
> > > > +		struct vm_area_struct *vma = find_vma(mm, addr);
> > > > +
> > > > +		if (vma && vma->vm_start <= addr)
> > > > +			return -ENOMEM;
> > > > +	}
> > > 
> > > Could you pick a different error code which cannot also be caused by a an
> > > unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?
> > 
> > Hmm, none of those are described in the man page. I am usually very
> > careful to not add new and potentially unexpected error codes but it is
> 
> I think this is a bad idea.  It leads to bizarre behavior, like open failing
> with EOVERFLOW with certain namespace configurations (which have nothing to
> do with file sizes).

Ohh, I agree but breaking userspace is, you know, no-no. And an
unexpected error codes can break things terribly.

> Most of the manual pages are incomplete regarding error codes, and with
> seccomp filters and security modules, what error codes you actually get is
> anyone's guess.
> 
> > true that a new flag should warrant a new error code. I am not sure
> > which one is more appropriate though. EBUSY suggests that retrying might
> > help which is true only if some other party unmaps the range. So EEXIST
> > would sound more natural.
> 
> Sure, EEXIST is completely fine.

OK, I will use it.
 
> > > This would definitely help with application-based randomization of mappings,
> > > and there, actual ENOMEM and this error would have to be handled
> > > differently.
> > 
> > I see. Could you be more specific about the usecase you have in mind? I
> > would incorporate it into the patch description.
> 
> glibc ld.so currently maps DSOs without hints.  This means that the kernel
> will map right next to each other, and the offsets between them a completely
> predictable.  We would like to change that and supply a random address in a
> window of the address space.  If there is a conflict, we do not want the
> kernel to pick a non-random address. Instead, we would try again with a
> random address.

This makes sense to me. Thanks, I will add it to the cover letter.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
  2017-11-20  9:33           ` Michal Hocko
@ 2017-11-20  9:45             ` Florian Weimer
  -1 siblings, 0 replies; 76+ messages in thread
From: Florian Weimer @ 2017-11-20  9:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On 11/20/2017 10:33 AM, Michal Hocko wrote:
> On Mon 20-11-17 10:10:32, Florian Weimer wrote:
>> On 11/20/2017 09:55 AM, Michal Hocko wrote:
>>> On Fri 17-11-17 08:30:48, Florian Weimer wrote:
>>>> On 11/16/2017 11:18 AM, Michal Hocko wrote:
>>>>> +	if (flags & MAP_FIXED_SAFE) {
>>>>> +		struct vm_area_struct *vma = find_vma(mm, addr);
>>>>> +
>>>>> +		if (vma && vma->vm_start <= addr)
>>>>> +			return -ENOMEM;
>>>>> +	}
>>>>
>>>> Could you pick a different error code which cannot also be caused by a an
>>>> unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?
>>>
>>> Hmm, none of those are described in the man page. I am usually very
>>> careful to not add new and potentially unexpected error codes but it is
>>
>> I think this is a bad idea.  It leads to bizarre behavior, like open failing
>> with EOVERFLOW with certain namespace configurations (which have nothing to
>> do with file sizes).
> 
> Ohh, I agree but breaking userspace is, you know, no-no. And an
> unexpected error codes can break things terribly.

On the glibc side, we see a lot of changes in error codes depending on 
kernel version, build and run-time configuration.  It never occurred to 
me that you guys think the precise error code is part of the userspace 
ABI.  Personally, I even assume that failure itself can disappear at any 
time (evidence: the f* functions which accept O_PATH in their non-*at 
variants).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-20  9:45             ` Florian Weimer
  0 siblings, 0 replies; 76+ messages in thread
From: Florian Weimer @ 2017-11-20  9:45 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch

On 11/20/2017 10:33 AM, Michal Hocko wrote:
> On Mon 20-11-17 10:10:32, Florian Weimer wrote:
>> On 11/20/2017 09:55 AM, Michal Hocko wrote:
>>> On Fri 17-11-17 08:30:48, Florian Weimer wrote:
>>>> On 11/16/2017 11:18 AM, Michal Hocko wrote:
>>>>> +	if (flags & MAP_FIXED_SAFE) {
>>>>> +		struct vm_area_struct *vma = find_vma(mm, addr);
>>>>> +
>>>>> +		if (vma && vma->vm_start <= addr)
>>>>> +			return -ENOMEM;
>>>>> +	}
>>>>
>>>> Could you pick a different error code which cannot also be caused by a an
>>>> unrelated, possibly temporary condition?  Maybe EBUSY or EEXIST?
>>>
>>> Hmm, none of those are described in the man page. I am usually very
>>> careful to not add new and potentially unexpected error codes but it is
>>
>> I think this is a bad idea.  It leads to bizarre behavior, like open failing
>> with EOVERFLOW with certain namespace configurations (which have nothing to
>> do with file sizes).
> 
> Ohh, I agree but breaking userspace is, you know, no-no. And an
> unexpected error codes can break things terribly.

On the glibc side, we see a lot of changes in error codes depending on 
kernel version, build and run-time configuration.  It never occurred to 
me that you guys think the precise error code is part of the userspace 
ABI.  Personally, I even assume that failure itself can disappear at any 
time (evidence: the f* functions which accept O_PATH in their non-*at 
variants).

Thanks,
Florian

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
  2017-11-20  9:05       ` Michal Hocko
  (?)
@ 2017-11-22  1:48         ` John Hubbard
  -1 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-22  1:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On 11/20/2017 01:05 AM, Michal Hocko wrote:
> On Fri 17-11-17 00:45:49, John Hubbard wrote:
>> On 11/16/2017 04:14 AM, Michal Hocko wrote:
>>> [Ups, managed to screw the subject - fix it]
>>>
>>> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
>>>> Hi,
>>>> this has started as a follow up discussion [1][2] resulting in the
>>>> runtime failure caused by hardening patch [3] which removes MAP_FIXED
>>>> from the elf loader because MAP_FIXED is inherently dangerous as it
>>>> might silently clobber and existing underlying mapping (e.g. stack). The
>>>> reason for the failure is that some architectures enforce an alignment
>>>> for the given address hint without MAP_FIXED used (e.g. for shared or
>>>> file backed mappings).
>>>>
>>>> One way around this would be excluding those archs which do alignment
>>>> tricks from the hardening [4]. The patch is really trivial but it has
>>>> been objected, rightfully so, that this screams for a more generic
>>>> solution. We basically want a non-destructive MAP_FIXED.
>>>>
>>>> The first patch introduced MAP_FIXED_SAFE which enforces the given
>>>> address but unlike MAP_FIXED it fails with ENOMEM if the given range
>>>> conflicts with an existing one. The flag is introduced as a completely
>>>> new flag rather than a MAP_FIXED extension because of the backward
>>>> compatibility. We really want a never-clobber semantic even on older
>>>> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
>>>> flags evaluation because we do not EINVAL on unknown flags. On those
>>>> kernels we would simply use the traditional hint based semantic so the
>>>> caller can still get a different address (which sucks) but at least not
>>>> silently corrupt an existing mapping. I do not see a good way around
>>>> that. Except we won't export expose the new semantic to the userspace at
>>>> all. It seems there are users who would like to have something like that
>>>> [5], though. Atomic address range probing in the multithreaded programs
>>>> sounds like an interesting thing to me as well, although I do not have
>>>> any specific usecase in mind.
>>
>> Hi Michal,
>>
>> From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
>> (or whatever it ends up being named) *would* be passed through from
>> user space. When you say that "we won't export expose the new semantic 
>> to the userspace at all", do you mean that glibc won't add it? Or
>> is there something I'm missing, that prevents that flag from getting
>> from the syscall, to do_mmap()?
> 
> I meant that I could make it an internal flag outside of the map_type
> space. So the userspace will not be able to use it.
>  
>> On the usage: there are cases in user space that could probably make
>> good use of a no-clobber hint to MAP_FIXED. The user space code
>> that surrounds HMM (speaking loosely there--it's really any user space
>> code that manages a unified memory address space, across devices)
>> often ends up using MAP_FIXED, but MAP_FIXED crams several features
>> into one flag: an exact address, an "atomic" switch to the new mapping,
>> and unmapping the old mappings. That's pretty overloaded, so being
>> able to split it up a bit, by removing one of those features, seems
>> useful.
> 
> Yes, atomic address range probing sounds useful. I cannot comment on HMM
> usage but if you have any more specific I would welcome any links to add
> them to the changelog.
> 

Hi Michal,

Yes, it really is useful for user space. I'll use CUDA as an example, but I 
think anything that enforces a uniform virtual addressing scheme across CPUs
and devices, probably has to do something eerily similar. CUDA does this:

a) Searches /proc/<pid>/maps for a "suitable" region of available VA space. 
"Suitable" generally means it has to have a base address within a certain
limited range (a particular device model might have odd limitations, for 
example), it has to be large enough, and alignment has to be large enough
(again, various devices may have constraints that lead us to do this).

This is of course subject to races with other threads in the process.

Let's say it finds a region starting at va.

b) Next it does: 
    p = mmap(va, ...) 

*without* setting MAP_FIXED, of course (so va is just a hint), to attempt to
safely reserve that region. If p != va, then in most cases, this is a failure
(almost certainly due to another thread getting a mapping from that region
before we did), and so this layer now has to call munmap(), before returning
a "failure: retry" to upper layers.

    IMPROVEMENT: --> if instead, we could call this:

            p = mmap(va, ... MAP_FIXED_NO_CLOBBER ...)

        , then we could skip the munmap() call upon failure. This is a small thing, 
        but it is useful here. (Thanks to Piotr Jaroszynski and Mark Hairgrove
        for helping me get that detail exactly right, btw.)

c) After that, CUDA suballocates from p, via: 
 
     q = mmap(sub_region_start, ... MAP_FIXED ...)

Interestingly enough, "freeing" is also done via MAP_FIXED, and setting PROT_NONE
to the subregion. Anyway, I just included (c) for general interest.

I expect that as we continue working on the open source compute software stack,
this new capability will be useful there, too.

Oh, and on the naming, when I described how your implementation worked (without
naming it) to Piotr, he said, "oh, something like map-fixed-no-clobber?". So I
think my miniature sociology naming data point here can bolster the case ever so
slightly for calling it MAP_FIXED_NO_CLOBBER. haha. :)

thanks,
John Hubbard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-22  1:48         ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-22  1:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On 11/20/2017 01:05 AM, Michal Hocko wrote:
> On Fri 17-11-17 00:45:49, John Hubbard wrote:
>> On 11/16/2017 04:14 AM, Michal Hocko wrote:
>>> [Ups, managed to screw the subject - fix it]
>>>
>>> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
>>>> Hi,
>>>> this has started as a follow up discussion [1][2] resulting in the
>>>> runtime failure caused by hardening patch [3] which removes MAP_FIXED
>>>> from the elf loader because MAP_FIXED is inherently dangerous as it
>>>> might silently clobber and existing underlying mapping (e.g. stack). The
>>>> reason for the failure is that some architectures enforce an alignment
>>>> for the given address hint without MAP_FIXED used (e.g. for shared or
>>>> file backed mappings).
>>>>
>>>> One way around this would be excluding those archs which do alignment
>>>> tricks from the hardening [4]. The patch is really trivial but it has
>>>> been objected, rightfully so, that this screams for a more generic
>>>> solution. We basically want a non-destructive MAP_FIXED.
>>>>
>>>> The first patch introduced MAP_FIXED_SAFE which enforces the given
>>>> address but unlike MAP_FIXED it fails with ENOMEM if the given range
>>>> conflicts with an existing one. The flag is introduced as a completely
>>>> new flag rather than a MAP_FIXED extension because of the backward
>>>> compatibility. We really want a never-clobber semantic even on older
>>>> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
>>>> flags evaluation because we do not EINVAL on unknown flags. On those
>>>> kernels we would simply use the traditional hint based semantic so the
>>>> caller can still get a different address (which sucks) but at least not
>>>> silently corrupt an existing mapping. I do not see a good way around
>>>> that. Except we won't export expose the new semantic to the userspace at
>>>> all. It seems there are users who would like to have something like that
>>>> [5], though. Atomic address range probing in the multithreaded programs
>>>> sounds like an interesting thing to me as well, although I do not have
>>>> any specific usecase in mind.
>>
>> Hi Michal,
>>
>> From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
>> (or whatever it ends up being named) *would* be passed through from
>> user space. When you say that "we won't export expose the new semantic 
>> to the userspace at all", do you mean that glibc won't add it? Or
>> is there something I'm missing, that prevents that flag from getting
>> from the syscall, to do_mmap()?
> 
> I meant that I could make it an internal flag outside of the map_type
> space. So the userspace will not be able to use it.
>  
>> On the usage: there are cases in user space that could probably make
>> good use of a no-clobber hint to MAP_FIXED. The user space code
>> that surrounds HMM (speaking loosely there--it's really any user space
>> code that manages a unified memory address space, across devices)
>> often ends up using MAP_FIXED, but MAP_FIXED crams several features
>> into one flag: an exact address, an "atomic" switch to the new mapping,
>> and unmapping the old mappings. That's pretty overloaded, so being
>> able to split it up a bit, by removing one of those features, seems
>> useful.
> 
> Yes, atomic address range probing sounds useful. I cannot comment on HMM
> usage but if you have any more specific I would welcome any links to add
> them to the changelog.
> 

Hi Michal,

Yes, it really is useful for user space. I'll use CUDA as an example, but I 
think anything that enforces a uniform virtual addressing scheme across CPUs
and devices, probably has to do something eerily similar. CUDA does this:

a) Searches /proc/<pid>/maps for a "suitable" region of available VA space. 
"Suitable" generally means it has to have a base address within a certain
limited range (a particular device model might have odd limitations, for 
example), it has to be large enough, and alignment has to be large enough
(again, various devices may have constraints that lead us to do this).

This is of course subject to races with other threads in the process.

Let's say it finds a region starting at va.

b) Next it does: 
    p = mmap(va, ...) 

*without* setting MAP_FIXED, of course (so va is just a hint), to attempt to
safely reserve that region. If p != va, then in most cases, this is a failure
(almost certainly due to another thread getting a mapping from that region
before we did), and so this layer now has to call munmap(), before returning
a "failure: retry" to upper layers.

    IMPROVEMENT: --> if instead, we could call this:

            p = mmap(va, ... MAP_FIXED_NO_CLOBBER ...)

        , then we could skip the munmap() call upon failure. This is a small thing, 
        but it is useful here. (Thanks to Piotr Jaroszynski and Mark Hairgrove
        for helping me get that detail exactly right, btw.)

c) After that, CUDA suballocates from p, via: 
 
     q = mmap(sub_region_start, ... MAP_FIXED ...)

Interestingly enough, "freeing" is also done via MAP_FIXED, and setting PROT_NONE
to the subregion. Anyway, I just included (c) for general interest.

I expect that as we continue working on the open source compute software stack,
this new capability will be useful there, too.

Oh, and on the naming, when I described how your implementation worked (without
naming it) to Piotr, he said, "oh, something like map-fixed-no-clobber?". So I
think my miniature sociology naming data point here can bolster the case ever so
slightly for calling it MAP_FIXED_NO_CLOBBER. haha. :)

thanks,
John Hubbard

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-22  1:48         ` John Hubbard
  0 siblings, 0 replies; 76+ messages in thread
From: John Hubbard @ 2017-11-22  1:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On 11/20/2017 01:05 AM, Michal Hocko wrote:
> On Fri 17-11-17 00:45:49, John Hubbard wrote:
>> On 11/16/2017 04:14 AM, Michal Hocko wrote:
>>> [Ups, managed to screw the subject - fix it]
>>>
>>> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
>>>> Hi,
>>>> this has started as a follow up discussion [1][2] resulting in the
>>>> runtime failure caused by hardening patch [3] which removes MAP_FIXED
>>>> from the elf loader because MAP_FIXED is inherently dangerous as it
>>>> might silently clobber and existing underlying mapping (e.g. stack). The
>>>> reason for the failure is that some architectures enforce an alignment
>>>> for the given address hint without MAP_FIXED used (e.g. for shared or
>>>> file backed mappings).
>>>>
>>>> One way around this would be excluding those archs which do alignment
>>>> tricks from the hardening [4]. The patch is really trivial but it has
>>>> been objected, rightfully so, that this screams for a more generic
>>>> solution. We basically want a non-destructive MAP_FIXED.
>>>>
>>>> The first patch introduced MAP_FIXED_SAFE which enforces the given
>>>> address but unlike MAP_FIXED it fails with ENOMEM if the given range
>>>> conflicts with an existing one. The flag is introduced as a completely
>>>> new flag rather than a MAP_FIXED extension because of the backward
>>>> compatibility. We really want a never-clobber semantic even on older
>>>> kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
>>>> flags evaluation because we do not EINVAL on unknown flags. On those
>>>> kernels we would simply use the traditional hint based semantic so the
>>>> caller can still get a different address (which sucks) but at least not
>>>> silently corrupt an existing mapping. I do not see a good way around
>>>> that. Except we won't export expose the new semantic to the userspace at
>>>> all. It seems there are users who would like to have something like that
>>>> [5], though. Atomic address range probing in the multithreaded programs
>>>> sounds like an interesting thing to me as well, although I do not have
>>>> any specific usecase in mind.
>>
>> Hi Michal,
>>
>> From looking at the patchset, it seems to me that the new MAP_FIXED_SAFE
>> (or whatever it ends up being named) *would* be passed through from
>> user space. When you say that "we won't export expose the new semantic 
>> to the userspace at all", do you mean that glibc won't add it? Or
>> is there something I'm missing, that prevents that flag from getting
>> from the syscall, to do_mmap()?
> 
> I meant that I could make it an internal flag outside of the map_type
> space. So the userspace will not be able to use it.
>  
>> On the usage: there are cases in user space that could probably make
>> good use of a no-clobber hint to MAP_FIXED. The user space code
>> that surrounds HMM (speaking loosely there--it's really any user space
>> code that manages a unified memory address space, across devices)
>> often ends up using MAP_FIXED, but MAP_FIXED crams several features
>> into one flag: an exact address, an "atomic" switch to the new mapping,
>> and unmapping the old mappings. That's pretty overloaded, so being
>> able to split it up a bit, by removing one of those features, seems
>> useful.
> 
> Yes, atomic address range probing sounds useful. I cannot comment on HMM
> usage but if you have any more specific I would welcome any links to add
> them to the changelog.
> 

Hi Michal,

Yes, it really is useful for user space. I'll use CUDA as an example, but I 
think anything that enforces a uniform virtual addressing scheme across CPUs
and devices, probably has to do something eerily similar. CUDA does this:

a) Searches /proc/<pid>/maps for a "suitable" region of available VA space. 
"Suitable" generally means it has to have a base address within a certain
limited range (a particular device model might have odd limitations, for 
example), it has to be large enough, and alignment has to be large enough
(again, various devices may have constraints that lead us to do this).

This is of course subject to races with other threads in the process.

Let's say it finds a region starting at va.

b) Next it does: 
    p = mmap(va, ...) 

*without* setting MAP_FIXED, of course (so va is just a hint), to attempt to
safely reserve that region. If p != va, then in most cases, this is a failure
(almost certainly due to another thread getting a mapping from that region
before we did), and so this layer now has to call munmap(), before returning
a "failure: retry" to upper layers.

    IMPROVEMENT: --> if instead, we could call this:

            p = mmap(va, ... MAP_FIXED_NO_CLOBBER ...)

        , then we could skip the munmap() call upon failure. This is a small thing, 
        but it is useful here. (Thanks to Piotr Jaroszynski and Mark Hairgrove
        for helping me get that detail exactly right, btw.)

c) After that, CUDA suballocates from p, via: 
 
     q = mmap(sub_region_start, ... MAP_FIXED ...)

Interestingly enough, "freeing" is also done via MAP_FIXED, and setting PROT_NONE
to the subregion. Anyway, I just included (c) for general interest.

I expect that as we continue working on the open source compute software stack,
this new capability will be useful there, too.

Oh, and on the naming, when I described how your implementation worked (without
naming it) to Piotr, he said, "oh, something like map-fixed-no-clobber?". So I
think my miniature sociology naming data point here can bolster the case ever so
slightly for calling it MAP_FIXED_NO_CLOBBER. haha. :)

thanks,
John Hubbard

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
  2017-11-22  1:48         ` John Hubbard
@ 2017-11-22 13:12           ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-22 13:12 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On Tue 21-11-17 17:48:31, John Hubbard wrote:
[...]
> Hi Michal,
> 
> Yes, it really is useful for user space. I'll use CUDA as an example, but I 
> think anything that enforces a uniform virtual addressing scheme across CPUs
> and devices, probably has to do something eerily similar. CUDA does this:
> 
> a) Searches /proc/<pid>/maps for a "suitable" region of available VA space. 
> "Suitable" generally means it has to have a base address within a certain
> limited range (a particular device model might have odd limitations, for 
> example), it has to be large enough, and alignment has to be large enough
> (again, various devices may have constraints that lead us to do this).
> 
> This is of course subject to races with other threads in the process.
> 
> Let's say it finds a region starting at va.
> 
> b) Next it does: 
>     p = mmap(va, ...) 
> 
> *without* setting MAP_FIXED, of course (so va is just a hint), to attempt to
> safely reserve that region. If p != va, then in most cases, this is a failure
> (almost certainly due to another thread getting a mapping from that region
> before we did), and so this layer now has to call munmap(), before returning
> a "failure: retry" to upper layers.
> 
>     IMPROVEMENT: --> if instead, we could call this:
> 
>             p = mmap(va, ... MAP_FIXED_NO_CLOBBER ...)
> 
>         , then we could skip the munmap() call upon failure. This is a small thing, 
>         but it is useful here. (Thanks to Piotr Jaroszynski and Mark Hairgrove
>         for helping me get that detail exactly right, btw.)
> 
> c) After that, CUDA suballocates from p, via: 
>  
>      q = mmap(sub_region_start, ... MAP_FIXED ...)
> 
> Interestingly enough, "freeing" is also done via MAP_FIXED, and setting PROT_NONE
> to the subregion. Anyway, I just included (c) for general interest.

OK, I will add this to the changelog. This is basically the "Atomic
address range probing in the multithreaded programs" I've had in the
cover letter.

> I expect that as we continue working on the open source compute software stack,
> this new capability will be useful there, too.
> 
> Oh, and on the naming, when I described how your implementation worked (without
> naming it) to Piotr, he said, "oh, something like map-fixed-no-clobber?". So I
> think my miniature sociology naming data point here can bolster the case ever so
> slightly for calling it MAP_FIXED_NO_CLOBBER. haha. :)

I will be probably stubborn and go with a shorter name I have currently.
I am not very fond-of-very-long-names.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-22 13:12           ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-22 13:12 UTC (permalink / raw)
  To: John Hubbard
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On Tue 21-11-17 17:48:31, John Hubbard wrote:
[...]
> Hi Michal,
> 
> Yes, it really is useful for user space. I'll use CUDA as an example, but I 
> think anything that enforces a uniform virtual addressing scheme across CPUs
> and devices, probably has to do something eerily similar. CUDA does this:
> 
> a) Searches /proc/<pid>/maps for a "suitable" region of available VA space. 
> "Suitable" generally means it has to have a base address within a certain
> limited range (a particular device model might have odd limitations, for 
> example), it has to be large enough, and alignment has to be large enough
> (again, various devices may have constraints that lead us to do this).
> 
> This is of course subject to races with other threads in the process.
> 
> Let's say it finds a region starting at va.
> 
> b) Next it does: 
>     p = mmap(va, ...) 
> 
> *without* setting MAP_FIXED, of course (so va is just a hint), to attempt to
> safely reserve that region. If p != va, then in most cases, this is a failure
> (almost certainly due to another thread getting a mapping from that region
> before we did), and so this layer now has to call munmap(), before returning
> a "failure: retry" to upper layers.
> 
>     IMPROVEMENT: --> if instead, we could call this:
> 
>             p = mmap(va, ... MAP_FIXED_NO_CLOBBER ...)
> 
>         , then we could skip the munmap() call upon failure. This is a small thing, 
>         but it is useful here. (Thanks to Piotr Jaroszynski and Mark Hairgrove
>         for helping me get that detail exactly right, btw.)
> 
> c) After that, CUDA suballocates from p, via: 
>  
>      q = mmap(sub_region_start, ... MAP_FIXED ...)
> 
> Interestingly enough, "freeing" is also done via MAP_FIXED, and setting PROT_NONE
> to the subregion. Anyway, I just included (c) for general interest.

OK, I will add this to the changelog. This is basically the "Atomic
address range probing in the multithreaded programs" I've had in the
cover letter.

> I expect that as we continue working on the open source compute software stack,
> this new capability will be useful there, too.
> 
> Oh, and on the naming, when I described how your implementation worked (without
> naming it) to Piotr, he said, "oh, something like map-fixed-no-clobber?". So I
> think my miniature sociology naming data point here can bolster the case ever so
> slightly for calling it MAP_FIXED_NO_CLOBBER. haha. :)

I will be probably stubborn and go with a shorter name I have currently.
I am not very fond-of-very-long-names.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
  2017-11-22 13:12           ` Michal Hocko
@ 2017-11-22 13:20             ` Vlastimil Babka
  -1 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2017-11-22 13:20 UTC (permalink / raw)
  To: Michal Hocko, John Hubbard
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On 11/22/2017 02:12 PM, Michal Hocko wrote:
> I will be probably stubborn and go with a shorter name I have currently.
> I am not very fond-of-very-long-names.

The short synonym for the last word is "German"

SCNR :P

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-22 13:20             ` Vlastimil Babka
  0 siblings, 0 replies; 76+ messages in thread
From: Vlastimil Babka @ 2017-11-22 13:20 UTC (permalink / raw)
  To: Michal Hocko, John Hubbard
  Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

On 11/22/2017 02:12 PM, Michal Hocko wrote:
> I will be probably stubborn and go with a shorter name I have currently.
> I am not very fond-of-very-long-names.

The short synonym for the last word is "German"

SCNR :P

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
  2017-11-16 12:14   ` Michal Hocko
@ 2017-11-24  8:54     ` Michal Hocko
  -1 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-24  8:54 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

Are there any more concerns? So far the biggest one was the name. The
other which suggests a flag as a modifier has been sorted out hopefully.
Is there anymore more before we can consider this for merging? Well
except for man page update which I will prepare of course. Can we target
this to 4.16?

On Thu 16-11-17 13:14:38, Michal Hocko wrote:
> [Ups, managed to screw the subject - fix it]
> 
> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
> > Hi,
> > this has started as a follow up discussion [1][2] resulting in the
> > runtime failure caused by hardening patch [3] which removes MAP_FIXED
> > from the elf loader because MAP_FIXED is inherently dangerous as it
> > might silently clobber and existing underlying mapping (e.g. stack). The
> > reason for the failure is that some architectures enforce an alignment
> > for the given address hint without MAP_FIXED used (e.g. for shared or
> > file backed mappings).
> > 
> > One way around this would be excluding those archs which do alignment
> > tricks from the hardening [4]. The patch is really trivial but it has
> > been objected, rightfully so, that this screams for a more generic
> > solution. We basically want a non-destructive MAP_FIXED.
> > 
> > The first patch introduced MAP_FIXED_SAFE which enforces the given
> > address but unlike MAP_FIXED it fails with ENOMEM if the given range
> > conflicts with an existing one. The flag is introduced as a completely
> > new flag rather than a MAP_FIXED extension because of the backward
> > compatibility. We really want a never-clobber semantic even on older
> > kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
> > flags evaluation because we do not EINVAL on unknown flags. On those
> > kernels we would simply use the traditional hint based semantic so the
> > caller can still get a different address (which sucks) but at least not
> > silently corrupt an existing mapping. I do not see a good way around
> > that. Except we won't export expose the new semantic to the userspace at
> > all. It seems there are users who would like to have something like that
> > [5], though. Atomic address range probing in the multithreaded programs
> > sounds like an interesting thing to me as well, although I do not have
> > any specific usecase in mind.
> > 
> > The second patch simply replaces MAP_FIXED use in elf loader by
> > MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
> > follow. Actually real MAP_FIXED usages should be docummented properly
> > and they should be more of an exception.
> > 
> > Does anybody see any fundamental reasons why this is a wrong approach?
> > 
> > Diffstat says
> >  arch/alpha/include/uapi/asm/mman.h   |  2 ++
> >  arch/metag/kernel/process.c          |  6 +++++-
> >  arch/mips/include/uapi/asm/mman.h    |  2 ++
> >  arch/parisc/include/uapi/asm/mman.h  |  2 ++
> >  arch/powerpc/include/uapi/asm/mman.h |  1 +
> >  arch/sparc/include/uapi/asm/mman.h   |  1 +
> >  arch/tile/include/uapi/asm/mman.h    |  1 +
> >  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
> >  fs/binfmt_elf.c                      | 12 ++++++++----
> >  include/uapi/asm-generic/mman.h      |  1 +
> >  mm/mmap.c                            | 11 +++++++++++
> >  11 files changed, 36 insertions(+), 5 deletions(-)
> > 
> > [1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
> > [2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
> > [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
> > [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
> > [5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
> > 
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-24  8:54     ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-24  8:54 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Abdul Haleem, Joel Stanley, Kees Cook

Are there any more concerns? So far the biggest one was the name. The
other which suggests a flag as a modifier has been sorted out hopefully.
Is there anymore more before we can consider this for merging? Well
except for man page update which I will prepare of course. Can we target
this to 4.16?

On Thu 16-11-17 13:14:38, Michal Hocko wrote:
> [Ups, managed to screw the subject - fix it]
> 
> On Thu 16-11-17 11:18:58, Michal Hocko wrote:
> > Hi,
> > this has started as a follow up discussion [1][2] resulting in the
> > runtime failure caused by hardening patch [3] which removes MAP_FIXED
> > from the elf loader because MAP_FIXED is inherently dangerous as it
> > might silently clobber and existing underlying mapping (e.g. stack). The
> > reason for the failure is that some architectures enforce an alignment
> > for the given address hint without MAP_FIXED used (e.g. for shared or
> > file backed mappings).
> > 
> > One way around this would be excluding those archs which do alignment
> > tricks from the hardening [4]. The patch is really trivial but it has
> > been objected, rightfully so, that this screams for a more generic
> > solution. We basically want a non-destructive MAP_FIXED.
> > 
> > The first patch introduced MAP_FIXED_SAFE which enforces the given
> > address but unlike MAP_FIXED it fails with ENOMEM if the given range
> > conflicts with an existing one. The flag is introduced as a completely
> > new flag rather than a MAP_FIXED extension because of the backward
> > compatibility. We really want a never-clobber semantic even on older
> > kernels which do not recognize the flag. Unfortunately mmap sucks wrt.
> > flags evaluation because we do not EINVAL on unknown flags. On those
> > kernels we would simply use the traditional hint based semantic so the
> > caller can still get a different address (which sucks) but at least not
> > silently corrupt an existing mapping. I do not see a good way around
> > that. Except we won't export expose the new semantic to the userspace at
> > all. It seems there are users who would like to have something like that
> > [5], though. Atomic address range probing in the multithreaded programs
> > sounds like an interesting thing to me as well, although I do not have
> > any specific usecase in mind.
> > 
> > The second patch simply replaces MAP_FIXED use in elf loader by
> > MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should
> > follow. Actually real MAP_FIXED usages should be docummented properly
> > and they should be more of an exception.
> > 
> > Does anybody see any fundamental reasons why this is a wrong approach?
> > 
> > Diffstat says
> >  arch/alpha/include/uapi/asm/mman.h   |  2 ++
> >  arch/metag/kernel/process.c          |  6 +++++-
> >  arch/mips/include/uapi/asm/mman.h    |  2 ++
> >  arch/parisc/include/uapi/asm/mman.h  |  2 ++
> >  arch/powerpc/include/uapi/asm/mman.h |  1 +
> >  arch/sparc/include/uapi/asm/mman.h   |  1 +
> >  arch/tile/include/uapi/asm/mman.h    |  1 +
> >  arch/xtensa/include/uapi/asm/mman.h  |  2 ++
> >  fs/binfmt_elf.c                      | 12 ++++++++----
> >  include/uapi/asm-generic/mman.h      |  1 +
> >  mm/mmap.c                            | 11 +++++++++++
> >  11 files changed, 36 insertions(+), 5 deletions(-)
> > 
> > [1] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au
> > [2] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com
> > [3] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org
> > [4] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz
> > [5] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au
> > 
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
> 
> -- 
> Michal Hocko
> SUSE Labs

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
  2017-11-24  8:54     ` Michal Hocko
@ 2017-11-27 15:51       ` Khalid Aziz
  -1 siblings, 0 replies; 76+ messages in thread
From: Khalid Aziz @ 2017-11-27 15:51 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Michael Ellerman, Andrew Morton, Russell King - ARM Linux,
	Andrea Arcangeli, linux-mm, LKML, linux-arch, Abdul Haleem,
	Joel Stanley, Kees Cook

On 11/24/2017 01:54 AM, Michal Hocko wrote:
> Are there any more concerns? So far the biggest one was the name. The
> other which suggests a flag as a modifier has been sorted out hopefully.
> Is there anymore more before we can consider this for merging? Well
> except for man page update which I will prepare of course. Can we target
> this to 4.16?

I do not have concerns with this approach. I prefer a new flag as 
opposed to a modifier and the name is ok by me.

Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE
@ 2017-11-27 15:51       ` Khalid Aziz
  0 siblings, 0 replies; 76+ messages in thread
From: Khalid Aziz @ 2017-11-27 15:51 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Michael Ellerman, Andrew Morton, Russell King - ARM Linux,
	Andrea Arcangeli, linux-mm, LKML, linux-arch, Abdul Haleem,
	Joel Stanley, Kees Cook

On 11/24/2017 01:54 AM, Michal Hocko wrote:
> Are there any more concerns? So far the biggest one was the name. The
> other which suggests a flag as a modifier has been sorted out hopefully.
> Is there anymore more before we can consider this for merging? Well
> except for man page update which I will prepare of course. Can we target
> this to 4.16?

I do not have concerns with this approach. I prefer a new flag as 
opposed to a modifier and the name is ok by me.

Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-05-31  9:24               ` Michal Hocko
@ 2018-05-31 21:46                 ` Mike Kravetz
  0 siblings, 0 replies; 76+ messages in thread
From: Mike Kravetz @ 2018-05-31 21:46 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, linux-mm, LKML, libhugetlbfs

On 05/31/2018 02:24 AM, Michal Hocko wrote:
> I am not an expert on the load linkers myself so I cannot really answer
> this question. Please note that ppc had something similar. See
> ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments").
> Maybe we need to sprinkle more of those at other places?

I finally understand the issue, and it is NOT a problem with the kernel.
The issue is with old libhugetlbfs provided linker scripts, and yes,
starting with v4.17 people who run libhugetlbfs tests on x86 (at least)
will see additional failures.

I'll try to work this from the libhugetlbfs side.  In the unlikely event
that anyone knows about those linker scripts, assistance and/or feedback
would be appreciated.

Read on only if you want additional details about this failure.

The executable files which are now failing are created with the elf_i386.xB
linker script.  This script is provided for pre-2.17 versions of binutils.
binutils-2.17 came out aprox in 2007, and this script is disabled by default
if binutils-2.17 or later is used.  The only way to create executables with
this script today is by setting the HUGETLB_DEPRECATED_LINK env variable.
This is what libhugetlbfs tests do to simply continue testing the old scripts.

I previously was mistaken about which tests were causing the additional
failures.  The example I previously provided failed on v4.16 as well as
v4.17-rc kernels.  So, please ignore that information.

For an executable that runs on v4.16 and fails on v4.17-rc, here is a listing
of elf sections that the kernel will attempt to load.

Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
LOAD           0x000000 0x08048000 0x08048000 0x11c24 0x11c24 R E 0x1000
LOAD           0x011c24 0x08059c24 0x08059c24 0x10d04 0x10d04 RW  0x1000
LOAD           0x023000 0x09000000 0x09000000 0x00000 0x10048 RWE 0x1000

The first section is loaded without issue.  elf_map() will create a vma
based on the following:
map_addr ELF_PAGESTART(addr) 8048000 ELF_PAGEALIGN(size) 12000 
File_offset 0

We then attempt to load the following section with:
map_addr ELF_PAGESTART(addr) 8059000 ELF_PAGEALIGN(size) 12000
File_offset 11000

This results in,
Uhuuh, elf segment at 8059000 requested but the memory is mapped already

Note that the last page of the first section overlaps with the first page
of the second section.  Unlike the case in ad55eac74f20, the access
permissions on section 1 (RE) are different than section 2 (RW).  If we
allowed the previous MAP_FIXED behavior, we would be changing part of a
read only section to read write.  This is exactly what MAP_FIXED_NOREPLACE
was designed to prevent.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-05-31  0:51             ` Mike Kravetz
@ 2018-05-31  9:24               ` Michal Hocko
  2018-05-31 21:46                 ` Mike Kravetz
  0 siblings, 1 reply; 76+ messages in thread
From: Michal Hocko @ 2018-05-31  9:24 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Andrew Morton, linux-mm, LKML, libhugetlbfs

On Wed 30-05-18 17:51:15, Mike Kravetz wrote:
[...]
> [   38.931497] load_elf_binary: skipping index 0 p_vaddr = 8048034
> [   38.932321] load_elf_binary: skipping index 1 p_vaddr = 8048154
> [   38.933165] load_elf_binary: calling elf_map() index 2 bias 0 vaddr 8048000
> [   38.934087]     map_addr ELF_PAGESTART(addr) 8048000 total_size 0 ELF_PAGEALIGN(size) 2000
> [   38.935101]     eppnt->p_offset = 0
> [   38.935561]     eppnt->p_vaddr  = 8048000
> [   38.936073]     eppnt->p_paddr  = 8048000
> [   38.936897]     eppnt->p_filesz = 169c
> [   38.937493]     eppnt->p_memsz  = 169c
> [   38.938042] load_elf_binary: calling elf_map() index 3 bias 0 vaddr 804969c
> [   38.939002]     map_addr ELF_PAGESTART(addr) 8049000 total_size 0 ELF_PAGEALIGN(size) 2000
> [   38.939959]     eppnt->p_offset = 169c
> [   38.940410]     eppnt->p_vaddr  = 804969c
> [   38.940897]     eppnt->p_paddr  = 804969c
> [   38.941507]     eppnt->p_filesz = 1878
> [   38.942019]     eppnt->p_memsz  = 1878
> [   38.942516] 1123 (xB.linkhuge_nof): Uhuuh, elf segment at 8049000 requested but the memory is mapped already
> 
> It is pretty easy to see the mmap conflict.  I'm still trying to determine if
> the executable file is 'valid'.  It did not throw an error previously as
> MAP_FIXED unmapped the overlapping page.  However, this does not seem right.

Yes, it looks suspicious to say the least. How come the original content
is not needed anymore? Maybe the first section should be 0x1000 rather
than 0x169c?

I am not an expert on the load linkers myself so I cannot really answer
this question. Please note that ppc had something similar. See
ad55eac74f20 ("elf: enforce MAP_FIXED on overlaying elf segments").
Maybe we need to sprinkle more of those at other places?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-05-30 16:25           ` Michal Hocko
@ 2018-05-31  0:51             ` Mike Kravetz
  2018-05-31  9:24               ` Michal Hocko
  0 siblings, 1 reply; 76+ messages in thread
From: Mike Kravetz @ 2018-05-31  0:51 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, linux-mm, LKML, libhugetlbfs

On 05/30/2018 09:25 AM, Michal Hocko wrote:
> Could you add a debugging data to dump the VMA which overlaps the
> requested adress and who requested that? E.g. hook into do_mmap and dump
> all requests from the linker.

Here you go.  I added a bunch of stuff as I clearly do not understand
how elf loading works.  To me, the 'sections' parsed by the kernel code
do not seem to directly align with those produced by objdump.

[   38.899260] load_elf_binary: attempting to load file ./tests/obj32/xB.linkhuge_nofd
[   38.902340]     dumping section headers
[   38.903534]     index 0 p_offset = 34
[   38.904683]     index 0 p_vaddr  = 8048034
[   38.905680]     index 0 p_paddr  = 8048034
[   38.906442]     index 0 p_filesz = 120
[   38.907110]     index 0 p_memsz  = 120
[   38.907764] 
[   38.908019]     index 1 p_offset = 154
[   38.908521]     index 1 p_vaddr  = 8048154
[   38.909081]     index 1 p_paddr  = 8048154
[   38.909496]     index 1 p_filesz = 13
[   38.909855]     index 1 p_memsz  = 13
[   38.910453] 
[   38.910731]     index 2 p_offset = 0
[   38.911317]     index 2 p_vaddr  = 8048000
[   38.911997]     index 2 p_paddr  = 8048000
[   38.912590]     index 2 p_filesz = 169c
[   38.913141]     index 2 p_memsz  = 169c
[   38.913713] 
[   38.913987]     index 3 p_offset = 169c
[   38.914518]     index 3 p_vaddr  = 804969c
[   38.915101]     index 3 p_paddr  = 804969c
[   38.915718]     index 3 p_filesz = 1878
[   38.916266]     index 3 p_memsz  = 1878
[   38.916799] 
[   38.917032]     index 4 p_offset = 3000
[   38.917537]     index 4 p_vaddr  = 9000000
[   38.918119]     index 4 p_paddr  = 9000000
[   38.918709]     index 4 p_filesz = 0
[   38.919525]     index 4 p_memsz  = 10
[   38.919993] 
[   38.920275]     index 5 p_offset = 2d88
[   38.920791]     index 5 p_vaddr  = 804ad88
[   38.921307]     index 5 p_paddr  = 804ad88
[   38.921800]     index 5 p_filesz = 18c
[   38.922288]     index 5 p_memsz  = 18c
[   38.922739] 
[   38.922973]     index 6 p_offset = 168
[   38.923431]     index 6 p_vaddr  = 8048168
[   38.923946]     index 6 p_paddr  = 8048168
[   38.924457]     index 6 p_filesz = 44
[   38.924931]     index 6 p_memsz  = 44
[   38.925414] 
[   38.925593]     index 7 p_offset = 0
[   38.926031]     index 7 p_vaddr  = 0
[   38.926510]     index 7 p_paddr  = 0
[   38.926957]     index 7 p_filesz = 0
[   38.927443]     index 7 p_memsz  = 0
[   38.927879] 
[   38.928115]     index 8 p_offset = 169c
[   38.928594]     index 8 p_vaddr  = 804969c
[   38.929091]     index 8 p_paddr  = 804969c
[   38.929646]     index 8 p_filesz = 8c
[   38.930177]     index 8 p_memsz  = 8c
[   38.930710] 
[   38.931497] load_elf_binary: skipping index 0 p_vaddr = 8048034
[   38.932321] load_elf_binary: skipping index 1 p_vaddr = 8048154
[   38.933165] load_elf_binary: calling elf_map() index 2 bias 0 vaddr 8048000
[   38.934087]     map_addr ELF_PAGESTART(addr) 8048000 total_size 0 ELF_PAGEALIGN(size) 2000
[   38.935101]     eppnt->p_offset = 0
[   38.935561]     eppnt->p_vaddr  = 8048000
[   38.936073]     eppnt->p_paddr  = 8048000
[   38.936897]     eppnt->p_filesz = 169c
[   38.937493]     eppnt->p_memsz  = 169c
[   38.938042] load_elf_binary: calling elf_map() index 3 bias 0 vaddr 804969c
[   38.939002]     map_addr ELF_PAGESTART(addr) 8049000 total_size 0 ELF_PAGEALIGN(size) 2000
[   38.939959]     eppnt->p_offset = 169c
[   38.940410]     eppnt->p_vaddr  = 804969c
[   38.940897]     eppnt->p_paddr  = 804969c
[   38.941507]     eppnt->p_filesz = 1878
[   38.942019]     eppnt->p_memsz  = 1878
[   38.942516] 1123 (xB.linkhuge_nof): Uhuuh, elf segment at 8049000 requested but the memory is mapped already

It is pretty easy to see the mmap conflict.  I'm still trying to determine if
the executable file is 'valid'.  It did not throw an error previously as
MAP_FIXED unmapped the overlapping page.  However, this does not seem right.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-05-30 15:00         ` Mike Kravetz
@ 2018-05-30 16:25           ` Michal Hocko
  2018-05-31  0:51             ` Mike Kravetz
  0 siblings, 1 reply; 76+ messages in thread
From: Michal Hocko @ 2018-05-30 16:25 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Andrew Morton, linux-mm, LKML, libhugetlbfs

On Wed 30-05-18 08:00:29, Mike Kravetz wrote:
> On 05/30/2018 01:02 AM, Michal Hocko wrote:
> > On Tue 29-05-18 15:21:14, Mike Kravetz wrote:
> >> Just a quick heads up.  I noticed a change in libhugetlbfs testing starting
> >> with v4.17-rc1.
> >>
> >> V4.16 libhugetlbfs test results
> >> ********** TEST SUMMARY
> >> *                      2M            
> >> *                      32-bit 64-bit 
> >> *     Total testcases:   110    113   
> >> *             Skipped:     0      0   
> >> *                PASS:   105    111   
> >> *                FAIL:     0      0   
> >> *    Killed by signal:     4      1   
> >> *   Bad configuration:     1      1   
> >> *       Expected FAIL:     0      0   
> >> *     Unexpected PASS:     0      0   
> >> *    Test not present:     0      0   
> >> * Strange test result:     0      0   
> >> **********
> >>
> >> v4.17-rc1 (and later) libhugetlbfs test results
> >> ********** TEST SUMMARY
> >> *                      2M            
> >> *                      32-bit 64-bit 
> >> *     Total testcases:   110    113   
> >> *             Skipped:     0      0   
> >> *                PASS:    98    111   
> >> *                FAIL:     0      0   
> >> *    Killed by signal:    11      1   
> >> *   Bad configuration:     1      1   
> >> *       Expected FAIL:     0      0   
> >> *     Unexpected PASS:     0      0   
> >> *    Test not present:     0      0   
> >> * Strange test result:     0      0   
> >> **********
> >>
> >> I traced the 7 additional (32-bit) killed by signal results to this
> >> commit 4ed28639519c fs, elf: drop MAP_FIXED usage from elf_map.
> >>
> >> libhugetlbfs does unusual things and even provides custom linker scripts.
> >> So, in hindsight this change in behavior does not seem too unexpected.  I
> >> JUST discovered this while running libhugetlbfs tests for an unrelated
> >> issue/change and, will do some analysis to see exactly what is happening.
> > 
> > I am definitely interested about further details. Are there any messages
> > in the kernel log?
> >
> 
> Yes, new messages associated with the failures.
> 
> [   47.570451] 1368 (xB.linkhuge_nof): Uhuuh, elf segment at 00000000a731413b requested but the memory is mapped already
> [   47.606991] 1372 (xB.linkhuge_nof): Uhuuh, elf segment at 00000000a731413b requested but the memory is mapped already
> [   47.641351] 1376 (xB.linkhuge_nof): Uhuuh, elf segment at 00000000a731413b requested but the memory is mapped already
> [   47.726138] 1384 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
> [   47.773169] 1393 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
> [   47.817788] 1402 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
> [   47.857338] 1406 (xB.linkshare): Uhuuh, elf segment at 0000000018430471 requested but the memory is mapped already
> [   47.956355] 1427 (xB.linkshare): Uhuuh, elf segment at 0000000018430471 requested but the memory is mapped already
> [   48.054894] 1448 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
> [   48.071221] 1451 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
> 
> Just curious, the addresses printed in those messages does not seem correct.
> They should be page aligned.  Correct?

I have no idea what the loader actually does here.

> I think that %p conversion in the pr_info() may doing something wrong.

Well, we are using %px and that shouldn't do any tricks to the given
address.

> Also, the new failures in question are indeed being built with custom linker
> scripts designed for use with binutils older than 2.16 (really old).  So, no
> new users should encounter this issue (I think).  It appears that this may
> only impact old applications built long ago with pre-2.16 binutils.

Could you add a debugging data to dump the VMA which overlaps the
requested adress and who requested that? E.g. hook into do_mmap and dump
all requests from the linker.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-05-30  8:02       ` Michal Hocko
@ 2018-05-30 15:00         ` Mike Kravetz
  2018-05-30 16:25           ` Michal Hocko
  0 siblings, 1 reply; 76+ messages in thread
From: Mike Kravetz @ 2018-05-30 15:00 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, linux-mm, LKML, libhugetlbfs

On 05/30/2018 01:02 AM, Michal Hocko wrote:
> On Tue 29-05-18 15:21:14, Mike Kravetz wrote:
>> Just a quick heads up.  I noticed a change in libhugetlbfs testing starting
>> with v4.17-rc1.
>>
>> V4.16 libhugetlbfs test results
>> ********** TEST SUMMARY
>> *                      2M            
>> *                      32-bit 64-bit 
>> *     Total testcases:   110    113   
>> *             Skipped:     0      0   
>> *                PASS:   105    111   
>> *                FAIL:     0      0   
>> *    Killed by signal:     4      1   
>> *   Bad configuration:     1      1   
>> *       Expected FAIL:     0      0   
>> *     Unexpected PASS:     0      0   
>> *    Test not present:     0      0   
>> * Strange test result:     0      0   
>> **********
>>
>> v4.17-rc1 (and later) libhugetlbfs test results
>> ********** TEST SUMMARY
>> *                      2M            
>> *                      32-bit 64-bit 
>> *     Total testcases:   110    113   
>> *             Skipped:     0      0   
>> *                PASS:    98    111   
>> *                FAIL:     0      0   
>> *    Killed by signal:    11      1   
>> *   Bad configuration:     1      1   
>> *       Expected FAIL:     0      0   
>> *     Unexpected PASS:     0      0   
>> *    Test not present:     0      0   
>> * Strange test result:     0      0   
>> **********
>>
>> I traced the 7 additional (32-bit) killed by signal results to this
>> commit 4ed28639519c fs, elf: drop MAP_FIXED usage from elf_map.
>>
>> libhugetlbfs does unusual things and even provides custom linker scripts.
>> So, in hindsight this change in behavior does not seem too unexpected.  I
>> JUST discovered this while running libhugetlbfs tests for an unrelated
>> issue/change and, will do some analysis to see exactly what is happening.
> 
> I am definitely interested about further details. Are there any messages
> in the kernel log?
>

Yes, new messages associated with the failures.

[   47.570451] 1368 (xB.linkhuge_nof): Uhuuh, elf segment at 00000000a731413b requested but the memory is mapped already
[   47.606991] 1372 (xB.linkhuge_nof): Uhuuh, elf segment at 00000000a731413b requested but the memory is mapped already
[   47.641351] 1376 (xB.linkhuge_nof): Uhuuh, elf segment at 00000000a731413b requested but the memory is mapped already
[   47.726138] 1384 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
[   47.773169] 1393 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
[   47.817788] 1402 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
[   47.857338] 1406 (xB.linkshare): Uhuuh, elf segment at 0000000018430471 requested but the memory is mapped already
[   47.956355] 1427 (xB.linkshare): Uhuuh, elf segment at 0000000018430471 requested but the memory is mapped already
[   48.054894] 1448 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already
[   48.071221] 1451 (xB.linkhuge): Uhuuh, elf segment at 0000000090b9eaf6 requested but the memory is mapped already

Just curious, the addresses printed in those messages does not seem correct.
They should be page aligned.  Correct?  I think that %p conversion in the
pr_info() may doing something wrong.

Also, the new failures in question are indeed being built with custom linker
scripts designed for use with binutils older than 2.16 (really old).  So, no
new users should encounter this issue (I think).  It appears that this may
only impact old applications built long ago with pre-2.16 binutils.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-05-29 22:21     ` Mike Kravetz
@ 2018-05-30  8:02       ` Michal Hocko
  2018-05-30 15:00         ` Mike Kravetz
  0 siblings, 1 reply; 76+ messages in thread
From: Michal Hocko @ 2018-05-30  8:02 UTC (permalink / raw)
  To: Mike Kravetz; +Cc: Andrew Morton, linux-mm, LKML, libhugetlbfs

On Tue 29-05-18 15:21:14, Mike Kravetz wrote:
> Just a quick heads up.  I noticed a change in libhugetlbfs testing starting
> with v4.17-rc1.
> 
> V4.16 libhugetlbfs test results
> ********** TEST SUMMARY
> *                      2M            
> *                      32-bit 64-bit 
> *     Total testcases:   110    113   
> *             Skipped:     0      0   
> *                PASS:   105    111   
> *                FAIL:     0      0   
> *    Killed by signal:     4      1   
> *   Bad configuration:     1      1   
> *       Expected FAIL:     0      0   
> *     Unexpected PASS:     0      0   
> *    Test not present:     0      0   
> * Strange test result:     0      0   
> **********
> 
> v4.17-rc1 (and later) libhugetlbfs test results
> ********** TEST SUMMARY
> *                      2M            
> *                      32-bit 64-bit 
> *     Total testcases:   110    113   
> *             Skipped:     0      0   
> *                PASS:    98    111   
> *                FAIL:     0      0   
> *    Killed by signal:    11      1   
> *   Bad configuration:     1      1   
> *       Expected FAIL:     0      0   
> *     Unexpected PASS:     0      0   
> *    Test not present:     0      0   
> * Strange test result:     0      0   
> **********
> 
> I traced the 7 additional (32-bit) killed by signal results to this
> commit 4ed28639519c fs, elf: drop MAP_FIXED usage from elf_map.
> 
> libhugetlbfs does unusual things and even provides custom linker scripts.
> So, in hindsight this change in behavior does not seem too unexpected.  I
> JUST discovered this while running libhugetlbfs tests for an unrelated
> issue/change and, will do some analysis to see exactly what is happening.

I am definitely interested about further details. Are there any messages
in the kernel log?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-11-29 17:45     ` Khalid Aziz
  (?)
@ 2018-05-29 22:21     ` Mike Kravetz
  2018-05-30  8:02       ` Michal Hocko
  -1 siblings, 1 reply; 76+ messages in thread
From: Mike Kravetz @ 2018-05-29 22:21 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Andrew Morton, linux-mm, LKML, libhugetlbfs

Just a quick heads up.  I noticed a change in libhugetlbfs testing starting
with v4.17-rc1.

V4.16 libhugetlbfs test results
********** TEST SUMMARY
*                      2M            
*                      32-bit 64-bit 
*     Total testcases:   110    113   
*             Skipped:     0      0   
*                PASS:   105    111   
*                FAIL:     0      0   
*    Killed by signal:     4      1   
*   Bad configuration:     1      1   
*       Expected FAIL:     0      0   
*     Unexpected PASS:     0      0   
*    Test not present:     0      0   
* Strange test result:     0      0   
**********

v4.17-rc1 (and later) libhugetlbfs test results
********** TEST SUMMARY
*                      2M            
*                      32-bit 64-bit 
*     Total testcases:   110    113   
*             Skipped:     0      0   
*                PASS:    98    111   
*                FAIL:     0      0   
*    Killed by signal:    11      1   
*   Bad configuration:     1      1   
*       Expected FAIL:     0      0   
*     Unexpected PASS:     0      0   
*    Test not present:     0      0   
* Strange test result:     0      0   
**********

I traced the 7 additional (32-bit) killed by signal results to this
commit 4ed28639519c fs, elf: drop MAP_FIXED usage from elf_map.

libhugetlbfs does unusual things and even provides custom linker scripts.
So, in hindsight this change in behavior does not seem too unexpected.  I
JUST discovered this while running libhugetlbfs tests for an unrelated
issue/change and, will do some analysis to see exactly what is happening.

Also, will take it upon myself to run libhugetlbfs test suite on a
regular (at least weekly) basis.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-04-18 11:43       ` Tetsuo Handa
@ 2018-04-18 11:55         ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2018-04-18 11:55 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: akpm, linux-mm, linux-kernel

On Wed 18-04-18 20:43:11, Tetsuo Handa wrote:
> Michal Hocko wrote:
> > > Don't complain if IS_ERR_VALUE(),
> > 
> > this is simply wrong. We do want to warn on the failure because this is
> > when the actual clash happens. We should just warn on EEXIST.
> 
> >From 25442cdd31aa5cc8522923a0153a77dfd2ebc832 Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Wed, 18 Apr 2018 20:38:15 +0900
> Subject: [PATCH] fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST
>  error.
> 
> Commit 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
> printing spurious messages under memory pressure due to map_addr == -ENOMEM.
> 
>  9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
>  14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
>  16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already
> 
> Complain only if -EEXIST, and use %px for printing the address.

Yes this is better. But...

[...]
> -	if ((type & MAP_FIXED_NOREPLACE) && BAD_ADDR(map_addr))
> -		pr_info("%d (%s): Uhuuh, elf segment at %p requested but the memory is mapped already\n",
> -				task_pid_nr(current), current->comm,
> -				(void *)addr);
> +	if ((type & MAP_FIXED_NOREPLACE) && map_addr == -EEXIST)

... please use PTR_ERR(map_addr) == -EEXIST

then you can add 
Acked-by: Michal Hocko <mhocko@suse.com>

> +		pr_info("%d (%s): Uhuuh, elf segment at %px requested but the memory is mapped already\n",
> +			task_pid_nr(current), current->comm, (void *)addr);
>  
>  	return(map_addr);
>  }
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-04-18 11:33     ` Michal Hocko
@ 2018-04-18 11:43       ` Tetsuo Handa
  2018-04-18 11:55         ` Michal Hocko
  0 siblings, 1 reply; 76+ messages in thread
From: Tetsuo Handa @ 2018-04-18 11:43 UTC (permalink / raw)
  To: mhocko; +Cc: akpm, linux-mm, linux-kernel

Michal Hocko wrote:
> > Don't complain if IS_ERR_VALUE(),
> 
> this is simply wrong. We do want to warn on the failure because this is
> when the actual clash happens. We should just warn on EEXIST.

>From 25442cdd31aa5cc8522923a0153a77dfd2ebc832 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 18 Apr 2018 20:38:15 +0900
Subject: [PATCH] fs, elf: don't complain MAP_FIXED_NOREPLACE unless -EEXIST
 error.

Commit 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
printing spurious messages under memory pressure due to map_addr == -ENOMEM.

 9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
 14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
 16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

Complain only if -EEXIST, and use %px for printing the address.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 fs/binfmt_elf.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 41e0418..96615d9 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -377,10 +377,9 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
-	if ((type & MAP_FIXED_NOREPLACE) && BAD_ADDR(map_addr))
-		pr_info("%d (%s): Uhuuh, elf segment at %p requested but the memory is mapped already\n",
-				task_pid_nr(current), current->comm,
-				(void *)addr);
+	if ((type & MAP_FIXED_NOREPLACE) && map_addr == -EEXIST)
+		pr_info("%d (%s): Uhuuh, elf segment at %px requested but the memory is mapped already\n",
+			task_pid_nr(current), current->comm, (void *)addr);
 
 	return(map_addr);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2018-04-18 10:51     ` Tetsuo Handa
  (?)
@ 2018-04-18 11:33     ` Michal Hocko
  2018-04-18 11:43       ` Tetsuo Handa
  -1 siblings, 1 reply; 76+ messages in thread
From: Michal Hocko @ 2018-04-18 11:33 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: Andrew Morton, linux-mm, LKML

On Wed 18-04-18 19:51:05, Tetsuo Handa wrote:
> >From 0ba20dcbbc40b703413c9a6907a77968b087811b Mon Sep 17 00:00:00 2001
> From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Date: Wed, 18 Apr 2018 15:31:48 +0900
> Subject: [PATCH] fs, elf: don't complain MAP_FIXED_NOREPLACE if mapping
>  failed.
> 
> Commit 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
> printing spurious messages under memory pressure due to map_addr == -ENOMEM.
> 
>  9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
>  14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
>  16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

Hmm this is ENOMEM.

> Don't complain if IS_ERR_VALUE(),

this is simply wrong. We do want to warn on the failure because this is
when the actual clash happens. We should just warn on EEXIST.

> and use %px for printing the address.
> 
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Andrei Vagin <avagin@openvz.org>
> Cc: Khalid Aziz <khalid.aziz@oracle.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Cc: Joel Stanley <joel@jms.id.au>
> Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> ---
>  fs/binfmt_elf.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
> index 41e0418..559d35b 100644
> --- a/fs/binfmt_elf.c
> +++ b/fs/binfmt_elf.c
> @@ -377,10 +377,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
>  	} else
>  		map_addr = vm_mmap(filep, addr, size, prot, type, off);
>  
> -	if ((type & MAP_FIXED_NOREPLACE) && BAD_ADDR(map_addr))
> -		pr_info("%d (%s): Uhuuh, elf segment at %p requested but the memory is mapped already\n",
> -				task_pid_nr(current), current->comm,
> -				(void *)addr);
> +	if ((type & MAP_FIXED_NOREPLACE) && BAD_ADDR(map_addr) &&
> +	    !IS_ERR_VALUE(map_addr))
> +		pr_info("%d (%s): Uhuuh, elf segment at %px requested but the memory is mapped already\n",
> +			task_pid_nr(current), current->comm, (void *)addr);
>  
>  	return(map_addr);
>  }
> -- 
> 1.8.3.1
> 
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-12-13  9:25   ` Michal Hocko
@ 2018-04-18 10:51     ` Tetsuo Handa
  -1 siblings, 0 replies; 76+ messages in thread
From: Tetsuo Handa @ 2018-04-18 10:51 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko; +Cc: linux-mm, LKML

>From 0ba20dcbbc40b703413c9a6907a77968b087811b Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Date: Wed, 18 Apr 2018 15:31:48 +0900
Subject: [PATCH] fs, elf: don't complain MAP_FIXED_NOREPLACE if mapping
 failed.

Commit 4ed28639519c7bad ("fs, elf: drop MAP_FIXED usage from elf_map") is
printing spurious messages under memory pressure due to map_addr == -ENOMEM.

 9794 (a.out): Uhuuh, elf segment at 00007f2e34738000(fffffffffffffff4) requested but the memory is mapped already
 14104 (a.out): Uhuuh, elf segment at 00007f34fd76c000(fffffffffffffff4) requested but the memory is mapped already
 16843 (a.out): Uhuuh, elf segment at 00007f930ecc7000(fffffffffffffff4) requested but the memory is mapped already

Don't complain if IS_ERR_VALUE(), and use %px for printing the address.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Andrei Vagin <avagin@openvz.org>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Kees Cook <keescook@chromium.org>
Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
---
 fs/binfmt_elf.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 41e0418..559d35b 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -377,10 +377,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
-	if ((type & MAP_FIXED_NOREPLACE) && BAD_ADDR(map_addr))
-		pr_info("%d (%s): Uhuuh, elf segment at %p requested but the memory is mapped already\n",
-				task_pid_nr(current), current->comm,
-				(void *)addr);
+	if ((type & MAP_FIXED_NOREPLACE) && BAD_ADDR(map_addr) &&
+	    !IS_ERR_VALUE(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segment at %px requested but the memory is mapped already\n",
+			task_pid_nr(current), current->comm, (void *)addr);
 
 	return(map_addr);
 }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2018-04-18 10:51     ` Tetsuo Handa
  0 siblings, 0 replies; 76+ messages in thread
From: Tetsuo Handa @ 2018-04-18 10:51 UTC (permalink / raw)
  To: Andrew Morton, Michal Hocko; +Cc: linux-mm, LKML



^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-12-13  9:25 [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Michal Hocko
  2017-12-13  9:25   ` Michal Hocko
@ 2017-12-13  9:25   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-12-13  9:25 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox,
	Michal Hocko, Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index 0909834c83a7..867c8d0a5fb4 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 73b01e474fdc..5916d45f64a7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-12-13  9:25   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-12-13  9:25 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox,
	Michal Hocko, Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index 0909834c83a7..867c8d0a5fb4 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 73b01e474fdc..5916d45f64a7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-12-13  9:25   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-12-13  9:25 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox,
	Michal Hocko, Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index 0909834c83a7..867c8d0a5fb4 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 73b01e474fdc..5916d45f64a7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-11-29 14:42   ` Michal Hocko
@ 2017-11-29 17:45     ` Khalid Aziz
  -1 siblings, 0 replies; 76+ messages in thread
From: Khalid Aziz @ 2017-11-29 17:45 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Michael Ellerman, Andrew Morton, Russell King - ARM Linux,
	Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer,
	John Hubbard, Michal Hocko, Abdul Haleem, Joel Stanley,
	Kees Cook

On 11/29/2017 07:42 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Both load_elf_interp and load_elf_binary rely on elf_map to map segments
> on a controlled address and they use MAP_FIXED to enforce that. This is
> however dangerous thing prone to silent data corruption which can be
> even exploitable. Let's take CVE-2017-1000253 as an example. At the time
> (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
> ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
> the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
> we could end up mapping over the existing stack with some luck.
> 
> The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
> fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
> further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
> revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
> stack consumption early during execve fully stopped by da029c11e6b1
> ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
> safe and any attack should be impractical. On the other hand this is
> just too subtle assumption so it can break quite easily and hard to
> spot.
> 
> I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
> fundamentally dangerous. Moreover it shouldn't be even needed. We are
> at the early process stage and so there shouldn't be unrelated mappings
> (except for stack and loader) existing so mmap for a given address
> should succeed even without MAP_FIXED. Something is terribly wrong if
> this is not the case and we should rather fail than silently corrupt the
> underlying mapping.
> 
> Address this issue by changing MAP_FIXED to the newly added
> MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
> existing mapping clashing with the requested one without clobbering it.
> 
> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Cc: Joel Stanley <joel@jms.id.au>
> Acked-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-11-29 17:45     ` Khalid Aziz
  0 siblings, 0 replies; 76+ messages in thread
From: Khalid Aziz @ 2017-11-29 17:45 UTC (permalink / raw)
  To: Michal Hocko, linux-api
  Cc: Michael Ellerman, Andrew Morton, Russell King - ARM Linux,
	Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer,
	John Hubbard, Michal Hocko, Abdul Haleem, Joel Stanley,
	Kees Cook

On 11/29/2017 07:42 AM, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Both load_elf_interp and load_elf_binary rely on elf_map to map segments
> on a controlled address and they use MAP_FIXED to enforce that. This is
> however dangerous thing prone to silent data corruption which can be
> even exploitable. Let's take CVE-2017-1000253 as an example. At the time
> (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
> ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
> the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
> we could end up mapping over the existing stack with some luck.
> 
> The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
> fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
> further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
> revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
> stack consumption early during execve fully stopped by da029c11e6b1
> ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
> safe and any attack should be impractical. On the other hand this is
> just too subtle assumption so it can break quite easily and hard to
> spot.
> 
> I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
> fundamentally dangerous. Moreover it shouldn't be even needed. We are
> at the early process stage and so there shouldn't be unrelated mappings
> (except for stack and loader) existing so mmap for a given address
> should succeed even without MAP_FIXED. Something is terribly wrong if
> this is not the case and we should rather fail than silently corrupt the
> underlying mapping.
> 
> Address this issue by changing MAP_FIXED to the newly added
> MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
> existing mapping clashing with the requested one without clobbering it.
> 
> Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
> Cc: Joel Stanley <joel@jms.id.au>
> Acked-by: Kees Cook <keescook@chromium.org>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---

Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
  2017-11-29 14:42 [PATCH " Michal Hocko
  2017-11-29 14:42   ` Michal Hocko
@ 2017-11-29 14:42   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-29 14:42 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Florian Weimer, John Hubbard, Michal Hocko,
	Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index 0909834c83a7..867c8d0a5fb4 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 73b01e474fdc..5916d45f64a7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-11-29 14:42   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-29 14:42 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Florian Weimer, John Hubbard, Michal Hocko,
	Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index 0909834c83a7..867c8d0a5fb4 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 73b01e474fdc..5916d45f64a7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map
@ 2017-11-29 14:42   ` Michal Hocko
  0 siblings, 0 replies; 76+ messages in thread
From: Michal Hocko @ 2017-11-29 14:42 UTC (permalink / raw)
  To: linux-api
  Cc: Khalid Aziz, Michael Ellerman, Andrew Morton,
	Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML,
	linux-arch, Florian Weimer, John Hubbard, Michal Hocko,
	Abdul Haleem, Joel Stanley, Kees Cook

From: Michal Hocko <mhocko@suse.com>

Both load_elf_interp and load_elf_binary rely on elf_map to map segments
on a controlled address and they use MAP_FIXED to enforce that. This is
however dangerous thing prone to silent data corruption which can be
even exploitable. Let's take CVE-2017-1000253 as an example. At the time
(before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE"))
ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from
the stack top on 32b (legacy) memory layout (only 1GB away). Therefore
we could end up mapping over the existing stack with some luck.

The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c:
fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much
further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm:
revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive
stack consumption early during execve fully stopped by da029c11e6b1
("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be
safe and any attack should be impractical. On the other hand this is
just too subtle assumption so it can break quite easily and hard to
spot.

I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still
fundamentally dangerous. Moreover it shouldn't be even needed. We are
at the early process stage and so there shouldn't be unrelated mappings
(except for stack and loader) existing so mmap for a given address
should succeed even without MAP_FIXED. Something is terribly wrong if
this is not the case and we should rather fail than silently corrupt the
underlying mapping.

Address this issue by changing MAP_FIXED to the newly added
MAP_FIXED_SAFE. This will mean that mmap will fail if there is an
existing mapping clashing with the requested one without clobbering it.

Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
Cc: Joel Stanley <joel@jms.id.au>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 arch/metag/kernel/process.c |  6 +++++-
 fs/binfmt_elf.c             | 12 ++++++++----
 2 files changed, 13 insertions(+), 5 deletions(-)

diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c
index 0909834c83a7..867c8d0a5fb4 100644
--- a/arch/metag/kernel/process.c
+++ b/arch/metag/kernel/process.c
@@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	tcm_tag = tcm_lookup_tag(addr);
 
 	if (tcm_tag != TCM_INVALID_TAG)
-		type &= ~MAP_FIXED;
+		type &= ~(MAP_FIXED | MAP_FIXED_SAFE);
 
 	/*
 	* total_size is the size of the ELF (interpreter) image.
@@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), tsk->comm, (void*)addr);
+
 	if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) {
 		struct tcm_allocation *tcm;
 		unsigned long tcm_addr;
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 73b01e474fdc..5916d45f64a7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr,
 	} else
 		map_addr = vm_mmap(filep, addr, size, prot, type, off);
 
+	if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr))
+		pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n",
+				task_pid_nr(current), current->comm, (void*)addr);
+
 	return(map_addr);
 }
 
@@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex,
 				elf_prot |= PROT_EXEC;
 			vaddr = eppnt->p_vaddr;
 			if (interp_elf_ex->e_type == ET_EXEC || load_addr_set)
-				elf_type |= MAP_FIXED;
+				elf_type |= MAP_FIXED_SAFE;
 			else if (no_base && interp_elf_ex->e_type == ET_DYN)
 				load_addr = -vaddr;
 
@@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 		 * the ET_DYN load_addr calculations, proceed normally.
 		 */
 		if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) {
-			elf_flags |= MAP_FIXED;
+			elf_flags |= MAP_FIXED_SAFE;
 		} else if (loc->elf_ex.e_type == ET_DYN) {
 			/*
 			 * This logic is run once for the first LOAD Program
@@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm)
 				load_bias = ELF_ET_DYN_BASE;
 				if (current->flags & PF_RANDOMIZE)
 					load_bias += arch_mmap_rnd();
-				elf_flags |= MAP_FIXED;
+				elf_flags |= MAP_FIXED_SAFE;
 			} else
 				load_bias = 0;
 
@@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file)
 			(eppnt->p_filesz +
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)),
 			PROT_READ | PROT_WRITE | PROT_EXEC,
-			MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE,
+			MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE,
 			(eppnt->p_offset -
 			 ELF_PAGEOFFSET(eppnt->p_vaddr)));
 	if (error != ELF_PAGESTART(eppnt->p_vaddr))
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2018-05-31 21:46 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-16 10:18 Michal Hocko
2017-11-16 10:18 ` Michal Hocko
2017-11-16 10:18 ` Michal Hocko
2017-11-16 10:18 ` (unknown), Michal Hocko
2017-11-16 10:18 ` [RFC PATCH 1/2] mm: introduce MAP_FIXED_SAFE Michal Hocko
2017-11-16 10:18   ` Michal Hocko
2017-11-16 10:18   ` Michal Hocko
2017-11-17  0:27   ` Kees Cook
2017-11-17  0:27     ` Kees Cook
2017-11-17 19:12     ` Matthew Wilcox
2017-11-17 19:12       ` Matthew Wilcox
2017-11-17 19:12       ` Matthew Wilcox
2017-11-20  8:43       ` Michal Hocko
2017-11-20  8:43         ` Michal Hocko
2017-11-17  7:30   ` Florian Weimer
2017-11-17  7:30     ` Florian Weimer
2017-11-20  8:55     ` Michal Hocko
2017-11-20  8:55       ` Michal Hocko
2017-11-20  9:10       ` Florian Weimer
2017-11-20  9:10         ` Florian Weimer
2017-11-20  9:33         ` Michal Hocko
2017-11-20  9:33           ` Michal Hocko
2017-11-20  9:33           ` Michal Hocko
2017-11-20  9:45           ` Florian Weimer
2017-11-20  9:45             ` Florian Weimer
2017-11-17  8:37   ` John Hubbard
2017-11-17  8:37     ` John Hubbard
2017-11-17  8:37     ` John Hubbard
2017-11-20  9:02     ` Michal Hocko
2017-11-20  9:02       ` Michal Hocko
2017-11-16 10:19 ` [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko
2017-11-16 10:19   ` Michal Hocko
2017-11-16 10:19   ` Michal Hocko
2017-11-17  0:30   ` Kees Cook
2017-11-17  0:30     ` Kees Cook
2017-11-16 12:14 ` [RFC PATCH 0/2] mm: introduce MAP_FIXED_SAFE Michal Hocko
2017-11-16 12:14   ` Michal Hocko
2017-11-16 12:14   ` Michal Hocko
2017-11-17  8:45   ` John Hubbard
2017-11-17  8:45     ` John Hubbard
2017-11-17  8:45     ` John Hubbard
2017-11-17  8:45     ` John Hubbard
2017-11-17  8:45     ` John Hubbard
2017-11-20  9:05     ` Michal Hocko
2017-11-20  9:05       ` Michal Hocko
2017-11-22  1:48       ` John Hubbard
2017-11-22  1:48         ` John Hubbard
2017-11-22  1:48         ` John Hubbard
2017-11-22 13:12         ` Michal Hocko
2017-11-22 13:12           ` Michal Hocko
2017-11-22 13:20           ` Vlastimil Babka
2017-11-22 13:20             ` Vlastimil Babka
2017-11-24  8:54   ` Michal Hocko
2017-11-24  8:54     ` Michal Hocko
2017-11-27 15:51     ` Khalid Aziz
2017-11-27 15:51       ` Khalid Aziz
2017-11-29 14:42 [PATCH " Michal Hocko
2017-11-29 14:42 ` [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko
2017-11-29 14:42   ` Michal Hocko
2017-11-29 14:42   ` Michal Hocko
2017-11-29 17:45   ` Khalid Aziz
2017-11-29 17:45     ` Khalid Aziz
2018-05-29 22:21     ` Mike Kravetz
2018-05-30  8:02       ` Michal Hocko
2018-05-30 15:00         ` Mike Kravetz
2018-05-30 16:25           ` Michal Hocko
2018-05-31  0:51             ` Mike Kravetz
2018-05-31  9:24               ` Michal Hocko
2018-05-31 21:46                 ` Mike Kravetz
2017-12-13  9:25 [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Michal Hocko
2017-12-13  9:25 ` [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko
2017-12-13  9:25   ` Michal Hocko
2017-12-13  9:25   ` Michal Hocko
2018-04-18 10:51   ` Tetsuo Handa
2018-04-18 10:51     ` Tetsuo Handa
2018-04-18 11:33     ` Michal Hocko
2018-04-18 11:43       ` Tetsuo Handa
2018-04-18 11:55         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.