* [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE @ 2017-12-13 9:25 Michal Hocko 2017-12-13 9:25 ` [PATCH 1/2] " Michal Hocko ` (3 more replies) 0 siblings, 4 replies; 44+ messages in thread From: Michal Hocko @ 2017-12-13 9:25 UTC (permalink / raw) To: linux-api Cc: Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko Hi, I am resending with some minor updates based on Michael's review and ask for inclusion. There haven't been any fundamental objections for the RFC [1] nor the previous version [2]. The biggest discussion revolved around the naming. There were many suggestions flowing around MAP_REQUIRED, MAP_EXACT, MAP_FIXED_NOCLOBBER, MAP_AT_ADDR, MAP_FIXED_NOREPLACE etc... I am afraid we can bikeshed this to death and there will still be somebody finding yet another better name. Therefore I've decided to stick with my original MAP_FIXED_SAFE. Why? Well, because it keeps the MAP_FIXED prefix which should be recognized by developers and _SAFE suffix should also be clear that all dangerous side effects of the old MAP_FIXED are gone. If somebody _really_ hates this then feel free to nack and resubmit with a different name you can find a consensus for. I am sorry to be stubborn here but I would rather have this merged than go over few more iterations changing the name just because it seems like a good idea now. My experience tells me that chances are that the name will turn out to be "suboptimal" anyway over time. Some more background: This has started as a follow up discussion [3][4] resulting in the runtime failure caused by hardening patch [5] which removes MAP_FIXED from the elf loader because MAP_FIXED is inherently dangerous as it might silently clobber an existing underlying mapping (e.g. stack). The reason for the failure is that some architectures enforce an alignment for the given address hint without MAP_FIXED used (e.g. for shared or file backed mappings). One way around this would be excluding those archs which do alignment tricks from the hardening [6]. The patch is really trivial but it has been objected, rightfully so, that this screams for a more generic solution. We basically want a non-destructive MAP_FIXED. The first patch introduced MAP_FIXED_SAFE which enforces the given address but unlike MAP_FIXED it fails with EEXIST if the given range conflicts with an existing one. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt. flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. Except we won't export expose the new semantic to the userspace at all. It seems there are users who would like to have something like that. Jemalloc has been mentioned by Michael Ellerman [7] Florian Weimer has mentioned the following: : glibc ld.so currently maps DSOs without hints. This means that the kernel : will map right next to each other, and the offsets between them a completely : predictable. We would like to change that and supply a random address in a : window of the address space. If there is a conflict, we do not want the : kernel to pick a non-random address. Instead, we would try again with a : random address. John Hubbard has mentioned CUDA example : a) Searches /proc/<pid>/maps for a "suitable" region of available : VA space. "Suitable" generally means it has to have a base address : within a certain limited range (a particular device model might : have odd limitations, for example), it has to be large enough, and : alignment has to be large enough (again, various devices may have : constraints that lead us to do this). : : This is of course subject to races with other threads in the process. : : Let's say it finds a region starting at va. : : b) Next it does: : p = mmap(va, ...) : : *without* setting MAP_FIXED, of course (so va is just a hint), to : attempt to safely reserve that region. If p != va, then in most cases, : this is a failure (almost certainly due to another thread getting a : mapping from that region before we did), and so this layer now has to : call munmap(), before returning a "failure: retry" to upper layers. : : IMPROVEMENT: --> if instead, we could call this: : : p = mmap(va, ... MAP_FIXED_SAFE ...) : : , then we could skip the munmap() call upon failure. This : is a small thing, but it is useful here. (Thanks to Piotr : Jaroszynski and Mark Hairgrove for helping me get that detail : exactly right, btw.) : : c) After that, CUDA suballocates from p, via: : : q = mmap(sub_region_start, ... MAP_FIXED ...) : : Interestingly enough, "freeing" is also done via MAP_FIXED, and : setting PROT_NONE to the subregion. Anyway, I just included (c) for : general interest. Atomic address range probing in the multithreaded programs in general sounds like an interesting thing to me. The second patch simply replaces MAP_FIXED use in elf loader by MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should follow. Actually real MAP_FIXED usages should be docummented properly and they should be more of an exception. Diffstat says arch/alpha/include/uapi/asm/mman.h | 1 + arch/metag/kernel/process.c | 6 +++++- arch/mips/include/uapi/asm/mman.h | 2 ++ arch/parisc/include/uapi/asm/mman.h | 2 ++ arch/sparc/include/uapi/asm/mman.h | 1 - arch/xtensa/include/uapi/asm/mman.h | 2 ++ fs/binfmt_elf.c | 12 ++++++++---- include/uapi/asm-generic/mman-common.h | 1 + mm/mmap.c | 11 +++++++++++ 9 files changed, 32 insertions(+), 6 deletions(-) [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko@kernel.org [3] http://lkml.kernel.org/r/20171107162217.382cd754@canb.auug.org.au [4] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com [5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko@kernel.org [6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y@dhcp22.suse.cz [7] http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/2] mm: introduce MAP_FIXED_SAFE 2017-12-13 9:25 [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Michal Hocko @ 2017-12-13 9:25 ` Michal Hocko [not found] ` <20171213092550.2774-2-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2017-12-13 9:25 ` [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko ` (2 subsequent siblings) 3 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2017-12-13 9:25 UTC (permalink / raw) To: linux-api Cc: Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Michal Hocko From: Michal Hocko <mhocko@suse.com> MAP_FIXED is used quite often to enforce mapping at the particular range. The main problem of this flag is, however, that it is inherently dangerous because it unmaps existing mappings covered by the requested range. This can cause silent memory corruptions. Some of them even with serious security implications. While the current semantic might be really desiderable in many cases there are others which would want to enforce the given range but rather see a failure than a silent memory corruption on a clashing range. Please note that there is no guarantee that a given range is obeyed by the mmap even when it is free - e.g. arch specific code is allowed to apply an alignment. Introduce a new MAP_FIXED_SAFE flag for mmap to achieve this behavior. It has the same semantic as MAP_FIXED wrt. the given address request with a single exception that it fails with EEXIST if the requested address is already covered by an existing mapping. We still do rely on get_unmaped_area to handle all the arch specific MAP_FIXED treatment and check for a conflicting vma after it returns. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt. flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. Changes since v1 - define MAP_FIXED_SAFE in asm-generic/mman-common.h as per Michael Ellerman because all architecture which use this header can share the same value. This will leave us with only 4 arches which need special handling. [fail on clashing range with EEXIST as per Florian Weimer] [set MAP_FIXED before round_hint_to_min as per Khalid Aziz] Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- arch/alpha/include/uapi/asm/mman.h | 1 + arch/mips/include/uapi/asm/mman.h | 2 ++ arch/parisc/include/uapi/asm/mman.h | 2 ++ arch/sparc/include/uapi/asm/mman.h | 1 - arch/xtensa/include/uapi/asm/mman.h | 2 ++ include/uapi/asm-generic/mman-common.h | 1 + mm/mmap.c | 11 +++++++++++ 7 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 6bf730063e3f..7287dbf1e11b 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -31,6 +31,7 @@ #define MAP_NONBLOCK 0x40000 /* do not block on IO */ #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x200000 /* MAP_FIXED which doesn't unmap underlying mapping */ #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_SYNC 2 /* synchronous memory sync */ diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index 20c3df7a8fdd..f1e15890345c 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -50,6 +50,8 @@ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */ + /* * Flags for msync */ diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index d1af0d74a188..daf0282ac417 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -26,6 +26,8 @@ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */ + #define MS_SYNC 1 /* synchronous memory sync */ #define MS_ASYNC 2 /* sync memory asynchronously */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h index 715a2c927e79..d21bffd5d3dc 100644 --- a/arch/sparc/include/uapi/asm/mman.h +++ b/arch/sparc/include/uapi/asm/mman.h @@ -25,5 +25,4 @@ #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ - #endif /* _UAPI__SPARC_MMAN_H__ */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 2bfe590694fc..0daf199caa57 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -56,6 +56,7 @@ #define MAP_NONBLOCK 0x20000 /* do not block on IO */ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */ #ifdef CONFIG_MMAP_ALLOW_UNINITIALIZED # define MAP_UNINITIALIZED 0x4000000 /* For anonymous mmap, memory could be * uninitialized */ @@ -63,6 +64,7 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif + /* * Flags for msync */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6d319c46fd90..1eca2cb10d44 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -25,6 +25,7 @@ #else # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif +#define MAP_FIXED_SAFE 0x80000 /* MAP_FIXED which doesn't unmap underlying mapping */ /* * Flags for mlock diff --git a/mm/mmap.c b/mm/mmap.c index 0de87a376aaa..447223a2e469 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (!(file && path_noexec(&file->f_path))) prot |= PROT_EXEC; + /* force arch specific MAP_FIXED handling in get_unmapped_area */ + if (flags & MAP_FIXED_SAFE) + flags |= MAP_FIXED; + if (!(flags & MAP_FIXED)) addr = round_hint_to_min(addr); @@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (offset_in_page(addr)) return addr; + if (flags & MAP_FIXED_SAFE) { + struct vm_area_struct *vma = find_vma(mm, addr); + + if (vma && vma->vm_start <= addr) + return -EEXIST; + } + if (prot == PROT_EXEC) { pkey = execute_only_pkey(mm); if (pkey < 0) -- 2.15.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
[parent not found: <20171213092550.2774-2-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH 1/2] mm: introduce MAP_FIXED_SAFE [not found] ` <20171213092550.2774-2-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2017-12-13 12:50 ` Matthew Wilcox 2017-12-13 13:01 ` Michal Hocko 0 siblings, 1 reply; 44+ messages in thread From: Matthew Wilcox @ 2017-12-13 12:50 UTC (permalink / raw) To: Michal Hocko Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML, linux-arch-u79uwXL29TY76Z2rM5mHXA, Florian Weimer, John Hubbard, Michal Hocko On Wed, Dec 13, 2017 at 10:25:49AM +0100, Michal Hocko wrote: > +++ b/mm/mmap.c > @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > if (!(file && path_noexec(&file->f_path))) > prot |= PROT_EXEC; > > + /* force arch specific MAP_FIXED handling in get_unmapped_area */ > + if (flags & MAP_FIXED_SAFE) > + flags |= MAP_FIXED; > + > if (!(flags & MAP_FIXED)) > addr = round_hint_to_min(addr); > We're up to 22 MAP_ flags now. We'll run out soon. Let's preserve half of a flag by giving userspace the definition: #define MAP_FIXED_SAFE (MAP_FIXED | _MAP_NOT_HINT) then in here: if ((flags & _MAP_NOT_HINT) && !(flags & MAP_FIXED)) return -EINVAL; Now we can use _MAP_NOT_HINT all by itself in the future to mean something else. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 1/2] mm: introduce MAP_FIXED_SAFE 2017-12-13 12:50 ` Matthew Wilcox @ 2017-12-13 13:01 ` Michal Hocko 0 siblings, 0 replies; 44+ messages in thread From: Michal Hocko @ 2017-12-13 13:01 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard On Wed 13-12-17 04:50:53, Matthew Wilcox wrote: > On Wed, Dec 13, 2017 at 10:25:49AM +0100, Michal Hocko wrote: > > +++ b/mm/mmap.c > > @@ -1342,6 +1342,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > > if (!(file && path_noexec(&file->f_path))) > > prot |= PROT_EXEC; > > > > + /* force arch specific MAP_FIXED handling in get_unmapped_area */ > > + if (flags & MAP_FIXED_SAFE) > > + flags |= MAP_FIXED; > > + > > if (!(flags & MAP_FIXED)) > > addr = round_hint_to_min(addr); > > > > We're up to 22 MAP_ flags now. We'll run out soon. Let's preserve half > of a flag by giving userspace the definition: > > #define MAP_FIXED_SAFE (MAP_FIXED | _MAP_NOT_HINT) I've already tried to explain why this cannot be a modifier for MAP_FIXED. Read about the backward compatibility note... Or do I misunderstand what you are saying here? > then in here: > > if ((flags & _MAP_NOT_HINT) && !(flags & MAP_FIXED)) > return -EINVAL; > > Now we can use _MAP_NOT_HINT all by itself in the future to mean > something else. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map 2017-12-13 9:25 [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Michal Hocko 2017-12-13 9:25 ` [PATCH 1/2] " Michal Hocko @ 2017-12-13 9:25 ` Michal Hocko 2017-12-16 0:49 ` [2/2] " Andrei Vagin 2017-12-13 9:31 ` [PATCH 1/2] mmap.2: document new MAP_FIXED_SAFE flag Michal Hocko [not found] ` <20171213092550.2774-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 3 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2017-12-13 9:25 UTC (permalink / raw) To: linux-api Cc: Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Michal Hocko, Abdul Haleem, Joel Stanley, Kees Cook From: Michal Hocko <mhocko@suse.com> Both load_elf_interp and load_elf_binary rely on elf_map to map segments on a controlled address and they use MAP_FIXED to enforce that. This is however dangerous thing prone to silent data corruption which can be even exploitable. Let's take CVE-2017-1000253 as an example. At the time (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")) ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from the stack top on 32b (legacy) memory layout (only 1GB away). Therefore we could end up mapping over the existing stack with some luck. The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c: fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive stack consumption early during execve fully stopped by da029c11e6b1 ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be safe and any attack should be impractical. On the other hand this is just too subtle assumption so it can break quite easily and hard to spot. I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still fundamentally dangerous. Moreover it shouldn't be even needed. We are at the early process stage and so there shouldn't be unrelated mappings (except for stack and loader) existing so mmap for a given address should succeed even without MAP_FIXED. Something is terribly wrong if this is not the case and we should rather fail than silently corrupt the underlying mapping. Address this issue by changing MAP_FIXED to the newly added MAP_FIXED_SAFE. This will mean that mmap will fail if there is an existing mapping clashing with the requested one without clobbering it. Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Cc: Joel Stanley <joel@jms.id.au> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- arch/metag/kernel/process.c | 6 +++++- fs/binfmt_elf.c | 12 ++++++++---- 2 files changed, 13 insertions(+), 5 deletions(-) diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c index 0909834c83a7..867c8d0a5fb4 100644 --- a/arch/metag/kernel/process.c +++ b/arch/metag/kernel/process.c @@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr, tcm_tag = tcm_lookup_tag(addr); if (tcm_tag != TCM_INVALID_TAG) - type &= ~MAP_FIXED; + type &= ~(MAP_FIXED | MAP_FIXED_SAFE); /* * total_size is the size of the ELF (interpreter) image. @@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr, } else map_addr = vm_mmap(filep, addr, size, prot, type, off); + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) + pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n", + task_pid_nr(current), tsk->comm, (void*)addr); + if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) { struct tcm_allocation *tcm; unsigned long tcm_addr; diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 73b01e474fdc..5916d45f64a7 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, } else map_addr = vm_mmap(filep, addr, size, prot, type, off); + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) + pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n", + task_pid_nr(current), current->comm, (void*)addr); + return(map_addr); } @@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex, elf_prot |= PROT_EXEC; vaddr = eppnt->p_vaddr; if (interp_elf_ex->e_type == ET_EXEC || load_addr_set) - elf_type |= MAP_FIXED; + elf_type |= MAP_FIXED_SAFE; else if (no_base && interp_elf_ex->e_type == ET_DYN) load_addr = -vaddr; @@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm) * the ET_DYN load_addr calculations, proceed normally. */ if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) { - elf_flags |= MAP_FIXED; + elf_flags |= MAP_FIXED_SAFE; } else if (loc->elf_ex.e_type == ET_DYN) { /* * This logic is run once for the first LOAD Program @@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm) load_bias = ELF_ET_DYN_BASE; if (current->flags & PF_RANDOMIZE) load_bias += arch_mmap_rnd(); - elf_flags |= MAP_FIXED; + elf_flags |= MAP_FIXED_SAFE; } else load_bias = 0; @@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file) (eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr)), PROT_READ | PROT_WRITE | PROT_EXEC, - MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE, + MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE, (eppnt->p_offset - ELF_PAGEOFFSET(eppnt->p_vaddr))); if (error != ELF_PAGESTART(eppnt->p_vaddr)) -- 2.15.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [2/2] fs, elf: drop MAP_FIXED usage from elf_map 2017-12-13 9:25 ` [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko @ 2017-12-16 0:49 ` Andrei Vagin 2017-12-18 9:13 ` Michal Hocko 0 siblings, 1 reply; 44+ messages in thread From: Andrei Vagin @ 2017-12-16 0:49 UTC (permalink / raw) To: Michal Hocko Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Michal Hocko, Abdul Haleem, Joel Stanley, Kees Cook [-- Attachment #1: Type: text/plain, Size: 6926 bytes --] Hi Michal, We run CRIU tests for linux-next and the 4.15.0-rc3-next-20171215 kernel doesn't boot: [ 3.492549] Freeing unused kernel memory: 1640K [ 3.494547] Write protecting the kernel read-only data: 18432k [ 3.498781] Freeing unused kernel memory: 2016K [ 3.503330] Freeing unused kernel memory: 512K [ 3.505232] rodata_test: all tests were successful [ 3.515355] 1 (init): Uhuuh, elf segement at 00000000928fda3e requested but the memory is mapped already [ 3.519533] Starting init: /sbin/init exists but couldn't execute it (error -95) [ 3.528993] Starting init: /bin/sh exists but couldn't execute it (error -14) [ 3.532127] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. [ 3.538328] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc3-next-20171215-00001-g6d6aea478fce #11 [ 3.542201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [ 3.546081] Call Trace: [ 3.547221] dump_stack+0x5c/0x79 [ 3.548768] ? rest_init+0x30/0xb0 [ 3.550320] panic+0xe4/0x232 [ 3.551669] ? rest_init+0xb0/0xb0 [ 3.553110] kernel_init+0xeb/0x100 [ 3.554701] ret_from_fork+0x1f/0x30 [ 3.558964] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 3.564160] ---[ end Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. If I revert this patch, it boots normally. Thanks, Andrei On Wed, Dec 13, 2017 at 10:25:50AM +0100, Michal Hocko wrote: > From: Michal Hocko <mhocko@suse.com> > > Both load_elf_interp and load_elf_binary rely on elf_map to map segments > on a controlled address and they use MAP_FIXED to enforce that. This is > however dangerous thing prone to silent data corruption which can be > even exploitable. Let's take CVE-2017-1000253 as an example. At the time > (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")) > ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from > the stack top on 32b (legacy) memory layout (only 1GB away). Therefore > we could end up mapping over the existing stack with some luck. > > The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c: > fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much > further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm: > revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive > stack consumption early during execve fully stopped by da029c11e6b1 > ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be > safe and any attack should be impractical. On the other hand this is > just too subtle assumption so it can break quite easily and hard to > spot. > > I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still > fundamentally dangerous. Moreover it shouldn't be even needed. We are > at the early process stage and so there shouldn't be unrelated mappings > (except for stack and loader) existing so mmap for a given address > should succeed even without MAP_FIXED. Something is terribly wrong if > this is not the case and we should rather fail than silently corrupt the > underlying mapping. > > Address this issue by changing MAP_FIXED to the newly added > MAP_FIXED_SAFE. This will mean that mmap will fail if there is an > existing mapping clashing with the requested one without clobbering it. > > Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> > Cc: Joel Stanley <joel@jms.id.au> > Acked-by: Kees Cook <keescook@chromium.org> > Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > arch/metag/kernel/process.c | 6 +++++- > fs/binfmt_elf.c | 12 ++++++++---- > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c > index 0909834c83a7..867c8d0a5fb4 100644 > --- a/arch/metag/kernel/process.c > +++ b/arch/metag/kernel/process.c > @@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr, > tcm_tag = tcm_lookup_tag(addr); > > if (tcm_tag != TCM_INVALID_TAG) > - type &= ~MAP_FIXED; > + type &= ~(MAP_FIXED | MAP_FIXED_SAFE); > > /* > * total_size is the size of the ELF (interpreter) image. > @@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr, > } else > map_addr = vm_mmap(filep, addr, size, prot, type, off); > > + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) > + pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n", > + task_pid_nr(current), tsk->comm, (void*)addr); > + > if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) { > struct tcm_allocation *tcm; > unsigned long tcm_addr; > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > index 73b01e474fdc..5916d45f64a7 100644 > --- a/fs/binfmt_elf.c > +++ b/fs/binfmt_elf.c > @@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, > } else > map_addr = vm_mmap(filep, addr, size, prot, type, off); > > + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) > + pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n", > + task_pid_nr(current), current->comm, (void*)addr); > + > return(map_addr); > } > > @@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex, > elf_prot |= PROT_EXEC; > vaddr = eppnt->p_vaddr; > if (interp_elf_ex->e_type == ET_EXEC || load_addr_set) > - elf_type |= MAP_FIXED; > + elf_type |= MAP_FIXED_SAFE; > else if (no_base && interp_elf_ex->e_type == ET_DYN) > load_addr = -vaddr; > > @@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm) > * the ET_DYN load_addr calculations, proceed normally. > */ > if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) { > - elf_flags |= MAP_FIXED; > + elf_flags |= MAP_FIXED_SAFE; > } else if (loc->elf_ex.e_type == ET_DYN) { > /* > * This logic is run once for the first LOAD Program > @@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm) > load_bias = ELF_ET_DYN_BASE; > if (current->flags & PF_RANDOMIZE) > load_bias += arch_mmap_rnd(); > - elf_flags |= MAP_FIXED; > + elf_flags |= MAP_FIXED_SAFE; > } else > load_bias = 0; > > @@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file) > (eppnt->p_filesz + > ELF_PAGEOFFSET(eppnt->p_vaddr)), > PROT_READ | PROT_WRITE | PROT_EXEC, > - MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE, > + MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE, > (eppnt->p_offset - > ELF_PAGEOFFSET(eppnt->p_vaddr))); > if (error != ELF_PAGESTART(eppnt->p_vaddr)) [-- Attachment #2: config.gz --] [-- Type: application/gzip, Size: 21900 bytes --] [-- Attachment #3: dmesg.txt --] [-- Type: text/plain, Size: 24417 bytes --] [ 0.000000] Linux version 4.15.0-rc3-next-20171215-00001-g6d6aea478fce (avagin@laptop) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #11 SMP Fri Dec 15 16:39:11 PST 2017 [ 0.000000] Command line: root=/dev/vda2 ro debug console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect selinux=0 [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' [ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 [ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 [ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 [ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format. [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffd8fff] usable [ 0.000000] BIOS-e820: [mem 0x000000007ffd9000-0x000000007fffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] random: fast init done [ 0.000000] SMBIOS 2.8 present. [ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [ 0.000000] Hypervisor detected: KVM [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] e820: last_pfn = 0x7ffd9 max_arch_pfn = 0x400000000 [ 0.000000] MTRR default type: write-back [ 0.000000] MTRR fixed ranges enabled: [ 0.000000] 00000-9FFFF write-back [ 0.000000] A0000-BFFFF uncachable [ 0.000000] C0000-FFFFF write-protect [ 0.000000] MTRR variable ranges enabled: [ 0.000000] 0 base 0080000000 mask FF80000000 uncachable [ 0.000000] 1 disabled [ 0.000000] 2 disabled [ 0.000000] 3 disabled [ 0.000000] 4 disabled [ 0.000000] 5 disabled [ 0.000000] 6 disabled [ 0.000000] 7 disabled [ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT [ 0.000000] found SMP MP-table at [mem 0x000f6bd0-0x000f6bdf] mapped at [ (ptrval)] [ 0.000000] Base memory trampoline at [ (ptrval)] 99000 size 24576 [ 0.000000] Using GB pages for direct mapping [ 0.000000] BRK [0x2c984000, 0x2c984fff] PGTABLE [ 0.000000] BRK [0x2c985000, 0x2c985fff] PGTABLE [ 0.000000] BRK [0x2c986000, 0x2c986fff] PGTABLE [ 0.000000] BRK [0x2c987000, 0x2c987fff] PGTABLE [ 0.000000] BRK [0x2c988000, 0x2c988fff] PGTABLE [ 0.000000] BRK [0x2c989000, 0x2c989fff] PGTABLE [ 0.000000] ACPI: Early table checksum verification disabled [ 0.000000] ACPI: RSDP 0x00000000000F69C0 000014 (v00 BOCHS ) [ 0.000000] ACPI: RSDT 0x000000007FFE12FF 00002C (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001) [ 0.000000] ACPI: FACP 0x000000007FFE120B 000074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001) [ 0.000000] ACPI: DSDT 0x000000007FFE0040 0011CB (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001) [ 0.000000] ACPI: FACS 0x000000007FFE0000 000040 [ 0.000000] ACPI: APIC 0x000000007FFE127F 000080 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001) [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] No NUMA configuration found [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffd8fff] [ 0.000000] NODE_DATA(0) allocated [mem 0x7ffc2000-0x7ffd8fff] [ 0.000000] kvm-clock: cpu 0, msr 0:7ffc0001, primary cpu clock [ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00 [ 0.000000] kvm-clock: using sched offset of 1076013277 cycles [ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] [ 0.000000] DMA32 [mem 0x0000000001000000-0x000000007ffd8fff] [ 0.000000] Normal empty [ 0.000000] Device empty [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff] [ 0.000000] node 0: [mem 0x0000000000100000-0x000000007ffd8fff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000007ffd8fff] [ 0.000000] On node 0 totalpages: 524151 [ 0.000000] DMA zone: 64 pages used for memmap [ 0.000000] DMA zone: 21 pages reserved [ 0.000000] DMA zone: 3998 pages, LIFO batch:0 [ 0.000000] DMA32 zone: 8128 pages used for memmap [ 0.000000] DMA32 zone: 520153 pages, LIFO batch:31 [ 0.000000] Reserved but unavailable: 98 pages [ 0.000000] ACPI: PM-Timer IO Port: 0x608 [ 0.000000] ACPI: Local APIC address 0xfee00000 [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) [ 0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23 [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) [ 0.000000] ACPI: IRQ0 used by override. [ 0.000000] ACPI: IRQ5 used by override. [ 0.000000] ACPI: IRQ9 used by override. [ 0.000000] ACPI: IRQ10 used by override. [ 0.000000] ACPI: IRQ11 used by override. [ 0.000000] Using ACPI (MADT) for SMP configuration information [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs [ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff] [ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff] [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff] [ 0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff] [ 0.000000] e820: [mem 0x80000000-0xfeffbfff] available for PCI devices [ 0.000000] Booting paravirtualized kernel on KVM [ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns [ 0.000000] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:2 nr_node_ids:1 [ 0.000000] percpu: Embedded 44 pages/cpu @ (ptrval) s142296 r8192 d29736 u1048576 [ 0.000000] pcpu-alloc: s142296 r8192 d29736 u1048576 alloc=1*2097152 [ 0.000000] pcpu-alloc: [0] 0 1 [ 0.000000] KVM setup async PF for cpu 0 [ 0.000000] kvm-stealtime: cpu 0, msr 7fc122c0 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 515938 [ 0.000000] Policy zone: DMA32 [ 0.000000] Kernel command line: root=/dev/vda2 ro debug console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect selinux=0 [ 0.000000] Memory: 2037056K/2096604K available (12300K kernel code, 1554K rwdata, 3584K rodata, 1640K init, 912K bss, 59548K reserved, 0K cma-reserved) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 [ 0.000000] ftrace: allocating 36554 entries in 143 pages [ 0.001000] Hierarchical RCU implementation. [ 0.001000] RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=2. [ 0.001000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 [ 0.001000] NR_IRQS: 4352, nr_irqs: 440, preallocated irqs: 16 [ 0.001000] Offload RCU callbacks from CPUs: (none). [ 0.001000] Console: colour dummy device 80x25 [ 0.001000] console [ttyS0] enabled [ 0.001000] ACPI: Core revision 20171110 [ 0.001000] ACPI: 1 ACPI AML tables successfully acquired and loaded [ 0.001009] APIC: Switch to symmetric I/O mode setup [ 0.001571] x2apic enabled [ 0.002003] Switched APIC routing to physical x2apic. [ 0.003538] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 [ 0.004000] tsc: Detected 2496.000 MHz processor [ 0.004014] Calibrating delay loop (skipped) preset value.. 4992.00 BogoMIPS (lpj=2496000) [ 0.005014] pid_max: default: 32768 minimum: 301 [ 0.006057] Security Framework initialized [ 0.006548] Yama: becoming mindful. [ 0.007019] SELinux: Disabled at boot. [ 0.008206] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) [ 0.009164] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) [ 0.009816] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) [ 0.010009] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes) [ 0.011322] mce: CPU supports 10 MCE banks [ 0.011740] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.012002] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 [ 0.012610] Freeing SMP alternatives memory: 36K [ 0.013467] TSC deadline timer enabled [ 0.013820] smpboot: CPU0: Intel Core Processor (Skylake) (family: 0x6, model: 0x5e, stepping: 0x3) [ 0.014000] Performance Events: unsupported p6 CPU model 94 no PMU driver, software events only. [ 0.014041] Hierarchical SRCU implementation. [ 0.015133] NMI watchdog: Perf event create on CPU 0 failed with -2 [ 0.015725] NMI watchdog: Perf NMI watchdog permanently disabled [ 0.016077] smp: Bringing up secondary CPUs ... [ 0.016654] x86: Booting SMP configuration: [ 0.017005] .... node #0, CPUs: #1 [ 0.001000] kvm-clock: cpu 1, msr 0:7ffc0041, secondary cpu clock [ 0.019051] KVM setup async PF for cpu 1 [ 0.019599] kvm-stealtime: cpu 1, msr 7fd122c0 [ 0.020009] smp: Brought up 1 node, 2 CPUs [ 0.020531] smpboot: Max logical packages: 2 [ 0.021009] smpboot: Total of 2 processors activated (9984.00 BogoMIPS) [ 0.023160] devtmpfs: initialized [ 0.023513] x86/mm: Memory block size: 128MB [ 0.024811] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns [ 0.025015] futex hash table entries: 512 (order: 3, 32768 bytes) [ 0.026185] RTC time: 0:42:06, date: 12/16/17 [ 0.026790] NET: Registered protocol family 16 [ 0.027204] audit: initializing netlink subsys (disabled) [ 0.027914] audit: type=2000 audit(1513384927.133:1): state=initialized audit_enabled=0 res=1 [ 0.028185] cpuidle: using governor menu [ 0.029118] ACPI: bus type PCI registered [ 0.029872] PCI: Using configuration type 1 for base access [ 0.034355] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages [ 0.035011] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages [ 0.036066] cryptd: max_cpu_qlen set to 1000 [ 0.036579] ACPI: Added _OSI(Module Device) [ 0.037007] ACPI: Added _OSI(Processor Device) [ 0.037426] ACPI: Added _OSI(3.0 _SCP Extensions) [ 0.037857] ACPI: Added _OSI(Processor Aggregator Device) [ 0.041356] ACPI: Interpreter enabled [ 0.041764] ACPI: (supports S0 S5) [ 0.042005] ACPI: Using IOAPIC for interrupt routing [ 0.042655] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug [ 0.043625] ACPI: Enabled 2 GPEs in block 00 to 0F [ 0.059248] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) [ 0.059953] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI] [ 0.060045] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM [ 0.061180] PCI host bridge to bus 0000:00 [ 0.061874] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] [ 0.062013] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] [ 0.063016] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] [ 0.064015] pci_bus 0000:00: root bus resource [mem 0x80000000-0xfebfffff window] [ 0.065014] pci_bus 0000:00: root bus resource [bus 00-ff] [ 0.065753] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 [ 0.066487] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 [ 0.067537] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 [ 0.071700] pci 0000:00:01.1: reg 0x20: [io 0xc100-0xc10f] [ 0.074032] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] [ 0.074908] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6] [ 0.075011] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] [ 0.076010] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376] [ 0.077121] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000 [ 0.078148] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by PIIX4 ACPI [ 0.079014] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB [ 0.080224] pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000 [ 0.082007] pci 0000:00:03.0: reg 0x10: [io 0xc040-0xc05f] [ 0.083814] pci 0000:00:03.0: reg 0x14: [mem 0xfebc0000-0xfebc0fff] [ 0.089891] pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref] [ 0.090545] pci 0000:00:05.0: [1af4:1003] type 00 class 0x078000 [ 0.092708] pci 0000:00:05.0: reg 0x10: [io 0xc060-0xc07f] [ 0.094009] pci 0000:00:05.0: reg 0x14: [mem 0xfebc1000-0xfebc1fff] [ 0.102484] pci 0000:00:06.0: [8086:2934] type 00 class 0x0c0300 [ 0.108028] pci 0000:00:06.0: reg 0x20: [io 0xc080-0xc09f] [ 0.110738] pci 0000:00:06.1: [8086:2935] type 00 class 0x0c0300 [ 0.114388] pci 0000:00:06.1: reg 0x20: [io 0xc0a0-0xc0bf] [ 0.117339] pci 0000:00:06.2: [8086:2936] type 00 class 0x0c0300 [ 0.122770] pci 0000:00:06.2: reg 0x20: [io 0xc0c0-0xc0df] [ 0.124738] pci 0000:00:06.7: [8086:293a] type 00 class 0x0c0320 [ 0.125825] pci 0000:00:06.7: reg 0x10: [mem 0xfebc2000-0xfebc2fff] [ 0.130347] pci 0000:00:07.0: [1af4:1001] type 00 class 0x010000 [ 0.133007] pci 0000:00:07.0: reg 0x10: [io 0xc000-0xc03f] [ 0.134793] pci 0000:00:07.0: reg 0x14: [mem 0xfebc3000-0xfebc3fff] [ 0.141808] pci 0000:00:08.0: [1af4:1002] type 00 class 0x00ff00 [ 0.142914] pci 0000:00:08.0: reg 0x10: [io 0xc0e0-0xc0ff] [ 0.148977] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) [ 0.149455] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11) [ 0.150390] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) [ 0.151382] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) [ 0.152380] ACPI: PCI Interrupt Link [LNKS] (IRQs *9) [ 0.154508] vgaarb: loaded [ 0.155271] SCSI subsystem initialized [ 0.155887] EDAC MC: Ver: 3.0.0 [ 0.156255] PCI: Using ACPI for IRQ routing [ 0.156566] PCI: pci_cache_line_size set to 64 bytes [ 0.157161] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff] [ 0.157914] e820: reserve RAM buffer [mem 0x7ffd9000-0x7fffffff] [ 0.158253] NetLabel: Initializing [ 0.158765] NetLabel: domain hash size = 128 [ 0.159005] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO [ 0.159775] NetLabel: unlabeled traffic allowed by default [ 0.160073] clocksource: Switched to clocksource kvm-clock [ 0.186764] VFS: Disk quotas dquot_6.6.0 [ 0.187277] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) [ 0.188251] FS-Cache: Loaded [ 0.188725] pnp: PnP ACPI init [ 0.189271] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active) [ 0.190231] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active) [ 0.191229] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active) [ 0.192096] pnp 00:03: [dma 2] [ 0.192514] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active) [ 0.193631] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (active) [ 0.195416] pnp: PnP ACPI: found 5 devices [ 0.206055] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns [ 0.207127] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window] [ 0.207832] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window] [ 0.208594] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window] [ 0.209469] pci_bus 0000:00: resource 7 [mem 0x80000000-0xfebfffff window] [ 0.210493] NET: Registered protocol family 2 [ 0.211244] tcp_listen_portaddr_hash hash table entries: 1024 (order: 2, 16384 bytes) [ 0.212283] TCP established hash table entries: 16384 (order: 5, 131072 bytes) [ 0.213285] TCP bind hash table entries: 16384 (order: 6, 262144 bytes) [ 0.214306] TCP: Hash tables configured (established 16384 bind 16384) [ 0.215065] UDP hash table entries: 1024 (order: 3, 32768 bytes) [ 0.215797] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) [ 0.217934] NET: Registered protocol family 1 [ 0.219126] RPC: Registered named UNIX socket transport module. [ 0.219676] RPC: Registered udp transport module. [ 0.220130] RPC: Registered tcp transport module. [ 0.220552] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 0.221153] pci 0000:00:00.0: Limiting direct PCI/PCI transfers [ 0.221701] pci 0000:00:01.0: PIIX3: Enabling Passive Release [ 0.222319] pci 0000:00:01.0: Activating ISA DMA hang workarounds [ 0.444214] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 [ 0.880141] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 [ 1.311493] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 [ 1.748829] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 [ 1.962124] PCI: CLS 0 bytes, default 64 [ 1.964749] Initialise system trusted keyrings [ 1.965289] workingset: timestamp_bits=37 max_order=19 bucket_order=0 [ 1.969600] zbud: loaded [ 1.971287] SGI XFS with security attributes, no debug enabled [ 2.106071] NET: Registered protocol family 38 [ 2.106556] Key type asymmetric registered [ 2.106931] Asymmetric key parser 'x509' registered [ 2.107514] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248) [ 2.108327] io scheduler noop registered [ 2.108813] io scheduler deadline registered [ 2.109608] io scheduler cfq registered (default) [ 2.110258] io scheduler mq-deadline registered [ 2.110796] io scheduler kyber registered [ 2.111688] intel_idle: Please enable MWAIT in BIOS SETUP [ 2.112310] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 [ 2.113037] ACPI: Power Button [PWRF] [ 2.331642] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver [ 2.554093] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver [ 2.775938] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver [ 2.975053] tsc: Refined TSC clocksource calibration: 2495.981 MHz [ 2.975641] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23fa6529869, max_idle_ns: 440795218057 ns [ 3.029409] virtio-pci 0000:00:08.0: virtio_pci: leaving for legacy driver [ 3.032925] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled [ 3.056849] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A [ 3.064748] Non-volatile memory driver v1.3 [ 3.065925] ppdev: user-space parallel port driver [ 3.071816] loop: module loaded [ 3.075337] vda: vda1 vda2 vda3 [ 3.076659] Rounding down aligned max_sectors from 4294967295 to 4294967288 [ 3.077996] Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) [ 3.079790] libphy: Fixed MDIO Bus: probed [ 3.080257] tun: Universal TUN/TAP device driver, 1.6 [ 3.082222] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 [ 3.083675] serio: i8042 KBD port at 0x60,0x64 irq 1 [ 3.084160] serio: i8042 AUX port at 0x60,0x64 irq 12 [ 3.084816] mousedev: PS/2 mouse device common for all mice [ 3.086603] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1 [ 3.089192] rtc_cmos 00:00: RTC can wake from S4 [ 3.090116] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0 [ 3.092829] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram [ 3.093937] IR NEC protocol handler initialized [ 3.094408] IR RC5(x/sz) protocol handler initialized [ 3.094901] IR RC6 protocol handler initialized [ 3.095510] IR JVC protocol handler initialized [ 3.095952] IR Sony protocol handler initialized [ 3.096399] IR SANYO protocol handler initialized [ 3.096862] IR Sharp protocol handler initialized [ 3.097342] IR MCE Keyboard/mouse protocol handler initialized [ 3.097919] IR XMP protocol handler initialized [ 3.098530] device-mapper: uevent: version 1.0.3 [ 3.099209] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com [ 3.100398] device-mapper: multipath round-robin: version 1.2.0 loaded [ 3.101883] drop_monitor: Initializing network drop monitor service [ 3.102553] Netfilter messages via NETLINK v0.30. [ 3.103090] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 3.103738] ctnetlink v0.93: registering with nfnetlink. [ 3.104494] ip_tables: (C) 2000-2006 Netfilter Core Team [ 3.105734] Initializing XFRM netlink socket [ 3.106885] NET: Registered protocol family 10 [ 3.109341] Segment Routing with IPv6 [ 3.109976] mip6: Mobile IPv6 [ 3.111987] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 3.114230] NET: Registered protocol family 17 [ 3.115047] Bridge firewalling registered [ 3.115824] Ebtables v2.0 registered [ 3.117996] 8021q: 802.1Q VLAN Support v1.8 [ 3.119429] AVX2 version of gcm_enc/dec engaged. [ 3.119886] AES CTR mode by8 optimization enabled [ 3.128818] sched_clock: Marking stable (3128714579, 0)->(3404180881, -275466302) [ 3.129945] registered taskstats version 1 [ 3.130427] Loading compiled-in X.509 certificates [ 3.163216] Loaded X.509 cert 'Build time autogenerated kernel key: 38e0adea1af8bd8a23b02436d4acf2f8c7408d23' [ 3.166359] zswap: loaded using pool lzo/zbud [ 3.167943] Key type big_key registered [ 3.168778] Magic number: 13:918:708 [ 3.169255] rtc_cmos 00:00: setting system clock to 2017-12-16 00:42:09 UTC (1513384929) [ 3.170604] md: Skipping autodetection of RAID arrays. (raid=autodetect will force) [ 3.171932] EXT4-fs (vda2): couldn't mount as ext3 due to feature incompatibilities [ 3.173871] EXT4-fs (vda2): couldn't mount as ext2 due to feature incompatibilities [ 3.175306] EXT4-fs (vda2): INFO: recovery required on readonly filesystem [ 3.176212] EXT4-fs (vda2): write access will be enabled during recovery [ 3.397187] EXT4-fs (vda2): orphan cleanup on readonly fs [ 3.399412] EXT4-fs (vda2): 5 orphan inodes deleted [ 3.402759] EXT4-fs (vda2): recovery complete [ 3.466647] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null) [ 3.469401] VFS: Mounted root (ext4 filesystem) readonly on device 253:2. [ 3.473719] devtmpfs: mounted [ 3.492549] Freeing unused kernel memory: 1640K [ 3.494547] Write protecting the kernel read-only data: 18432k [ 3.498781] Freeing unused kernel memory: 2016K [ 3.503330] Freeing unused kernel memory: 512K [ 3.505232] rodata_test: all tests were successful [ 3.515355] 1 (init): Uhuuh, elf segement at 00000000928fda3e requested but the memory is mapped already [ 3.519533] Starting init: /sbin/init exists but couldn't execute it (error -95) [ 3.528993] Starting init: /bin/sh exists but couldn't execute it (error -14) [ 3.532127] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. [ 3.538328] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc3-next-20171215-00001-g6d6aea478fce #11 [ 3.542201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 [ 3.546081] Call Trace: [ 3.547221] dump_stack+0x5c/0x79 [ 3.548768] ? rest_init+0x30/0xb0 [ 3.550320] panic+0xe4/0x232 [ 3.551669] ? rest_init+0xb0/0xb0 [ 3.553110] kernel_init+0xeb/0x100 [ 3.554701] ret_from_fork+0x1f/0x30 [ 3.558964] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 3.564160] ---[ end Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [2/2] fs, elf: drop MAP_FIXED usage from elf_map 2017-12-16 0:49 ` [2/2] " Andrei Vagin @ 2017-12-18 9:13 ` Michal Hocko 2017-12-18 18:12 ` Andrei Vagin 0 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2017-12-18 9:13 UTC (permalink / raw) To: Andrei Vagin Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook On Fri 15-12-17 16:49:28, Andrei Vagin wrote: > Hi Michal, > > We run CRIU tests for linux-next and the 4.15.0-rc3-next-20171215 kernel > doesn't boot: > > [ 3.492549] Freeing unused kernel memory: 1640K > [ 3.494547] Write protecting the kernel read-only data: 18432k > [ 3.498781] Freeing unused kernel memory: 2016K > [ 3.503330] Freeing unused kernel memory: 512K > [ 3.505232] rodata_test: all tests were successful > [ 3.515355] 1 (init): Uhuuh, elf segement at 00000000928fda3e requested but the memory is mapped already Hmm, this interesting. What does the test actualy do? Could you add some instrumentation to see what is actually mapped there? Something like diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 0e50230ce53d..1b68ddc34043 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -372,10 +372,28 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, } else map_addr = vm_mmap(filep, addr, size, prot, type, off); - if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) { + struct vm_area_struct *vma; + pr_info("%d (%s): Uhuuh, elf segment at %p requested but the memory is mapped already\n", task_pid_nr(current), current->comm, (void *)addr); + vma = find_vma(current->mm, map_addr); + if (vma && vma->vm_start < addr) { + pr_info("requested [%lx, %lx] mapped [%lx, %lx] %lx ", addr, addr + total_size, + vma->vm_start, vma->vm_end, vma->vm_flags); + if (!vma->vm_file) { + pr_cont("anon\n"); + } else { + char path[512]; + char *p = file_path(vma->vm_file, path, sizeof(path)); + if (IS_ERR(p)) + p = "?"; + pr_cont("\"%s\"\n", kbasename(p)); + } + dump_stack(); + } + } return(map_addr); } > [ 3.519533] Starting init: /sbin/init exists but couldn't execute it (error -95) > [ 3.528993] Starting init: /bin/sh exists but couldn't execute it (error -14) > [ 3.532127] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. > [ 3.538328] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc3-next-20171215-00001-g6d6aea478fce #11 > [ 3.542201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > [ 3.546081] Call Trace: > [ 3.547221] dump_stack+0x5c/0x79 > [ 3.548768] ? rest_init+0x30/0xb0 > [ 3.550320] panic+0xe4/0x232 > [ 3.551669] ? rest_init+0xb0/0xb0 > [ 3.553110] kernel_init+0xeb/0x100 > [ 3.554701] ret_from_fork+0x1f/0x30 > [ 3.558964] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 3.564160] ---[ end Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. > > If I revert this patch, it boots normally. > > Thanks, > Andrei > > On Wed, Dec 13, 2017 at 10:25:50AM +0100, Michal Hocko wrote: > > From: Michal Hocko <mhocko@suse.com> > > > > Both load_elf_interp and load_elf_binary rely on elf_map to map segments > > on a controlled address and they use MAP_FIXED to enforce that. This is > > however dangerous thing prone to silent data corruption which can be > > even exploitable. Let's take CVE-2017-1000253 as an example. At the time > > (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")) > > ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from > > the stack top on 32b (legacy) memory layout (only 1GB away). Therefore > > we could end up mapping over the existing stack with some luck. > > > > The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c: > > fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much > > further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm: > > revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive > > stack consumption early during execve fully stopped by da029c11e6b1 > > ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be > > safe and any attack should be impractical. On the other hand this is > > just too subtle assumption so it can break quite easily and hard to > > spot. > > > > I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still > > fundamentally dangerous. Moreover it shouldn't be even needed. We are > > at the early process stage and so there shouldn't be unrelated mappings > > (except for stack and loader) existing so mmap for a given address > > should succeed even without MAP_FIXED. Something is terribly wrong if > > this is not the case and we should rather fail than silently corrupt the > > underlying mapping. > > > > Address this issue by changing MAP_FIXED to the newly added > > MAP_FIXED_SAFE. This will mean that mmap will fail if there is an > > existing mapping clashing with the requested one without clobbering it. > > > > Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> > > Cc: Joel Stanley <joel@jms.id.au> > > Acked-by: Kees Cook <keescook@chromium.org> > > Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > --- > > arch/metag/kernel/process.c | 6 +++++- > > fs/binfmt_elf.c | 12 ++++++++---- > > 2 files changed, 13 insertions(+), 5 deletions(-) > > > > diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c > > index 0909834c83a7..867c8d0a5fb4 100644 > > --- a/arch/metag/kernel/process.c > > +++ b/arch/metag/kernel/process.c > > @@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr, > > tcm_tag = tcm_lookup_tag(addr); > > > > if (tcm_tag != TCM_INVALID_TAG) > > - type &= ~MAP_FIXED; > > + type &= ~(MAP_FIXED | MAP_FIXED_SAFE); > > > > /* > > * total_size is the size of the ELF (interpreter) image. > > @@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr, > > } else > > map_addr = vm_mmap(filep, addr, size, prot, type, off); > > > > + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) > > + pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n", > > + task_pid_nr(current), tsk->comm, (void*)addr); > > + > > if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) { > > struct tcm_allocation *tcm; > > unsigned long tcm_addr; > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > > index 73b01e474fdc..5916d45f64a7 100644 > > --- a/fs/binfmt_elf.c > > +++ b/fs/binfmt_elf.c > > @@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, > > } else > > map_addr = vm_mmap(filep, addr, size, prot, type, off); > > > > + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) > > + pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n", > > + task_pid_nr(current), current->comm, (void*)addr); > > + > > return(map_addr); > > } > > > > @@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex, > > elf_prot |= PROT_EXEC; > > vaddr = eppnt->p_vaddr; > > if (interp_elf_ex->e_type == ET_EXEC || load_addr_set) > > - elf_type |= MAP_FIXED; > > + elf_type |= MAP_FIXED_SAFE; > > else if (no_base && interp_elf_ex->e_type == ET_DYN) > > load_addr = -vaddr; > > > > @@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm) > > * the ET_DYN load_addr calculations, proceed normally. > > */ > > if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) { > > - elf_flags |= MAP_FIXED; > > + elf_flags |= MAP_FIXED_SAFE; > > } else if (loc->elf_ex.e_type == ET_DYN) { > > /* > > * This logic is run once for the first LOAD Program > > @@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm) > > load_bias = ELF_ET_DYN_BASE; > > if (current->flags & PF_RANDOMIZE) > > load_bias += arch_mmap_rnd(); > > - elf_flags |= MAP_FIXED; > > + elf_flags |= MAP_FIXED_SAFE; > > } else > > load_bias = 0; > > > > @@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file) > > (eppnt->p_filesz + > > ELF_PAGEOFFSET(eppnt->p_vaddr)), > > PROT_READ | PROT_WRITE | PROT_EXEC, > > - MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE, > > + MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE, > > (eppnt->p_offset - > > ELF_PAGEOFFSET(eppnt->p_vaddr))); > > if (error != ELF_PAGESTART(eppnt->p_vaddr)) > [ 0.000000] Linux version 4.15.0-rc3-next-20171215-00001-g6d6aea478fce (avagin@laptop) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #11 SMP Fri Dec 15 16:39:11 PST 2017 > [ 0.000000] Command line: root=/dev/vda2 ro debug console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect selinux=0 > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' > [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 > [ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 > [ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 > [ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format. > [ 0.000000] e820: BIOS-provided physical RAM map: > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffd8fff] usable > [ 0.000000] BIOS-e820: [mem 0x000000007ffd9000-0x000000007fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved > [ 0.000000] NX (Execute Disable) protection: active > [ 0.000000] random: fast init done > [ 0.000000] SMBIOS 2.8 present. > [ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > [ 0.000000] Hypervisor detected: KVM > [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved > [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable > [ 0.000000] e820: last_pfn = 0x7ffd9 max_arch_pfn = 0x400000000 > [ 0.000000] MTRR default type: write-back > [ 0.000000] MTRR fixed ranges enabled: > [ 0.000000] 00000-9FFFF write-back > [ 0.000000] A0000-BFFFF uncachable > [ 0.000000] C0000-FFFFF write-protect > [ 0.000000] MTRR variable ranges enabled: > [ 0.000000] 0 base 0080000000 mask FF80000000 uncachable > [ 0.000000] 1 disabled > [ 0.000000] 2 disabled > [ 0.000000] 3 disabled > [ 0.000000] 4 disabled > [ 0.000000] 5 disabled > [ 0.000000] 6 disabled > [ 0.000000] 7 disabled > [ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT > [ 0.000000] found SMP MP-table at [mem 0x000f6bd0-0x000f6bdf] mapped at [ (ptrval)] > [ 0.000000] Base memory trampoline at [ (ptrval)] 99000 size 24576 > [ 0.000000] Using GB pages for direct mapping > [ 0.000000] BRK [0x2c984000, 0x2c984fff] PGTABLE > [ 0.000000] BRK [0x2c985000, 0x2c985fff] PGTABLE > [ 0.000000] BRK [0x2c986000, 0x2c986fff] PGTABLE > [ 0.000000] BRK [0x2c987000, 0x2c987fff] PGTABLE > [ 0.000000] BRK [0x2c988000, 0x2c988fff] PGTABLE > [ 0.000000] BRK [0x2c989000, 0x2c989fff] PGTABLE > [ 0.000000] ACPI: Early table checksum verification disabled > [ 0.000000] ACPI: RSDP 0x00000000000F69C0 000014 (v00 BOCHS ) > [ 0.000000] ACPI: RSDT 0x000000007FFE12FF 00002C (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001) > [ 0.000000] ACPI: FACP 0x000000007FFE120B 000074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001) > [ 0.000000] ACPI: DSDT 0x000000007FFE0040 0011CB (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001) > [ 0.000000] ACPI: FACS 0x000000007FFE0000 000040 > [ 0.000000] ACPI: APIC 0x000000007FFE127F 000080 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001) > [ 0.000000] ACPI: Local APIC address 0xfee00000 > [ 0.000000] No NUMA configuration found > [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffd8fff] > [ 0.000000] NODE_DATA(0) allocated [mem 0x7ffc2000-0x7ffd8fff] > [ 0.000000] kvm-clock: cpu 0, msr 0:7ffc0001, primary cpu clock > [ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00 > [ 0.000000] kvm-clock: using sched offset of 1076013277 cycles > [ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns > [ 0.000000] Zone ranges: > [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] > [ 0.000000] DMA32 [mem 0x0000000001000000-0x000000007ffd8fff] > [ 0.000000] Normal empty > [ 0.000000] Device empty > [ 0.000000] Movable zone start for each node > [ 0.000000] Early memory node ranges > [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff] > [ 0.000000] node 0: [mem 0x0000000000100000-0x000000007ffd8fff] > [ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000007ffd8fff] > [ 0.000000] On node 0 totalpages: 524151 > [ 0.000000] DMA zone: 64 pages used for memmap > [ 0.000000] DMA zone: 21 pages reserved > [ 0.000000] DMA zone: 3998 pages, LIFO batch:0 > [ 0.000000] DMA32 zone: 8128 pages used for memmap > [ 0.000000] DMA32 zone: 520153 pages, LIFO batch:31 > [ 0.000000] Reserved but unavailable: 98 pages > [ 0.000000] ACPI: PM-Timer IO Port: 0x608 > [ 0.000000] ACPI: Local APIC address 0xfee00000 > [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) > [ 0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23 > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) > [ 0.000000] ACPI: IRQ0 used by override. > [ 0.000000] ACPI: IRQ5 used by override. > [ 0.000000] ACPI: IRQ9 used by override. > [ 0.000000] ACPI: IRQ10 used by override. > [ 0.000000] ACPI: IRQ11 used by override. > [ 0.000000] Using ACPI (MADT) for SMP configuration information > [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs > [ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff] > [ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff] > [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff] > [ 0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff] > [ 0.000000] e820: [mem 0x80000000-0xfeffbfff] available for PCI devices > [ 0.000000] Booting paravirtualized kernel on KVM > [ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns > [ 0.000000] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:2 nr_node_ids:1 > [ 0.000000] percpu: Embedded 44 pages/cpu @ (ptrval) s142296 r8192 d29736 u1048576 > [ 0.000000] pcpu-alloc: s142296 r8192 d29736 u1048576 alloc=1*2097152 > [ 0.000000] pcpu-alloc: [0] 0 1 > [ 0.000000] KVM setup async PF for cpu 0 > [ 0.000000] kvm-stealtime: cpu 0, msr 7fc122c0 > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 515938 > [ 0.000000] Policy zone: DMA32 > [ 0.000000] Kernel command line: root=/dev/vda2 ro debug console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect selinux=0 > [ 0.000000] Memory: 2037056K/2096604K available (12300K kernel code, 1554K rwdata, 3584K rodata, 1640K init, 912K bss, 59548K reserved, 0K cma-reserved) > [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 > [ 0.000000] ftrace: allocating 36554 entries in 143 pages > [ 0.001000] Hierarchical RCU implementation. > [ 0.001000] RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=2. > [ 0.001000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 > [ 0.001000] NR_IRQS: 4352, nr_irqs: 440, preallocated irqs: 16 > [ 0.001000] Offload RCU callbacks from CPUs: (none). > [ 0.001000] Console: colour dummy device 80x25 > [ 0.001000] console [ttyS0] enabled > [ 0.001000] ACPI: Core revision 20171110 > [ 0.001000] ACPI: 1 ACPI AML tables successfully acquired and loaded > [ 0.001009] APIC: Switch to symmetric I/O mode setup > [ 0.001571] x2apic enabled > [ 0.002003] Switched APIC routing to physical x2apic. > [ 0.003538] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 > [ 0.004000] tsc: Detected 2496.000 MHz processor > [ 0.004014] Calibrating delay loop (skipped) preset value.. 4992.00 BogoMIPS (lpj=2496000) > [ 0.005014] pid_max: default: 32768 minimum: 301 > [ 0.006057] Security Framework initialized > [ 0.006548] Yama: becoming mindful. > [ 0.007019] SELinux: Disabled at boot. > [ 0.008206] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) > [ 0.009164] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) > [ 0.009816] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) > [ 0.010009] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes) > [ 0.011322] mce: CPU supports 10 MCE banks > [ 0.011740] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 > [ 0.012002] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 > [ 0.012610] Freeing SMP alternatives memory: 36K > [ 0.013467] TSC deadline timer enabled > [ 0.013820] smpboot: CPU0: Intel Core Processor (Skylake) (family: 0x6, model: 0x5e, stepping: 0x3) > [ 0.014000] Performance Events: unsupported p6 CPU model 94 no PMU driver, software events only. > [ 0.014041] Hierarchical SRCU implementation. > [ 0.015133] NMI watchdog: Perf event create on CPU 0 failed with -2 > [ 0.015725] NMI watchdog: Perf NMI watchdog permanently disabled > [ 0.016077] smp: Bringing up secondary CPUs ... > [ 0.016654] x86: Booting SMP configuration: > [ 0.017005] .... node #0, CPUs: #1 > [ 0.001000] kvm-clock: cpu 1, msr 0:7ffc0041, secondary cpu clock > [ 0.019051] KVM setup async PF for cpu 1 > [ 0.019599] kvm-stealtime: cpu 1, msr 7fd122c0 > [ 0.020009] smp: Brought up 1 node, 2 CPUs > [ 0.020531] smpboot: Max logical packages: 2 > [ 0.021009] smpboot: Total of 2 processors activated (9984.00 BogoMIPS) > [ 0.023160] devtmpfs: initialized > [ 0.023513] x86/mm: Memory block size: 128MB > [ 0.024811] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns > [ 0.025015] futex hash table entries: 512 (order: 3, 32768 bytes) > [ 0.026185] RTC time: 0:42:06, date: 12/16/17 > [ 0.026790] NET: Registered protocol family 16 > [ 0.027204] audit: initializing netlink subsys (disabled) > [ 0.027914] audit: type=2000 audit(1513384927.133:1): state=initialized audit_enabled=0 res=1 > [ 0.028185] cpuidle: using governor menu > [ 0.029118] ACPI: bus type PCI registered > [ 0.029872] PCI: Using configuration type 1 for base access > [ 0.034355] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages > [ 0.035011] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages > [ 0.036066] cryptd: max_cpu_qlen set to 1000 > [ 0.036579] ACPI: Added _OSI(Module Device) > [ 0.037007] ACPI: Added _OSI(Processor Device) > [ 0.037426] ACPI: Added _OSI(3.0 _SCP Extensions) > [ 0.037857] ACPI: Added _OSI(Processor Aggregator Device) > [ 0.041356] ACPI: Interpreter enabled > [ 0.041764] ACPI: (supports S0 S5) > [ 0.042005] ACPI: Using IOAPIC for interrupt routing > [ 0.042655] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug > [ 0.043625] ACPI: Enabled 2 GPEs in block 00 to 0F > [ 0.059248] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) > [ 0.059953] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI] > [ 0.060045] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM > [ 0.061180] PCI host bridge to bus 0000:00 > [ 0.061874] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] > [ 0.062013] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] > [ 0.063016] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] > [ 0.064015] pci_bus 0000:00: root bus resource [mem 0x80000000-0xfebfffff window] > [ 0.065014] pci_bus 0000:00: root bus resource [bus 00-ff] > [ 0.065753] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 > [ 0.066487] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 > [ 0.067537] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 > [ 0.071700] pci 0000:00:01.1: reg 0x20: [io 0xc100-0xc10f] > [ 0.074032] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] > [ 0.074908] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6] > [ 0.075011] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] > [ 0.076010] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376] > [ 0.077121] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000 > [ 0.078148] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by PIIX4 ACPI > [ 0.079014] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB > [ 0.080224] pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000 > [ 0.082007] pci 0000:00:03.0: reg 0x10: [io 0xc040-0xc05f] > [ 0.083814] pci 0000:00:03.0: reg 0x14: [mem 0xfebc0000-0xfebc0fff] > [ 0.089891] pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref] > [ 0.090545] pci 0000:00:05.0: [1af4:1003] type 00 class 0x078000 > [ 0.092708] pci 0000:00:05.0: reg 0x10: [io 0xc060-0xc07f] > [ 0.094009] pci 0000:00:05.0: reg 0x14: [mem 0xfebc1000-0xfebc1fff] > [ 0.102484] pci 0000:00:06.0: [8086:2934] type 00 class 0x0c0300 > [ 0.108028] pci 0000:00:06.0: reg 0x20: [io 0xc080-0xc09f] > [ 0.110738] pci 0000:00:06.1: [8086:2935] type 00 class 0x0c0300 > [ 0.114388] pci 0000:00:06.1: reg 0x20: [io 0xc0a0-0xc0bf] > [ 0.117339] pci 0000:00:06.2: [8086:2936] type 00 class 0x0c0300 > [ 0.122770] pci 0000:00:06.2: reg 0x20: [io 0xc0c0-0xc0df] > [ 0.124738] pci 0000:00:06.7: [8086:293a] type 00 class 0x0c0320 > [ 0.125825] pci 0000:00:06.7: reg 0x10: [mem 0xfebc2000-0xfebc2fff] > [ 0.130347] pci 0000:00:07.0: [1af4:1001] type 00 class 0x010000 > [ 0.133007] pci 0000:00:07.0: reg 0x10: [io 0xc000-0xc03f] > [ 0.134793] pci 0000:00:07.0: reg 0x14: [mem 0xfebc3000-0xfebc3fff] > [ 0.141808] pci 0000:00:08.0: [1af4:1002] type 00 class 0x00ff00 > [ 0.142914] pci 0000:00:08.0: reg 0x10: [io 0xc0e0-0xc0ff] > [ 0.148977] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) > [ 0.149455] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11) > [ 0.150390] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) > [ 0.151382] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) > [ 0.152380] ACPI: PCI Interrupt Link [LNKS] (IRQs *9) > [ 0.154508] vgaarb: loaded > [ 0.155271] SCSI subsystem initialized > [ 0.155887] EDAC MC: Ver: 3.0.0 > [ 0.156255] PCI: Using ACPI for IRQ routing > [ 0.156566] PCI: pci_cache_line_size set to 64 bytes > [ 0.157161] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff] > [ 0.157914] e820: reserve RAM buffer [mem 0x7ffd9000-0x7fffffff] > [ 0.158253] NetLabel: Initializing > [ 0.158765] NetLabel: domain hash size = 128 > [ 0.159005] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO > [ 0.159775] NetLabel: unlabeled traffic allowed by default > [ 0.160073] clocksource: Switched to clocksource kvm-clock > [ 0.186764] VFS: Disk quotas dquot_6.6.0 > [ 0.187277] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) > [ 0.188251] FS-Cache: Loaded > [ 0.188725] pnp: PnP ACPI init > [ 0.189271] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active) > [ 0.190231] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active) > [ 0.191229] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active) > [ 0.192096] pnp 00:03: [dma 2] > [ 0.192514] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active) > [ 0.193631] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (active) > [ 0.195416] pnp: PnP ACPI: found 5 devices > [ 0.206055] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns > [ 0.207127] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window] > [ 0.207832] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window] > [ 0.208594] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window] > [ 0.209469] pci_bus 0000:00: resource 7 [mem 0x80000000-0xfebfffff window] > [ 0.210493] NET: Registered protocol family 2 > [ 0.211244] tcp_listen_portaddr_hash hash table entries: 1024 (order: 2, 16384 bytes) > [ 0.212283] TCP established hash table entries: 16384 (order: 5, 131072 bytes) > [ 0.213285] TCP bind hash table entries: 16384 (order: 6, 262144 bytes) > [ 0.214306] TCP: Hash tables configured (established 16384 bind 16384) > [ 0.215065] UDP hash table entries: 1024 (order: 3, 32768 bytes) > [ 0.215797] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) > [ 0.217934] NET: Registered protocol family 1 > [ 0.219126] RPC: Registered named UNIX socket transport module. > [ 0.219676] RPC: Registered udp transport module. > [ 0.220130] RPC: Registered tcp transport module. > [ 0.220552] RPC: Registered tcp NFSv4.1 backchannel transport module. > [ 0.221153] pci 0000:00:00.0: Limiting direct PCI/PCI transfers > [ 0.221701] pci 0000:00:01.0: PIIX3: Enabling Passive Release > [ 0.222319] pci 0000:00:01.0: Activating ISA DMA hang workarounds > [ 0.444214] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 > [ 0.880141] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 > [ 1.311493] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 > [ 1.748829] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 > [ 1.962124] PCI: CLS 0 bytes, default 64 > [ 1.964749] Initialise system trusted keyrings > [ 1.965289] workingset: timestamp_bits=37 max_order=19 bucket_order=0 > [ 1.969600] zbud: loaded > [ 1.971287] SGI XFS with security attributes, no debug enabled > [ 2.106071] NET: Registered protocol family 38 > [ 2.106556] Key type asymmetric registered > [ 2.106931] Asymmetric key parser 'x509' registered > [ 2.107514] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248) > [ 2.108327] io scheduler noop registered > [ 2.108813] io scheduler deadline registered > [ 2.109608] io scheduler cfq registered (default) > [ 2.110258] io scheduler mq-deadline registered > [ 2.110796] io scheduler kyber registered > [ 2.111688] intel_idle: Please enable MWAIT in BIOS SETUP > [ 2.112310] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 > [ 2.113037] ACPI: Power Button [PWRF] > [ 2.331642] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver > [ 2.554093] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver > [ 2.775938] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver > [ 2.975053] tsc: Refined TSC clocksource calibration: 2495.981 MHz > [ 2.975641] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23fa6529869, max_idle_ns: 440795218057 ns > [ 3.029409] virtio-pci 0000:00:08.0: virtio_pci: leaving for legacy driver > [ 3.032925] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled > [ 3.056849] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A > [ 3.064748] Non-volatile memory driver v1.3 > [ 3.065925] ppdev: user-space parallel port driver > [ 3.071816] loop: module loaded > [ 3.075337] vda: vda1 vda2 vda3 > [ 3.076659] Rounding down aligned max_sectors from 4294967295 to 4294967288 > [ 3.077996] Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) > [ 3.079790] libphy: Fixed MDIO Bus: probed > [ 3.080257] tun: Universal TUN/TAP device driver, 1.6 > [ 3.082222] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 > [ 3.083675] serio: i8042 KBD port at 0x60,0x64 irq 1 > [ 3.084160] serio: i8042 AUX port at 0x60,0x64 irq 12 > [ 3.084816] mousedev: PS/2 mouse device common for all mice > [ 3.086603] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1 > [ 3.089192] rtc_cmos 00:00: RTC can wake from S4 > [ 3.090116] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0 > [ 3.092829] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram > [ 3.093937] IR NEC protocol handler initialized > [ 3.094408] IR RC5(x/sz) protocol handler initialized > [ 3.094901] IR RC6 protocol handler initialized > [ 3.095510] IR JVC protocol handler initialized > [ 3.095952] IR Sony protocol handler initialized > [ 3.096399] IR SANYO protocol handler initialized > [ 3.096862] IR Sharp protocol handler initialized > [ 3.097342] IR MCE Keyboard/mouse protocol handler initialized > [ 3.097919] IR XMP protocol handler initialized > [ 3.098530] device-mapper: uevent: version 1.0.3 > [ 3.099209] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com > [ 3.100398] device-mapper: multipath round-robin: version 1.2.0 loaded > [ 3.101883] drop_monitor: Initializing network drop monitor service > [ 3.102553] Netfilter messages via NETLINK v0.30. > [ 3.103090] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) > [ 3.103738] ctnetlink v0.93: registering with nfnetlink. > [ 3.104494] ip_tables: (C) 2000-2006 Netfilter Core Team > [ 3.105734] Initializing XFRM netlink socket > [ 3.106885] NET: Registered protocol family 10 > [ 3.109341] Segment Routing with IPv6 > [ 3.109976] mip6: Mobile IPv6 > [ 3.111987] ip6_tables: (C) 2000-2006 Netfilter Core Team > [ 3.114230] NET: Registered protocol family 17 > [ 3.115047] Bridge firewalling registered > [ 3.115824] Ebtables v2.0 registered > [ 3.117996] 8021q: 802.1Q VLAN Support v1.8 > [ 3.119429] AVX2 version of gcm_enc/dec engaged. > [ 3.119886] AES CTR mode by8 optimization enabled > [ 3.128818] sched_clock: Marking stable (3128714579, 0)->(3404180881, -275466302) > [ 3.129945] registered taskstats version 1 > [ 3.130427] Loading compiled-in X.509 certificates > [ 3.163216] Loaded X.509 cert 'Build time autogenerated kernel key: 38e0adea1af8bd8a23b02436d4acf2f8c7408d23' > [ 3.166359] zswap: loaded using pool lzo/zbud > [ 3.167943] Key type big_key registered > [ 3.168778] Magic number: 13:918:708 > [ 3.169255] rtc_cmos 00:00: setting system clock to 2017-12-16 00:42:09 UTC (1513384929) > [ 3.170604] md: Skipping autodetection of RAID arrays. (raid=autodetect will force) > [ 3.171932] EXT4-fs (vda2): couldn't mount as ext3 due to feature incompatibilities > [ 3.173871] EXT4-fs (vda2): couldn't mount as ext2 due to feature incompatibilities > [ 3.175306] EXT4-fs (vda2): INFO: recovery required on readonly filesystem > [ 3.176212] EXT4-fs (vda2): write access will be enabled during recovery > [ 3.397187] EXT4-fs (vda2): orphan cleanup on readonly fs > [ 3.399412] EXT4-fs (vda2): 5 orphan inodes deleted > [ 3.402759] EXT4-fs (vda2): recovery complete > [ 3.466647] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null) > [ 3.469401] VFS: Mounted root (ext4 filesystem) readonly on device 253:2. > [ 3.473719] devtmpfs: mounted > [ 3.492549] Freeing unused kernel memory: 1640K > [ 3.494547] Write protecting the kernel read-only data: 18432k > [ 3.498781] Freeing unused kernel memory: 2016K > [ 3.503330] Freeing unused kernel memory: 512K > [ 3.505232] rodata_test: all tests were successful > [ 3.515355] 1 (init): Uhuuh, elf segement at 00000000928fda3e requested but the memory is mapped already > [ 3.519533] Starting init: /sbin/init exists but couldn't execute it (error -95) > [ 3.528993] Starting init: /bin/sh exists but couldn't execute it (error -14) > [ 3.532127] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. > [ 3.538328] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc3-next-20171215-00001-g6d6aea478fce #11 > [ 3.542201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > [ 3.546081] Call Trace: > [ 3.547221] dump_stack+0x5c/0x79 > [ 3.548768] ? rest_init+0x30/0xb0 > [ 3.550320] panic+0xe4/0x232 > [ 3.551669] ? rest_init+0xb0/0xb0 > [ 3.553110] kernel_init+0xeb/0x100 > [ 3.554701] ret_from_fork+0x1f/0x30 > [ 3.558964] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > [ 3.564160] ---[ end Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [2/2] fs, elf: drop MAP_FIXED usage from elf_map 2017-12-18 9:13 ` Michal Hocko @ 2017-12-18 18:12 ` Andrei Vagin 0 siblings, 0 replies; 44+ messages in thread From: Andrei Vagin @ 2017-12-18 18:12 UTC (permalink / raw) To: Michal Hocko Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook On Mon, Dec 18, 2017 at 10:13:02AM +0100, Michal Hocko wrote: > On Fri 15-12-17 16:49:28, Andrei Vagin wrote: > > Hi Michal, > > > > We run CRIU tests for linux-next and the 4.15.0-rc3-next-20171215 kernel > > doesn't boot: > > > > [ 3.492549] Freeing unused kernel memory: 1640K > > [ 3.494547] Write protecting the kernel read-only data: 18432k > > [ 3.498781] Freeing unused kernel memory: 2016K > > [ 3.503330] Freeing unused kernel memory: 512K > > [ 3.505232] rodata_test: all tests were successful > > [ 3.515355] 1 (init): Uhuuh, elf segement at 00000000928fda3e requested but the memory is mapped already > > Hmm, this interesting. What does the test actualy do? Could you add some > instrumentation to see what is actually mapped there? Something like There is nothing mapped there. It returns -95 (ENOSUPP) The kernel is booted with this patch: + int ttype = type & ~MAP_FIXED_SAFE; if (total_size) { total_size = ELF_PAGEALIGN(total_size); - map_addr = vm_mmap(filep, addr, total_size, prot, type, off); + map_addr = vm_mmap(filep, addr, total_size, prot, ttype, off); if (!BAD_ADDR(map_addr)) vm_munmap(map_addr+size, total_size-size); } else - map_addr = vm_mmap(filep, addr, size, prot, type, off); + map_addr = vm_mmap(filep, addr, size, prot, ttype, off); > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > index 0e50230ce53d..1b68ddc34043 100644 > --- a/fs/binfmt_elf.c > +++ b/fs/binfmt_elf.c > @@ -372,10 +372,28 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, > } else > map_addr = vm_mmap(filep, addr, size, prot, type, off); > > - if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) > + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) { > + struct vm_area_struct *vma; > + > pr_info("%d (%s): Uhuuh, elf segment at %p requested but the memory is mapped already\n", > task_pid_nr(current), current->comm, > (void *)addr); > + vma = find_vma(current->mm, map_addr); > + if (vma && vma->vm_start < addr) { > + pr_info("requested [%lx, %lx] mapped [%lx, %lx] %lx ", addr, addr + total_size, > + vma->vm_start, vma->vm_end, vma->vm_flags); > + if (!vma->vm_file) { > + pr_cont("anon\n"); > + } else { > + char path[512]; > + char *p = file_path(vma->vm_file, path, sizeof(path)); > + if (IS_ERR(p)) > + p = "?"; > + pr_cont("\"%s\"\n", kbasename(p)); > + } > + dump_stack(); > + } > + } > > return(map_addr); > } > > > [ 3.519533] Starting init: /sbin/init exists but couldn't execute it (error -95) > > [ 3.528993] Starting init: /bin/sh exists but couldn't execute it (error -14) > > [ 3.532127] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. > > [ 3.538328] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc3-next-20171215-00001-g6d6aea478fce #11 > > [ 3.542201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > > [ 3.546081] Call Trace: > > [ 3.547221] dump_stack+0x5c/0x79 > > [ 3.548768] ? rest_init+0x30/0xb0 > > [ 3.550320] panic+0xe4/0x232 > > [ 3.551669] ? rest_init+0xb0/0xb0 > > [ 3.553110] kernel_init+0xeb/0x100 > > [ 3.554701] ret_from_fork+0x1f/0x30 > > [ 3.558964] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 3.564160] ---[ end Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. > > > > If I revert this patch, it boots normally. > > > > Thanks, > > Andrei > > > > On Wed, Dec 13, 2017 at 10:25:50AM +0100, Michal Hocko wrote: > > > From: Michal Hocko <mhocko@suse.com> > > > > > > Both load_elf_interp and load_elf_binary rely on elf_map to map segments > > > on a controlled address and they use MAP_FIXED to enforce that. This is > > > however dangerous thing prone to silent data corruption which can be > > > even exploitable. Let's take CVE-2017-1000253 as an example. At the time > > > (before eab09532d400 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")) > > > ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from > > > the stack top on 32b (legacy) memory layout (only 1GB away). Therefore > > > we could end up mapping over the existing stack with some luck. > > > > > > The issue has been fixed since then (a87938b2e246 ("fs/binfmt_elf.c: > > > fix bug in loading of PIE binaries")), ELF_ET_DYN_BASE moved moved much > > > further from the stack (eab09532d400 and later by c715b72c1ba4 ("mm: > > > revert x86_64 and arm64 ELF_ET_DYN_BASE base changes")) and excessive > > > stack consumption early during execve fully stopped by da029c11e6b1 > > > ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be > > > safe and any attack should be impractical. On the other hand this is > > > just too subtle assumption so it can break quite easily and hard to > > > spot. > > > > > > I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still > > > fundamentally dangerous. Moreover it shouldn't be even needed. We are > > > at the early process stage and so there shouldn't be unrelated mappings > > > (except for stack and loader) existing so mmap for a given address > > > should succeed even without MAP_FIXED. Something is terribly wrong if > > > this is not the case and we should rather fail than silently corrupt the > > > underlying mapping. > > > > > > Address this issue by changing MAP_FIXED to the newly added > > > MAP_FIXED_SAFE. This will mean that mmap will fail if there is an > > > existing mapping clashing with the requested one without clobbering it. > > > > > > Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com> > > > Cc: Joel Stanley <joel@jms.id.au> > > > Acked-by: Kees Cook <keescook@chromium.org> > > > Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> > > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > --- > > > arch/metag/kernel/process.c | 6 +++++- > > > fs/binfmt_elf.c | 12 ++++++++---- > > > 2 files changed, 13 insertions(+), 5 deletions(-) > > > > > > diff --git a/arch/metag/kernel/process.c b/arch/metag/kernel/process.c > > > index 0909834c83a7..867c8d0a5fb4 100644 > > > --- a/arch/metag/kernel/process.c > > > +++ b/arch/metag/kernel/process.c > > > @@ -399,7 +399,7 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr, > > > tcm_tag = tcm_lookup_tag(addr); > > > > > > if (tcm_tag != TCM_INVALID_TAG) > > > - type &= ~MAP_FIXED; > > > + type &= ~(MAP_FIXED | MAP_FIXED_SAFE); > > > > > > /* > > > * total_size is the size of the ELF (interpreter) image. > > > @@ -417,6 +417,10 @@ unsigned long __metag_elf_map(struct file *filep, unsigned long addr, > > > } else > > > map_addr = vm_mmap(filep, addr, size, prot, type, off); > > > > > > + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) > > > + pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n", > > > + task_pid_nr(current), tsk->comm, (void*)addr); > > > + > > > if (!BAD_ADDR(map_addr) && tcm_tag != TCM_INVALID_TAG) { > > > struct tcm_allocation *tcm; > > > unsigned long tcm_addr; > > > diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c > > > index 73b01e474fdc..5916d45f64a7 100644 > > > --- a/fs/binfmt_elf.c > > > +++ b/fs/binfmt_elf.c > > > @@ -372,6 +372,10 @@ static unsigned long elf_map(struct file *filep, unsigned long addr, > > > } else > > > map_addr = vm_mmap(filep, addr, size, prot, type, off); > > > > > > + if ((type & MAP_FIXED_SAFE) && BAD_ADDR(map_addr)) > > > + pr_info("%d (%s): Uhuuh, elf segement at %p requested but the memory is mapped already\n", > > > + task_pid_nr(current), current->comm, (void*)addr); > > > + > > > return(map_addr); > > > } > > > > > > @@ -569,7 +573,7 @@ static unsigned long load_elf_interp(struct elfhdr *interp_elf_ex, > > > elf_prot |= PROT_EXEC; > > > vaddr = eppnt->p_vaddr; > > > if (interp_elf_ex->e_type == ET_EXEC || load_addr_set) > > > - elf_type |= MAP_FIXED; > > > + elf_type |= MAP_FIXED_SAFE; > > > else if (no_base && interp_elf_ex->e_type == ET_DYN) > > > load_addr = -vaddr; > > > > > > @@ -930,7 +934,7 @@ static int load_elf_binary(struct linux_binprm *bprm) > > > * the ET_DYN load_addr calculations, proceed normally. > > > */ > > > if (loc->elf_ex.e_type == ET_EXEC || load_addr_set) { > > > - elf_flags |= MAP_FIXED; > > > + elf_flags |= MAP_FIXED_SAFE; > > > } else if (loc->elf_ex.e_type == ET_DYN) { > > > /* > > > * This logic is run once for the first LOAD Program > > > @@ -966,7 +970,7 @@ static int load_elf_binary(struct linux_binprm *bprm) > > > load_bias = ELF_ET_DYN_BASE; > > > if (current->flags & PF_RANDOMIZE) > > > load_bias += arch_mmap_rnd(); > > > - elf_flags |= MAP_FIXED; > > > + elf_flags |= MAP_FIXED_SAFE; > > > } else > > > load_bias = 0; > > > > > > @@ -1223,7 +1227,7 @@ static int load_elf_library(struct file *file) > > > (eppnt->p_filesz + > > > ELF_PAGEOFFSET(eppnt->p_vaddr)), > > > PROT_READ | PROT_WRITE | PROT_EXEC, > > > - MAP_FIXED | MAP_PRIVATE | MAP_DENYWRITE, > > > + MAP_FIXED_SAFE | MAP_PRIVATE | MAP_DENYWRITE, > > > (eppnt->p_offset - > > > ELF_PAGEOFFSET(eppnt->p_vaddr))); > > > if (error != ELF_PAGESTART(eppnt->p_vaddr)) > > > > [ 0.000000] Linux version 4.15.0-rc3-next-20171215-00001-g6d6aea478fce (avagin@laptop) (gcc version 7.2.1 20170915 (Red Hat 7.2.1-2) (GCC)) #11 SMP Fri Dec 15 16:39:11 PST 2017 > > [ 0.000000] Command line: root=/dev/vda2 ro debug console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect selinux=0 > > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers' > > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers' > > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers' > > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x008: 'MPX bounds registers' > > [ 0.000000] x86/fpu: Supporting XSAVE feature 0x010: 'MPX CSR' > > [ 0.000000] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256 > > [ 0.000000] x86/fpu: xstate_offset[3]: 832, xstate_sizes[3]: 64 > > [ 0.000000] x86/fpu: xstate_offset[4]: 896, xstate_sizes[4]: 64 > > [ 0.000000] x86/fpu: Enabled xstate features 0x1f, context size is 960 bytes, using 'compacted' format. > > [ 0.000000] e820: BIOS-provided physical RAM map: > > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable > > [ 0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved > > [ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved > > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007ffd8fff] usable > > [ 0.000000] BIOS-e820: [mem 0x000000007ffd9000-0x000000007fffffff] reserved > > [ 0.000000] BIOS-e820: [mem 0x00000000feffc000-0x00000000feffffff] reserved > > [ 0.000000] BIOS-e820: [mem 0x00000000fffc0000-0x00000000ffffffff] reserved > > [ 0.000000] NX (Execute Disable) protection: active > > [ 0.000000] random: fast init done > > [ 0.000000] SMBIOS 2.8 present. > > [ 0.000000] DMI: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > > [ 0.000000] Hypervisor detected: KVM > > [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved > > [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable > > [ 0.000000] e820: last_pfn = 0x7ffd9 max_arch_pfn = 0x400000000 > > [ 0.000000] MTRR default type: write-back > > [ 0.000000] MTRR fixed ranges enabled: > > [ 0.000000] 00000-9FFFF write-back > > [ 0.000000] A0000-BFFFF uncachable > > [ 0.000000] C0000-FFFFF write-protect > > [ 0.000000] MTRR variable ranges enabled: > > [ 0.000000] 0 base 0080000000 mask FF80000000 uncachable > > [ 0.000000] 1 disabled > > [ 0.000000] 2 disabled > > [ 0.000000] 3 disabled > > [ 0.000000] 4 disabled > > [ 0.000000] 5 disabled > > [ 0.000000] 6 disabled > > [ 0.000000] 7 disabled > > [ 0.000000] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT > > [ 0.000000] found SMP MP-table at [mem 0x000f6bd0-0x000f6bdf] mapped at [ (ptrval)] > > [ 0.000000] Base memory trampoline at [ (ptrval)] 99000 size 24576 > > [ 0.000000] Using GB pages for direct mapping > > [ 0.000000] BRK [0x2c984000, 0x2c984fff] PGTABLE > > [ 0.000000] BRK [0x2c985000, 0x2c985fff] PGTABLE > > [ 0.000000] BRK [0x2c986000, 0x2c986fff] PGTABLE > > [ 0.000000] BRK [0x2c987000, 0x2c987fff] PGTABLE > > [ 0.000000] BRK [0x2c988000, 0x2c988fff] PGTABLE > > [ 0.000000] BRK [0x2c989000, 0x2c989fff] PGTABLE > > [ 0.000000] ACPI: Early table checksum verification disabled > > [ 0.000000] ACPI: RSDP 0x00000000000F69C0 000014 (v00 BOCHS ) > > [ 0.000000] ACPI: RSDT 0x000000007FFE12FF 00002C (v01 BOCHS BXPCRSDT 00000001 BXPC 00000001) > > [ 0.000000] ACPI: FACP 0x000000007FFE120B 000074 (v01 BOCHS BXPCFACP 00000001 BXPC 00000001) > > [ 0.000000] ACPI: DSDT 0x000000007FFE0040 0011CB (v01 BOCHS BXPCDSDT 00000001 BXPC 00000001) > > [ 0.000000] ACPI: FACS 0x000000007FFE0000 000040 > > [ 0.000000] ACPI: APIC 0x000000007FFE127F 000080 (v01 BOCHS BXPCAPIC 00000001 BXPC 00000001) > > [ 0.000000] ACPI: Local APIC address 0xfee00000 > > [ 0.000000] No NUMA configuration found > > [ 0.000000] Faking a node at [mem 0x0000000000000000-0x000000007ffd8fff] > > [ 0.000000] NODE_DATA(0) allocated [mem 0x7ffc2000-0x7ffd8fff] > > [ 0.000000] kvm-clock: cpu 0, msr 0:7ffc0001, primary cpu clock > > [ 0.000000] kvm-clock: Using msrs 4b564d01 and 4b564d00 > > [ 0.000000] kvm-clock: using sched offset of 1076013277 cycles > > [ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns > > [ 0.000000] Zone ranges: > > [ 0.000000] DMA [mem 0x0000000000001000-0x0000000000ffffff] > > [ 0.000000] DMA32 [mem 0x0000000001000000-0x000000007ffd8fff] > > [ 0.000000] Normal empty > > [ 0.000000] Device empty > > [ 0.000000] Movable zone start for each node > > [ 0.000000] Early memory node ranges > > [ 0.000000] node 0: [mem 0x0000000000001000-0x000000000009efff] > > [ 0.000000] node 0: [mem 0x0000000000100000-0x000000007ffd8fff] > > [ 0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000007ffd8fff] > > [ 0.000000] On node 0 totalpages: 524151 > > [ 0.000000] DMA zone: 64 pages used for memmap > > [ 0.000000] DMA zone: 21 pages reserved > > [ 0.000000] DMA zone: 3998 pages, LIFO batch:0 > > [ 0.000000] DMA32 zone: 8128 pages used for memmap > > [ 0.000000] DMA32 zone: 520153 pages, LIFO batch:31 > > [ 0.000000] Reserved but unavailable: 98 pages > > [ 0.000000] ACPI: PM-Timer IO Port: 0x608 > > [ 0.000000] ACPI: Local APIC address 0xfee00000 > > [ 0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1]) > > [ 0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23 > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 5 global_irq 5 high level) > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 10 global_irq 10 high level) > > [ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 11 global_irq 11 high level) > > [ 0.000000] ACPI: IRQ0 used by override. > > [ 0.000000] ACPI: IRQ5 used by override. > > [ 0.000000] ACPI: IRQ9 used by override. > > [ 0.000000] ACPI: IRQ10 used by override. > > [ 0.000000] ACPI: IRQ11 used by override. > > [ 0.000000] Using ACPI (MADT) for SMP configuration information > > [ 0.000000] smpboot: Allowing 2 CPUs, 0 hotplug CPUs > > [ 0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff] > > [ 0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff] > > [ 0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000effff] > > [ 0.000000] PM: Registered nosave memory: [mem 0x000f0000-0x000fffff] > > [ 0.000000] e820: [mem 0x80000000-0xfeffbfff] available for PCI devices > > [ 0.000000] Booting paravirtualized kernel on KVM > > [ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1910969940391419 ns > > [ 0.000000] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64 nr_cpu_ids:2 nr_node_ids:1 > > [ 0.000000] percpu: Embedded 44 pages/cpu @ (ptrval) s142296 r8192 d29736 u1048576 > > [ 0.000000] pcpu-alloc: s142296 r8192 d29736 u1048576 alloc=1*2097152 > > [ 0.000000] pcpu-alloc: [0] 0 1 > > [ 0.000000] KVM setup async PF for cpu 0 > > [ 0.000000] kvm-stealtime: cpu 0, msr 7fc122c0 > > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 515938 > > [ 0.000000] Policy zone: DMA32 > > [ 0.000000] Kernel command line: root=/dev/vda2 ro debug console=ttyS0,115200 LANG=en_US.UTF-8 slub_debug=FZP raid=noautodetect selinux=0 > > [ 0.000000] Memory: 2037056K/2096604K available (12300K kernel code, 1554K rwdata, 3584K rodata, 1640K init, 912K bss, 59548K reserved, 0K cma-reserved) > > [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1 > > [ 0.000000] ftrace: allocating 36554 entries in 143 pages > > [ 0.001000] Hierarchical RCU implementation. > > [ 0.001000] RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=2. > > [ 0.001000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=2 > > [ 0.001000] NR_IRQS: 4352, nr_irqs: 440, preallocated irqs: 16 > > [ 0.001000] Offload RCU callbacks from CPUs: (none). > > [ 0.001000] Console: colour dummy device 80x25 > > [ 0.001000] console [ttyS0] enabled > > [ 0.001000] ACPI: Core revision 20171110 > > [ 0.001000] ACPI: 1 ACPI AML tables successfully acquired and loaded > > [ 0.001009] APIC: Switch to symmetric I/O mode setup > > [ 0.001571] x2apic enabled > > [ 0.002003] Switched APIC routing to physical x2apic. > > [ 0.003538] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 > > [ 0.004000] tsc: Detected 2496.000 MHz processor > > [ 0.004014] Calibrating delay loop (skipped) preset value.. 4992.00 BogoMIPS (lpj=2496000) > > [ 0.005014] pid_max: default: 32768 minimum: 301 > > [ 0.006057] Security Framework initialized > > [ 0.006548] Yama: becoming mindful. > > [ 0.007019] SELinux: Disabled at boot. > > [ 0.008206] Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes) > > [ 0.009164] Inode-cache hash table entries: 131072 (order: 8, 1048576 bytes) > > [ 0.009816] Mount-cache hash table entries: 4096 (order: 3, 32768 bytes) > > [ 0.010009] Mountpoint-cache hash table entries: 4096 (order: 3, 32768 bytes) > > [ 0.011322] mce: CPU supports 10 MCE banks > > [ 0.011740] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 > > [ 0.012002] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0 > > [ 0.012610] Freeing SMP alternatives memory: 36K > > [ 0.013467] TSC deadline timer enabled > > [ 0.013820] smpboot: CPU0: Intel Core Processor (Skylake) (family: 0x6, model: 0x5e, stepping: 0x3) > > [ 0.014000] Performance Events: unsupported p6 CPU model 94 no PMU driver, software events only. > > [ 0.014041] Hierarchical SRCU implementation. > > [ 0.015133] NMI watchdog: Perf event create on CPU 0 failed with -2 > > [ 0.015725] NMI watchdog: Perf NMI watchdog permanently disabled > > [ 0.016077] smp: Bringing up secondary CPUs ... > > [ 0.016654] x86: Booting SMP configuration: > > [ 0.017005] .... node #0, CPUs: #1 > > [ 0.001000] kvm-clock: cpu 1, msr 0:7ffc0041, secondary cpu clock > > [ 0.019051] KVM setup async PF for cpu 1 > > [ 0.019599] kvm-stealtime: cpu 1, msr 7fd122c0 > > [ 0.020009] smp: Brought up 1 node, 2 CPUs > > [ 0.020531] smpboot: Max logical packages: 2 > > [ 0.021009] smpboot: Total of 2 processors activated (9984.00 BogoMIPS) > > [ 0.023160] devtmpfs: initialized > > [ 0.023513] x86/mm: Memory block size: 128MB > > [ 0.024811] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275000 ns > > [ 0.025015] futex hash table entries: 512 (order: 3, 32768 bytes) > > [ 0.026185] RTC time: 0:42:06, date: 12/16/17 > > [ 0.026790] NET: Registered protocol family 16 > > [ 0.027204] audit: initializing netlink subsys (disabled) > > [ 0.027914] audit: type=2000 audit(1513384927.133:1): state=initialized audit_enabled=0 res=1 > > [ 0.028185] cpuidle: using governor menu > > [ 0.029118] ACPI: bus type PCI registered > > [ 0.029872] PCI: Using configuration type 1 for base access > > [ 0.034355] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages > > [ 0.035011] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages > > [ 0.036066] cryptd: max_cpu_qlen set to 1000 > > [ 0.036579] ACPI: Added _OSI(Module Device) > > [ 0.037007] ACPI: Added _OSI(Processor Device) > > [ 0.037426] ACPI: Added _OSI(3.0 _SCP Extensions) > > [ 0.037857] ACPI: Added _OSI(Processor Aggregator Device) > > [ 0.041356] ACPI: Interpreter enabled > > [ 0.041764] ACPI: (supports S0 S5) > > [ 0.042005] ACPI: Using IOAPIC for interrupt routing > > [ 0.042655] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug > > [ 0.043625] ACPI: Enabled 2 GPEs in block 00 to 0F > > [ 0.059248] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff]) > > [ 0.059953] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI] > > [ 0.060045] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM > > [ 0.061180] PCI host bridge to bus 0000:00 > > [ 0.061874] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7 window] > > [ 0.062013] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] > > [ 0.063016] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window] > > [ 0.064015] pci_bus 0000:00: root bus resource [mem 0x80000000-0xfebfffff window] > > [ 0.065014] pci_bus 0000:00: root bus resource [bus 00-ff] > > [ 0.065753] pci 0000:00:00.0: [8086:1237] type 00 class 0x060000 > > [ 0.066487] pci 0000:00:01.0: [8086:7000] type 00 class 0x060100 > > [ 0.067537] pci 0000:00:01.1: [8086:7010] type 00 class 0x010180 > > [ 0.071700] pci 0000:00:01.1: reg 0x20: [io 0xc100-0xc10f] > > [ 0.074032] pci 0000:00:01.1: legacy IDE quirk: reg 0x10: [io 0x01f0-0x01f7] > > [ 0.074908] pci 0000:00:01.1: legacy IDE quirk: reg 0x14: [io 0x03f6] > > [ 0.075011] pci 0000:00:01.1: legacy IDE quirk: reg 0x18: [io 0x0170-0x0177] > > [ 0.076010] pci 0000:00:01.1: legacy IDE quirk: reg 0x1c: [io 0x0376] > > [ 0.077121] pci 0000:00:01.3: [8086:7113] type 00 class 0x068000 > > [ 0.078148] pci 0000:00:01.3: quirk: [io 0x0600-0x063f] claimed by PIIX4 ACPI > > [ 0.079014] pci 0000:00:01.3: quirk: [io 0x0700-0x070f] claimed by PIIX4 SMB > > [ 0.080224] pci 0000:00:03.0: [1af4:1000] type 00 class 0x020000 > > [ 0.082007] pci 0000:00:03.0: reg 0x10: [io 0xc040-0xc05f] > > [ 0.083814] pci 0000:00:03.0: reg 0x14: [mem 0xfebc0000-0xfebc0fff] > > [ 0.089891] pci 0000:00:03.0: reg 0x30: [mem 0xfeb80000-0xfebbffff pref] > > [ 0.090545] pci 0000:00:05.0: [1af4:1003] type 00 class 0x078000 > > [ 0.092708] pci 0000:00:05.0: reg 0x10: [io 0xc060-0xc07f] > > [ 0.094009] pci 0000:00:05.0: reg 0x14: [mem 0xfebc1000-0xfebc1fff] > > [ 0.102484] pci 0000:00:06.0: [8086:2934] type 00 class 0x0c0300 > > [ 0.108028] pci 0000:00:06.0: reg 0x20: [io 0xc080-0xc09f] > > [ 0.110738] pci 0000:00:06.1: [8086:2935] type 00 class 0x0c0300 > > [ 0.114388] pci 0000:00:06.1: reg 0x20: [io 0xc0a0-0xc0bf] > > [ 0.117339] pci 0000:00:06.2: [8086:2936] type 00 class 0x0c0300 > > [ 0.122770] pci 0000:00:06.2: reg 0x20: [io 0xc0c0-0xc0df] > > [ 0.124738] pci 0000:00:06.7: [8086:293a] type 00 class 0x0c0320 > > [ 0.125825] pci 0000:00:06.7: reg 0x10: [mem 0xfebc2000-0xfebc2fff] > > [ 0.130347] pci 0000:00:07.0: [1af4:1001] type 00 class 0x010000 > > [ 0.133007] pci 0000:00:07.0: reg 0x10: [io 0xc000-0xc03f] > > [ 0.134793] pci 0000:00:07.0: reg 0x14: [mem 0xfebc3000-0xfebc3fff] > > [ 0.141808] pci 0000:00:08.0: [1af4:1002] type 00 class 0x00ff00 > > [ 0.142914] pci 0000:00:08.0: reg 0x10: [io 0xc0e0-0xc0ff] > > [ 0.148977] ACPI: PCI Interrupt Link [LNKA] (IRQs 5 *10 11) > > [ 0.149455] ACPI: PCI Interrupt Link [LNKB] (IRQs 5 *10 11) > > [ 0.150390] ACPI: PCI Interrupt Link [LNKC] (IRQs 5 10 *11) > > [ 0.151382] ACPI: PCI Interrupt Link [LNKD] (IRQs 5 10 *11) > > [ 0.152380] ACPI: PCI Interrupt Link [LNKS] (IRQs *9) > > [ 0.154508] vgaarb: loaded > > [ 0.155271] SCSI subsystem initialized > > [ 0.155887] EDAC MC: Ver: 3.0.0 > > [ 0.156255] PCI: Using ACPI for IRQ routing > > [ 0.156566] PCI: pci_cache_line_size set to 64 bytes > > [ 0.157161] e820: reserve RAM buffer [mem 0x0009fc00-0x0009ffff] > > [ 0.157914] e820: reserve RAM buffer [mem 0x7ffd9000-0x7fffffff] > > [ 0.158253] NetLabel: Initializing > > [ 0.158765] NetLabel: domain hash size = 128 > > [ 0.159005] NetLabel: protocols = UNLABELED CIPSOv4 CALIPSO > > [ 0.159775] NetLabel: unlabeled traffic allowed by default > > [ 0.160073] clocksource: Switched to clocksource kvm-clock > > [ 0.186764] VFS: Disk quotas dquot_6.6.0 > > [ 0.187277] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes) > > [ 0.188251] FS-Cache: Loaded > > [ 0.188725] pnp: PnP ACPI init > > [ 0.189271] pnp 00:00: Plug and Play ACPI device, IDs PNP0b00 (active) > > [ 0.190231] pnp 00:01: Plug and Play ACPI device, IDs PNP0303 (active) > > [ 0.191229] pnp 00:02: Plug and Play ACPI device, IDs PNP0f13 (active) > > [ 0.192096] pnp 00:03: [dma 2] > > [ 0.192514] pnp 00:03: Plug and Play ACPI device, IDs PNP0700 (active) > > [ 0.193631] pnp 00:04: Plug and Play ACPI device, IDs PNP0501 (active) > > [ 0.195416] pnp: PnP ACPI: found 5 devices > > [ 0.206055] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns > > [ 0.207127] pci_bus 0000:00: resource 4 [io 0x0000-0x0cf7 window] > > [ 0.207832] pci_bus 0000:00: resource 5 [io 0x0d00-0xffff window] > > [ 0.208594] pci_bus 0000:00: resource 6 [mem 0x000a0000-0x000bffff window] > > [ 0.209469] pci_bus 0000:00: resource 7 [mem 0x80000000-0xfebfffff window] > > [ 0.210493] NET: Registered protocol family 2 > > [ 0.211244] tcp_listen_portaddr_hash hash table entries: 1024 (order: 2, 16384 bytes) > > [ 0.212283] TCP established hash table entries: 16384 (order: 5, 131072 bytes) > > [ 0.213285] TCP bind hash table entries: 16384 (order: 6, 262144 bytes) > > [ 0.214306] TCP: Hash tables configured (established 16384 bind 16384) > > [ 0.215065] UDP hash table entries: 1024 (order: 3, 32768 bytes) > > [ 0.215797] UDP-Lite hash table entries: 1024 (order: 3, 32768 bytes) > > [ 0.217934] NET: Registered protocol family 1 > > [ 0.219126] RPC: Registered named UNIX socket transport module. > > [ 0.219676] RPC: Registered udp transport module. > > [ 0.220130] RPC: Registered tcp transport module. > > [ 0.220552] RPC: Registered tcp NFSv4.1 backchannel transport module. > > [ 0.221153] pci 0000:00:00.0: Limiting direct PCI/PCI transfers > > [ 0.221701] pci 0000:00:01.0: PIIX3: Enabling Passive Release > > [ 0.222319] pci 0000:00:01.0: Activating ISA DMA hang workarounds > > [ 0.444214] ACPI: PCI Interrupt Link [LNKB] enabled at IRQ 10 > > [ 0.880141] ACPI: PCI Interrupt Link [LNKC] enabled at IRQ 11 > > [ 1.311493] ACPI: PCI Interrupt Link [LNKD] enabled at IRQ 11 > > [ 1.748829] ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 10 > > [ 1.962124] PCI: CLS 0 bytes, default 64 > > [ 1.964749] Initialise system trusted keyrings > > [ 1.965289] workingset: timestamp_bits=37 max_order=19 bucket_order=0 > > [ 1.969600] zbud: loaded > > [ 1.971287] SGI XFS with security attributes, no debug enabled > > [ 2.106071] NET: Registered protocol family 38 > > [ 2.106556] Key type asymmetric registered > > [ 2.106931] Asymmetric key parser 'x509' registered > > [ 2.107514] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 248) > > [ 2.108327] io scheduler noop registered > > [ 2.108813] io scheduler deadline registered > > [ 2.109608] io scheduler cfq registered (default) > > [ 2.110258] io scheduler mq-deadline registered > > [ 2.110796] io scheduler kyber registered > > [ 2.111688] intel_idle: Please enable MWAIT in BIOS SETUP > > [ 2.112310] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 > > [ 2.113037] ACPI: Power Button [PWRF] > > [ 2.331642] virtio-pci 0000:00:03.0: virtio_pci: leaving for legacy driver > > [ 2.554093] virtio-pci 0000:00:05.0: virtio_pci: leaving for legacy driver > > [ 2.775938] virtio-pci 0000:00:07.0: virtio_pci: leaving for legacy driver > > [ 2.975053] tsc: Refined TSC clocksource calibration: 2495.981 MHz > > [ 2.975641] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x23fa6529869, max_idle_ns: 440795218057 ns > > [ 3.029409] virtio-pci 0000:00:08.0: virtio_pci: leaving for legacy driver > > [ 3.032925] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled > > [ 3.056849] 00:04: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A > > [ 3.064748] Non-volatile memory driver v1.3 > > [ 3.065925] ppdev: user-space parallel port driver > > [ 3.071816] loop: module loaded > > [ 3.075337] vda: vda1 vda2 vda3 > > [ 3.076659] Rounding down aligned max_sectors from 4294967295 to 4294967288 > > [ 3.077996] Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011) > > [ 3.079790] libphy: Fixed MDIO Bus: probed > > [ 3.080257] tun: Universal TUN/TAP device driver, 1.6 > > [ 3.082222] i8042: PNP: PS/2 Controller [PNP0303:KBD,PNP0f13:MOU] at 0x60,0x64 irq 1,12 > > [ 3.083675] serio: i8042 KBD port at 0x60,0x64 irq 1 > > [ 3.084160] serio: i8042 AUX port at 0x60,0x64 irq 12 > > [ 3.084816] mousedev: PS/2 mouse device common for all mice > > [ 3.086603] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1 > > [ 3.089192] rtc_cmos 00:00: RTC can wake from S4 > > [ 3.090116] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0 > > [ 3.092829] rtc_cmos 00:00: alarms up to one day, y3k, 114 bytes nvram > > [ 3.093937] IR NEC protocol handler initialized > > [ 3.094408] IR RC5(x/sz) protocol handler initialized > > [ 3.094901] IR RC6 protocol handler initialized > > [ 3.095510] IR JVC protocol handler initialized > > [ 3.095952] IR Sony protocol handler initialized > > [ 3.096399] IR SANYO protocol handler initialized > > [ 3.096862] IR Sharp protocol handler initialized > > [ 3.097342] IR MCE Keyboard/mouse protocol handler initialized > > [ 3.097919] IR XMP protocol handler initialized > > [ 3.098530] device-mapper: uevent: version 1.0.3 > > [ 3.099209] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com > > [ 3.100398] device-mapper: multipath round-robin: version 1.2.0 loaded > > [ 3.101883] drop_monitor: Initializing network drop monitor service > > [ 3.102553] Netfilter messages via NETLINK v0.30. > > [ 3.103090] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) > > [ 3.103738] ctnetlink v0.93: registering with nfnetlink. > > [ 3.104494] ip_tables: (C) 2000-2006 Netfilter Core Team > > [ 3.105734] Initializing XFRM netlink socket > > [ 3.106885] NET: Registered protocol family 10 > > [ 3.109341] Segment Routing with IPv6 > > [ 3.109976] mip6: Mobile IPv6 > > [ 3.111987] ip6_tables: (C) 2000-2006 Netfilter Core Team > > [ 3.114230] NET: Registered protocol family 17 > > [ 3.115047] Bridge firewalling registered > > [ 3.115824] Ebtables v2.0 registered > > [ 3.117996] 8021q: 802.1Q VLAN Support v1.8 > > [ 3.119429] AVX2 version of gcm_enc/dec engaged. > > [ 3.119886] AES CTR mode by8 optimization enabled > > [ 3.128818] sched_clock: Marking stable (3128714579, 0)->(3404180881, -275466302) > > [ 3.129945] registered taskstats version 1 > > [ 3.130427] Loading compiled-in X.509 certificates > > [ 3.163216] Loaded X.509 cert 'Build time autogenerated kernel key: 38e0adea1af8bd8a23b02436d4acf2f8c7408d23' > > [ 3.166359] zswap: loaded using pool lzo/zbud > > [ 3.167943] Key type big_key registered > > [ 3.168778] Magic number: 13:918:708 > > [ 3.169255] rtc_cmos 00:00: setting system clock to 2017-12-16 00:42:09 UTC (1513384929) > > [ 3.170604] md: Skipping autodetection of RAID arrays. (raid=autodetect will force) > > [ 3.171932] EXT4-fs (vda2): couldn't mount as ext3 due to feature incompatibilities > > [ 3.173871] EXT4-fs (vda2): couldn't mount as ext2 due to feature incompatibilities > > [ 3.175306] EXT4-fs (vda2): INFO: recovery required on readonly filesystem > > [ 3.176212] EXT4-fs (vda2): write access will be enabled during recovery > > [ 3.397187] EXT4-fs (vda2): orphan cleanup on readonly fs > > [ 3.399412] EXT4-fs (vda2): 5 orphan inodes deleted > > [ 3.402759] EXT4-fs (vda2): recovery complete > > [ 3.466647] EXT4-fs (vda2): mounted filesystem with ordered data mode. Opts: (null) > > [ 3.469401] VFS: Mounted root (ext4 filesystem) readonly on device 253:2. > > [ 3.473719] devtmpfs: mounted > > [ 3.492549] Freeing unused kernel memory: 1640K > > [ 3.494547] Write protecting the kernel read-only data: 18432k > > [ 3.498781] Freeing unused kernel memory: 2016K > > [ 3.503330] Freeing unused kernel memory: 512K > > [ 3.505232] rodata_test: all tests were successful > > [ 3.515355] 1 (init): Uhuuh, elf segement at 00000000928fda3e requested but the memory is mapped already > > [ 3.519533] Starting init: /sbin/init exists but couldn't execute it (error -95) > > [ 3.528993] Starting init: /bin/sh exists but couldn't execute it (error -14) > > [ 3.532127] Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. > > [ 3.538328] CPU: 0 PID: 1 Comm: init Not tainted 4.15.0-rc3-next-20171215-00001-g6d6aea478fce #11 > > [ 3.542201] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1.fc26 04/01/2014 > > [ 3.546081] Call Trace: > > [ 3.547221] dump_stack+0x5c/0x79 > > [ 3.548768] ? rest_init+0x30/0xb0 > > [ 3.550320] panic+0xe4/0x232 > > [ 3.551669] ? rest_init+0xb0/0xb0 > > [ 3.553110] kernel_init+0xeb/0x100 > > [ 3.554701] ret_from_fork+0x1f/0x30 > > [ 3.558964] Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > [ 3.564160] ---[ end Kernel panic - not syncing: No working init found. Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst for guidance. > > > -- > Michal Hocko > SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH 1/2] mmap.2: document new MAP_FIXED_SAFE flag 2017-12-13 9:25 [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Michal Hocko 2017-12-13 9:25 ` [PATCH 1/2] " Michal Hocko 2017-12-13 9:25 ` [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko @ 2017-12-13 9:31 ` Michal Hocko 2017-12-13 9:31 ` [PATCH 2/2] mmap.2: MAP_FIXED updated documentation Michal Hocko [not found] ` <20171213092550.2774-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 3 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2017-12-13 9:31 UTC (permalink / raw) To: Michael Kerrisk Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Michal Hocko From: Michal Hocko <mhocko@suse.com> 4.16+ kernels offer a new MAP_FIXED_SAFE flag which allows the caller to atomicaly probe for a given address range. [wording heavily updated by John Hubbard <jhubbard@nvidia.com>] Signed-off-by: Michal Hocko <mhocko@suse.com> --- man2/mmap.2 | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/man2/mmap.2 b/man2/mmap.2 index a5a8eb47a263..02d391697ce6 100644 --- a/man2/mmap.2 +++ b/man2/mmap.2 @@ -227,6 +227,22 @@ in mind that the exact layout of a process' memory map is allowed to change significantly between kernel versions, C library versions, and operating system releases. .TP +.BR MAP_FIXED_SAFE " (since Linux 4.16)" +Similar to MAP_FIXED with respect to the +.I +addr +enforcement, but different in that MAP_FIXED_SAFE never clobbers a pre-existing +mapped range. If the requested range would collide with an existing +mapping, then this call fails with +.B EEXIST. +This flag can therefore be used as a way to atomically (with respect to other +threads) attempt to map an address range: one thread will succeed; all others +will report failure. Please note that older kernels which do not recognize this +flag will typically (upon detecting a collision with a pre-existing mapping) +fall back to a "non-MAP_FIXED" type of behavior: they will return an address that +is different than the requested one. Therefore, backward-compatible software +should check the returned address against the requested address. +.TP .B MAP_GROWSDOWN This flag is used for stacks. It indicates to the kernel virtual memory system that the mapping @@ -451,6 +467,12 @@ is not a valid file descriptor (and .B MAP_ANONYMOUS was not set). .TP +.B EEXIST +range covered by +.IR addr , +.IR length +is clashing with an existing mapping. +.TP .B EINVAL We don't like .IR addr , -- 2.15.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 9:31 ` [PATCH 1/2] mmap.2: document new MAP_FIXED_SAFE flag Michal Hocko @ 2017-12-13 9:31 ` Michal Hocko 2017-12-13 12:55 ` Pavel Machek 2017-12-14 2:52 ` Jann Horn 0 siblings, 2 replies; 44+ messages in thread From: Michal Hocko @ 2017-12-13 9:31 UTC (permalink / raw) To: Michael Kerrisk Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport, Cyril Hrubis, Pavel Machek, Michal Hocko From: John Hubbard <jhubbard@nvidia.com> -- Expand the documentation to discuss the hazards in enough detail to allow avoiding them. -- Mention the upcoming MAP_FIXED_SAFE flag. -- Enhance the alignment requirement slightly. CC: Michael Ellerman <mpe@ellerman.id.au> CC: Jann Horn <jannh@google.com> CC: Matthew Wilcox <willy@infradead.org> CC: Michal Hocko <mhocko@kernel.org> CC: Mike Rapoport <rppt@linux.vnet.ibm.com> CC: Cyril Hrubis <chrubis@suse.cz> CC: Pavel Machek <pavel@ucw.cz> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> Signed-off-by: Michal Hocko <mhocko@suse.com> --- man2/mmap.2 | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/man2/mmap.2 b/man2/mmap.2 index 02d391697ce6..cb8789daec2d 100644 --- a/man2/mmap.2 +++ b/man2/mmap.2 @@ -212,8 +212,9 @@ Don't interpret .I addr as a hint: place the mapping at exactly that address. .I addr -must be a multiple of the page size. -If the memory region specified by +must be suitably aligned: for most architectures a multiple of page +size is sufficient; however, some architectures may impose additional +restrictions. If the memory region specified by .I addr and .I len @@ -226,6 +227,33 @@ Software that aspires to be portable should use this option with care, keeping in mind that the exact layout of a process' memory map is allowed to change significantly between kernel versions, C library versions, and operating system releases. +.IP +Furthermore, this option is extremely hazardous (when used on its own), because +it forcibly removes pre-existing mappings, making it easy for a multi-threaded +process to corrupt its own address space. +.IP +For example, thread A looks through +.I /proc/<pid>/maps +and locates an available +address range, while thread B simultaneously acquires part or all of that same +address range. Thread A then calls mmap(MAP_FIXED), effectively overwriting +the mapping that thread B created. +.IP +Thread B need not create a mapping directly; simply making a library call +that, internally, uses +.I dlopen(3) +to load some other shared library, will +suffice. The dlopen(3) call will map the library into the process's address +space. Furthermore, almost any library call may be implemented using this +technique. +Examples include brk(2), malloc(3), pthread_create(3), and the PAM libraries +(http://www.linux-pam.org). +.IP +Newer kernels +(Linux 4.16 and later) have a +.B MAP_FIXED_SAFE +option that avoids the corruption problem; if available, MAP_FIXED_SAFE +should be preferred over MAP_FIXED. .TP .BR MAP_FIXED_SAFE " (since Linux 4.16)" Similar to MAP_FIXED with respect to the -- 2.15.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 9:31 ` [PATCH 2/2] mmap.2: MAP_FIXED updated documentation Michal Hocko @ 2017-12-13 12:55 ` Pavel Machek 2017-12-13 13:03 ` Cyril Hrubis 2017-12-13 13:04 ` Michal Hocko 2017-12-14 2:52 ` Jann Horn 1 sibling, 2 replies; 44+ messages in thread From: Pavel Machek @ 2017-12-13 12:55 UTC (permalink / raw) To: Michal Hocko Cc: Michael Kerrisk, linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport, Cyril Hrubis, Michal Hocko [-- Attachment #1: Type: text/plain, Size: 539 bytes --] On Wed 2017-12-13 10:31:10, Michal Hocko wrote: > From: John Hubbard <jhubbard@nvidia.com> > > -- Expand the documentation to discuss the hazards in > enough detail to allow avoiding them. > > -- Mention the upcoming MAP_FIXED_SAFE flag. Pretty map everyone agreed MAP_FIXED_SAFE was a bad name. MAP_FIXED_NOREPLACE (IIRC) was best replacement. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 12:55 ` Pavel Machek @ 2017-12-13 13:03 ` Cyril Hrubis 2017-12-13 13:04 ` Michal Hocko 1 sibling, 0 replies; 44+ messages in thread From: Cyril Hrubis @ 2017-12-13 13:03 UTC (permalink / raw) To: Pavel Machek Cc: Michal Hocko, Michael Kerrisk, linux-api-u79uwXL29TY76Z2rM5mHXA, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML, linux-arch-u79uwXL29TY76Z2rM5mHXA, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport, Michal Hocko Hi! > Pretty map everyone agreed MAP_FIXED_SAFE was a bad > name. MAP_FIXED_NOREPLACE (IIRC) was best replacement. For what it's worth I do agree here. -- Cyril Hrubis chrubis-AlSwsSmVLrQ@public.gmane.org ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 12:55 ` Pavel Machek 2017-12-13 13:03 ` Cyril Hrubis @ 2017-12-13 13:04 ` Michal Hocko 2017-12-13 13:09 ` Pavel Machek 1 sibling, 1 reply; 44+ messages in thread From: Michal Hocko @ 2017-12-13 13:04 UTC (permalink / raw) To: Pavel Machek Cc: Michael Kerrisk, linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport, Cyril Hrubis On Wed 13-12-17 13:55:40, Pavel Machek wrote: > On Wed 2017-12-13 10:31:10, Michal Hocko wrote: > > From: John Hubbard <jhubbard@nvidia.com> > > > > -- Expand the documentation to discuss the hazards in > > enough detail to allow avoiding them. > > > > -- Mention the upcoming MAP_FIXED_SAFE flag. > > Pretty map everyone agreed MAP_FIXED_SAFE was a bad > name. MAP_FIXED_NOREPLACE (IIRC) was best replacement. Please read http://lkml.kernel.org/r/20171213092550.2774-1-mhocko@kernel.org -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 13:04 ` Michal Hocko @ 2017-12-13 13:09 ` Pavel Machek 2017-12-13 13:16 ` Michal Hocko 0 siblings, 1 reply; 44+ messages in thread From: Pavel Machek @ 2017-12-13 13:09 UTC (permalink / raw) To: Michal Hocko Cc: Michael Kerrisk, linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport, Cyril Hrubis [-- Attachment #1: Type: text/plain, Size: 851 bytes --] On Wed 2017-12-13 14:04:58, Michal Hocko wrote: > On Wed 13-12-17 13:55:40, Pavel Machek wrote: > > On Wed 2017-12-13 10:31:10, Michal Hocko wrote: > > > From: John Hubbard <jhubbard@nvidia.com> > > > > > > -- Expand the documentation to discuss the hazards in > > > enough detail to allow avoiding them. > > > > > > -- Mention the upcoming MAP_FIXED_SAFE flag. > > > > Pretty map everyone agreed MAP_FIXED_SAFE was a bad > > name. MAP_FIXED_NOREPLACE (IIRC) was best replacement. > > Please read http://lkml.kernel.org/r/20171213092550.2774-1-mhocko@kernel.org Please fix your patches according to the feedback... NACCKED-by: Pavel Machek <pavel@ucw.cz> Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 13:09 ` Pavel Machek @ 2017-12-13 13:16 ` Michal Hocko [not found] ` <20171213131640.GJ25185-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> 0 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2017-12-13 13:16 UTC (permalink / raw) To: Pavel Machek Cc: Michael Kerrisk, linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport, Cyril Hrubis On Wed 13-12-17 14:09:00, Pavel Machek wrote: > On Wed 2017-12-13 14:04:58, Michal Hocko wrote: > > On Wed 13-12-17 13:55:40, Pavel Machek wrote: > > > On Wed 2017-12-13 10:31:10, Michal Hocko wrote: > > > > From: John Hubbard <jhubbard@nvidia.com> > > > > > > > > -- Expand the documentation to discuss the hazards in > > > > enough detail to allow avoiding them. > > > > > > > > -- Mention the upcoming MAP_FIXED_SAFE flag. > > > > > > Pretty map everyone agreed MAP_FIXED_SAFE was a bad > > > name. MAP_FIXED_NOREPLACE (IIRC) was best replacement. > > > > Please read http://lkml.kernel.org/r/20171213092550.2774-1-mhocko@kernel.org > > Please fix your patches according to the feedback... > > NACCKED-by: Pavel Machek <pavel@ucw.cz> Good luck pursuing this further then. I am not going to spend time on naming bikeheds. I have more pressing stuff to work on. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20171213131640.GJ25185-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>]
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation [not found] ` <20171213131640.GJ25185-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> @ 2017-12-13 13:21 ` Pavel Machek 2017-12-13 13:35 ` Michal Hocko 2017-12-13 14:40 ` Cyril Hrubis 0 siblings, 2 replies; 44+ messages in thread From: Pavel Machek @ 2017-12-13 13:21 UTC (permalink / raw) To: Michal Hocko Cc: Michael Kerrisk, linux-api-u79uwXL29TY76Z2rM5mHXA, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML, linux-arch-u79uwXL29TY76Z2rM5mHXA, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport, Cyril Hrubis [-- Attachment #1: Type: text/plain, Size: 1516 bytes --] On Wed 2017-12-13 14:16:40, Michal Hocko wrote: > On Wed 13-12-17 14:09:00, Pavel Machek wrote: > > On Wed 2017-12-13 14:04:58, Michal Hocko wrote: > > > On Wed 13-12-17 13:55:40, Pavel Machek wrote: > > > > On Wed 2017-12-13 10:31:10, Michal Hocko wrote: > > > > > From: John Hubbard <jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> > > > > > > > > > > -- Expand the documentation to discuss the hazards in > > > > > enough detail to allow avoiding them. > > > > > > > > > > -- Mention the upcoming MAP_FIXED_SAFE flag. > > > > > > > > Pretty map everyone agreed MAP_FIXED_SAFE was a bad > > > > name. MAP_FIXED_NOREPLACE (IIRC) was best replacement. > > > > > > Please read http://lkml.kernel.org/r/20171213092550.2774-1-mhocko@kernel.org > > > > Please fix your patches according to the feedback... > > > > NACCKED-by: Pavel Machek <pavel-+ZI9xUNit7I@public.gmane.org> > > Good luck pursuing this further then. I am not going to spend time on > naming bikeheds. I have more pressing stuff to work on. You selected stupid name for a flag. Everyone and their dog agrees with that. There's even consensus on better name (and everyone agrees it is better than .._SAFE). Of course, we could have debate if it is NOREPLACE or NOREMOVE or ... and that would be bikeshed. This was just poor naming on your part. Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 13:21 ` Pavel Machek @ 2017-12-13 13:35 ` Michal Hocko 2017-12-13 14:40 ` Cyril Hrubis 1 sibling, 0 replies; 44+ messages in thread From: Michal Hocko @ 2017-12-13 13:35 UTC (permalink / raw) To: Pavel Machek Cc: Michael Kerrisk, linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport, Cyril Hrubis On Wed 13-12-17 14:21:05, Pavel Machek wrote: > On Wed 2017-12-13 14:16:40, Michal Hocko wrote: > > On Wed 13-12-17 14:09:00, Pavel Machek wrote: > > > On Wed 2017-12-13 14:04:58, Michal Hocko wrote: > > > > On Wed 13-12-17 13:55:40, Pavel Machek wrote: > > > > > On Wed 2017-12-13 10:31:10, Michal Hocko wrote: > > > > > > From: John Hubbard <jhubbard@nvidia.com> > > > > > > > > > > > > -- Expand the documentation to discuss the hazards in > > > > > > enough detail to allow avoiding them. > > > > > > > > > > > > -- Mention the upcoming MAP_FIXED_SAFE flag. > > > > > > > > > > Pretty map everyone agreed MAP_FIXED_SAFE was a bad > > > > > name. MAP_FIXED_NOREPLACE (IIRC) was best replacement. > > > > > > > > Please read http://lkml.kernel.org/r/20171213092550.2774-1-mhocko@kernel.org > > > > > > Please fix your patches according to the feedback... > > > > > > NACCKED-by: Pavel Machek <pavel@ucw.cz> > > > > Good luck pursuing this further then. I am not going to spend time on > > naming bikeheds. I have more pressing stuff to work on. > > You selected stupid name for a flag. Everyone and their dog agrees > with that. Not sure about your dog but mine says that a flag which fixes an _unsafe_ aspects of MAP_FIXED can be called MAP_FIXED_SAFE just fine. Anyway, I am not going to argue about this further. I've implemented the code, gathered uscases and fortified an in-kernel user which already led to a security issue in the past. I consider my part done here. I do not agree that MAP_FIXED_NOREPLACE would be so much better to respin and then deal with what about this MAP_$FOO. If there are really stong feelings about this then feel free to take these patches, do s@MAP_FIXED_SAFE@MAP_$FOO@ and try to upstream them yourself. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 13:21 ` Pavel Machek 2017-12-13 13:35 ` Michal Hocko @ 2017-12-13 14:40 ` Cyril Hrubis 2017-12-13 23:19 ` Kees Cook 1 sibling, 1 reply; 44+ messages in thread From: Cyril Hrubis @ 2017-12-13 14:40 UTC (permalink / raw) To: Pavel Machek Cc: Michal Hocko, Michael Kerrisk, linux-api-u79uwXL29TY76Z2rM5mHXA, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML, linux-arch-u79uwXL29TY76Z2rM5mHXA, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport Hi! > You selected stupid name for a flag. Everyone and their dog agrees > with that. There's even consensus on better name (and everyone agrees > it is better than .._SAFE). Of course, we could have debate if it is > NOREPLACE or NOREMOVE or ... and that would be bikeshed. This was just > poor naming on your part. Well while everybody agrees that the name is so bad that basically anything else would be better, there does not seem to be consensus on which one to pick. I do understand that this frustrating and fruitless. So what do we do now, roll a dice to choose new name? Or do we ask BFDL[1] to choose the name? [1] https://en.wikipedia.org/wiki/Benevolent_dictator_for_life -- Cyril Hrubis chrubis-AlSwsSmVLrQ@public.gmane.org ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 14:40 ` Cyril Hrubis @ 2017-12-13 23:19 ` Kees Cook [not found] ` <CAGXu5jLqE6cUxk-Girx6PG7upEzz8jmu1OH_3LVC26iJc2vTxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-18 19:12 ` Michael Kerrisk (man-pages) 0 siblings, 2 replies; 44+ messages in thread From: Kees Cook @ 2017-12-13 23:19 UTC (permalink / raw) To: Cyril Hrubis Cc: Pavel Machek, Michal Hocko, Michael Kerrisk, Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport On Wed, Dec 13, 2017 at 6:40 AM, Cyril Hrubis <chrubis@suse.cz> wrote: > Hi! >> You selected stupid name for a flag. Everyone and their dog agrees >> with that. There's even consensus on better name (and everyone agrees >> it is better than .._SAFE). Of course, we could have debate if it is >> NOREPLACE or NOREMOVE or ... and that would be bikeshed. This was just >> poor naming on your part. > > Well while everybody agrees that the name is so bad that basically > anything else would be better, there does not seem to be consensus on > which one to pick. I do understand that this frustrating and fruitless. Based on the earlier threads where I tried to end the bikeshedding, it seemed like MAP_FIXED_NOREPLACE was the least bad option. > So what do we do now, roll a dice to choose new name? > > Or do we ask BFDL[1] to choose the name? I'd like to hear feedback from Michael Kerrisk, as he's had to deal with these kinds of choices in the past. I'm fine to ask Linus too. I just want to get past the name since the feature is quite valuable. And if Michal doesn't want to touch this patch any more, I'm happy to do the search/replace/resend. :P -Kees -- Kees Cook Pixel Security -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <CAGXu5jLqE6cUxk-Girx6PG7upEzz8jmu1OH_3LVC26iJc2vTxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation [not found] ` <CAGXu5jLqE6cUxk-Girx6PG7upEzz8jmu1OH_3LVC26iJc2vTxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-12-14 7:07 ` Michal Hocko 0 siblings, 0 replies; 44+ messages in thread From: Michal Hocko @ 2017-12-14 7:07 UTC (permalink / raw) To: Kees Cook Cc: Cyril Hrubis, Pavel Machek, Michael Kerrisk, Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport On Wed 13-12-17 15:19:00, Kees Cook wrote: > On Wed, Dec 13, 2017 at 6:40 AM, Cyril Hrubis <chrubis-AlSwsSmVLrQ@public.gmane.org> wrote: > > Hi! > >> You selected stupid name for a flag. Everyone and their dog agrees > >> with that. There's even consensus on better name (and everyone agrees > >> it is better than .._SAFE). Of course, we could have debate if it is > >> NOREPLACE or NOREMOVE or ... and that would be bikeshed. This was just > >> poor naming on your part. > > > > Well while everybody agrees that the name is so bad that basically > > anything else would be better, there does not seem to be consensus on > > which one to pick. I do understand that this frustrating and fruitless. > > Based on the earlier threads where I tried to end the bikeshedding, it > seemed like MAP_FIXED_NOREPLACE was the least bad option. > > > So what do we do now, roll a dice to choose new name? > > > > Or do we ask BFDL[1] to choose the name? > > I'd like to hear feedback from Michael Kerrisk, as he's had to deal > with these kinds of choices in the past. I'm fine to ask Linus too. I > just want to get past the name since the feature is quite valuable. > > And if Michal doesn't want to touch this patch any more, I'm happy to > do the search/replace/resend. :P I think Andrew can do the s@MAP_FIXED_SAFE@MAP_$FOO@ when adding the patch to the mmotm tree. The reason why I refuse to repost is that a) functionality doesn't really need a further rework (at least not based on the review feedback) and b) I do not really see any large consensus here. People claim to like this or that more but nobody (except of you Kees) was willing to put their name under their preference in a form of Acked-by. And that worries me, because generating "better" names sounds too easy to allow a forward progress. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 23:19 ` Kees Cook [not found] ` <CAGXu5jLqE6cUxk-Girx6PG7upEzz8jmu1OH_3LVC26iJc2vTxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-12-18 19:12 ` Michael Kerrisk (man-pages) 2017-12-18 20:19 ` Kees Cook 1 sibling, 1 reply; 44+ messages in thread From: Michael Kerrisk (man-pages) @ 2017-12-18 19:12 UTC (permalink / raw) To: Kees Cook, Cyril Hrubis Cc: mtk.manpages, Pavel Machek, Michal Hocko, Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport Hello Kees, I'm late to the party, and only just caught up with the fuss :-). On 12/14/2017 12:19 AM, Kees Cook wrote: > On Wed, Dec 13, 2017 at 6:40 AM, Cyril Hrubis <chrubis@suse.cz> wrote: >> Hi! >>> You selected stupid name for a flag. Everyone and their dog agrees >>> with that. There's even consensus on better name (and everyone agrees >>> it is better than .._SAFE). Of course, we could have debate if it is >>> NOREPLACE or NOREMOVE or ... and that would be bikeshed. This was just >>> poor naming on your part. >> >> Well while everybody agrees that the name is so bad that basically >> anything else would be better, there does not seem to be consensus on >> which one to pick. I do understand that this frustrating and fruitless. > > Based on the earlier threads where I tried to end the bikeshedding, it > seemed like MAP_FIXED_NOREPLACE was the least bad option. > >> So what do we do now, roll a dice to choose new name? >> >> Or do we ask BFDL[1] to choose the name? > > I'd like to hear feedback from Michael Kerrisk, as he's had to deal > with these kinds of choices in the past. I'm fine to ask Linus too. I > just want to get past the name since the feature is quite valuable. > > And if Michal doesn't want to touch this patch any more, I'm happy to > do the search/replace/resend. :P Something with the prefix MAP_FIXED_ seems to me obviously desirable, both to suggest that the function is similar, and also for easy grepping of the source code to look for instances of both. MAP_FIXED_SAFE didn't really bother me as a name, but MAP_FIXED_NOREPLACE (or MAP_FIXED_NOCLOBBER) seem slightly more descriptive of what the flag actually does, so a little better. Cheers, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-18 19:12 ` Michael Kerrisk (man-pages) @ 2017-12-18 20:19 ` Kees Cook 2017-12-18 20:33 ` Matthew Wilcox [not found] ` <CAGXu5jJ289R9koVoHmxcvUWr6XHSZR2p0qq3WtpNyN-iNSvrNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 2 replies; 44+ messages in thread From: Kees Cook @ 2017-12-18 20:19 UTC (permalink / raw) To: Michael Kerrisk (man-pages), Andrew Morton Cc: Cyril Hrubis, Pavel Machek, Michal Hocko, Linux API, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport On Mon, Dec 18, 2017 at 11:12 AM, Michael Kerrisk (man-pages) <mtk.manpages@gmail.com> wrote: > Hello Kees, > > I'm late to the party, and only just caught up with the fuss :-). No worries! > On 12/14/2017 12:19 AM, Kees Cook wrote: >> On Wed, Dec 13, 2017 at 6:40 AM, Cyril Hrubis <chrubis@suse.cz> wrote: >>> Hi! >>>> You selected stupid name for a flag. Everyone and their dog agrees >>>> with that. There's even consensus on better name (and everyone agrees >>>> it is better than .._SAFE). Of course, we could have debate if it is >>>> NOREPLACE or NOREMOVE or ... and that would be bikeshed. This was just >>>> poor naming on your part. >>> >>> Well while everybody agrees that the name is so bad that basically >>> anything else would be better, there does not seem to be consensus on >>> which one to pick. I do understand that this frustrating and fruitless. >> >> Based on the earlier threads where I tried to end the bikeshedding, it >> seemed like MAP_FIXED_NOREPLACE was the least bad option. >> >>> So what do we do now, roll a dice to choose new name? >>> >>> Or do we ask BFDL[1] to choose the name? >> >> I'd like to hear feedback from Michael Kerrisk, as he's had to deal >> with these kinds of choices in the past. I'm fine to ask Linus too. I >> just want to get past the name since the feature is quite valuable. >> >> And if Michal doesn't want to touch this patch any more, I'm happy to >> do the search/replace/resend. :P > > Something with the prefix MAP_FIXED_ seems to me obviously desirable, > both to suggest that the function is similar, and also for easy > grepping of the source code to look for instances of both. > MAP_FIXED_SAFE didn't really bother me as a name, but > MAP_FIXED_NOREPLACE (or MAP_FIXED_NOCLOBBER) seem slightly more > descriptive of what the flag actually does, so a little better. Great, thanks! Andrew, can you s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g in the series? -Kees -- Kees Cook Pixel Security -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-18 20:19 ` Kees Cook @ 2017-12-18 20:33 ` Matthew Wilcox [not found] ` <CAGXu5jJ289R9koVoHmxcvUWr6XHSZR2p0qq3WtpNyN-iNSvrNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 1 sibling, 0 replies; 44+ messages in thread From: Matthew Wilcox @ 2017-12-18 20:33 UTC (permalink / raw) To: Kees Cook Cc: Michael Kerrisk (man-pages), Andrew Morton, Cyril Hrubis, Pavel Machek, Michal Hocko, Linux API, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Jann Horn, Mike Rapoport On Mon, Dec 18, 2017 at 12:19:21PM -0800, Kees Cook wrote: > Andrew, can you s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g in the series? +1 ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <CAGXu5jJ289R9koVoHmxcvUWr6XHSZR2p0qq3WtpNyN-iNSvrNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation [not found] ` <CAGXu5jJ289R9koVoHmxcvUWr6XHSZR2p0qq3WtpNyN-iNSvrNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-12-21 12:38 ` Michael Ellerman 2017-12-21 14:59 ` known bad patch in -mm tree was " Pavel Machek 2017-12-21 22:24 ` Andrew Morton 0 siblings, 2 replies; 44+ messages in thread From: Michael Ellerman @ 2017-12-21 12:38 UTC (permalink / raw) To: Kees Cook, Michael Kerrisk (man-pages), Andrew Morton Cc: Cyril Hrubis, Pavel Machek, Michal Hocko, Linux API, Khalid Aziz, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> writes: > On Mon, Dec 18, 2017 at 11:12 AM, Michael Kerrisk (man-pages) > <mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote: >> Hello Kees, >> >> I'm late to the party, and only just caught up with the fuss :-). > > No worries! > >> On 12/14/2017 12:19 AM, Kees Cook wrote: >>> On Wed, Dec 13, 2017 at 6:40 AM, Cyril Hrubis <chrubis-AlSwsSmVLrQ@public.gmane.org> wrote: >>>> Hi! >>>>> You selected stupid name for a flag. Everyone and their dog agrees >>>>> with that. There's even consensus on better name (and everyone agrees >>>>> it is better than .._SAFE). Of course, we could have debate if it is >>>>> NOREPLACE or NOREMOVE or ... and that would be bikeshed. This was just >>>>> poor naming on your part. >>>> >>>> Well while everybody agrees that the name is so bad that basically >>>> anything else would be better, there does not seem to be consensus on >>>> which one to pick. I do understand that this frustrating and fruitless. >>> >>> Based on the earlier threads where I tried to end the bikeshedding, it >>> seemed like MAP_FIXED_NOREPLACE was the least bad option. >>> >>>> So what do we do now, roll a dice to choose new name? >>>> >>>> Or do we ask BFDL[1] to choose the name? >>> >>> I'd like to hear feedback from Michael Kerrisk, as he's had to deal >>> with these kinds of choices in the past. I'm fine to ask Linus too. I >>> just want to get past the name since the feature is quite valuable. >>> >>> And if Michal doesn't want to touch this patch any more, I'm happy to >>> do the search/replace/resend. :P >> >> Something with the prefix MAP_FIXED_ seems to me obviously desirable, >> both to suggest that the function is similar, and also for easy >> grepping of the source code to look for instances of both. >> MAP_FIXED_SAFE didn't really bother me as a name, but >> MAP_FIXED_NOREPLACE (or MAP_FIXED_NOCLOBBER) seem slightly more >> descriptive of what the flag actually does, so a little better. > > Great, thanks! > > Andrew, can you s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g in the series? This seems to have not happened. Presumably Andrew just missed the mail in the flood. And will probably miss this one too ... :) cheers ^ permalink raw reply [flat|nested] 44+ messages in thread
* known bad patch in -mm tree was Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-21 12:38 ` Michael Ellerman @ 2017-12-21 14:59 ` Pavel Machek 2017-12-21 15:08 ` Michal Hocko 2017-12-21 22:24 ` Andrew Morton 1 sibling, 1 reply; 44+ messages in thread From: Pavel Machek @ 2017-12-21 14:59 UTC (permalink / raw) To: Michael Ellerman, vojtech, jikos Cc: Kees Cook, Michael Kerrisk (man-pages), Andrew Morton, Cyril Hrubis, Michal Hocko, Linux API, Khalid Aziz, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport [-- Attachment #1: Type: text/plain, Size: 1227 bytes --] Hi! > >>> And if Michal doesn't want to touch this patch any more, I'm happy to > >>> do the search/replace/resend. :P > >> > >> Something with the prefix MAP_FIXED_ seems to me obviously desirable, > >> both to suggest that the function is similar, and also for easy > >> grepping of the source code to look for instances of both. > >> MAP_FIXED_SAFE didn't really bother me as a name, but > >> MAP_FIXED_NOREPLACE (or MAP_FIXED_NOCLOBBER) seem slightly more > >> descriptive of what the flag actually does, so a little better. > > > > Great, thanks! > > > > Andrew, can you s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g in the series? > > This seems to have not happened. Presumably Andrew just missed the mail > in the flood. And will probably miss this one too ... :) Nice way to mess up kernel development, Michal. Thank you! :-(. Andrew, everyone and their dog agrees MAP_FIXED_SAFE is stupid name, but Michal decided to just go ahead, ignoring feedback... Can you either s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g or drop the patches? Thanks, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: known bad patch in -mm tree was Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-21 14:59 ` known bad patch in -mm tree was " Pavel Machek @ 2017-12-21 15:08 ` Michal Hocko 0 siblings, 0 replies; 44+ messages in thread From: Michal Hocko @ 2017-12-21 15:08 UTC (permalink / raw) To: Pavel Machek Cc: Michael Ellerman, vojtech, jikos, Kees Cook, Michael Kerrisk (man-pages), Andrew Morton, Cyril Hrubis, Linux API, Khalid Aziz, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport On Thu 21-12-17 15:59:07, Pavel Machek wrote: > Hi! > > > >>> And if Michal doesn't want to touch this patch any more, I'm happy to > > >>> do the search/replace/resend. :P > > >> > > >> Something with the prefix MAP_FIXED_ seems to me obviously desirable, > > >> both to suggest that the function is similar, and also for easy > > >> grepping of the source code to look for instances of both. > > >> MAP_FIXED_SAFE didn't really bother me as a name, but > > >> MAP_FIXED_NOREPLACE (or MAP_FIXED_NOCLOBBER) seem slightly more > > >> descriptive of what the flag actually does, so a little better. > > > > > > Great, thanks! > > > > > > Andrew, can you s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g in the series? > > > > This seems to have not happened. Presumably Andrew just missed the mail > > in the flood. And will probably miss this one too ... :) > > Nice way to mess up kernel development, Michal. Thank you! :-(. Thank you for your valuable feedback! Maybe you have noticed that I haven't enforced the patch and led others to decide the final name (either by resubmitting patches or a simple replace in mmotm tree). Or maybe you haven't because you are so busy bikesheding that you can hardly see anything else. > Andrew, everyone and their dog agrees MAP_FIXED_SAFE is stupid name, > but Michal decided to just go ahead, ignoring feedback... > > Can you either s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g or drop the patches? You have surely saved the world today and I hardly find words to thank you (and your dog of course). Thanks! -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-21 12:38 ` Michael Ellerman 2017-12-21 14:59 ` known bad patch in -mm tree was " Pavel Machek @ 2017-12-21 22:24 ` Andrew Morton 2017-12-22 0:06 ` Michael Ellerman 1 sibling, 1 reply; 44+ messages in thread From: Andrew Morton @ 2017-12-21 22:24 UTC (permalink / raw) To: Michael Ellerman Cc: Kees Cook, Michael Kerrisk (man-pages), Cyril Hrubis, Pavel Machek, Michal Hocko, Linux API, Khalid Aziz, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport On Thu, 21 Dec 2017 23:38:37 +1100 Michael Ellerman <mpe@ellerman.id.au> wrote: > > Andrew, can you s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g in the series? Done. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-21 22:24 ` Andrew Morton @ 2017-12-22 0:06 ` Michael Ellerman 0 siblings, 0 replies; 44+ messages in thread From: Michael Ellerman @ 2017-12-22 0:06 UTC (permalink / raw) To: Andrew Morton Cc: Kees Cook, Michael Kerrisk (man-pages), Cyril Hrubis, Pavel Machek, Michal Hocko, Linux API, Khalid Aziz, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Jann Horn, Mike Rapoport Andrew Morton <akpm@linux-foundation.org> writes: > On Thu, 21 Dec 2017 23:38:37 +1100 Michael Ellerman <mpe@ellerman.id.au> wrote: > >> > Andrew, can you s/MAP_FIXED_SAFE/MAP_FIXED_NOREPLACE/g in the series? > > Done. Thanks. I sent an ack at some point, here's another if you like: Acked-by: Michael Ellerman <mpe@ellerman.id.au> There's also a couple of stray whitespace changes in the version in linux-next, and some inconsistent whitespace between the various mman.h changes. Patch below to fix them up if you haven't already. cheers diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index bb9ccb5ff3ed..5e362db59780 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -50,7 +50,6 @@ #define MAP_NONBLOCK 0x20000 /* do not block on IO */ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ - #define MAP_FIXED_SAFE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */ /* diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index dedc09ead4cb..a0702506d7c6 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -26,7 +26,6 @@ #define MAP_NONBLOCK 0x20000 /* do not block on IO */ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ - #define MAP_FIXED_SAFE 0x100000 /* MAP_FIXED which doesn't unmap underlying mapping */ #define MS_SYNC 1 /* synchronous memory sync */ diff --git a/arch/sparc/include/uapi/asm/mman.h b/arch/sparc/include/uapi/asm/mman.h index d21bffd5d3dc..715a2c927e79 100644 --- a/arch/sparc/include/uapi/asm/mman.h +++ b/arch/sparc/include/uapi/asm/mman.h @@ -25,4 +25,5 @@ #define MAP_STACK 0x20000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x40000 /* create a huge page mapping */ + #endif /* _UAPI__SPARC_MMAN_H__ */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index da73b6d5dbcd..52f4d21923b3 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -65,7 +65,6 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif - /* * Flags for msync */ ^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-13 9:31 ` [PATCH 2/2] mmap.2: MAP_FIXED updated documentation Michal Hocko 2017-12-13 12:55 ` Pavel Machek @ 2017-12-14 2:52 ` Jann Horn [not found] ` <CAG48ez0JZ3PVW3vgSXDmDijS+a_5bSX9qNuyggnsB6JTSkKngA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-14 23:06 ` John Hubbard 1 sibling, 2 replies; 44+ messages in thread From: Jann Horn @ 2017-12-14 2:52 UTC (permalink / raw) To: Michal Hocko Cc: Michael Kerrisk, Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Mike Rapoport, Cyril Hrubis, Pavel Machek, Michal Hocko On Wed, Dec 13, 2017 at 10:31 AM, Michal Hocko <mhocko@kernel.org> wrote: > From: John Hubbard <jhubbard@nvidia.com> > > -- Expand the documentation to discuss the hazards in > enough detail to allow avoiding them. > > -- Mention the upcoming MAP_FIXED_SAFE flag. > > -- Enhance the alignment requirement slightly. > > CC: Michael Ellerman <mpe@ellerman.id.au> > CC: Jann Horn <jannh@google.com> > CC: Matthew Wilcox <willy@infradead.org> > CC: Michal Hocko <mhocko@kernel.org> > CC: Mike Rapoport <rppt@linux.vnet.ibm.com> > CC: Cyril Hrubis <chrubis@suse.cz> > CC: Pavel Machek <pavel@ucw.cz> > Acked-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > Signed-off-by: Michal Hocko <mhocko@suse.com> > --- > man2/mmap.2 | 32 ++++++++++++++++++++++++++++++-- > 1 file changed, 30 insertions(+), 2 deletions(-) > > diff --git a/man2/mmap.2 b/man2/mmap.2 > index 02d391697ce6..cb8789daec2d 100644 > --- a/man2/mmap.2 > +++ b/man2/mmap.2 [...] > @@ -226,6 +227,33 @@ Software that aspires to be portable should use this option with care, keeping > in mind that the exact layout of a process' memory map is allowed to change > significantly between kernel versions, C library versions, and operating system > releases. > +.IP > +Furthermore, this option is extremely hazardous (when used on its own), because > +it forcibly removes pre-existing mappings, making it easy for a multi-threaded > +process to corrupt its own address space. I think this is worded unfortunately. It is dangerous if used incorrectly, and it's a good tool when used correctly. [...] > +Thread B need not create a mapping directly; simply making a library call > +that, internally, uses > +.I dlopen(3) > +to load some other shared library, will > +suffice. The dlopen(3) call will map the library into the process's address > +space. Furthermore, almost any library call may be implemented using this > +technique. > +Examples include brk(2), malloc(3), pthread_create(3), and the PAM libraries > +(http://www.linux-pam.org). This is arkward. This first mentions dlopen(), which is a very niche case, and then just very casually mentions the much bigger problem that tons of library functions can allocate memory through malloc(), causing mmap() calls, sometimes without that even being a documented property of the function. > +.IP > +Newer kernels > +(Linux 4.16 and later) have a > +.B MAP_FIXED_SAFE > +option that avoids the corruption problem; if available, MAP_FIXED_SAFE > +should be preferred over MAP_FIXED. This is bad advice. MAP_FIXED is completely safe if you use it on an address range you've allocated, and it is used in this way by core system libraries to place multiple VMAs in virtually contiguous memory, for example: ld.so (from glibc) uses it to load dynamic libraries: $ strace -e trace=open,mmap,close /usr/bin/id 2>&1 >/dev/null | head -n20 mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f35811c0000 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 mmap(NULL, 161237, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3581198000 close(3) = 0 open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3 mmap(NULL, 2259664, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f3580d78000 mmap(0x7f3580f9c000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f3580f9c000 mmap(0x7f3580f9e000, 6864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3580f9e000 close(3) = 0 open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 mmap(NULL, 3795360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f35809d9000 mmap(0x7f3580d6e000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f3580d6e000 mmap(0x7f3580d74000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3580d74000 close(3) = 0 [...] As a comment in dl-map-segments.h in glibc explains: /* This is a position-independent shared object. We can let the kernel map it anywhere it likes, but we must have space for all the segments in their specified positions relative to the first. So we map the first segment without MAP_FIXED, but with its extent increased to cover all the segments. Then we remove access from excess portion, and there is known sufficient space there to remap from the later segments. And AFAIK anything that allocates thread stacks uses MAP_FIXED to create the guard page at the bottom. MAP_FIXED is a better solution for these usecases than MAP_FIXED_SAFE, or whatever it ends up being called. Please remove this advice or, better, clarify what MAP_FIXED should be used for (creation of virtually contiguous VMAs) and what MAP_FIXED_SAFE should be used for (attempting to allocate memory at a fixed address for some reason, with a failure instead of the normal fallback to using a different address). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <CAG48ez0JZ3PVW3vgSXDmDijS+a_5bSX9qNuyggnsB6JTSkKngA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation [not found] ` <CAG48ez0JZ3PVW3vgSXDmDijS+a_5bSX9qNuyggnsB6JTSkKngA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-12-14 5:28 ` John Hubbard 0 siblings, 0 replies; 44+ messages in thread From: John Hubbard @ 2017-12-14 5:28 UTC (permalink / raw) To: Jann Horn, Michal Hocko Cc: Michael Kerrisk, Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML, linux-arch, Florian Weimer, Matthew Wilcox, Mike Rapoport, Cyril Hrubis, Pavel Machek, Michal Hocko On 12/13/2017 06:52 PM, Jann Horn wrote: > On Wed, Dec 13, 2017 at 10:31 AM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: >> From: John Hubbard <jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> >> >> -- Expand the documentation to discuss the hazards in >> enough detail to allow avoiding them. >> >> -- Mention the upcoming MAP_FIXED_SAFE flag. >> >> -- Enhance the alignment requirement slightly. >> >> CC: Michael Ellerman <mpe-Gsx/Oe8HsFggBc27wqDAHg@public.gmane.org> >> CC: Jann Horn <jannh-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> >> CC: Matthew Wilcox <willy-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> >> CC: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> >> CC: Mike Rapoport <rppt-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org> >> CC: Cyril Hrubis <chrubis-AlSwsSmVLrQ@public.gmane.org> >> CC: Pavel Machek <pavel-+ZI9xUNit7I@public.gmane.org> >> Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> >> Signed-off-by: John Hubbard <jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> >> Signed-off-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> >> --- >> man2/mmap.2 | 32 ++++++++++++++++++++++++++++++-- >> 1 file changed, 30 insertions(+), 2 deletions(-) >> >> diff --git a/man2/mmap.2 b/man2/mmap.2 >> index 02d391697ce6..cb8789daec2d 100644 >> --- a/man2/mmap.2 >> +++ b/man2/mmap.2 > [...] >> @@ -226,6 +227,33 @@ Software that aspires to be portable should use this option with care, keeping >> in mind that the exact layout of a process' memory map is allowed to change >> significantly between kernel versions, C library versions, and operating system >> releases. >> +.IP >> +Furthermore, this option is extremely hazardous (when used on its own), because >> +it forcibly removes pre-existing mappings, making it easy for a multi-threaded >> +process to corrupt its own address space. > > I think this is worded unfortunately. It is dangerous if used > incorrectly, and it's a good tool when used correctly. > Hi Jann, Hey, thanks for reviewing this again. I think I can accomodate all of your requests, without contradicting other reviewers' earlier feedback...approximately. :) I'll have a go at rewording this, and addressing your additional comments below, tomorrow afternoon, so please look for an updated version later that day. thanks, -- John Hubbard NVIDIA > [...] >> +Thread B need not create a mapping directly; simply making a library call >> +that, internally, uses >> +.I dlopen(3) >> +to load some other shared library, will >> +suffice. The dlopen(3) call will map the library into the process's address >> +space. Furthermore, almost any library call may be implemented using this >> +technique. >> +Examples include brk(2), malloc(3), pthread_create(3), and the PAM libraries >> +(http://www.linux-pam.org). > > This is arkward. This first mentions dlopen(), which is a very niche > case, and then just very casually mentions the much bigger > problem that tons of library functions can allocate memory through > malloc(), causing mmap() calls, sometimes without that even being > a documented property of the function. > >> +.IP >> +Newer kernels >> +(Linux 4.16 and later) have a >> +.B MAP_FIXED_SAFE >> +option that avoids the corruption problem; if available, MAP_FIXED_SAFE >> +should be preferred over MAP_FIXED. > > This is bad advice. MAP_FIXED is completely safe if you use it on an address > range you've allocated, and it is used in this way by core system libraries to > place multiple VMAs in virtually contiguous memory, for example: > > ld.so (from glibc) uses it to load dynamic libraries: > > $ strace -e trace=open,mmap,close /usr/bin/id 2>&1 >/dev/null | head -n20 > mmap(NULL, 12288, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, > 0) = 0x7f35811c0000 > open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 > mmap(NULL, 161237, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f3581198000 > close(3) = 0 > open("/lib/x86_64-linux-gnu/libselinux.so.1", O_RDONLY|O_CLOEXEC) = 3 > mmap(NULL, 2259664, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, > 0) = 0x7f3580d78000 > mmap(0x7f3580f9c000, 8192, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x24000) = 0x7f3580f9c000 > mmap(0x7f3580f9e000, 6864, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3580f9e000 > close(3) = 0 > open("/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 > mmap(NULL, 3795360, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, > 0) = 0x7f35809d9000 > mmap(0x7f3580d6e000, 24576, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x195000) = 0x7f3580d6e000 > mmap(0x7f3580d74000, 14752, PROT_READ|PROT_WRITE, > MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3580d74000 > close(3) = 0 > [...] > > As a comment in dl-map-segments.h in glibc explains: > /* This is a position-independent shared object. We can let the > kernel map it anywhere it likes, but we must have space for all > the segments in their specified positions relative to the first. > So we map the first segment without MAP_FIXED, but with its > extent increased to cover all the segments. Then we remove > access from excess portion, and there is known sufficient space > there to remap from the later segments. > > > And AFAIK anything that allocates thread stacks uses MAP_FIXED to > create the guard page at the bottom. > > > MAP_FIXED is a better solution for these usecases than MAP_FIXED_SAFE, > or whatever it ends up being called. Please remove this advice or, better, > clarify what MAP_FIXED should be used for (creation of virtually contiguous > VMAs) and what MAP_FIXED_SAFE should be used for (attempting to > allocate memory at a fixed address for some reason, with a failure instead of > the normal fallback to using a different address). > ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation 2017-12-14 2:52 ` Jann Horn [not found] ` <CAG48ez0JZ3PVW3vgSXDmDijS+a_5bSX9qNuyggnsB6JTSkKngA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-12-14 23:06 ` John Hubbard [not found] ` <b4fb7b3a-e53e-bf87-53c5-186751a14f4e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> 1 sibling, 1 reply; 44+ messages in thread From: John Hubbard @ 2017-12-14 23:06 UTC (permalink / raw) To: Jann Horn, Michal Hocko Cc: Michael Kerrisk, Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, Matthew Wilcox, Mike Rapoport, Cyril Hrubis, Pavel Machek, Michal Hocko On 12/13/2017 06:52 PM, Jann Horn wrote: > On Wed, Dec 13, 2017 at 10:31 AM, Michal Hocko <mhocko@kernel.org> wrote: >> From: John Hubbard <jhubbard@nvidia.com> [...] >> +.IP >> +Furthermore, this option is extremely hazardous (when used on its own), because >> +it forcibly removes pre-existing mappings, making it easy for a multi-threaded >> +process to corrupt its own address space. > > I think this is worded unfortunately. It is dangerous if used > incorrectly, and it's a good tool when used correctly. > > [...] >> +Thread B need not create a mapping directly; simply making a library call >> +that, internally, uses >> +.I dlopen(3) >> +to load some other shared library, will >> +suffice. The dlopen(3) call will map the library into the process's address >> +space. Furthermore, almost any library call may be implemented using this >> +technique. >> +Examples include brk(2), malloc(3), pthread_create(3), and the PAM libraries >> +(http://www.linux-pam.org). > > This is arkward. This first mentions dlopen(), which is a very niche > case, and then just very casually mentions the much bigger > problem that tons of library functions can allocate memory through > malloc(), causing mmap() calls, sometimes without that even being > a documented property of the function. > Hi Jann, Here is some proposed new wording, to address your two comments above. What do you think of this: NOTE: this option can be hazardous (when used on its own), because it forcibly removes pre-existing mappings, making it easy for a multi- threaded process to corrupt its own address space. For example, thread A looks through /proc/<pid>/maps and locates an available address range, while thread B simultaneously acquires part or all of that same address range. Thread A then calls mmap(MAP_FIXED), effectively overwriting the mapping that thread B created. Thread B need not create a mapping directly; simply making a library call whose implementation calls malloc(3), mmap(), or dlopen(3) will suffice, because those calls all create new mappings. >> +.IP >> +Newer kernels >> +(Linux 4.16 and later) have a >> +.B MAP_FIXED_SAFE >> +option that avoids the corruption problem; if available, MAP_FIXED_SAFE >> +should be preferred over MAP_FIXED. > > This is bad advice. MAP_FIXED is completely safe if you use it on an address > range you've allocated, and it is used in this way by core system libraries to > place multiple VMAs in virtually contiguous memory, for example: [...] > MAP_FIXED is a better solution for these usecases than MAP_FIXED_SAFE, > or whatever it ends up being called. Please remove this advice or, better, > clarify what MAP_FIXED should be used for (creation of virtually contiguous > VMAs) and what MAP_FIXED_SAFE should be used for (attempting to > allocate memory at a fixed address for some reason, with a failure instead of > the normal fallback to using a different address). > Rather than risk another back-and-forth with Michal (who doesn't want any advice on how to use this safely, in the man page), I've simply removed this advice entirely. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <b4fb7b3a-e53e-bf87-53c5-186751a14f4e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH 2/2] mmap.2: MAP_FIXED updated documentation [not found] ` <b4fb7b3a-e53e-bf87-53c5-186751a14f4e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> @ 2017-12-14 23:10 ` Jann Horn 0 siblings, 0 replies; 44+ messages in thread From: Jann Horn @ 2017-12-14 23:10 UTC (permalink / raw) To: John Hubbard Cc: Michal Hocko, Michael Kerrisk, Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML, linux-arch, Florian Weimer, Matthew Wilcox, Mike Rapoport, Cyril Hrubis, Pavel Machek, Michal Hocko On Fri, Dec 15, 2017 at 12:06 AM, John Hubbard <jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> wrote: > On 12/13/2017 06:52 PM, Jann Horn wrote: >> On Wed, Dec 13, 2017 at 10:31 AM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: >>> From: John Hubbard <jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> > [...] >>> +.IP >>> +Furthermore, this option is extremely hazardous (when used on its own), because >>> +it forcibly removes pre-existing mappings, making it easy for a multi-threaded >>> +process to corrupt its own address space. >> >> I think this is worded unfortunately. It is dangerous if used >> incorrectly, and it's a good tool when used correctly. >> >> [...] >>> +Thread B need not create a mapping directly; simply making a library call >>> +that, internally, uses >>> +.I dlopen(3) >>> +to load some other shared library, will >>> +suffice. The dlopen(3) call will map the library into the process's address >>> +space. Furthermore, almost any library call may be implemented using this >>> +technique. >>> +Examples include brk(2), malloc(3), pthread_create(3), and the PAM libraries >>> +(http://www.linux-pam.org). >> >> This is arkward. This first mentions dlopen(), which is a very niche >> case, and then just very casually mentions the much bigger >> problem that tons of library functions can allocate memory through >> malloc(), causing mmap() calls, sometimes without that even being >> a documented property of the function. >> > > Hi Jann, > > Here is some proposed new wording, to address your two comments above. What do > you think of this: > > NOTE: this option can be hazardous (when used on its own), because it > forcibly removes pre-existing mappings, making it easy for a multi- > threaded process to corrupt its own address space. For example, thread A > looks through /proc/<pid>/maps and locates an available address range, > while thread B simultaneously acquires part or all of that same address > range. Thread A then calls mmap(MAP_FIXED), effectively overwriting the > mapping that thread B created. > > Thread B need not create a mapping directly; simply making a library call > whose implementation calls malloc(3), mmap(), or dlopen(3) will suffice, > because those calls all create new mappings. Thanks! That sounds better to me. >>> +.IP >>> +Newer kernels >>> +(Linux 4.16 and later) have a >>> +.B MAP_FIXED_SAFE >>> +option that avoids the corruption problem; if available, MAP_FIXED_SAFE >>> +should be preferred over MAP_FIXED. >> >> This is bad advice. MAP_FIXED is completely safe if you use it on an address >> range you've allocated, and it is used in this way by core system libraries to >> place multiple VMAs in virtually contiguous memory, for example: > [...] >> MAP_FIXED is a better solution for these usecases than MAP_FIXED_SAFE, >> or whatever it ends up being called. Please remove this advice or, better, >> clarify what MAP_FIXED should be used for (creation of virtually contiguous >> VMAs) and what MAP_FIXED_SAFE should be used for (attempting to >> allocate memory at a fixed address for some reason, with a failure instead of >> the normal fallback to using a different address). >> > > Rather than risk another back-and-forth with Michal (who doesn't want any advice > on how to use this safely, in the man page), I've simply removed this advice > entirely. Makes sense. ^ permalink raw reply [flat|nested] 44+ messages in thread
[parent not found: <20171213092550.2774-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>]
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE [not found] ` <20171213092550.2774-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> @ 2017-12-13 12:25 ` Matthew Wilcox 2017-12-13 12:34 ` Michal Hocko 2017-12-13 17:13 ` Kees Cook 2017-12-14 0:32 ` Andrew Morton 2 siblings, 1 reply; 44+ messages in thread From: Matthew Wilcox @ 2017-12-13 12:25 UTC (permalink / raw) To: Michal Hocko Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML, linux-arch-u79uwXL29TY76Z2rM5mHXA, Florian Weimer, John Hubbard, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko On Wed, Dec 13, 2017 at 10:25:48AM +0100, Michal Hocko wrote: > I am afraid we can bikeshed this to death and there will still be > somebody finding yet another better name. Therefore I've decided to > stick with my original MAP_FIXED_SAFE. Why? Well, because it keeps the > MAP_FIXED prefix which should be recognized by developers and _SAFE > suffix should also be clear that all dangerous side effects of the old > MAP_FIXED are gone. I liked basically every other name suggested more than MAP_FIXED_SAFE. "Safe against what?" was an important question. MAP_AT_ADDR was the best suggestion I saw that wasn't one of mine. Of my suggestions, I liked MAP_STATIC the best. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-13 12:25 ` [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Matthew Wilcox @ 2017-12-13 12:34 ` Michal Hocko 0 siblings, 0 replies; 44+ messages in thread From: Michal Hocko @ 2017-12-13 12:34 UTC (permalink / raw) To: Matthew Wilcox Cc: linux-api, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Abdul Haleem, Joel Stanley, Kees Cook On Wed 13-12-17 04:25:33, Matthew Wilcox wrote: > On Wed, Dec 13, 2017 at 10:25:48AM +0100, Michal Hocko wrote: > > I am afraid we can bikeshed this to death and there will still be > > somebody finding yet another better name. Therefore I've decided to > > stick with my original MAP_FIXED_SAFE. Why? Well, because it keeps the > > MAP_FIXED prefix which should be recognized by developers and _SAFE > > suffix should also be clear that all dangerous side effects of the old > > MAP_FIXED are gone. > > I liked basically every other name suggested more than MAP_FIXED_SAFE. > "Safe against what?" was an important question. > > MAP_AT_ADDR was the best suggestion I saw that wasn't one of mine. Of > my suggestions, I liked MAP_STATIC the best. The question is whether you care enough to pursue this further yourself. Because as I've said I do not want to spend another round discussing the name. The flag is documented and I believe that the name has some merit. Disagreeing on naming is the easiest pitfall to block otherwise useful functionality from being merged. And I am pretty sure there will be always somebody objecting... -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE [not found] ` <20171213092550.2774-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2017-12-13 12:25 ` [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Matthew Wilcox @ 2017-12-13 17:13 ` Kees Cook 2017-12-15 9:02 ` Michael Ellerman 2017-12-14 0:32 ` Andrew Morton 2 siblings, 1 reply; 44+ messages in thread From: Kees Cook @ 2017-12-13 17:13 UTC (permalink / raw) To: Michal Hocko Cc: Linux API, Khalid Aziz, Michael Ellerman, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Michal Hocko On Wed, Dec 13, 2017 at 1:25 AM, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > > Hi, > I am resending with some minor updates based on Michael's review and > ask for inclusion. There haven't been any fundamental objections for > the RFC [1] nor the previous version [2]. The biggest discussion > revolved around the naming. There were many suggestions flowing > around MAP_REQUIRED, MAP_EXACT, MAP_FIXED_NOCLOBBER, MAP_AT_ADDR, > MAP_FIXED_NOREPLACE etc... With this named MAP_FIXED_NOREPLACE (the best consensus we've got on a name), please consider this series: Acked-by: Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> -Kees > > I am afraid we can bikeshed this to death and there will still be > somebody finding yet another better name. Therefore I've decided to > stick with my original MAP_FIXED_SAFE. Why? Well, because it keeps the > MAP_FIXED prefix which should be recognized by developers and _SAFE > suffix should also be clear that all dangerous side effects of the old > MAP_FIXED are gone. > > If somebody _really_ hates this then feel free to nack and resubmit > with a different name you can find a consensus for. I am sorry to be > stubborn here but I would rather have this merged than go over few more > iterations changing the name just because it seems like a good idea > now. My experience tells me that chances are that the name will turn out > to be "suboptimal" anyway over time. > > Some more background: > This has started as a follow up discussion [3][4] resulting in the > runtime failure caused by hardening patch [5] which removes MAP_FIXED > from the elf loader because MAP_FIXED is inherently dangerous as it > might silently clobber an existing underlying mapping (e.g. stack). The > reason for the failure is that some architectures enforce an alignment > for the given address hint without MAP_FIXED used (e.g. for shared or > file backed mappings). > > One way around this would be excluding those archs which do alignment > tricks from the hardening [6]. The patch is really trivial but it has > been objected, rightfully so, that this screams for a more generic > solution. We basically want a non-destructive MAP_FIXED. > > The first patch introduced MAP_FIXED_SAFE which enforces the given > address but unlike MAP_FIXED it fails with EEXIST if the given range > conflicts with an existing one. The flag is introduced as a completely > new one rather than a MAP_FIXED extension because of the backward > compatibility. We really want a never-clobber semantic even on older > kernels which do not recognize the flag. Unfortunately mmap sucks wrt. > flags evaluation because we do not EINVAL on unknown flags. On those > kernels we would simply use the traditional hint based semantic so the > caller can still get a different address (which sucks) but at least not > silently corrupt an existing mapping. I do not see a good way around > that. Except we won't export expose the new semantic to the userspace at > all. > > It seems there are users who would like to have something like that. > Jemalloc has been mentioned by Michael Ellerman [7] > > Florian Weimer has mentioned the following: > : glibc ld.so currently maps DSOs without hints. This means that the kernel > : will map right next to each other, and the offsets between them a completely > : predictable. We would like to change that and supply a random address in a > : window of the address space. If there is a conflict, we do not want the > : kernel to pick a non-random address. Instead, we would try again with a > : random address. > > John Hubbard has mentioned CUDA example > : a) Searches /proc/<pid>/maps for a "suitable" region of available > : VA space. "Suitable" generally means it has to have a base address > : within a certain limited range (a particular device model might > : have odd limitations, for example), it has to be large enough, and > : alignment has to be large enough (again, various devices may have > : constraints that lead us to do this). > : > : This is of course subject to races with other threads in the process. > : > : Let's say it finds a region starting at va. > : > : b) Next it does: > : p = mmap(va, ...) > : > : *without* setting MAP_FIXED, of course (so va is just a hint), to > : attempt to safely reserve that region. If p != va, then in most cases, > : this is a failure (almost certainly due to another thread getting a > : mapping from that region before we did), and so this layer now has to > : call munmap(), before returning a "failure: retry" to upper layers. > : > : IMPROVEMENT: --> if instead, we could call this: > : > : p = mmap(va, ... MAP_FIXED_SAFE ...) > : > : , then we could skip the munmap() call upon failure. This > : is a small thing, but it is useful here. (Thanks to Piotr > : Jaroszynski and Mark Hairgrove for helping me get that detail > : exactly right, btw.) > : > : c) After that, CUDA suballocates from p, via: > : > : q = mmap(sub_region_start, ... MAP_FIXED ...) > : > : Interestingly enough, "freeing" is also done via MAP_FIXED, and > : setting PROT_NONE to the subregion. Anyway, I just included (c) for > : general interest. > > Atomic address range probing in the multithreaded programs in general > sounds like an interesting thing to me. > > The second patch simply replaces MAP_FIXED use in elf loader by > MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should > follow. Actually real MAP_FIXED usages should be docummented properly > and they should be more of an exception. > > Diffstat says > arch/alpha/include/uapi/asm/mman.h | 1 + > arch/metag/kernel/process.c | 6 +++++- > arch/mips/include/uapi/asm/mman.h | 2 ++ > arch/parisc/include/uapi/asm/mman.h | 2 ++ > arch/sparc/include/uapi/asm/mman.h | 1 - > arch/xtensa/include/uapi/asm/mman.h | 2 ++ > fs/binfmt_elf.c | 12 ++++++++---- > include/uapi/asm-generic/mman-common.h | 1 + > mm/mmap.c | 11 +++++++++++ > 9 files changed, 32 insertions(+), 6 deletions(-) > > [1] http://lkml.kernel.org/r/20171116101900.13621-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org > [2] http://lkml.kernel.org/r/20171129144219.22867-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org > [3] http://lkml.kernel.org/r/20171107162217.382cd754-3FnU+UHB4dNDw9hX6IcOSA@public.gmane.org > [4] http://lkml.kernel.org/r/1510048229.12079.7.camel-JKZ9t1WPFCv1ENwx4SLHqw@public.gmane.org > [5] http://lkml.kernel.org/r/20171023082608.6167-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org > [6] http://lkml.kernel.org/r/20171113094203.aofz2e7kueitk55y-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org > [7] http://lkml.kernel.org/r/87efp1w7vy.fsf-W0DJWXSxmBNbyGPkN3NxC2scP1bn1w/D@public.gmane.org > -- Kees Cook Pixel Security ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-13 17:13 ` Kees Cook @ 2017-12-15 9:02 ` Michael Ellerman 0 siblings, 0 replies; 44+ messages in thread From: Michael Ellerman @ 2017-12-15 9:02 UTC (permalink / raw) To: Kees Cook, Michal Hocko Cc: Linux API, Khalid Aziz, Andrew Morton, Russell King - ARM Linux, Andrea Arcangeli, Linux-MM, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Michal Hocko Kees Cook <keescook@chromium.org> writes: > On Wed, Dec 13, 2017 at 1:25 AM, Michal Hocko <mhocko@kernel.org> wrote: >> >> Hi, >> I am resending with some minor updates based on Michael's review and >> ask for inclusion. There haven't been any fundamental objections for >> the RFC [1] nor the previous version [2]. The biggest discussion >> revolved around the naming. There were many suggestions flowing >> around MAP_REQUIRED, MAP_EXACT, MAP_FIXED_NOCLOBBER, MAP_AT_ADDR, >> MAP_FIXED_NOREPLACE etc... > > With this named MAP_FIXED_NOREPLACE (the best consensus we've got on a > name), please consider this series: > > Acked-by: Kees Cook <keescook@chromium.org> I don't feel like I'm actually qualified to ack the mm and binfmt changes, but everything *looks* correct to me, and you've fixed the flag numbering such that it can go in mman-common.h as I suggested. So if the name was MAP_FIXED_NOREPLACE I would also be happy with it. Acked-by: Michael Ellerman <mpe@ellerman.id.au> I can resubmit with the name changed if that's what it takes. cheers -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE [not found] ` <20171213092550.2774-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2017-12-13 12:25 ` [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Matthew Wilcox 2017-12-13 17:13 ` Kees Cook @ 2017-12-14 0:32 ` Andrew Morton 2017-12-14 1:35 ` David Goldblatt 2017-12-14 12:44 ` Edward Napierala 2 siblings, 2 replies; 44+ messages in thread From: Andrew Morton @ 2017-12-14 0:32 UTC (permalink / raw) To: Michal Hocko Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, linux-mm-Bw31MaZKKs3YtjvyW6yDsg, LKML, linux-arch-u79uwXL29TY76Z2rM5mHXA, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko, jasone-hpIqsD4AKlfQT0dZR+AlfA, davidtgoldblatt-Re5JQEeQqe8AvxtiuMwx3w, trasz-HZy0K5TPuP5AfugRpC6u6w On Wed, 13 Dec 2017 10:25:48 +0100 Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > > Hi, > I am resending with some minor updates based on Michael's review and > ask for inclusion. There haven't been any fundamental objections for > the RFC [1] nor the previous version [2]. The biggest discussion > revolved around the naming. There were many suggestions flowing > around MAP_REQUIRED, MAP_EXACT, MAP_FIXED_NOCLOBBER, MAP_AT_ADDR, > MAP_FIXED_NOREPLACE etc... I like MAP_FIXED_CAREFUL :) > I am afraid we can bikeshed this to death and there will still be > somebody finding yet another better name. Therefore I've decided to > stick with my original MAP_FIXED_SAFE. Why? Well, because it keeps the > MAP_FIXED prefix which should be recognized by developers and _SAFE > suffix should also be clear that all dangerous side effects of the old > MAP_FIXED are gone. > > If somebody _really_ hates this then feel free to nack and resubmit > with a different name you can find a consensus for. I am sorry to be > stubborn here but I would rather have this merged than go over few more > iterations changing the name just because it seems like a good idea > now. My experience tells me that chances are that the name will turn out > to be "suboptimal" anyway over time. > > Some more background: > This has started as a follow up discussion [3][4] resulting in the > runtime failure caused by hardening patch [5] which removes MAP_FIXED > from the elf loader because MAP_FIXED is inherently dangerous as it > might silently clobber an existing underlying mapping (e.g. stack). The > reason for the failure is that some architectures enforce an alignment > for the given address hint without MAP_FIXED used (e.g. for shared or > file backed mappings). > > One way around this would be excluding those archs which do alignment > tricks from the hardening [6]. The patch is really trivial but it has > been objected, rightfully so, that this screams for a more generic > solution. We basically want a non-destructive MAP_FIXED. > > The first patch introduced MAP_FIXED_SAFE which enforces the given > address but unlike MAP_FIXED it fails with EEXIST if the given range > conflicts with an existing one. The flag is introduced as a completely > new one rather than a MAP_FIXED extension because of the backward > compatibility. We really want a never-clobber semantic even on older > kernels which do not recognize the flag. Unfortunately mmap sucks wrt. > flags evaluation because we do not EINVAL on unknown flags. On those > kernels we would simply use the traditional hint based semantic so the > caller can still get a different address (which sucks) but at least not > silently corrupt an existing mapping. I do not see a good way around > that. Except we won't export expose the new semantic to the userspace at > all. > > It seems there are users who would like to have something like that. > Jemalloc has been mentioned by Michael Ellerman [7] http://lkml.kernel.org/r/87efp1w7vy.fsf-W0DJWXSxmBNbyGPkN3NxC2scP1bn1w/D4b8TPpkIZ3Q@public.gmane.org It would be useful to get feedback from jemalloc developers (please). I'll add some cc's. > Florian Weimer has mentioned the following: > : glibc ld.so currently maps DSOs without hints. This means that the kernel > : will map right next to each other, and the offsets between them a completely > : predictable. We would like to change that and supply a random address in a > : window of the address space. If there is a conflict, we do not want the > : kernel to pick a non-random address. Instead, we would try again with a > : random address. > > John Hubbard has mentioned CUDA example > : a) Searches /proc/<pid>/maps for a "suitable" region of available > : VA space. "Suitable" generally means it has to have a base address > : within a certain limited range (a particular device model might > : have odd limitations, for example), it has to be large enough, and > : alignment has to be large enough (again, various devices may have > : constraints that lead us to do this). > : > : This is of course subject to races with other threads in the process. > : > : Let's say it finds a region starting at va. > : > : b) Next it does: > : p = mmap(va, ...) > : > : *without* setting MAP_FIXED, of course (so va is just a hint), to > : attempt to safely reserve that region. If p != va, then in most cases, > : this is a failure (almost certainly due to another thread getting a > : mapping from that region before we did), and so this layer now has to > : call munmap(), before returning a "failure: retry" to upper layers. > : > : IMPROVEMENT: --> if instead, we could call this: > : > : p = mmap(va, ... MAP_FIXED_SAFE ...) > : > : , then we could skip the munmap() call upon failure. This > : is a small thing, but it is useful here. (Thanks to Piotr > : Jaroszynski and Mark Hairgrove for helping me get that detail > : exactly right, btw.) > : > : c) After that, CUDA suballocates from p, via: > : > : q = mmap(sub_region_start, ... MAP_FIXED ...) > : > : Interestingly enough, "freeing" is also done via MAP_FIXED, and > : setting PROT_NONE to the subregion. Anyway, I just included (c) for > : general interest. > > Atomic address range probing in the multithreaded programs in general > sounds like an interesting thing to me. > > The second patch simply replaces MAP_FIXED use in elf loader by > MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should > follow. Actually real MAP_FIXED usages should be docummented properly > and they should be more of an exception. ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-14 0:32 ` Andrew Morton @ 2017-12-14 1:35 ` David Goldblatt 2017-12-14 1:42 ` David Goldblatt 2017-12-14 12:44 ` Edward Napierala 1 sibling, 1 reply; 44+ messages in thread From: David Goldblatt @ 2017-12-14 1:35 UTC (permalink / raw) To: Andrew Morton Cc: Michal Hocko, linux-api, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko, trasz, Jason Evans [-- Attachment #1: Type: text/plain, Size: 6278 bytes --] (+cc the jemalloc jasone; -cc,+bcc the Google jasone). The only time we would want MAP_FIXED (or rather, a non-broken variant) is in the middle of trying to expand an allocation in place; "atomic address range probing in the multithreaded programs" describes our use case pretty well. That's in a pathway that usually fails; it's pretty far down on our kernel mmap enhancements wish-list. On Wed, Dec 13, 2017 at 4:32 PM, Andrew Morton <akpm@linux-foundation.org> wrote: > On Wed, 13 Dec 2017 10:25:48 +0100 Michal Hocko <mhocko@kernel.org> wrote: > > > > > Hi, > > I am resending with some minor updates based on Michael's review and > > ask for inclusion. There haven't been any fundamental objections for > > the RFC [1] nor the previous version [2]. The biggest discussion > > revolved around the naming. There were many suggestions flowing > > around MAP_REQUIRED, MAP_EXACT, MAP_FIXED_NOCLOBBER, MAP_AT_ADDR, > > MAP_FIXED_NOREPLACE etc... > > I like MAP_FIXED_CAREFUL :) > > > I am afraid we can bikeshed this to death and there will still be > > somebody finding yet another better name. Therefore I've decided to > > stick with my original MAP_FIXED_SAFE. Why? Well, because it keeps the > > MAP_FIXED prefix which should be recognized by developers and _SAFE > > suffix should also be clear that all dangerous side effects of the old > > MAP_FIXED are gone. > > > > If somebody _really_ hates this then feel free to nack and resubmit > > with a different name you can find a consensus for. I am sorry to be > > stubborn here but I would rather have this merged than go over few more > > iterations changing the name just because it seems like a good idea > > now. My experience tells me that chances are that the name will turn out > > to be "suboptimal" anyway over time. > > > > Some more background: > > This has started as a follow up discussion [3][4] resulting in the > > runtime failure caused by hardening patch [5] which removes MAP_FIXED > > from the elf loader because MAP_FIXED is inherently dangerous as it > > might silently clobber an existing underlying mapping (e.g. stack). The > > reason for the failure is that some architectures enforce an alignment > > for the given address hint without MAP_FIXED used (e.g. for shared or > > file backed mappings). > > > > One way around this would be excluding those archs which do alignment > > tricks from the hardening [6]. The patch is really trivial but it has > > been objected, rightfully so, that this screams for a more generic > > solution. We basically want a non-destructive MAP_FIXED. > > > > The first patch introduced MAP_FIXED_SAFE which enforces the given > > address but unlike MAP_FIXED it fails with EEXIST if the given range > > conflicts with an existing one. The flag is introduced as a completely > > new one rather than a MAP_FIXED extension because of the backward > > compatibility. We really want a never-clobber semantic even on older > > kernels which do not recognize the flag. Unfortunately mmap sucks wrt. > > flags evaluation because we do not EINVAL on unknown flags. On those > > kernels we would simply use the traditional hint based semantic so the > > caller can still get a different address (which sucks) but at least not > > silently corrupt an existing mapping. I do not see a good way around > > that. Except we won't export expose the new semantic to the userspace at > > all. > > > > It seems there are users who would like to have something like that. > > Jemalloc has been mentioned by Michael Ellerman [7] > > http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au. > > It would be useful to get feedback from jemalloc developers (please). > I'll add some cc's. > > > > Florian Weimer has mentioned the following: > > : glibc ld.so currently maps DSOs without hints. This means that the > kernel > > : will map right next to each other, and the offsets between them a > completely > > : predictable. We would like to change that and supply a random address > in a > > : window of the address space. If there is a conflict, we do not want > the > > : kernel to pick a non-random address. Instead, we would try again with a > > : random address. > > > > John Hubbard has mentioned CUDA example > > : a) Searches /proc/<pid>/maps for a "suitable" region of available > > : VA space. "Suitable" generally means it has to have a base address > > : within a certain limited range (a particular device model might > > : have odd limitations, for example), it has to be large enough, and > > : alignment has to be large enough (again, various devices may have > > : constraints that lead us to do this). > > : > > : This is of course subject to races with other threads in the process. > > : > > : Let's say it finds a region starting at va. > > : > > : b) Next it does: > > : p = mmap(va, ...) > > : > > : *without* setting MAP_FIXED, of course (so va is just a hint), to > > : attempt to safely reserve that region. If p != va, then in most cases, > > : this is a failure (almost certainly due to another thread getting a > > : mapping from that region before we did), and so this layer now has to > > : call munmap(), before returning a "failure: retry" to upper layers. > > : > > : IMPROVEMENT: --> if instead, we could call this: > > : > > : p = mmap(va, ... MAP_FIXED_SAFE ...) > > : > > : , then we could skip the munmap() call upon failure. This > > : is a small thing, but it is useful here. (Thanks to Piotr > > : Jaroszynski and Mark Hairgrove for helping me get that detail > > : exactly right, btw.) > > : > > : c) After that, CUDA suballocates from p, via: > > : > > : q = mmap(sub_region_start, ... MAP_FIXED ...) > > : > > : Interestingly enough, "freeing" is also done via MAP_FIXED, and > > : setting PROT_NONE to the subregion. Anyway, I just included (c) for > > : general interest. > > > > Atomic address range probing in the multithreaded programs in general > > sounds like an interesting thing to me. > > > > The second patch simply replaces MAP_FIXED use in elf loader by > > MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should > > follow. Actually real MAP_FIXED usages should be docummented properly > > and they should be more of an exception. > > [-- Attachment #2: Type: text/html, Size: 7614 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-14 1:35 ` David Goldblatt @ 2017-12-14 1:42 ` David Goldblatt 0 siblings, 0 replies; 44+ messages in thread From: David Goldblatt @ 2017-12-14 1:42 UTC (permalink / raw) To: Andrew Morton Cc: Michal Hocko, linux-api, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko, trasz, Jason Evans (+cc the jemalloc jasone; -cc,+bcc the Google jasone). The only time we would want MAP_FIXED (or rather, a non-broken variant) is in the middle of trying to expand an allocation in place; "atomic address range probing in the multithreaded programs" describes our use case pretty well. That's in a pathway that usually fails; it's pretty far down on our kernel mmap enhancements wish-list. (Sorry if you get this twice, an html reply bounced). On Wed, Dec 13, 2017 at 5:35 PM, David Goldblatt <davidtgoldblatt@gmail.com> wrote: > (+cc the jemalloc jasone; -cc,+bcc the Google jasone). > > The only time we would want MAP_FIXED (or rather, a non-broken variant) is > in the middle of trying to expand an allocation in place; "atomic address > range probing in the multithreaded programs" describes our use case pretty > well. That's in a pathway that usually fails; it's pretty far down on our > kernel mmap enhancements wish-list. > > On Wed, Dec 13, 2017 at 4:32 PM, Andrew Morton <akpm@linux-foundation.org> > wrote: >> >> On Wed, 13 Dec 2017 10:25:48 +0100 Michal Hocko <mhocko@kernel.org> wrote: >> >> > >> > Hi, >> > I am resending with some minor updates based on Michael's review and >> > ask for inclusion. There haven't been any fundamental objections for >> > the RFC [1] nor the previous version [2]. The biggest discussion >> > revolved around the naming. There were many suggestions flowing >> > around MAP_REQUIRED, MAP_EXACT, MAP_FIXED_NOCLOBBER, MAP_AT_ADDR, >> > MAP_FIXED_NOREPLACE etc... >> >> I like MAP_FIXED_CAREFUL :) >> >> > I am afraid we can bikeshed this to death and there will still be >> > somebody finding yet another better name. Therefore I've decided to >> > stick with my original MAP_FIXED_SAFE. Why? Well, because it keeps the >> > MAP_FIXED prefix which should be recognized by developers and _SAFE >> > suffix should also be clear that all dangerous side effects of the old >> > MAP_FIXED are gone. >> > >> > If somebody _really_ hates this then feel free to nack and resubmit >> > with a different name you can find a consensus for. I am sorry to be >> > stubborn here but I would rather have this merged than go over few more >> > iterations changing the name just because it seems like a good idea >> > now. My experience tells me that chances are that the name will turn out >> > to be "suboptimal" anyway over time. >> > >> > Some more background: >> > This has started as a follow up discussion [3][4] resulting in the >> > runtime failure caused by hardening patch [5] which removes MAP_FIXED >> > from the elf loader because MAP_FIXED is inherently dangerous as it >> > might silently clobber an existing underlying mapping (e.g. stack). The >> > reason for the failure is that some architectures enforce an alignment >> > for the given address hint without MAP_FIXED used (e.g. for shared or >> > file backed mappings). >> > >> > One way around this would be excluding those archs which do alignment >> > tricks from the hardening [6]. The patch is really trivial but it has >> > been objected, rightfully so, that this screams for a more generic >> > solution. We basically want a non-destructive MAP_FIXED. >> > >> > The first patch introduced MAP_FIXED_SAFE which enforces the given >> > address but unlike MAP_FIXED it fails with EEXIST if the given range >> > conflicts with an existing one. The flag is introduced as a completely >> > new one rather than a MAP_FIXED extension because of the backward >> > compatibility. We really want a never-clobber semantic even on older >> > kernels which do not recognize the flag. Unfortunately mmap sucks wrt. >> > flags evaluation because we do not EINVAL on unknown flags. On those >> > kernels we would simply use the traditional hint based semantic so the >> > caller can still get a different address (which sucks) but at least not >> > silently corrupt an existing mapping. I do not see a good way around >> > that. Except we won't export expose the new semantic to the userspace at >> > all. >> > >> > It seems there are users who would like to have something like that. >> > Jemalloc has been mentioned by Michael Ellerman [7] >> >> http://lkml.kernel.org/r/87efp1w7vy.fsf@concordia.ellerman.id.au. >> >> It would be useful to get feedback from jemalloc developers (please). >> I'll add some cc's. >> >> >> > Florian Weimer has mentioned the following: >> > : glibc ld.so currently maps DSOs without hints. This means that the >> > kernel >> > : will map right next to each other, and the offsets between them a >> > completely >> > : predictable. We would like to change that and supply a random address >> > in a >> > : window of the address space. If there is a conflict, we do not want >> > the >> > : kernel to pick a non-random address. Instead, we would try again with >> > a >> > : random address. >> > >> > John Hubbard has mentioned CUDA example >> > : a) Searches /proc/<pid>/maps for a "suitable" region of available >> > : VA space. "Suitable" generally means it has to have a base address >> > : within a certain limited range (a particular device model might >> > : have odd limitations, for example), it has to be large enough, and >> > : alignment has to be large enough (again, various devices may have >> > : constraints that lead us to do this). >> > : >> > : This is of course subject to races with other threads in the process. >> > : >> > : Let's say it finds a region starting at va. >> > : >> > : b) Next it does: >> > : p = mmap(va, ...) >> > : >> > : *without* setting MAP_FIXED, of course (so va is just a hint), to >> > : attempt to safely reserve that region. If p != va, then in most cases, >> > : this is a failure (almost certainly due to another thread getting a >> > : mapping from that region before we did), and so this layer now has to >> > : call munmap(), before returning a "failure: retry" to upper layers. >> > : >> > : IMPROVEMENT: --> if instead, we could call this: >> > : >> > : p = mmap(va, ... MAP_FIXED_SAFE ...) >> > : >> > : , then we could skip the munmap() call upon failure. This >> > : is a small thing, but it is useful here. (Thanks to Piotr >> > : Jaroszynski and Mark Hairgrove for helping me get that detail >> > : exactly right, btw.) >> > : >> > : c) After that, CUDA suballocates from p, via: >> > : >> > : q = mmap(sub_region_start, ... MAP_FIXED ...) >> > : >> > : Interestingly enough, "freeing" is also done via MAP_FIXED, and >> > : setting PROT_NONE to the subregion. Anyway, I just included (c) for >> > : general interest. >> > >> > Atomic address range probing in the multithreaded programs in general >> > sounds like an interesting thing to me. >> > >> > The second patch simply replaces MAP_FIXED use in elf loader by >> > MAP_FIXED_SAFE. I believe other places which rely on MAP_FIXED should >> > follow. Actually real MAP_FIXED usages should be docummented properly >> > and they should be more of an exception. >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-14 0:32 ` Andrew Morton 2017-12-14 1:35 ` David Goldblatt @ 2017-12-14 12:44 ` Edward Napierala 2017-12-14 13:15 ` Michal Hocko 1 sibling, 1 reply; 44+ messages in thread From: Edward Napierala @ 2017-12-14 12:44 UTC (permalink / raw) To: Andrew Morton Cc: Michal Hocko, linux-api, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, Michal Hocko, jasone, davidtgoldblatt [-- Attachment #1: Type: text/plain, Size: 827 bytes --] Regarding the name - how about adopting MAP_EXCL? It was introduced in FreeBSD, and seems to do exactly this; quoting mmap(2): MAP_FIXED Do not permit the system to select a different address than the one specified. If the specified address cannot be used, mmap() will fail. If MAP_FIXED is specified, addr must be a multiple of the page size. If MAP_EXCL is not specified, a successful MAP_FIXED request replaces any previous mappings for the process' pages in the range from addr to addr + len. In contrast, if MAP_EXCL is specified, the request will fail if a mapping already exists within the range. [-- Attachment #2: Type: text/html, Size: 1120 bytes --] ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-14 12:44 ` Edward Napierala @ 2017-12-14 13:15 ` Michal Hocko 2017-12-14 14:54 ` Edward Napierala 0 siblings, 1 reply; 44+ messages in thread From: Michal Hocko @ 2017-12-14 13:15 UTC (permalink / raw) To: Edward Napierala Cc: Andrew Morton, linux-api, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, jasone, davidtgoldblatt On Thu 14-12-17 12:44:17, Edward Napierala wrote: > Regarding the name - how about adopting MAP_EXCL? It was introduced in > FreeBSD, > and seems to do exactly this; quoting mmap(2): > > MAP_FIXED Do not permit the system to select a different address > than the one specified. If the specified address > cannot be used, mmap() will fail. If MAP_FIXED is > specified, addr must be a multiple of the page size. > If MAP_EXCL is not specified, a successful MAP_FIXED > request replaces any previous mappings for the > process' pages in the range from addr to addr + len. > In contrast, if MAP_EXCL is specified, the request > will fail if a mapping already exists within the > range. I am not familiar with the FreeBSD implementation but from the above it looks like MAP_EXCL is a MAP_FIXED mofifier which is not how we are going to implement it in linux due to reasons mentioned in this cover letter. Using the same name would be more confusing than helpful I am afraid. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-14 13:15 ` Michal Hocko @ 2017-12-14 14:54 ` Edward Napierala 2017-12-19 12:40 ` David Laight 0 siblings, 1 reply; 44+ messages in thread From: Edward Napierala @ 2017-12-14 14:54 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, linux-api, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, jasone, davidtgoldblatt On 1214T1415, Michal Hocko wrote: > On Thu 14-12-17 12:44:17, Edward Napierala wrote: > > Regarding the name - how about adopting MAP_EXCL? It was introduced in > > FreeBSD, > > and seems to do exactly this; quoting mmap(2): > > > > MAP_FIXED Do not permit the system to select a different address > > than the one specified. If the specified address > > cannot be used, mmap() will fail. If MAP_FIXED is > > specified, addr must be a multiple of the page size. > > If MAP_EXCL is not specified, a successful MAP_FIXED > > request replaces any previous mappings for the > > process' pages in the range from addr to addr + len. > > In contrast, if MAP_EXCL is specified, the request > > will fail if a mapping already exists within the > > range. > > I am not familiar with the FreeBSD implementation but from the above it > looks like MAP_EXCL is a MAP_FIXED mofifier which is not how we are > going to implement it in linux due to reasons mentioned in this cover > letter. Using the same name would be more confusing than helpful I am > afraid. Sorry, missed that. Indeed, reusing a name with a different semantics would be a bad idea. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* RE: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-14 14:54 ` Edward Napierala @ 2017-12-19 12:40 ` David Laight 2017-12-19 12:46 ` Michal Hocko 0 siblings, 1 reply; 44+ messages in thread From: David Laight @ 2017-12-19 12:40 UTC (permalink / raw) To: 'Edward Napierala', Michal Hocko Cc: Andrew Morton, linux-api, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, jasone, davidtgoldblatt From: Edward Napierala > Sent: 14 December 2017 14:55 > > On 1214T1415, Michal Hocko wrote: > > On Thu 14-12-17 12:44:17, Edward Napierala wrote: > > > Regarding the name - how about adopting MAP_EXCL? It was introduced in > > > FreeBSD, > > > and seems to do exactly this; quoting mmap(2): > > > > > > MAP_FIXED Do not permit the system to select a different address > > > than the one specified. If the specified address > > > cannot be used, mmap() will fail. If MAP_FIXED is > > > specified, addr must be a multiple of the page size. > > > If MAP_EXCL is not specified, a successful MAP_FIXED > > > request replaces any previous mappings for the > > > process' pages in the range from addr to addr + len. > > > In contrast, if MAP_EXCL is specified, the request > > > will fail if a mapping already exists within the > > > range. > > > > I am not familiar with the FreeBSD implementation but from the above it > > looks like MAP_EXCL is a MAP_FIXED mofifier which is not how we are > > going to implement it in linux due to reasons mentioned in this cover > > letter. Using the same name would be more confusing than helpful I am > > afraid. > > Sorry, missed that. Indeed, reusing a name with a different semantics > would be a bad idea. I don't remember any discussion about using MAP_FIXED | MAP_EXCL ? Why not match the prior art?? David -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE 2017-12-19 12:40 ` David Laight @ 2017-12-19 12:46 ` Michal Hocko 0 siblings, 0 replies; 44+ messages in thread From: Michal Hocko @ 2017-12-19 12:46 UTC (permalink / raw) To: David Laight Cc: 'Edward Napierala', Andrew Morton, linux-api, Khalid Aziz, Michael Ellerman, Russell King - ARM Linux, Andrea Arcangeli, linux-mm, LKML, linux-arch, Florian Weimer, John Hubbard, Matthew Wilcox, Abdul Haleem, Joel Stanley, Kees Cook, jasone, davidtgoldblatt On Tue 19-12-17 12:40:16, David Laight wrote: > From: Edward Napierala > > Sent: 14 December 2017 14:55 > > > > On 1214T1415, Michal Hocko wrote: > > > On Thu 14-12-17 12:44:17, Edward Napierala wrote: > > > > Regarding the name - how about adopting MAP_EXCL? It was introduced in > > > > FreeBSD, > > > > and seems to do exactly this; quoting mmap(2): > > > > > > > > MAP_FIXED Do not permit the system to select a different address > > > > than the one specified. If the specified address > > > > cannot be used, mmap() will fail. If MAP_FIXED is > > > > specified, addr must be a multiple of the page size. > > > > If MAP_EXCL is not specified, a successful MAP_FIXED > > > > request replaces any previous mappings for the > > > > process' pages in the range from addr to addr + len. > > > > In contrast, if MAP_EXCL is specified, the request > > > > will fail if a mapping already exists within the > > > > range. > > > > > > I am not familiar with the FreeBSD implementation but from the above it > > > looks like MAP_EXCL is a MAP_FIXED mofifier which is not how we are > > > going to implement it in linux due to reasons mentioned in this cover > > > letter. Using the same name would be more confusing than helpful I am > > > afraid. > > > > Sorry, missed that. Indeed, reusing a name with a different semantics > > would be a bad idea. > > I don't remember any discussion about using MAP_FIXED | MAP_EXCL ? > > Why not match the prior art?? See the cover letter which explains why an extension is not backward compatible. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2017-12-22 0:06 UTC | newest] Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-12-13 9:25 [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Michal Hocko 2017-12-13 9:25 ` [PATCH 1/2] " Michal Hocko [not found] ` <20171213092550.2774-2-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2017-12-13 12:50 ` Matthew Wilcox 2017-12-13 13:01 ` Michal Hocko 2017-12-13 9:25 ` [PATCH 2/2] fs, elf: drop MAP_FIXED usage from elf_map Michal Hocko 2017-12-16 0:49 ` [2/2] " Andrei Vagin 2017-12-18 9:13 ` Michal Hocko 2017-12-18 18:12 ` Andrei Vagin 2017-12-13 9:31 ` [PATCH 1/2] mmap.2: document new MAP_FIXED_SAFE flag Michal Hocko 2017-12-13 9:31 ` [PATCH 2/2] mmap.2: MAP_FIXED updated documentation Michal Hocko 2017-12-13 12:55 ` Pavel Machek 2017-12-13 13:03 ` Cyril Hrubis 2017-12-13 13:04 ` Michal Hocko 2017-12-13 13:09 ` Pavel Machek 2017-12-13 13:16 ` Michal Hocko [not found] ` <20171213131640.GJ25185-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org> 2017-12-13 13:21 ` Pavel Machek 2017-12-13 13:35 ` Michal Hocko 2017-12-13 14:40 ` Cyril Hrubis 2017-12-13 23:19 ` Kees Cook [not found] ` <CAGXu5jLqE6cUxk-Girx6PG7upEzz8jmu1OH_3LVC26iJc2vTxQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-14 7:07 ` Michal Hocko 2017-12-18 19:12 ` Michael Kerrisk (man-pages) 2017-12-18 20:19 ` Kees Cook 2017-12-18 20:33 ` Matthew Wilcox [not found] ` <CAGXu5jJ289R9koVoHmxcvUWr6XHSZR2p0qq3WtpNyN-iNSvrNQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-21 12:38 ` Michael Ellerman 2017-12-21 14:59 ` known bad patch in -mm tree was " Pavel Machek 2017-12-21 15:08 ` Michal Hocko 2017-12-21 22:24 ` Andrew Morton 2017-12-22 0:06 ` Michael Ellerman 2017-12-14 2:52 ` Jann Horn [not found] ` <CAG48ez0JZ3PVW3vgSXDmDijS+a_5bSX9qNuyggnsB6JTSkKngA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-12-14 5:28 ` John Hubbard 2017-12-14 23:06 ` John Hubbard [not found] ` <b4fb7b3a-e53e-bf87-53c5-186751a14f4e-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org> 2017-12-14 23:10 ` Jann Horn [not found] ` <20171213092550.2774-1-mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> 2017-12-13 12:25 ` [PATCH v2 0/2] mm: introduce MAP_FIXED_SAFE Matthew Wilcox 2017-12-13 12:34 ` Michal Hocko 2017-12-13 17:13 ` Kees Cook 2017-12-15 9:02 ` Michael Ellerman 2017-12-14 0:32 ` Andrew Morton 2017-12-14 1:35 ` David Goldblatt 2017-12-14 1:42 ` David Goldblatt 2017-12-14 12:44 ` Edward Napierala 2017-12-14 13:15 ` Michal Hocko 2017-12-14 14:54 ` Edward Napierala 2017-12-19 12:40 ` David Laight 2017-12-19 12:46 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).