* linux-next: Tree for Nov 7 [not found] ` <CACPK8Xe5uUKEytkRiszdX511b_cYTD-z3X=ZsMcNJ-NOYnXfuQ@mail.gmail.com> @ 2017-11-13 9:20 ` Michal Hocko 2017-11-13 9:34 ` Russell King - ARM Linux ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Michal Hocko @ 2017-11-13 9:20 UTC (permalink / raw) To: linux-arm-kernel [Cc arm and ppc maintainers] Thanks a lot for testing! On Sun 12-11-17 11:38:02, Joel Stanley wrote: > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > Hi Joel, > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > [...] > >> > There are a lot of messages on the way up that look like this: > >> > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > > >> > And then trying to run userspace looks like this: > >> > >> Could you please run with debugging patch posted > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57 at dhcp22.suse.cz > > > > Did you have chance to test with this debugging patch, please? > > Lots of this: > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) This smells like the problem I've expected that mmap with hint doesn't respect the hint even though there is no clashing mapping. The above basically says that we didn't map at 0xd9000 but it has placed it at 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new mapping. find_vma returns the closest vma (with addr < vm_end) for the given address 0xd9000 so this address cannot be mapped by any other vma. Now that I am looking at arm's arch_get_unmapped_area it does perform aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, reported earlier [1] seems to suffer from the similar problem. slice_get_unmapped_area alignes to slices, whatever that means. I can see two possible ways around that. Either we explicitly request non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or simply opt out from the MAP_FIXED protection via ifdefs. The first option sounds more generic to me but also more tricky to not introduce other user visible effects. The later is quite straightforward. What do you think about the following on top of the previous patch? It is rather terse and disables the MAP_FIXED protection for arm comletely because I couldn't find a way to make it conditional on CACHEID_VIPT_ALIASING. But this can be always handled later. I find the protection for other archtectures useful enough to have this working for most architectures now and handle others specially. [1] http://lkml.kernel.org/r/1510048229.12079.7.camel at abdul.in.ibm.com --- diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 61a0cb15067e..018d041a30e6 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -99,6 +99,7 @@ config ARM select PERF_USE_VMALLOC select RTC_LIB select SYS_SUPPORTS_APM_EMULATION + select ARCH_ALIGNED_MMAPS # Above selects are sorted alphabetically; please add new ones # according to that. Thanks. help diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 2f629e0551e9..156f69c09c7f 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -368,6 +368,7 @@ config PPC_MM_SLICES bool default y if PPC_STD_MMU_64 default n + select ARCH_ALIGNED_MMAPS config PPC_HAVE_PMU_SUPPORT bool diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index a22718de42db..d23eb89f31c0 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -345,13 +345,19 @@ static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr, unsigned long size, int prot, int type, unsigned long off) { unsigned long map_addr; + unsigned long map_type = type; /* * If caller requests the mapping@a specific place, make sure we fail * rather than potentially clobber an existing mapping which can have - * security consequences (e.g. smash over the stack area). + * security consequences (e.g. smash over the stack area). Be careful + * about architectures which do not respect the address hint due to + * aligning restrictions for !fixed mappings. */ - map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off); + if (!IS_ENABLED(ARCH_ALIGNED_MMAPS)) + map_type &= ~MAP_FIXED; + + map_addr = vm_mmap(filep, addr, size, prot, map_type, off); if (BAD_ADDR(map_addr)) return map_addr; -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 9:20 ` linux-next: Tree for Nov 7 Michal Hocko @ 2017-11-13 9:34 ` Russell King - ARM Linux 2017-11-13 9:42 ` Michal Hocko 2017-11-13 14:11 ` Michal Hocko 2 siblings, 0 replies; 21+ messages in thread From: Russell King - ARM Linux @ 2017-11-13 9:34 UTC (permalink / raw) To: linux-arm-kernel On Mon, Nov 13, 2017 at 10:20:06AM +0100, Michal Hocko wrote: > [Cc arm and ppc maintainers] > > Thanks a lot for testing! > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > Hi Joel, > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > [...] > > >> > There are a lot of messages on the way up that look like this: > > >> > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > > > >> > And then trying to run userspace looks like this: > > >> > > >> Could you please run with debugging patch posted > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57 at dhcp22.suse.cz > > > > > > Did you have chance to test with this debugging patch, please? > > > > Lots of this: > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > This smells like the problem I've expected that mmap with hint doesn't > respect the hint even though there is no clashing mapping. The above > basically says that we didn't map at 0xd9000 but it has placed it at > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > mapping. find_vma returns the closest vma (with addr < vm_end) for the > given address 0xd9000 so this address cannot be mapped by any other vma. > > Now that I am looking at arm's arch_get_unmapped_area it does perform > aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, > reported earlier [1] seems to suffer from the similar problem. > slice_get_unmapped_area alignes to slices, whatever that means. > > I can see two possible ways around that. Either we explicitly request > non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or > simply opt out from the MAP_FIXED protection via ifdefs. The first > option sounds more generic to me but also more tricky to not introduce > other user visible effects. The later is quite straightforward. What do > you think about the following on top of the previous patch? > > It is rather terse and disables the MAP_FIXED protection for arm > comletely because I couldn't find a way to make it conditional on > CACHEID_VIPT_ALIASING. But this can be always handled later. I find the > protection for other archtectures useful enough to have this working for > most architectures now and handle others specially. Can someone provide the background information for this please? -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 9:20 ` linux-next: Tree for Nov 7 Michal Hocko 2017-11-13 9:34 ` Russell King - ARM Linux @ 2017-11-13 9:42 ` Michal Hocko 2017-11-13 11:34 ` Michael Ellerman 2017-11-13 14:11 ` Michal Hocko 2 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2017-11-13 9:42 UTC (permalink / raw) To: linux-arm-kernel On Mon 13-11-17 10:20:06, Michal Hocko wrote: > [Cc arm and ppc maintainers] Hmm, it turned out to be a problem on other architectures as well. CCing more maintainers. For your reference, we are talking about http://lkml.kernel.org/r/20171023082608.6167-1-mhocko at kernel.org which has broken architectures which do apply aligning on the mmap address hint without MAP_FIXED applied. See below my proposed way around this issue because I belive that the above patch is quite valuable on its own to be dropped for all archs. > Thanks a lot for testing! > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > Hi Joel, > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > [...] > > >> > There are a lot of messages on the way up that look like this: > > >> > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > > > >> > And then trying to run userspace looks like this: > > >> > > >> Could you please run with debugging patch posted > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57 at dhcp22.suse.cz > > > > > > Did you have chance to test with this debugging patch, please? > > > > Lots of this: > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > This smells like the problem I've expected that mmap with hint doesn't > respect the hint even though there is no clashing mapping. The above > basically says that we didn't map at 0xd9000 but it has placed it at > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > mapping. find_vma returns the closest vma (with addr < vm_end) for the > given address 0xd9000 so this address cannot be mapped by any other vma. > > Now that I am looking at arm's arch_get_unmapped_area it does perform > aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, > reported earlier [1] seems to suffer from the similar problem. > slice_get_unmapped_area alignes to slices, whatever that means. > > I can see two possible ways around that. Either we explicitly request > non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or > simply opt out from the MAP_FIXED protection via ifdefs. The first > option sounds more generic to me but also more tricky to not introduce > other user visible effects. The later is quite straightforward. What do > you think about the following on top of the previous patch? > > It is rather terse and disables the MAP_FIXED protection for arm > comletely because I couldn't find a way to make it conditional on > CACHEID_VIPT_ALIASING. But this can be always handled later. I find the > protection for other archtectures useful enough to have this working for > most architectures now and handle others specially. > > [1] http://lkml.kernel.org/r/1510048229.12079.7.camel at abdul.in.ibm.com > --- diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 61a0cb15067e..018d041a30e6 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -99,6 +99,7 @@ config ARM select PERF_USE_VMALLOC select RTC_LIB select SYS_SUPPORTS_APM_EMULATION + select ARCH_ALIGNED_MMAPS # Above selects are sorted alphabetically; please add new ones # according to that. Thanks. help diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 48d91d5be4e9..eca59d27e9f1 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -72,6 +72,7 @@ config MIPS select RTC_LIB if !MACH_LOONGSON64 select SYSCTL_EXCEPTION_TRACE select VIRT_TO_BUS + select ARCH_ALIGNED_MMAPS menu "Machine selection" diff --git a/arch/parisc/Kconfig b/arch/parisc/Kconfig index 22f27ec8c117..8376d16e0a4a 100644 --- a/arch/parisc/Kconfig +++ b/arch/parisc/Kconfig @@ -40,6 +40,7 @@ config PARISC select GENERIC_CLOCKEVENTS select ARCH_NO_COHERENT_DMA_MMAP select CPU_NO_EFFICIENT_FFS + select ARCH_ALIGNED_MMAPS help The PA-RISC microprocessor is designed by Hewlett-Packard and used diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 2f629e0551e9..156f69c09c7f 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -368,6 +368,7 @@ config PPC_MM_SLICES bool default y if PPC_STD_MMU_64 default n + select ARCH_ALIGNED_MMAPS config PPC_HAVE_PMU_SUPPORT bool diff --git a/arch/sh/Kconfig b/arch/sh/Kconfig index 640a85925060..ac1d4637a728 100644 --- a/arch/sh/Kconfig +++ b/arch/sh/Kconfig @@ -49,6 +49,7 @@ config SUPERH select HAVE_ARCH_AUDITSYSCALL select HAVE_FUTEX_CMPXCHG if FUTEX select HAVE_NMI + select ARCH_ALIGNED_MMAPS help The SuperH is a RISC processor targeted for use in embedded systems and consumer electronics; it was also used in the Sega Dreamcast diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig index 0be3828752e5..c265dcda3d28 100644 --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -45,6 +45,7 @@ config SPARC select CPU_NO_EFFICIENT_FFS select LOCKDEP_SMALL if LOCKDEP select ARCH_WANT_RELAX_ORDER + select ARCH_ALIGNED_MMAPS if SPARC64 config SPARC32 def_bool !64BIT diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index 7ad6d77b2f22..a5cf535034d1 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -30,6 +30,7 @@ config XTENSA select NO_BOOTMEM select PERF_USE_VMALLOC select VIRT_TO_BUS + select ARCH_ALIGNED_MMAPS if MMU help Xtensa processors are 32-bit RISC machines designed by Tensilica primarily for embedded systems. These processors are both diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index a22718de42db..d23eb89f31c0 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -345,13 +345,19 @@ static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr, unsigned long size, int prot, int type, unsigned long off) { unsigned long map_addr; + unsigned long map_type = type; /* * If caller requests the mapping at a specific place, make sure we fail * rather than potentially clobber an existing mapping which can have - * security consequences (e.g. smash over the stack area). + * security consequences (e.g. smash over the stack area). Be careful + * about architectures which do not respect the address hint due to + * aligning restrictions for !fixed mappings. */ - map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off); + if (!IS_ENABLED(ARCH_ALIGNED_MMAPS)) + map_type &= ~MAP_FIXED; + + map_addr = vm_mmap(filep, addr, size, prot, map_type, off); if (BAD_ADDR(map_addr)) return map_addr; -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 9:42 ` Michal Hocko @ 2017-11-13 11:34 ` Michael Ellerman 2017-11-13 12:00 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Michael Ellerman @ 2017-11-13 11:34 UTC (permalink / raw) To: linux-arm-kernel Hi Michal, Michal Hocko <mhocko@kernel.org> writes: > On Mon 13-11-17 10:20:06, Michal Hocko wrote: >> [Cc arm and ppc maintainers] > > Hmm, it turned out to be a problem on other architectures as well. > CCing more maintainers. For your reference, we are talking about > http://lkml.kernel.org/r/20171023082608.6167-1-mhocko at kernel.org > which has broken architectures which do apply aligning on the mmap > address hint without MAP_FIXED applied. See below my proposed way > around this issue because I belive that the above patch is quite > valuable on its own to be dropped for all archs. I don't really like your solution sorry :) The fact that you've had to patch seven arches seems like a red flag. I think this is a generic problem with MAP_FIXED, which I've heard userspace folks complain about in the past. Currently MAP_FIXED does two things: 1. makes addr not a hint but the required address 2. blasts any existing mapping You want 1) but not 2). So the right solution IMHO would be to add a new mmap flag to request that behaviour, ie. a fixed address but iff there is nothing already mapped there. I don't know the mm code well enough to know if that's hard for some reason, but it *seems* like it should be doable. cheers ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 11:34 ` Michael Ellerman @ 2017-11-13 12:00 ` Michal Hocko 2017-11-13 15:16 ` Michal Hocko 2017-11-14 8:54 ` Michael Ellerman 0 siblings, 2 replies; 21+ messages in thread From: Michal Hocko @ 2017-11-13 12:00 UTC (permalink / raw) To: linux-arm-kernel On Mon 13-11-17 22:34:50, Michael Ellerman wrote: > Hi Michal, > > Michal Hocko <mhocko@kernel.org> writes: > > On Mon 13-11-17 10:20:06, Michal Hocko wrote: > >> [Cc arm and ppc maintainers] > > > > Hmm, it turned out to be a problem on other architectures as well. > > CCing more maintainers. For your reference, we are talking about > > http://lkml.kernel.org/r/20171023082608.6167-1-mhocko at kernel.org > > which has broken architectures which do apply aligning on the mmap > > address hint without MAP_FIXED applied. See below my proposed way > > around this issue because I belive that the above patch is quite > > valuable on its own to be dropped for all archs. > > I don't really like your solution sorry :) The fact that you've had to > patch seven arches seems like a red flag. > > I think this is a generic problem with MAP_FIXED, which I've heard > userspace folks complain about in the past. The thing is that we canno change MAP_FIXED behavior as it is carved in stone > Currently MAP_FIXED does two things: > 1. makes addr not a hint but the required address > 2. blasts any existing mapping > > You want 1) but not 2). + fail if there is a clashing range > So the right solution IMHO would be to add a new mmap flag to request > that behaviour, ie. a fixed address but iff there is nothing already > mapped there. > > I don't know the mm code well enough to know if that's hard for some > reason, but it *seems* like it should be doable. Yes, I have mentioned that in the previous email but the amount of code would be even larger. Basically every arch which reimplements arch_get_unmapped_area would have to special case new MAP_FIXED flag to do vma lookup. So this was the most simple solution I could come up with. If there was a general interest for MAP_FIXED_SAFE then we can introduce it later of course. I would just like the hardening merged sooner rather than later. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 12:00 ` Michal Hocko @ 2017-11-13 15:16 ` Michal Hocko 2017-11-13 15:48 ` Russell King - ARM Linux ` (2 more replies) 2017-11-14 8:54 ` Michael Ellerman 1 sibling, 3 replies; 21+ messages in thread From: Michal Hocko @ 2017-11-13 15:16 UTC (permalink / raw) To: linux-arm-kernel On Mon 13-11-17 13:00:57, Michal Hocko wrote: [...] > Yes, I have mentioned that in the previous email but the amount of code > would be even larger. Basically every arch which reimplements > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > do vma lookup. It turned out that this might be much more easier than I thought after all. It seems we can really handle that in the common code. This would mean that we are exposing a new functionality to the userspace though. Myabe this would be useful on its own though. Just a quick draft (not even compile tested) whether this makes sense in general. I would be worried about unexpected behavior when somebody set other bit without a good reason and we might fail with ENOMEM for such a call now. Elf loader would then use MAP_FIXED_SAFE rather than MAP_FIXED. --- diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 3b26cc62dadb..d021c21f9b01 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -31,6 +31,9 @@ #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ +#define MAP_KEEP_MAPPING 0x2000000 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_SYNC 2 /* synchronous memory sync */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index da3216007fe0..51e3885fbfc1 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -49,6 +49,9 @@ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_KEEP_MAPPING 0x2000000 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + /* * Flags for msync */ diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index cc9ba1d34779..5a4381484fc5 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -25,6 +25,9 @@ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_KEEP_MAPPING 0x2000000 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + #define MS_SYNC 1 /* synchronous memory sync */ #define MS_ASYNC 2 /* sync memory asynchronously */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index b15b278aa314..5df8a81524da 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -62,6 +62,9 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif +#define MAP_KEEP_MAPPING 0x2000000 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + /* * Flags for msync */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 203268f9231e..22442846f5c8 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -25,6 +25,9 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif +#define MAP_KEEP_MAPPING 0x2000000 +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ + /* * Flags for mlock */ diff --git a/mm/mmap.c b/mm/mmap.c index 680506faceae..e53b6b15a8d9 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1365,6 +1365,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (offset_in_page(addr)) return addr; + if ((flags & MAP_FIXED_SAFE) == MAP_FIXED_SAFE) { + struct vm_area_struct *vma = find_vma(mm, addr); + + if (vma && vma->vm_start <= addr) + return -ENOMEM; + } + if (prot == PROT_EXEC) { pkey = execute_only_pkey(mm); if (pkey < 0) -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 15:16 ` Michal Hocko @ 2017-11-13 15:48 ` Russell King - ARM Linux 2017-11-13 15:59 ` Michal Hocko 2017-11-13 15:49 ` Michal Hocko 2017-11-14 9:02 ` Michael Ellerman 2 siblings, 1 reply; 21+ messages in thread From: Russell King - ARM Linux @ 2017-11-13 15:48 UTC (permalink / raw) To: linux-arm-kernel On Mon, Nov 13, 2017 at 04:16:41PM +0100, Michal Hocko wrote: > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > [...] > > Yes, I have mentioned that in the previous email but the amount of code > > would be even larger. Basically every arch which reimplements > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > > do vma lookup. > > It turned out that this might be much more easier than I thought after > all. It seems we can really handle that in the common code. This would > mean that we are exposing a new functionality to the userspace though. > Myabe this would be useful on its own though. Just a quick draft (not > even compile tested) whether this makes sense in general. I would be > worried about unexpected behavior when somebody set other bit without a > good reason and we might fail with ENOMEM for such a call now. > > Elf loader would then use MAP_FIXED_SAFE rather than MAP_FIXED. > --- > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h > index 3b26cc62dadb..d021c21f9b01 100644 > --- a/arch/alpha/include/uapi/asm/mman.h > +++ b/arch/alpha/include/uapi/asm/mman.h > @@ -31,6 +31,9 @@ > #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ > > +#define MAP_KEEP_MAPPING 0x2000000 > +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ A few things... 1. Does this need to be exposed to userland? 2. Can it end up in include/uapi/asm-generic/mman*.h ? 3. The definition of MAP_FIXED_SAFE should really have parens around it. > @@ -1365,6 +1365,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > if (offset_in_page(addr)) > return addr; > > + if ((flags & MAP_FIXED_SAFE) == MAP_FIXED_SAFE) { I'm surprised this doesn't warn - since this effectively expands to: flags & MAP_FIXED | MAP_KEEP_MAPPING hence why MAP_FIXED_SAFE needs parens. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 15:48 ` Russell King - ARM Linux @ 2017-11-13 15:59 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2017-11-13 15:59 UTC (permalink / raw) To: linux-arm-kernel On Mon 13-11-17 15:48:13, Russell King - ARM Linux wrote: > On Mon, Nov 13, 2017 at 04:16:41PM +0100, Michal Hocko wrote: > > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > > [...] > > > Yes, I have mentioned that in the previous email but the amount of code > > > would be even larger. Basically every arch which reimplements > > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > > > do vma lookup. > > > > It turned out that this might be much more easier than I thought after > > all. It seems we can really handle that in the common code. This would > > mean that we are exposing a new functionality to the userspace though. > > Myabe this would be useful on its own though. Just a quick draft (not > > even compile tested) whether this makes sense in general. I would be > > worried about unexpected behavior when somebody set other bit without a > > good reason and we might fail with ENOMEM for such a call now. > > > > Elf loader would then use MAP_FIXED_SAFE rather than MAP_FIXED. > > --- > > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h > > index 3b26cc62dadb..d021c21f9b01 100644 > > --- a/arch/alpha/include/uapi/asm/mman.h > > +++ b/arch/alpha/include/uapi/asm/mman.h > > @@ -31,6 +31,9 @@ > > #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ > > #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ > > > > +#define MAP_KEEP_MAPPING 0x2000000 > > +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ > > A few things... > > 1. Does this need to be exposed to userland? As I've written in another email, exposing the flag this way would be really dangerous wrt. backward compatibility. So we would either need some translation or make it a flag on its own and touch the arch specific code which I really wanted to prevent from. Whether this is something useful for the userspace is a separate question which I should bring up to linux-api for a wider audience to discuss. So I guess this goes down to whether we want/need something like MAP_FIXED_SAFE or opt out the specific hardening code for arches that cannot make unaligned mappings for the requested address. > 2. Can it end up in include/uapi/asm-generic/mman*.h ? > 3. The definition of MAP_FIXED_SAFE should really have parens around it. Of course. I thought I did... > > @@ -1365,6 +1365,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > > if (offset_in_page(addr)) > > return addr; > > > > + if ((flags & MAP_FIXED_SAFE) == MAP_FIXED_SAFE) { > > I'm surprised this doesn't warn - since this effectively expands to: > > flags & MAP_FIXED | MAP_KEEP_MAPPING > > hence why MAP_FIXED_SAFE needs parens. It sure does. Thanks! -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 15:16 ` Michal Hocko 2017-11-13 15:48 ` Russell King - ARM Linux @ 2017-11-13 15:49 ` Michal Hocko 2017-11-13 16:06 ` Michal Hocko 2017-11-14 9:02 ` Michael Ellerman 2 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2017-11-13 15:49 UTC (permalink / raw) To: linux-arm-kernel On Mon 13-11-17 16:16:41, Michal Hocko wrote: > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > [...] > > Yes, I have mentioned that in the previous email but the amount of code > > would be even larger. Basically every arch which reimplements > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > > do vma lookup. > > It turned out that this might be much more easier than I thought after > all. It seems we can really handle that in the common code. This would > mean that we are exposing a new functionality to the userspace though. > Myabe this would be useful on its own though. Just a quick draft (not > even compile tested) whether this makes sense in general. I would be > worried about unexpected behavior when somebody set other bit without a > good reason and we might fail with ENOMEM for such a call now. Hmm, the bigger problem would be the backward compatibility actually. We would get silent corruptions which is exactly what the flag is trying fix. mmap flags handling really sucks. So I guess we would have to make the flag internal only :/ -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 15:49 ` Michal Hocko @ 2017-11-13 16:06 ` Michal Hocko 2017-11-13 16:35 ` Khalid Aziz 2017-11-14 9:18 ` Michael Ellerman 0 siblings, 2 replies; 21+ messages in thread From: Michal Hocko @ 2017-11-13 16:06 UTC (permalink / raw) To: linux-arm-kernel [Sorry for spamming, this one is the last attempt hopefully] On Mon 13-11-17 16:49:39, Michal Hocko wrote: > On Mon 13-11-17 16:16:41, Michal Hocko wrote: > > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > > [...] > > > Yes, I have mentioned that in the previous email but the amount of code > > > would be even larger. Basically every arch which reimplements > > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > > > do vma lookup. > > > > It turned out that this might be much more easier than I thought after > > all. It seems we can really handle that in the common code. This would > > mean that we are exposing a new functionality to the userspace though. > > Myabe this would be useful on its own though. Just a quick draft (not > > even compile tested) whether this makes sense in general. I would be > > worried about unexpected behavior when somebody set other bit without a > > good reason and we might fail with ENOMEM for such a call now. > > Hmm, the bigger problem would be the backward compatibility actually. We > would get silent corruptions which is exactly what the flag is trying > fix. mmap flags handling really sucks. So I guess we would have to make > the flag internal only :/ OK, so this one should take care of the backward compatibility while still not touching the arch code --- commit 39ff9bf8597e79a032da0954aea1f0d77d137765 Author: Michal Hocko <mhocko@suse.com> Date: Mon Nov 13 17:06:24 2017 +0100 mm: introduce MAP_FIXED_SAFE MAP_FIXED is used quite often but it is inherently dangerous because it unmaps an existing mapping covered by the requested range. While this might be might be really desidered behavior in many cases there are others which would rather see a failure than a silent memory corruption. Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior. It is a MAP_FIXED extension with a single exception that it fails with ENOMEM if the requested address is already covered by an existing mapping. We still do rely on get_unmaped_area to handle all the arch specific MAP_FIXED treatment and check for a conflicting vma after it returns. Signed-off-by: Michal Hocko <mhocko@suse.com> diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 3b26cc62dadb..767bcb8a4c28 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -31,6 +31,8 @@ #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x2000000 /* MAP_FIXED which doesn't unmap underlying mapping */ + #define MS_ASYNC 1 /* sync memory asynchronously */ #define MS_SYNC 2 /* synchronous memory sync */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index da3216007fe0..c2311eb7219b 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -49,6 +49,8 @@ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x2000000 /* MAP_FIXED which doesn't unmap underlying mapping */ + /* * Flags for msync */ diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index cc9ba1d34779..b06fd830bc6f 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -25,6 +25,8 @@ #define MAP_STACK 0x40000 /* give out an address that is best suited for process/thread stacks */ #define MAP_HUGETLB 0x80000 /* create a huge page mapping */ +#define MAP_FIXED_SAFE 0x2000000 /* MAP_FIXED which doesn't unmap underlying mapping */ + #define MS_SYNC 1 /* synchronous memory sync */ #define MS_ASYNC 2 /* sync memory asynchronously */ #define MS_INVALIDATE 4 /* invalidate the caches */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index b15b278aa314..f4b291bca764 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -62,6 +62,8 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif +#define MAP_FIXED_SAFE 0x2000000 /* MAP_FIXED which doesn't unmap underlying mapping */ + /* * Flags for msync */ diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 203268f9231e..03c518777f83 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -25,6 +25,8 @@ # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ #endif +#define MAP_FIXED_SAFE 0x2000000 /* MAP_FIXED which doesn't unmap underlying mapping */ + /* * Flags for mlock */ diff --git a/mm/mmap.c b/mm/mmap.c index 680506faceae..aad8d37f0205 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1358,6 +1358,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (mm->map_count > sysctl_max_map_count) return -ENOMEM; + /* force arch specific MAP_FIXED handling in get_unmapped_area */ + if (flags & MAP_FIXED_SAFE) + flags |= MAP_FIXED; + /* Obtain the address to map to. we verify (or select) it and ensure * that it represents a valid section of the address space. */ @@ -1365,6 +1369,13 @@ unsigned long do_mmap(struct file *file, unsigned long addr, if (offset_in_page(addr)) return addr; + if (flags & MAP_FIXED_SAFE) { + struct vm_area_struct *vma = find_vma(mm, addr); + + if (vma && vma->vm_start <= addr) + return -ENOMEM; + } + if (prot == PROT_EXEC) { pkey = execute_only_pkey(mm); if (pkey < 0) -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 16:06 ` Michal Hocko @ 2017-11-13 16:35 ` Khalid Aziz 2017-11-14 7:07 ` Michal Hocko 2017-11-14 9:18 ` Michael Ellerman 1 sibling, 1 reply; 21+ messages in thread From: Khalid Aziz @ 2017-11-13 16:35 UTC (permalink / raw) To: linux-arm-kernel On 11/13/2017 09:06 AM, Michal Hocko wrote: > OK, so this one should take care of the backward compatibility while > still not touching the arch code > --- > commit 39ff9bf8597e79a032da0954aea1f0d77d137765 > Author: Michal Hocko <mhocko@suse.com> > Date: Mon Nov 13 17:06:24 2017 +0100 > > mm: introduce MAP_FIXED_SAFE > > MAP_FIXED is used quite often but it is inherently dangerous because it > unmaps an existing mapping covered by the requested range. While this > might be might be really desidered behavior in many cases there are > others which would rather see a failure than a silent memory corruption. > Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior. > It is a MAP_FIXED extension with a single exception that it fails with > ENOMEM if the requested address is already covered by an existing > mapping. We still do rely on get_unmaped_area to handle all the arch > specific MAP_FIXED treatment and check for a conflicting vma after it > returns. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > ...... deleted ....... > diff --git a/mm/mmap.c b/mm/mmap.c > index 680506faceae..aad8d37f0205 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1358,6 +1358,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > if (mm->map_count > sysctl_max_map_count) > return -ENOMEM; > > + /* force arch specific MAP_FIXED handling in get_unmapped_area */ > + if (flags & MAP_FIXED_SAFE) > + flags |= MAP_FIXED; > + > /* Obtain the address to map to. we verify (or select) it and ensure > * that it represents a valid section of the address space. > */ Do you need to move this code above: if (!(flags & MAP_FIXED)) addr = round_hint_to_min(addr); /* Careful about overflows.. */ len = PAGE_ALIGN(len); if (!len) return -ENOMEM; Not doing that might mean the hint address will end up being rounded for MAP_FIXED_SAFE which would change the behavior from MAP_FIXED. -- Khalid ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 16:35 ` Khalid Aziz @ 2017-11-14 7:07 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2017-11-14 7:07 UTC (permalink / raw) To: linux-arm-kernel On Mon 13-11-17 09:35:22, Khalid Aziz wrote: > On 11/13/2017 09:06 AM, Michal Hocko wrote: > > OK, so this one should take care of the backward compatibility while > > still not touching the arch code > > --- > > commit 39ff9bf8597e79a032da0954aea1f0d77d137765 > > Author: Michal Hocko <mhocko@suse.com> > > Date: Mon Nov 13 17:06:24 2017 +0100 > > > > mm: introduce MAP_FIXED_SAFE > > MAP_FIXED is used quite often but it is inherently dangerous because it > > unmaps an existing mapping covered by the requested range. While this > > might be might be really desidered behavior in many cases there are > > others which would rather see a failure than a silent memory corruption. > > Introduce a new MAP_FIXED_SAFE flag for mmap to achive this behavior. > > It is a MAP_FIXED extension with a single exception that it fails with > > ENOMEM if the requested address is already covered by an existing > > mapping. We still do rely on get_unmaped_area to handle all the arch > > specific MAP_FIXED treatment and check for a conflicting vma after it > > returns. > > Signed-off-by: Michal Hocko <mhocko@suse.com> > > > > ...... deleted ....... > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 680506faceae..aad8d37f0205 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1358,6 +1358,10 @@ unsigned long do_mmap(struct file *file, unsigned long addr, > > if (mm->map_count > sysctl_max_map_count) > > return -ENOMEM; > > + /* force arch specific MAP_FIXED handling in get_unmapped_area */ > > + if (flags & MAP_FIXED_SAFE) > > + flags |= MAP_FIXED; > > + > > /* Obtain the address to map to. we verify (or select) it and ensure > > * that it represents a valid section of the address space. > > */ > > Do you need to move this code above: > > if (!(flags & MAP_FIXED)) > addr = round_hint_to_min(addr); > > /* Careful about overflows.. */ > len = PAGE_ALIGN(len); > if (!len) > return -ENOMEM; > > Not doing that might mean the hint address will end up being rounded for > MAP_FIXED_SAFE which would change the behavior from MAP_FIXED. Yes, I will move it. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 16:06 ` Michal Hocko 2017-11-13 16:35 ` Khalid Aziz @ 2017-11-14 9:18 ` Michael Ellerman 2017-11-14 9:29 ` Michal Hocko 1 sibling, 1 reply; 21+ messages in thread From: Michael Ellerman @ 2017-11-14 9:18 UTC (permalink / raw) To: linux-arm-kernel Michal Hocko <mhocko@kernel.org> writes: > [Sorry for spamming, this one is the last attempt hopefully] > > On Mon 13-11-17 16:49:39, Michal Hocko wrote: >> On Mon 13-11-17 16:16:41, Michal Hocko wrote: >> > On Mon 13-11-17 13:00:57, Michal Hocko wrote: >> > [...] >> > > Yes, I have mentioned that in the previous email but the amount of code >> > > would be even larger. Basically every arch which reimplements >> > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to >> > > do vma lookup. >> > >> > It turned out that this might be much more easier than I thought after >> > all. It seems we can really handle that in the common code. This would >> > mean that we are exposing a new functionality to the userspace though. >> > Myabe this would be useful on its own though. Just a quick draft (not >> > even compile tested) whether this makes sense in general. I would be >> > worried about unexpected behavior when somebody set other bit without a >> > good reason and we might fail with ENOMEM for such a call now. >> >> Hmm, the bigger problem would be the backward compatibility actually. We >> would get silent corruptions which is exactly what the flag is trying >> fix. mmap flags handling really sucks. So I guess we would have to make >> the flag internal only :/ > > OK, so this one should take care of the backward compatibility while > still not touching the arch code I'm not sure I understand your worries about backward compatibility? If we add a new mmap flag which is currently unused then what is the problem? Are you worried about user code that accidentally passes that flag already? > diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h > index 203268f9231e..03c518777f83 100644 > --- a/include/uapi/asm-generic/mman-common.h > +++ b/include/uapi/asm-generic/mman-common.h > @@ -25,6 +25,8 @@ > # define MAP_UNINITIALIZED 0x0 /* Don't support this flag */ > #endif > > +#define MAP_FIXED_SAFE 0x2000000 /* MAP_FIXED which doesn't unmap underlying mapping */ > + As I said in my other mail I think this should be a modifier to MAP_FIXED. That way all the existing code that checks for MAP_FIXED (in the kernel) works exactly as it currently does - like the check Khalid pointed out. And I think MAP_NO_CLOBBER would be a better name. cheers ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-14 9:18 ` Michael Ellerman @ 2017-11-14 9:29 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2017-11-14 9:29 UTC (permalink / raw) To: linux-arm-kernel On Tue 14-11-17 20:18:04, Michael Ellerman wrote: > Michal Hocko <mhocko@kernel.org> writes: > > > [Sorry for spamming, this one is the last attempt hopefully] > > > > On Mon 13-11-17 16:49:39, Michal Hocko wrote: > >> On Mon 13-11-17 16:16:41, Michal Hocko wrote: > >> > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > >> > [...] > >> > > Yes, I have mentioned that in the previous email but the amount of code > >> > > would be even larger. Basically every arch which reimplements > >> > > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > >> > > do vma lookup. > >> > > >> > It turned out that this might be much more easier than I thought after > >> > all. It seems we can really handle that in the common code. This would > >> > mean that we are exposing a new functionality to the userspace though. > >> > Myabe this would be useful on its own though. Just a quick draft (not > >> > even compile tested) whether this makes sense in general. I would be > >> > worried about unexpected behavior when somebody set other bit without a > >> > good reason and we might fail with ENOMEM for such a call now. > >> > >> Hmm, the bigger problem would be the backward compatibility actually. We > >> would get silent corruptions which is exactly what the flag is trying > >> fix. mmap flags handling really sucks. So I guess we would have to make > >> the flag internal only :/ > > > > OK, so this one should take care of the backward compatibility while > > still not touching the arch code > > I'm not sure I understand your worries about backward compatibility? Just imagine you are running an application which uses the new flag combination on an older kernel. You will get no warning, yet you have no way to check that you have actually clobbered an existing mapping because MAP_FIXED will be used the old way. > If we add a new mmap flag which is currently unused then what is the > problem? Are you worried about user code that accidentally passes that > flag already? If we add a completely new flag, like in this patch, then the code using the flag will not clobber an existing mapping on older kernels which do not recognize it (we will simply fall back to the default hint based implementation). You might not get the mapping you asked for which sucks but that is not fixable AFAICS. You can at least do mapped_addr = mmap(addr, ... MAP_FIXED_SAFE...); assert(mapped_addr == addr); So I do not think we can go with the modifier unfortunatelly. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 15:16 ` Michal Hocko 2017-11-13 15:48 ` Russell King - ARM Linux 2017-11-13 15:49 ` Michal Hocko @ 2017-11-14 9:02 ` Michael Ellerman 2 siblings, 0 replies; 21+ messages in thread From: Michael Ellerman @ 2017-11-14 9:02 UTC (permalink / raw) To: linux-arm-kernel Michal Hocko <mhocko@kernel.org> writes: > On Mon 13-11-17 13:00:57, Michal Hocko wrote: > [...] >> Yes, I have mentioned that in the previous email but the amount of code >> would be even larger. Basically every arch which reimplements >> arch_get_unmapped_area would have to special case new MAP_FIXED flag to >> do vma lookup. > > It turned out that this might be much more easier than I thought after > all. It seems we can really handle that in the common code. Ah nice. I should have read this before replying to your previous mail. > This would mean that we are exposing a new functionality to the userspace though. > Myabe this would be useful on its own though. Yes I think it would. At least jemalloc seems like it could use it: https://github.com/jemalloc/jemalloc/blob/9f455e2786685b443201c33119765c8093461174/src/pages.c#L65 And I have memories of some JIT code I read once which did a loop of mmap()s or something to try and get allocations below 4GB or some other limit - but I can't remember now what it was. > Just a quick draft (not > even compile tested) whether this makes sense in general. I would be > worried about unexpected behavior when somebody set other bit without a > good reason and we might fail with ENOMEM for such a call now. > > Elf loader would then use MAP_FIXED_SAFE rather than MAP_FIXED. > --- > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h > index 3b26cc62dadb..d021c21f9b01 100644 > --- a/arch/alpha/include/uapi/asm/mman.h > +++ b/arch/alpha/include/uapi/asm/mman.h > @@ -31,6 +31,9 @@ > #define MAP_STACK 0x80000 /* give out an address that is best suited for process/thread stacks */ > #define MAP_HUGETLB 0x100000 /* create a huge page mapping */ > > +#define MAP_KEEP_MAPPING 0x2000000 > +#define MAP_FIXED_SAFE MAP_FIXED|MAP_KEEP_MAPPING /* enforce MAP_FIXED without clobbering an existing mapping */ So bike-shedding a bit, but I think "SAFE" is too vague a name. Perhaps MAP_NO_CLOBBER - which has the single semantic of "do not clobber any existing mappings". It would be a flag on its own, so you could pass it with or without MAP_FIXED, but it would only change the behaviour when MAP_FIXED is specified also. cheers ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 12:00 ` Michal Hocko 2017-11-13 15:16 ` Michal Hocko @ 2017-11-14 8:54 ` Michael Ellerman 2017-11-14 9:04 ` Michal Hocko 1 sibling, 1 reply; 21+ messages in thread From: Michael Ellerman @ 2017-11-14 8:54 UTC (permalink / raw) To: linux-arm-kernel Michal Hocko <mhocko@kernel.org> writes: > On Mon 13-11-17 22:34:50, Michael Ellerman wrote: >> Hi Michal, >> >> Michal Hocko <mhocko@kernel.org> writes: >> > On Mon 13-11-17 10:20:06, Michal Hocko wrote: >> >> [Cc arm and ppc maintainers] >> > >> > Hmm, it turned out to be a problem on other architectures as well. >> > CCing more maintainers. For your reference, we are talking about >> > http://lkml.kernel.org/r/20171023082608.6167-1-mhocko at kernel.org >> > which has broken architectures which do apply aligning on the mmap >> > address hint without MAP_FIXED applied. See below my proposed way >> > around this issue because I belive that the above patch is quite >> > valuable on its own to be dropped for all archs. >> >> I don't really like your solution sorry :) The fact that you've had to >> patch seven arches seems like a red flag. >> >> I think this is a generic problem with MAP_FIXED, which I've heard >> userspace folks complain about in the past. > > The thing is that we canno change MAP_FIXED behavior as it is carved in > stone Yes obviously. I didn't mean to imply we would change MAP_FIXED, rather we would add a new flag with the new semantics. >> Currently MAP_FIXED does two things: >> 1. makes addr not a hint but the required address >> 2. blasts any existing mapping >> >> You want 1) but not 2). > > + fail if there is a clashing range Yep. I thought that was implied :) >> So the right solution IMHO would be to add a new mmap flag to request >> that behaviour, ie. a fixed address but iff there is nothing already >> mapped there. >> >> I don't know the mm code well enough to know if that's hard for some >> reason, but it *seems* like it should be doable. > > Yes, I have mentioned that in the previous email but the amount of code > would be even larger. Basically every arch which reimplements > arch_get_unmapped_area would have to special case new MAP_FIXED flag to > do vma lookup. I'd have to look, but my memory of the arch code is that it doesn't deal with the vma so it wouldn't need any change. > So this was the most simple solution I could come up > with. If there was a general interest for MAP_FIXED_SAFE then we can > introduce it later of course. I would just like the hardening merged > sooner rather than later. Sure. But in the scheme of things one more kernel release is not that big a deal to get it right. Given that the simple approach of dropping MAP_FIXED turns out to not be simple at all. cheers ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-14 8:54 ` Michael Ellerman @ 2017-11-14 9:04 ` Michal Hocko 2017-11-14 14:52 ` Khalid Aziz 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2017-11-14 9:04 UTC (permalink / raw) To: linux-arm-kernel On Tue 14-11-17 19:54:59, Michael Ellerman wrote: > Michal Hocko <mhocko@kernel.org> writes: [...] > > So this was the most simple solution I could come up > > with. If there was a general interest for MAP_FIXED_SAFE then we can > > introduce it later of course. I would just like the hardening merged > > sooner rather than later. > > Sure. But in the scheme of things one more kernel release is not that > big a deal to get it right. Given that the simple approach of dropping > MAP_FIXED turns out to not be simple at all. Well, my idea was to push this hardening to older kernels because those were more vulnerable for the PIE base vs. stack placement and stack controllable size from userspace etc... Anyway, as per [1] it seems that the MAP_FIXED_SAFE doesn't look terrible from the backporting POV. If there is a general consensus that this is the preferred way to go, I will post the patch as an RFC to linux-api [1] http://lkml.kernel.org/r/20171113160637.jhekbdyfpccme3be at dhcp22.suse.cz -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-14 9:04 ` Michal Hocko @ 2017-11-14 14:52 ` Khalid Aziz 0 siblings, 0 replies; 21+ messages in thread From: Khalid Aziz @ 2017-11-14 14:52 UTC (permalink / raw) To: linux-arm-kernel On Tue, 2017-11-14 at 10:04 +0100, Michal Hocko wrote: > If there is a general consensus that this is the preferred way to go, > I > will post the patch as an RFC to linux-api > > [1] http://lkml.kernel.org/r/20171113160637.jhekbdyfpccme3be at dhcp22.s > use.cz I prefer the new flag. It is cleaner and avoids unintended breakage for existing flag. -- Khalid ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 9:20 ` linux-next: Tree for Nov 7 Michal Hocko 2017-11-13 9:34 ` Russell King - ARM Linux 2017-11-13 9:42 ` Michal Hocko @ 2017-11-13 14:11 ` Michal Hocko 2017-11-13 15:09 ` Russell King - ARM Linux 2 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2017-11-13 14:11 UTC (permalink / raw) To: linux-arm-kernel On Mon 13-11-17 10:20:06, Michal Hocko wrote: > [Cc arm and ppc maintainers] > > Thanks a lot for testing! > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > Hi Joel, > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > [...] > > >> > There are a lot of messages on the way up that look like this: > > >> > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > >> > memory is mapped already > > >> > > > >> > And then trying to run userspace looks like this: > > >> > > >> Could you please run with debugging patch posted > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57 at dhcp22.suse.cz > > > > > > Did you have chance to test with this debugging patch, please? > > > > Lots of this: > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > This smells like the problem I've expected that mmap with hint doesn't > respect the hint even though there is no clashing mapping. The above > basically says that we didn't map at 0xd9000 but it has placed it at > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > mapping. find_vma returns the closest vma (with addr < vm_end) for the > given address 0xd9000 so this address cannot be mapped by any other vma. > > Now that I am looking at arm's arch_get_unmapped_area it does perform > aligning for shared vmas. Sorry for confusion here. These are not shared mappings as pointed out by Russell in a private email. I got confused by the above flags which I have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags obviously so the bit 0 is VM_READ. Sorry about the confusion. The real reason we are doing the alignment is that we do a file mapping /* * We only need to do colour alignment if either the I or D * caches alias. */ if (aliasing) do_align = filp || (flags & MAP_SHARED); I am not really familiar with this architecture to understand why do we need aliasing for file mappings, though. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 14:11 ` Michal Hocko @ 2017-11-13 15:09 ` Russell King - ARM Linux 2017-11-13 15:31 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: Russell King - ARM Linux @ 2017-11-13 15:09 UTC (permalink / raw) To: linux-arm-kernel On Mon, Nov 13, 2017 at 03:11:40PM +0100, Michal Hocko wrote: > On Mon 13-11-17 10:20:06, Michal Hocko wrote: > > [Cc arm and ppc maintainers] > > > > Thanks a lot for testing! > > > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > > Hi Joel, > > > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > > [...] > > > >> > There are a lot of messages on the way up that look like this: > > > >> > > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > > >> > memory is mapped already > > > >> > > > > >> > And then trying to run userspace looks like this: > > > >> > > > >> Could you please run with debugging patch posted > > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57 at dhcp22.suse.cz > > > > > > > > Did you have chance to test with this debugging patch, please? > > > > > > Lots of this: > > > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > > > This smells like the problem I've expected that mmap with hint doesn't > > respect the hint even though there is no clashing mapping. The above > > basically says that we didn't map at 0xd9000 but it has placed it at > > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > > mapping. find_vma returns the closest vma (with addr < vm_end) for the > > given address 0xd9000 so this address cannot be mapped by any other vma. > > > > Now that I am looking at arm's arch_get_unmapped_area it does perform > > aligning for shared vmas. > > Sorry for confusion here. These are not shared mappings as pointed out > by Russell in a private email. I got confused by the above flags which I > have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags > obviously so the bit 0 is VM_READ. Sorry about the confusion. The real > reason we are doing the alignment is that we do a file mapping > /* > * We only need to do colour alignment if either the I or D > * caches alias. > */ > if (aliasing) > do_align = filp || (flags & MAP_SHARED); > > I am not really familiar with this architecture to understand why do we > need aliasing for file mappings, though. I think it's there so that flush_dcache_page() works - possibly get_user_pages() being used on a private mapping of page cache pages, but that's guessing. I'm afraid I don't remember all the details, this is code from around 15 years ago, and I'd be very nervous about changing it now without fully understanding the issues. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up ^ permalink raw reply [flat|nested] 21+ messages in thread
* linux-next: Tree for Nov 7 2017-11-13 15:09 ` Russell King - ARM Linux @ 2017-11-13 15:31 ` Michal Hocko 0 siblings, 0 replies; 21+ messages in thread From: Michal Hocko @ 2017-11-13 15:31 UTC (permalink / raw) To: linux-arm-kernel On Mon 13-11-17 15:09:09, Russell King - ARM Linux wrote: > On Mon, Nov 13, 2017 at 03:11:40PM +0100, Michal Hocko wrote: > > On Mon 13-11-17 10:20:06, Michal Hocko wrote: > > > [Cc arm and ppc maintainers] > > > > > > Thanks a lot for testing! > > > > > > On Sun 12-11-17 11:38:02, Joel Stanley wrote: > > > > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko <mhocko@kernel.org> wrote: > > > > > Hi Joel, > > > > > > > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > > > > [...] > > > > >> > There are a lot of messages on the way up that look like this: > > > > >> > > > > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > > > > >> > memory is mapped already > > > > >> > > > > > >> > And then trying to run userspace looks like this: > > > > >> > > > > >> Could you please run with debugging patch posted > > > > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57 at dhcp22.suse.cz > > > > > > > > > > Did you have chance to test with this debugging patch, please? > > > > > > > > Lots of this: > > > > > > > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > > > > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) > > > > > > This smells like the problem I've expected that mmap with hint doesn't > > > respect the hint even though there is no clashing mapping. The above > > > basically says that we didn't map at 0xd9000 but it has placed it at > > > 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new > > > mapping. find_vma returns the closest vma (with addr < vm_end) for the > > > given address 0xd9000 so this address cannot be mapped by any other vma. > > > > > > Now that I am looking at arm's arch_get_unmapped_area it does perform > > > aligning for shared vmas. > > > > Sorry for confusion here. These are not shared mappings as pointed out > > by Russell in a private email. I got confused by the above flags which I > > have misinterpreted as bit 0 set => MAP_SHARED. These are vm_flags > > obviously so the bit 0 is VM_READ. Sorry about the confusion. The real > > reason we are doing the alignment is that we do a file mapping > > /* > > * We only need to do colour alignment if either the I or D > > * caches alias. > > */ > > if (aliasing) > > do_align = filp || (flags & MAP_SHARED); > > > > I am not really familiar with this architecture to understand why do we > > need aliasing for file mappings, though. > > I think it's there so that flush_dcache_page() works - possibly > get_user_pages() being used on a private mapping of page cache pages, > but that's guessing. I fail to see how the mixure of MAP_FIXED and regular mapping of the same file work then, but as I've said I really do not understand this code. > I'm afraid I don't remember all the details, this is code from around > 15 years ago, and I'd be very nervous about changing it now without > fully understanding the issues. Ohh, absolutely! I didn't dare to touch this code and that's why I took the easy way and simply opt-out from the harding for all those archs that are basically sharing this pattern. But after a closer look it seems that we can really introduce MAP_FIXED_SAFE that would keep the arch mmap code intact yet we would get the hardening for all archs. It would allow also allow a safer MAP_FIXED semantic for userspace. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2017-11-14 14:52 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <20171107162217.382cd754@canb.auug.org.au> [not found] ` <CACPK8Xfd4nqkf=Lk3n6+TNHAAi327r0dkUfGypZ3TpR0LqfS4Q@mail.gmail.com> [not found] ` <20171108142050.7w3yliulxjeco3b7@dhcp22.suse.cz> [not found] ` <20171110123054.5pnefm3mczsfv7bz@dhcp22.suse.cz> [not found] ` <CACPK8Xe5uUKEytkRiszdX511b_cYTD-z3X=ZsMcNJ-NOYnXfuQ@mail.gmail.com> 2017-11-13 9:20 ` linux-next: Tree for Nov 7 Michal Hocko 2017-11-13 9:34 ` Russell King - ARM Linux 2017-11-13 9:42 ` Michal Hocko 2017-11-13 11:34 ` Michael Ellerman 2017-11-13 12:00 ` Michal Hocko 2017-11-13 15:16 ` Michal Hocko 2017-11-13 15:48 ` Russell King - ARM Linux 2017-11-13 15:59 ` Michal Hocko 2017-11-13 15:49 ` Michal Hocko 2017-11-13 16:06 ` Michal Hocko 2017-11-13 16:35 ` Khalid Aziz 2017-11-14 7:07 ` Michal Hocko 2017-11-14 9:18 ` Michael Ellerman 2017-11-14 9:29 ` Michal Hocko 2017-11-14 9:02 ` Michael Ellerman 2017-11-14 8:54 ` Michael Ellerman 2017-11-14 9:04 ` Michal Hocko 2017-11-14 14:52 ` Khalid Aziz 2017-11-13 14:11 ` Michal Hocko 2017-11-13 15:09 ` Russell King - ARM Linux 2017-11-13 15:31 ` Michal Hocko
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).