From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751401AbdKMJUO (ORCPT ); Mon, 13 Nov 2017 04:20:14 -0500 Received: from mx2.suse.de ([195.135.220.15]:34120 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751128AbdKMJUK (ORCPT ); Mon, 13 Nov 2017 04:20:10 -0500 Date: Mon, 13 Nov 2017 10:20:06 +0100 From: Michal Hocko To: Joel Stanley Cc: Stephen Rothwell , Andrew Morton , Linux-Next Mailing List , Linux Kernel Mailing List , Russell King , linux-arm-kernel@lists.infradead.org, Benjamin Herrenschmidt , Michael Ellerman , Abdul Haleem , linuxppc-dev@lists.ozlabs.org Subject: Re: linux-next: Tree for Nov 7 Message-ID: <20171113092006.cjw2njjukt6limvb@dhcp22.suse.cz> References: <20171107162217.382cd754@canb.auug.org.au> <20171108142050.7w3yliulxjeco3b7@dhcp22.suse.cz> <20171110123054.5pnefm3mczsfv7bz@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [Cc arm and ppc maintainers] Thanks a lot for testing! On Sun 12-11-17 11:38:02, Joel Stanley wrote: > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko wrote: > > Hi Joel, > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > [...] > >> > There are a lot of messages on the way up that look like this: > >> > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > > >> > And then trying to run userspace looks like this: > >> > >> Could you please run with debugging patch posted > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57@dhcp22.suse.cz > > > > Did you have chance to test with this debugging patch, please? > > Lots of this: > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) This smells like the problem I've expected that mmap with hint doesn't respect the hint even though there is no clashing mapping. The above basically says that we didn't map at 0xd9000 but it has placed it at 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new mapping. find_vma returns the closest vma (with addr < vm_end) for the given address 0xd9000 so this address cannot be mapped by any other vma. Now that I am looking at arm's arch_get_unmapped_area it does perform aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, reported earlier [1] seems to suffer from the similar problem. slice_get_unmapped_area alignes to slices, whatever that means. I can see two possible ways around that. Either we explicitly request non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or simply opt out from the MAP_FIXED protection via ifdefs. The first option sounds more generic to me but also more tricky to not introduce other user visible effects. The later is quite straightforward. What do you think about the following on top of the previous patch? It is rather terse and disables the MAP_FIXED protection for arm comletely because I couldn't find a way to make it conditional on CACHEID_VIPT_ALIASING. But this can be always handled later. I find the protection for other archtectures useful enough to have this working for most architectures now and handle others specially. [1] http://lkml.kernel.org/r/1510048229.12079.7.camel@abdul.in.ibm.com --- diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 61a0cb15067e..018d041a30e6 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -99,6 +99,7 @@ config ARM select PERF_USE_VMALLOC select RTC_LIB select SYS_SUPPORTS_APM_EMULATION + select ARCH_ALIGNED_MMAPS # Above selects are sorted alphabetically; please add new ones # according to that. Thanks. help diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 2f629e0551e9..156f69c09c7f 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -368,6 +368,7 @@ config PPC_MM_SLICES bool default y if PPC_STD_MMU_64 default n + select ARCH_ALIGNED_MMAPS config PPC_HAVE_PMU_SUPPORT bool diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index a22718de42db..d23eb89f31c0 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -345,13 +345,19 @@ static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr, unsigned long size, int prot, int type, unsigned long off) { unsigned long map_addr; + unsigned long map_type = type; /* * If caller requests the mapping at a specific place, make sure we fail * rather than potentially clobber an existing mapping which can have - * security consequences (e.g. smash over the stack area). + * security consequences (e.g. smash over the stack area). Be careful + * about architectures which do not respect the address hint due to + * aligning restrictions for !fixed mappings. */ - map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off); + if (!IS_ENABLED(ARCH_ALIGNED_MMAPS)) + map_type &= ~MAP_FIXED; + + map_addr = vm_mmap(filep, addr, size, prot, map_type, off); if (BAD_ADDR(map_addr)) return map_addr; -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 From: mhocko@kernel.org (Michal Hocko) Date: Mon, 13 Nov 2017 10:20:06 +0100 Subject: linux-next: Tree for Nov 7 In-Reply-To: References: <20171107162217.382cd754@canb.auug.org.au> <20171108142050.7w3yliulxjeco3b7@dhcp22.suse.cz> <20171110123054.5pnefm3mczsfv7bz@dhcp22.suse.cz> Message-ID: <20171113092006.cjw2njjukt6limvb@dhcp22.suse.cz> To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org [Cc arm and ppc maintainers] Thanks a lot for testing! On Sun 12-11-17 11:38:02, Joel Stanley wrote: > On Fri, Nov 10, 2017 at 11:00 PM, Michal Hocko wrote: > > Hi Joel, > > > > On Wed 08-11-17 15:20:50, Michal Hocko wrote: > > [...] > >> > There are a lot of messages on the way up that look like this: > >> > > >> > [ 2.527460] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > [ 2.540160] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > [ 2.546153] Uhuuh, elf segement at 000d9000 requested but the > >> > memory is mapped already > >> > > >> > And then trying to run userspace looks like this: > >> > >> Could you please run with debugging patch posted > >> http://lkml.kernel.org/r/20171107102854.vylrtaodla63kc57 at dhcp22.suse.cz > > > > Did you have chance to test with this debugging patch, please? > > Lots of this: > > [ 1.177266] Uhuuh, elf segement at 000d9000 requested but the memory is mapped already, got 000dd000 > [ 1.177555] Clashing vma [dd000, de000] flags:100873 name:(null) This smells like the problem I've expected that mmap with hint doesn't respect the hint even though there is no clashing mapping. The above basically says that we didn't map at 0xd9000 but it has placed it at 0xdd000. The nearest (clashing) vma is at 0xdd000 so this is our new mapping. find_vma returns the closest vma (with addr < vm_end) for the given address 0xd9000 so this address cannot be mapped by any other vma. Now that I am looking at arm's arch_get_unmapped_area it does perform aligning for shared vmas. We do not do that for MAP_FIXED. Powepc, reported earlier [1] seems to suffer from the similar problem. slice_get_unmapped_area alignes to slices, whatever that means. I can see two possible ways around that. Either we explicitly request non-aligned mappings via a special MAP_$FOO (e.g. MAP_FIXED_SAFE) or simply opt out from the MAP_FIXED protection via ifdefs. The first option sounds more generic to me but also more tricky to not introduce other user visible effects. The later is quite straightforward. What do you think about the following on top of the previous patch? It is rather terse and disables the MAP_FIXED protection for arm comletely because I couldn't find a way to make it conditional on CACHEID_VIPT_ALIASING. But this can be always handled later. I find the protection for other archtectures useful enough to have this working for most architectures now and handle others specially. [1] http://lkml.kernel.org/r/1510048229.12079.7.camel at abdul.in.ibm.com --- diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 61a0cb15067e..018d041a30e6 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -99,6 +99,7 @@ config ARM select PERF_USE_VMALLOC select RTC_LIB select SYS_SUPPORTS_APM_EMULATION + select ARCH_ALIGNED_MMAPS # Above selects are sorted alphabetically; please add new ones # according to that. Thanks. help diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index 2f629e0551e9..156f69c09c7f 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -368,6 +368,7 @@ config PPC_MM_SLICES bool default y if PPC_STD_MMU_64 default n + select ARCH_ALIGNED_MMAPS config PPC_HAVE_PMU_SUPPORT bool diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index a22718de42db..d23eb89f31c0 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -345,13 +345,19 @@ static unsigned long elf_vm_mmap(struct file *filep, unsigned long addr, unsigned long size, int prot, int type, unsigned long off) { unsigned long map_addr; + unsigned long map_type = type; /* * If caller requests the mapping@a specific place, make sure we fail * rather than potentially clobber an existing mapping which can have - * security consequences (e.g. smash over the stack area). + * security consequences (e.g. smash over the stack area). Be careful + * about architectures which do not respect the address hint due to + * aligning restrictions for !fixed mappings. */ - map_addr = vm_mmap(filep, addr, size, prot, type & ~MAP_FIXED, off); + if (!IS_ENABLED(ARCH_ALIGNED_MMAPS)) + map_type &= ~MAP_FIXED; + + map_addr = vm_mmap(filep, addr, size, prot, map_type, off); if (BAD_ADDR(map_addr)) return map_addr; -- Michal Hocko SUSE Labs