From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Tue, 14 Apr 2020 22:13:44 +1000 From: Nicholas Piggin Subject: Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings References: <20200413125303.423864-1-npiggin@gmail.com> <20200413125303.423864-5-npiggin@gmail.com> <20200414072316.GA5503@infradead.org> In-Reply-To: <20200414072316.GA5503@infradead.org> MIME-Version: 1.0 Message-ID: <1586864403.0qfilei2ft.astroid@bobo.none> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org To: Christoph Hellwig Cc: Borislav Petkov , Catalin Marinas , "H. Peter Anvin" , linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linuxppc-dev@lists.ozlabs.org, Ingo Molnar , Thomas Gleixner , Will Deacon , x86@kernel.org List-ID: Message-ID: <20200414121344.e_0UH3MiZAKnR4WYdKSdkA0lqCWUnWiNHMu1GPm337w@z> Excerpts from Christoph Hellwig's message of April 14, 2020 5:23 pm: > On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote: >> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappi= ngs, >> have vmalloc attempt to allocate PMD-sized pages first, before falling b= ack >> to small pages. Allocations which use something other than PAGE_KERNEL >> protections are not permitted to use huge pages yet, not all callers exp= ect >> this (e.g., module allocations vs strict module rwx). >>=20 >> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), fr= om >> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9. >>=20 >> This can result in more internal fragmentation and memory overhead for a >> given allocation. It can also cause greater NUMA unbalance on hashdist >> allocations. >>=20 >> There may be other callers that expect small pages under vmalloc but use >> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An >> alternative would be a new function or flag which enables large mappings= , >> and use that in callers. >=20 > Why do we even use vmalloc in this case rather than just doing a huge > page allocation? Which case? Usually the answer would be because you don't want to use contiguous physical memory and/or you don't want to use the linear=20 mapping. > What callers are you intersted in? The dentry and inode caches for this test, obviously. Lots of other things could possibly benefit though, other system=20 hashes like networking, but lot of other vmalloc callers that might benefit right away, some others could use some work to batch up allocation sizes to benefit. Thanks, Nick