linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>,
	Ding Tianhong <dingtianhong@huawei.com>,
	linux-mm@kvack.org
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>,
	Christoph Hellwig <hch@infradead.org>,
	Jonathan Cameron <Jonathan.Cameron@Huawei.com>,
	linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org,
	Rick Edgecombe <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings
Date: Tue, 26 Jan 2021 19:47:22 +1000	[thread overview]
Message-ID: <1611653945.t3oot63nwn.astroid@bobo.none> (raw)
In-Reply-To: <0f360e6e-6d34-19ce-6c76-a17a5f4f7fc3@huawei.com>

Excerpts from Ding Tianhong's message of January 26, 2021 4:59 pm:
> On 2021/1/26 12:45, Nicholas Piggin wrote:
>> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
>> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
>> supports PMD sized vmap mappings.
>> 
>> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
>> or larger, and fall back to small pages if that was unsuccessful.
>> 
>> Architectures must ensure that any arch specific vmalloc allocations
>> that require PAGE_SIZE mappings (e.g., module allocations vs strict
>> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
>> 
>> When hugepage vmalloc mappings are enabled in the next patch, this
>> reduces TLB misses by nearly 30x on a `git diff` workload on a 2-node
>> POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
>> 
>> This can result in more internal fragmentation and memory overhead for a
>> given allocation, an option nohugevmalloc is added to disable at boot.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  arch/Kconfig            |  11 ++
>>  include/linux/vmalloc.h |  21 ++++
>>  mm/page_alloc.c         |   5 +-
>>  mm/vmalloc.c            | 215 +++++++++++++++++++++++++++++++---------
>>  4 files changed, 205 insertions(+), 47 deletions(-)
>> 
>> diff --git a/arch/Kconfig b/arch/Kconfig
>> index 24862d15f3a3..eef170e0c9b8 100644
>> --- a/arch/Kconfig
>> +++ b/arch/Kconfig
>> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>>  config HAVE_ARCH_HUGE_VMAP
>>  	bool
>>  
>> +#
>> +#  Archs that select this would be capable of PMD-sized vmaps (i.e.,
>> +#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
>> +#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP flag
>> +#  can be used to prohibit arch-specific allocations from using hugepages to
>> +#  help with this (e.g., modules may require it).
>> +#
>> +config HAVE_ARCH_HUGE_VMALLOC
>> +	depends on HAVE_ARCH_HUGE_VMAP
>> +	bool
>> +
>>  config ARCH_WANT_HUGE_PMD_SHARE
>>  	bool
>>  
>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 99ea72d547dc..93270adf5db5 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -25,6 +25,7 @@ struct notifier_block;		/* in notifier.h */
>>  #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>>  #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
>>  #define VM_MAP_PUT_PAGES	0x00000100	/* put pages and free array in vfree */
>> +#define VM_NO_HUGE_VMAP		0x00000200	/* force PAGE_SIZE pte mapping */
>> 
>>  /*
>>   * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
>> @@ -59,6 +60,9 @@ struct vm_struct {
>>  	unsigned long		size;
>>  	unsigned long		flags;
>>  	struct page		**pages;
>> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
>> +	unsigned int		page_order;
>> +#endif
>>  	unsigned int		nr_pages;
>>  	phys_addr_t		phys_addr;
>>  	const void		*caller;
> Hi Nicholas:
> 
> Give a suggestion :)
> 
> The page order was only used to indicate the huge page flag for vm area, and only valid when
> size bigger than PMD_SIZE, so can we use the vm flgas to instead of that, just like define the
> new flag named VM_HUGEPAGE, it would not break the vm struct, and it is easier for me to backport the serious
> patches to our own branches. (Base on the lts version).

Hmm, it might be possible. I'm not sure if 1GB vmallocs will be used any 
time soon (or maybe they will for edge case configurations? It would be 
trivial to add support for).

The other concern I have is that Christophe IIRC was asking about 
implementing a mapping for PPC which used TLB mappings that were 
different than kernel page table tree size. Although I guess we could 
deal with that when it comes.

I like the flexibility of page_order though. How hard would it be for 
you to do the backport with VM_HUGEPAGE yourself?

I should also say, thanks for all the review and testing from the Huawei 
team. Do you have an x86 patch?

Thanks,
Nick

  reply	other threads:[~2021-01-26 11:29 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-26  4:44 [PATCH v11 00/13] huge vmalloc mappings Nicholas Piggin
2021-01-26  4:44 ` [PATCH v11 01/13] mm/vmalloc: fix HUGE_VMAP regression by enabling huge pages in vmalloc_to_page Nicholas Piggin
2021-01-26  6:40   ` Miaohe Lin
2021-01-28  3:13   ` Ding Tianhong
2021-02-02 10:22     ` Nicholas Piggin
2021-01-26  4:44 ` [PATCH v11 02/13] mm: apply_to_pte_range warn and fail if a large pte is encountered Nicholas Piggin
2021-01-26  6:49   ` Miaohe Lin
2021-01-26  4:45 ` [PATCH v11 03/13] mm/vmalloc: rename vmap_*_range vmap_pages_*_range Nicholas Piggin
2021-01-27  2:10   ` Miaohe Lin
2021-01-26  4:45 ` [PATCH v11 04/13] mm/ioremap: rename ioremap_*_range to vmap_*_range Nicholas Piggin
2021-01-26  6:40   ` Christoph Hellwig
2021-01-28  2:38   ` Miaohe Lin
2021-01-26  4:45 ` [PATCH v11 05/13] mm: HUGE_VMAP arch support cleanup Nicholas Piggin
2021-01-26  6:07   ` Ding Tianhong
2021-01-26 13:26   ` kernel test robot
2021-01-27  5:26   ` kernel test robot
2021-01-26  4:45 ` [PATCH v11 06/13] powerpc: inline huge vmap supported functions Nicholas Piggin
2021-01-26  4:45 ` [PATCH v11 07/13] arm64: " Nicholas Piggin
2021-01-26  4:45 ` [PATCH v11 08/13] x86: " Nicholas Piggin
2021-01-26  4:45 ` [PATCH v11 09/13] mm/vmalloc: provide fallback arch huge vmap support functions Nicholas Piggin
2021-01-26  4:45 ` [PATCH v11 10/13] mm: Move vmap_range from mm/ioremap.c to mm/vmalloc.c Nicholas Piggin
2021-01-26  4:45 ` [PATCH v11 11/13] mm/vmalloc: add vmap_range_noflush variant Nicholas Piggin
2021-01-26  4:45 ` [PATCH v11 12/13] mm/vmalloc: Hugepage vmalloc mappings Nicholas Piggin
2021-01-26  6:59   ` Ding Tianhong
2021-01-26  9:47     ` Nicholas Piggin [this message]
2021-01-26 11:48       ` Ding Tianhong
2021-01-26  4:45 ` [PATCH v11 13/13] powerpc/64s/radix: Enable huge " Nicholas Piggin
2021-01-27 10:26   ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1611653945.t3oot63nwn.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=Jonathan.Cameron@Huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=dingtianhong@huawei.com \
    --cc=hch@infradead.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=rick.p.edgecombe@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).