All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<linux-arch@vger.kernel.org>, <linuxppc-dev@lists.ozlabs.org>,
	"Catalin Marinas" <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	<x86@kernel.org>, "H. Peter Anvin" <hpa@zytor.com>,
	"Zefan Li" <lizefan@huawei.com>
Subject: Re: [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings
Date: Wed, 12 Aug 2020 13:25:24 +0100	[thread overview]
Message-ID: <20200812132524.000067a6@Huawei.com> (raw)
In-Reply-To: <20200810022732.1150009-9-npiggin@gmail.com>

On Mon, 10 Aug 2020 12:27:32 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmaps,
> vmalloc will attempt to allocate PMD-sized pages first, before falling
> back to small pages.
> 
> Allocations which use something other than PAGE_KERNEL protections are
> not permitted to use huge pages yet, not all callers expect this (e.g.,
> module allocations vs strict module rwx).
> 
> This reduces TLB misses by nearly 30x on a `git diff` workload on a
> 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> 
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmap is added to disable at boot.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Hi Nicholas,

Busy afternoon, but a possible point of interest in line in the meantime.


...

> @@ -2701,22 +2760,45 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  			pgprot_t prot, unsigned long vm_flags, int node,
>  			const void *caller)
>  {
> -	struct vm_struct *area;
> +	struct vm_struct *area = NULL;
>  	void *addr;
>  	unsigned long real_size = size;
> +	unsigned long real_align = align;
> +	unsigned int shift = PAGE_SHIFT;
>  
>  	size = PAGE_ALIGN(size);
>  	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
>  		goto fail;
>  
> -	area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED |
> +	if (vmap_allow_huge && (pgprot_val(prot) == pgprot_val(PAGE_KERNEL))) {
> +		unsigned long size_per_node;
> +
> +		/*
> +		 * Try huge pages. Only try for PAGE_KERNEL allocations,
> +		 * others like modules don't yet expect huge pages in
> +		 * their allocations due to apply_to_page_range not
> +		 * supporting them.
> +		 */
> +
> +		size_per_node = size;
> +		if (node == NUMA_NO_NODE)
> +			size_per_node /= num_online_nodes();
> +		if (size_per_node >= PMD_SIZE)
> +			shift = PMD_SHIFT;
> +	}
> +
> +again:
> +	align = max(real_align, 1UL << shift);
> +	size = ALIGN(real_size, align);

So my suspicion is that the issue on arm64 is related to this.
In the relevant call path, align is 32K whilst the size is 16K

Previously I don't think we force size to be a multiple of align.

I think this results in nr_pages being double what it was before.


> +
> +	area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
>  				vm_flags, start, end, node, gfp_mask, caller);
>  	if (!area)
>  		goto fail;
>  
> -	addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> +	addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
>  	if (!addr)
> -		return NULL;
> +		goto fail;
>  
>  	/*
>  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> @@ -2730,8 +2812,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  	return addr;
>  
>  fail:
> -	warn_alloc(gfp_mask, NULL,
> +	if (shift > PAGE_SHIFT) {
> +		shift = PAGE_SHIFT;
> +		goto again;
> +	}
> +
> +	if (!area) {
> +		/* Warn for area allocation, page allocations already warn */
> +		warn_alloc(gfp_mask, NULL,
>  			  "vmalloc: allocation failure: %lu bytes", real_size);
> +	}
>  	return NULL;
>  }
>  



WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-arch@vger.kernel.org, Zefan Li <lizefan@huawei.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Will Deacon <will@kernel.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings
Date: Wed, 12 Aug 2020 13:25:24 +0100	[thread overview]
Message-ID: <20200812132524.000067a6@Huawei.com> (raw)
In-Reply-To: <20200810022732.1150009-9-npiggin@gmail.com>

On Mon, 10 Aug 2020 12:27:32 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmaps,
> vmalloc will attempt to allocate PMD-sized pages first, before falling
> back to small pages.
> 
> Allocations which use something other than PAGE_KERNEL protections are
> not permitted to use huge pages yet, not all callers expect this (e.g.,
> module allocations vs strict module rwx).
> 
> This reduces TLB misses by nearly 30x on a `git diff` workload on a
> 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> 
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmap is added to disable at boot.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Hi Nicholas,

Busy afternoon, but a possible point of interest in line in the meantime.


...

> @@ -2701,22 +2760,45 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  			pgprot_t prot, unsigned long vm_flags, int node,
>  			const void *caller)
>  {
> -	struct vm_struct *area;
> +	struct vm_struct *area = NULL;
>  	void *addr;
>  	unsigned long real_size = size;
> +	unsigned long real_align = align;
> +	unsigned int shift = PAGE_SHIFT;
>  
>  	size = PAGE_ALIGN(size);
>  	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
>  		goto fail;
>  
> -	area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED |
> +	if (vmap_allow_huge && (pgprot_val(prot) == pgprot_val(PAGE_KERNEL))) {
> +		unsigned long size_per_node;
> +
> +		/*
> +		 * Try huge pages. Only try for PAGE_KERNEL allocations,
> +		 * others like modules don't yet expect huge pages in
> +		 * their allocations due to apply_to_page_range not
> +		 * supporting them.
> +		 */
> +
> +		size_per_node = size;
> +		if (node == NUMA_NO_NODE)
> +			size_per_node /= num_online_nodes();
> +		if (size_per_node >= PMD_SIZE)
> +			shift = PMD_SHIFT;
> +	}
> +
> +again:
> +	align = max(real_align, 1UL << shift);
> +	size = ALIGN(real_size, align);

So my suspicion is that the issue on arm64 is related to this.
In the relevant call path, align is 32K whilst the size is 16K

Previously I don't think we force size to be a multiple of align.

I think this results in nr_pages being double what it was before.


> +
> +	area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
>  				vm_flags, start, end, node, gfp_mask, caller);
>  	if (!area)
>  		goto fail;
>  
> -	addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> +	addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
>  	if (!addr)
> -		return NULL;
> +		goto fail;
>  
>  	/*
>  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> @@ -2730,8 +2812,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  	return addr;
>  
>  fail:
> -	warn_alloc(gfp_mask, NULL,
> +	if (shift > PAGE_SHIFT) {
> +		shift = PAGE_SHIFT;
> +		goto again;
> +	}
> +
> +	if (!area) {
> +		/* Warn for area allocation, page allocations already warn */
> +		warn_alloc(gfp_mask, NULL,
>  			  "vmalloc: allocation failure: %lu bytes", real_size);
> +	}
>  	return NULL;
>  }
>  



WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-arch@vger.kernel.org, Zefan Li <lizefan@huawei.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Will Deacon <will@kernel.org>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings
Date: Wed, 12 Aug 2020 13:25:24 +0100	[thread overview]
Message-ID: <20200812132524.000067a6@Huawei.com> (raw)
In-Reply-To: <20200810022732.1150009-9-npiggin@gmail.com>

On Mon, 10 Aug 2020 12:27:32 +1000
Nicholas Piggin <npiggin@gmail.com> wrote:

> On platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmaps,
> vmalloc will attempt to allocate PMD-sized pages first, before falling
> back to small pages.
> 
> Allocations which use something other than PAGE_KERNEL protections are
> not permitted to use huge pages yet, not all callers expect this (e.g.,
> module allocations vs strict module rwx).
> 
> This reduces TLB misses by nearly 30x on a `git diff` workload on a
> 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> 
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmap is added to disable at boot.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Hi Nicholas,

Busy afternoon, but a possible point of interest in line in the meantime.


...

> @@ -2701,22 +2760,45 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  			pgprot_t prot, unsigned long vm_flags, int node,
>  			const void *caller)
>  {
> -	struct vm_struct *area;
> +	struct vm_struct *area = NULL;
>  	void *addr;
>  	unsigned long real_size = size;
> +	unsigned long real_align = align;
> +	unsigned int shift = PAGE_SHIFT;
>  
>  	size = PAGE_ALIGN(size);
>  	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
>  		goto fail;
>  
> -	area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED |
> +	if (vmap_allow_huge && (pgprot_val(prot) == pgprot_val(PAGE_KERNEL))) {
> +		unsigned long size_per_node;
> +
> +		/*
> +		 * Try huge pages. Only try for PAGE_KERNEL allocations,
> +		 * others like modules don't yet expect huge pages in
> +		 * their allocations due to apply_to_page_range not
> +		 * supporting them.
> +		 */
> +
> +		size_per_node = size;
> +		if (node == NUMA_NO_NODE)
> +			size_per_node /= num_online_nodes();
> +		if (size_per_node >= PMD_SIZE)
> +			shift = PMD_SHIFT;
> +	}
> +
> +again:
> +	align = max(real_align, 1UL << shift);
> +	size = ALIGN(real_size, align);

So my suspicion is that the issue on arm64 is related to this.
In the relevant call path, align is 32K whilst the size is 16K

Previously I don't think we force size to be a multiple of align.

I think this results in nr_pages being double what it was before.


> +
> +	area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
>  				vm_flags, start, end, node, gfp_mask, caller);
>  	if (!area)
>  		goto fail;
>  
> -	addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> +	addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
>  	if (!addr)
> -		return NULL;
> +		goto fail;
>  
>  	/*
>  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> @@ -2730,8 +2812,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
>  	return addr;
>  
>  fail:
> -	warn_alloc(gfp_mask, NULL,
> +	if (shift > PAGE_SHIFT) {
> +		shift = PAGE_SHIFT;
> +		goto again;
> +	}
> +
> +	if (!area) {
> +		/* Warn for area allocation, page allocations already warn */
> +		warn_alloc(gfp_mask, NULL,
>  			  "vmalloc: allocation failure: %lu bytes", real_size);
> +	}
>  	return NULL;
>  }
>  



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-08-12 12:27 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10  2:27 [PATCH v3 0/8] huge vmalloc mappings Nicholas Piggin
2020-08-10  2:27 ` Nicholas Piggin
2020-08-10  2:27 ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 1/8] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 2/8] mm: apply_to_pte_range warn and fail if a large pte is encountered Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 3/8] mm/vmalloc: rename vmap_*_range vmap_pages_*_range Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 4/8] lib/ioremap: rename ioremap_*_range to vmap_*_range Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 5/8] mm: HUGE_VMAP arch support cleanup Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  3:58   ` kernel test robot
2020-08-10  3:58     ` kernel test robot
2020-08-10  3:58     ` kernel test robot
2020-08-10  3:58     ` kernel test robot
2020-08-10  3:58     ` kernel test robot
2020-08-10  2:27 ` [PATCH v3 6/8] mm: Move vmap_range from lib/ioremap.c to mm/vmalloc.c Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 7/8] mm/vmalloc: add vmap_range_noflush variant Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-12 12:25   ` Jonathan Cameron [this message]
2020-08-12 12:25     ` Jonathan Cameron
2020-08-12 12:25     ` Jonathan Cameron
2020-08-12 16:18     ` Jonathan Cameron
2020-08-12 16:18       ` Jonathan Cameron
2020-08-12 16:18       ` Jonathan Cameron
2020-08-11 16:32 ` [PATCH v3 0/8] huge " Jonathan Cameron
2020-08-11 16:32   ` Jonathan Cameron
2020-08-11 16:32   ` Jonathan Cameron
2020-08-11 16:32   ` Jonathan Cameron
2020-08-12  1:07   ` Zefan Li
2020-08-12  1:07     ` Zefan Li
2020-08-12  1:07     ` Zefan Li
2020-08-12  1:07     ` Zefan Li
2020-08-12  8:11     ` Nicholas Piggin
2020-08-12  8:11       ` Nicholas Piggin
2020-08-12  8:11       ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812132524.000067a6@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lizefan@huawei.com \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.