All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: <linux-arch@vger.kernel.org>, Zefan Li <lizefan@huawei.com>,
	"H. Peter Anvin" <hpa@zytor.com>, Will Deacon <will@kernel.org>,
	<x86@kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-mm@kvack.org>, Ingo Molnar <mingo@redhat.com>,
	Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	<linuxppc-dev@lists.ozlabs.org>,
	<linux-arm-kernel@lists.infradead.org>, <linuxarm@huawei.com>
Subject: Re: [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings
Date: Wed, 12 Aug 2020 17:18:33 +0100	[thread overview]
Message-ID: <20200812171833.00001570@Huawei.com> (raw)
In-Reply-To: <20200812132524.000067a6@Huawei.com>

On Wed, 12 Aug 2020 13:25:24 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Mon, 10 Aug 2020 12:27:32 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> > On platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmaps,
> > vmalloc will attempt to allocate PMD-sized pages first, before falling
> > back to small pages.
> > 
> > Allocations which use something other than PAGE_KERNEL protections are
> > not permitted to use huge pages yet, not all callers expect this (e.g.,
> > module allocations vs strict module rwx).
> > 
> > This reduces TLB misses by nearly 30x on a `git diff` workload on a
> > 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> > 
> > This can result in more internal fragmentation and memory overhead for a
> > given allocation, an option nohugevmap is added to disable at boot.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>  
> Hi Nicholas,
> 
> Busy afternoon, but a possible point of interest in line in the meantime.
> 

I did manage to get back to this.

The issue I think is that ARM64 defines THREAD_ALIGN with CONFIG_VMAP_STACK
to be 2* THREAD SIZE.  There is comment in arch/arm64/include/asm/memory.h
that this is to allow cheap checking of overflow.

A quick grep suggests ARM64 is the only architecture to do this...

Jonathan



> 
> ...
> 
> > @@ -2701,22 +2760,45 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >  			pgprot_t prot, unsigned long vm_flags, int node,
> >  			const void *caller)
> >  {
> > -	struct vm_struct *area;
> > +	struct vm_struct *area = NULL;
> >  	void *addr;
> >  	unsigned long real_size = size;
> > +	unsigned long real_align = align;
> > +	unsigned int shift = PAGE_SHIFT;
> >  
> >  	size = PAGE_ALIGN(size);
> >  	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
> >  		goto fail;
> >  
> > -	area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED |
> > +	if (vmap_allow_huge && (pgprot_val(prot) == pgprot_val(PAGE_KERNEL))) {
> > +		unsigned long size_per_node;
> > +
> > +		/*
> > +		 * Try huge pages. Only try for PAGE_KERNEL allocations,
> > +		 * others like modules don't yet expect huge pages in
> > +		 * their allocations due to apply_to_page_range not
> > +		 * supporting them.
> > +		 */
> > +
> > +		size_per_node = size;
> > +		if (node == NUMA_NO_NODE)
> > +			size_per_node /= num_online_nodes();
> > +		if (size_per_node >= PMD_SIZE)
> > +			shift = PMD_SHIFT;
> > +	}
> > +
> > +again:
> > +	align = max(real_align, 1UL << shift);
> > +	size = ALIGN(real_size, align);  
> 
> So my suspicion is that the issue on arm64 is related to this.
> In the relevant call path, align is 32K whilst the size is 16K
> 
> Previously I don't think we force size to be a multiple of align.
> 
> I think this results in nr_pages being double what it was before.
> 
> 
> > +
> > +	area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
> >  				vm_flags, start, end, node, gfp_mask, caller);
> >  	if (!area)
> >  		goto fail;
> >  
> > -	addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> > +	addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
> >  	if (!addr)
> > -		return NULL;
> > +		goto fail;
> >  
> >  	/*
> >  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> > @@ -2730,8 +2812,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >  	return addr;
> >  
> >  fail:
> > -	warn_alloc(gfp_mask, NULL,
> > +	if (shift > PAGE_SHIFT) {
> > +		shift = PAGE_SHIFT;
> > +		goto again;
> > +	}
> > +
> > +	if (!area) {
> > +		/* Warn for area allocation, page allocations already warn */
> > +		warn_alloc(gfp_mask, NULL,
> >  			  "vmalloc: allocation failure: %lu bytes", real_size);
> > +	}
> >  	return NULL;
> >  }
> >    
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com, linux-mm@kvack.org,
	Zefan Li <lizefan@huawei.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings
Date: Wed, 12 Aug 2020 17:18:33 +0100	[thread overview]
Message-ID: <20200812171833.00001570@Huawei.com> (raw)
In-Reply-To: <20200812132524.000067a6@Huawei.com>

On Wed, 12 Aug 2020 13:25:24 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Mon, 10 Aug 2020 12:27:32 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> > On platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmaps,
> > vmalloc will attempt to allocate PMD-sized pages first, before falling
> > back to small pages.
> > 
> > Allocations which use something other than PAGE_KERNEL protections are
> > not permitted to use huge pages yet, not all callers expect this (e.g.,
> > module allocations vs strict module rwx).
> > 
> > This reduces TLB misses by nearly 30x on a `git diff` workload on a
> > 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> > 
> > This can result in more internal fragmentation and memory overhead for a
> > given allocation, an option nohugevmap is added to disable at boot.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>  
> Hi Nicholas,
> 
> Busy afternoon, but a possible point of interest in line in the meantime.
> 

I did manage to get back to this.

The issue I think is that ARM64 defines THREAD_ALIGN with CONFIG_VMAP_STACK
to be 2* THREAD SIZE.  There is comment in arch/arm64/include/asm/memory.h
that this is to allow cheap checking of overflow.

A quick grep suggests ARM64 is the only architecture to do this...

Jonathan



> 
> ...
> 
> > @@ -2701,22 +2760,45 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >  			pgprot_t prot, unsigned long vm_flags, int node,
> >  			const void *caller)
> >  {
> > -	struct vm_struct *area;
> > +	struct vm_struct *area = NULL;
> >  	void *addr;
> >  	unsigned long real_size = size;
> > +	unsigned long real_align = align;
> > +	unsigned int shift = PAGE_SHIFT;
> >  
> >  	size = PAGE_ALIGN(size);
> >  	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
> >  		goto fail;
> >  
> > -	area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED |
> > +	if (vmap_allow_huge && (pgprot_val(prot) == pgprot_val(PAGE_KERNEL))) {
> > +		unsigned long size_per_node;
> > +
> > +		/*
> > +		 * Try huge pages. Only try for PAGE_KERNEL allocations,
> > +		 * others like modules don't yet expect huge pages in
> > +		 * their allocations due to apply_to_page_range not
> > +		 * supporting them.
> > +		 */
> > +
> > +		size_per_node = size;
> > +		if (node == NUMA_NO_NODE)
> > +			size_per_node /= num_online_nodes();
> > +		if (size_per_node >= PMD_SIZE)
> > +			shift = PMD_SHIFT;
> > +	}
> > +
> > +again:
> > +	align = max(real_align, 1UL << shift);
> > +	size = ALIGN(real_size, align);  
> 
> So my suspicion is that the issue on arm64 is related to this.
> In the relevant call path, align is 32K whilst the size is 16K
> 
> Previously I don't think we force size to be a multiple of align.
> 
> I think this results in nr_pages being double what it was before.
> 
> 
> > +
> > +	area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
> >  				vm_flags, start, end, node, gfp_mask, caller);
> >  	if (!area)
> >  		goto fail;
> >  
> > -	addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> > +	addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
> >  	if (!addr)
> > -		return NULL;
> > +		goto fail;
> >  
> >  	/*
> >  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> > @@ -2730,8 +2812,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >  	return addr;
> >  
> >  fail:
> > -	warn_alloc(gfp_mask, NULL,
> > +	if (shift > PAGE_SHIFT) {
> > +		shift = PAGE_SHIFT;
> > +		goto again;
> > +	}
> > +
> > +	if (!area) {
> > +		/* Warn for area allocation, page allocations already warn */
> > +		warn_alloc(gfp_mask, NULL,
> >  			  "vmalloc: allocation failure: %lu bytes", real_size);
> > +	}
> >  	return NULL;
> >  }
> >    
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org,
	linuxarm@huawei.com, linux-mm@kvack.org,
	Zefan Li <lizefan@huawei.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings
Date: Wed, 12 Aug 2020 17:18:33 +0100	[thread overview]
Message-ID: <20200812171833.00001570@Huawei.com> (raw)
In-Reply-To: <20200812132524.000067a6@Huawei.com>

On Wed, 12 Aug 2020 13:25:24 +0100
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Mon, 10 Aug 2020 12:27:32 +1000
> Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> > On platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmaps,
> > vmalloc will attempt to allocate PMD-sized pages first, before falling
> > back to small pages.
> > 
> > Allocations which use something other than PAGE_KERNEL protections are
> > not permitted to use huge pages yet, not all callers expect this (e.g.,
> > module allocations vs strict module rwx).
> > 
> > This reduces TLB misses by nearly 30x on a `git diff` workload on a
> > 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%.
> > 
> > This can result in more internal fragmentation and memory overhead for a
> > given allocation, an option nohugevmap is added to disable at boot.
> > 
> > Signed-off-by: Nicholas Piggin <npiggin@gmail.com>  
> Hi Nicholas,
> 
> Busy afternoon, but a possible point of interest in line in the meantime.
> 

I did manage to get back to this.

The issue I think is that ARM64 defines THREAD_ALIGN with CONFIG_VMAP_STACK
to be 2* THREAD SIZE.  There is comment in arch/arm64/include/asm/memory.h
that this is to allow cheap checking of overflow.

A quick grep suggests ARM64 is the only architecture to do this...

Jonathan



> 
> ...
> 
> > @@ -2701,22 +2760,45 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >  			pgprot_t prot, unsigned long vm_flags, int node,
> >  			const void *caller)
> >  {
> > -	struct vm_struct *area;
> > +	struct vm_struct *area = NULL;
> >  	void *addr;
> >  	unsigned long real_size = size;
> > +	unsigned long real_align = align;
> > +	unsigned int shift = PAGE_SHIFT;
> >  
> >  	size = PAGE_ALIGN(size);
> >  	if (!size || (size >> PAGE_SHIFT) > totalram_pages())
> >  		goto fail;
> >  
> > -	area = __get_vm_area_node(real_size, align, VM_ALLOC | VM_UNINITIALIZED |
> > +	if (vmap_allow_huge && (pgprot_val(prot) == pgprot_val(PAGE_KERNEL))) {
> > +		unsigned long size_per_node;
> > +
> > +		/*
> > +		 * Try huge pages. Only try for PAGE_KERNEL allocations,
> > +		 * others like modules don't yet expect huge pages in
> > +		 * their allocations due to apply_to_page_range not
> > +		 * supporting them.
> > +		 */
> > +
> > +		size_per_node = size;
> > +		if (node == NUMA_NO_NODE)
> > +			size_per_node /= num_online_nodes();
> > +		if (size_per_node >= PMD_SIZE)
> > +			shift = PMD_SHIFT;
> > +	}
> > +
> > +again:
> > +	align = max(real_align, 1UL << shift);
> > +	size = ALIGN(real_size, align);  
> 
> So my suspicion is that the issue on arm64 is related to this.
> In the relevant call path, align is 32K whilst the size is 16K
> 
> Previously I don't think we force size to be a multiple of align.
> 
> I think this results in nr_pages being double what it was before.
> 
> 
> > +
> > +	area = __get_vm_area_node(size, align, VM_ALLOC | VM_UNINITIALIZED |
> >  				vm_flags, start, end, node, gfp_mask, caller);
> >  	if (!area)
> >  		goto fail;
> >  
> > -	addr = __vmalloc_area_node(area, gfp_mask, prot, node);
> > +	addr = __vmalloc_area_node(area, gfp_mask, prot, shift, node);
> >  	if (!addr)
> > -		return NULL;
> > +		goto fail;
> >  
> >  	/*
> >  	 * In this function, newly allocated vm_struct has VM_UNINITIALIZED
> > @@ -2730,8 +2812,16 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align,
> >  	return addr;
> >  
> >  fail:
> > -	warn_alloc(gfp_mask, NULL,
> > +	if (shift > PAGE_SHIFT) {
> > +		shift = PAGE_SHIFT;
> > +		goto again;
> > +	}
> > +
> > +	if (!area) {
> > +		/* Warn for area allocation, page allocations already warn */
> > +		warn_alloc(gfp_mask, NULL,
> >  			  "vmalloc: allocation failure: %lu bytes", real_size);
> > +	}
> >  	return NULL;
> >  }
> >    
> 
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-08-12 16:20 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10  2:27 [PATCH v3 0/8] huge vmalloc mappings Nicholas Piggin
2020-08-10  2:27 ` Nicholas Piggin
2020-08-10  2:27 ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 1/8] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 2/8] mm: apply_to_pte_range warn and fail if a large pte is encountered Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 3/8] mm/vmalloc: rename vmap_*_range vmap_pages_*_range Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 4/8] lib/ioremap: rename ioremap_*_range to vmap_*_range Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 5/8] mm: HUGE_VMAP arch support cleanup Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  3:58   ` kernel test robot
2020-08-10  3:58     ` kernel test robot
2020-08-10  3:58     ` kernel test robot
2020-08-10  3:58     ` kernel test robot
2020-08-10  3:58     ` kernel test robot
2020-08-10  2:27 ` [PATCH v3 6/8] mm: Move vmap_range from lib/ioremap.c to mm/vmalloc.c Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 7/8] mm/vmalloc: add vmap_range_noflush variant Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27 ` [PATCH v3 8/8] mm/vmalloc: Hugepage vmalloc mappings Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-10  2:27   ` Nicholas Piggin
2020-08-12 12:25   ` Jonathan Cameron
2020-08-12 12:25     ` Jonathan Cameron
2020-08-12 12:25     ` Jonathan Cameron
2020-08-12 16:18     ` Jonathan Cameron [this message]
2020-08-12 16:18       ` Jonathan Cameron
2020-08-12 16:18       ` Jonathan Cameron
2020-08-11 16:32 ` [PATCH v3 0/8] huge " Jonathan Cameron
2020-08-11 16:32   ` Jonathan Cameron
2020-08-11 16:32   ` Jonathan Cameron
2020-08-11 16:32   ` Jonathan Cameron
2020-08-12  1:07   ` Zefan Li
2020-08-12  1:07     ` Zefan Li
2020-08-12  1:07     ` Zefan Li
2020-08-12  1:07     ` Zefan Li
2020-08-12  8:11     ` Nicholas Piggin
2020-08-12  8:11       ` Nicholas Piggin
2020-08-12  8:11       ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200812171833.00001570@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lizefan@huawei.com \
    --cc=mingo@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.