All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nicholas Piggin <npiggin@gmail.com>
To: Will Deacon <will@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>,
	Catalin Marinas <catalin.marinas@arm.com>,
	"H. Peter Anvin" <hpa@zytor.com>,
	linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linuxppc-dev@lists.ozlabs.org, Ingo Molnar <mingo@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	x86@kernel.org
Subject: Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings
Date: Thu, 16 Apr 2020 12:38:00 +1000	[thread overview]
Message-ID: <1587003993.x84ylh11b2.astroid@bobo.none> (raw)
In-Reply-To: <20200415104755.GD12621@willie-the-truck>

Excerpts from Will Deacon's message of April 15, 2020 8:47 pm:
> Hi Nick,
> 
> On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings,
>> have vmalloc attempt to allocate PMD-sized pages first, before falling back
>> to small pages. Allocations which use something other than PAGE_KERNEL
>> protections are not permitted to use huge pages yet, not all callers expect
>> this (e.g., module allocations vs strict module rwx).
>> 
>> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from
>> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9.
> 
> I wonder if it's worth extending vmap() to handle higher order pages in
> a similar way? That might be helpful for tracing PMUs such as Arm SPE,
> where the CPU streams tracing data out to a virtually addressed buffer
> (see rb_alloc_aux_page()).

Yeah it becomes pretty trivial to do that with VM_HUGE_PAGES after
this patch, I have something to do it but no callers ready yet, if
you have an easy one we can add it.

>> This can result in more internal fragmentation and memory overhead for a
>> given allocation. It can also cause greater NUMA unbalance on hashdist
>> allocations.
>> 
>> There may be other callers that expect small pages under vmalloc but use
>> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An
>> alternative would be a new function or flag which enables large mappings,
>> and use that in callers.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  include/linux/vmalloc.h |   2 +
>>  mm/vmalloc.c            | 135 +++++++++++++++++++++++++++++-----------
>>  2 files changed, 102 insertions(+), 35 deletions(-)
>> 
>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 291313a7e663..853b82eac192 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -24,6 +24,7 @@ struct notifier_block;		/* in notifier.h */
>>  #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
>>  #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>>  #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
>> +#define VM_HUGE_PAGES		0x00000100	/* may use huge pages */
> 
> Please can you add a check for this in the arm64 change_memory_common()
> code? Other architectures might need something similar, but we need to
> forbid changing memory attributes for portions of the huge page.

Yeah good idea, I can look about adding some more checks.

> 
> In general, I'm a bit wary of software table walkers tripping over this.
> For example, I don't think apply_to_existing_page_range() can handle
> huge mappings at all, but the one user (KASAN) only ever uses page mappings
> so it's ok there.

Right, I have something to warn for apply to page range (and looking
at adding support for bigger pages). It doesn't even have a test and
warn at the moment which isn't good practice IMO so we should add one
even without huge vmap.

> 
>> @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
>>  	if (unlikely(!size))
>>  		return NULL;
>>  
>> -	if (flags & VM_IOREMAP)
>> -		align = 1ul << clamp_t(int, get_count_order_long(size),
>> -				       PAGE_SHIFT, IOREMAP_MAX_ORDER);
>> +	if (flags & VM_IOREMAP) {
>> +		align = max(align,
>> +			    1ul << clamp_t(int, get_count_order_long(size),
>> +					   PAGE_SHIFT, IOREMAP_MAX_ORDER));
>> +	}
> 
> 
> I don't follow this part. Please could you explain why you're potentially
> aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest
> of the patch.

Trying to remember. If the caller asks for a particular alignment we 
shouldn't reduce it. Should put it in another patch.

Thanks,
Nick

WARNING: multiple messages have this Message-ID (diff)
From: Nicholas Piggin <npiggin@gmail.com>
To: Will Deacon <will@kernel.org>
Cc: linux-arch@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings
Date: Thu, 16 Apr 2020 12:38:00 +1000	[thread overview]
Message-ID: <1587003993.x84ylh11b2.astroid@bobo.none> (raw)
In-Reply-To: <20200415104755.GD12621@willie-the-truck>

Excerpts from Will Deacon's message of April 15, 2020 8:47 pm:
> Hi Nick,
> 
> On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings,
>> have vmalloc attempt to allocate PMD-sized pages first, before falling back
>> to small pages. Allocations which use something other than PAGE_KERNEL
>> protections are not permitted to use huge pages yet, not all callers expect
>> this (e.g., module allocations vs strict module rwx).
>> 
>> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from
>> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9.
> 
> I wonder if it's worth extending vmap() to handle higher order pages in
> a similar way? That might be helpful for tracing PMUs such as Arm SPE,
> where the CPU streams tracing data out to a virtually addressed buffer
> (see rb_alloc_aux_page()).

Yeah it becomes pretty trivial to do that with VM_HUGE_PAGES after
this patch, I have something to do it but no callers ready yet, if
you have an easy one we can add it.

>> This can result in more internal fragmentation and memory overhead for a
>> given allocation. It can also cause greater NUMA unbalance on hashdist
>> allocations.
>> 
>> There may be other callers that expect small pages under vmalloc but use
>> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An
>> alternative would be a new function or flag which enables large mappings,
>> and use that in callers.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  include/linux/vmalloc.h |   2 +
>>  mm/vmalloc.c            | 135 +++++++++++++++++++++++++++++-----------
>>  2 files changed, 102 insertions(+), 35 deletions(-)
>> 
>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 291313a7e663..853b82eac192 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -24,6 +24,7 @@ struct notifier_block;		/* in notifier.h */
>>  #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
>>  #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>>  #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
>> +#define VM_HUGE_PAGES		0x00000100	/* may use huge pages */
> 
> Please can you add a check for this in the arm64 change_memory_common()
> code? Other architectures might need something similar, but we need to
> forbid changing memory attributes for portions of the huge page.

Yeah good idea, I can look about adding some more checks.

> 
> In general, I'm a bit wary of software table walkers tripping over this.
> For example, I don't think apply_to_existing_page_range() can handle
> huge mappings at all, but the one user (KASAN) only ever uses page mappings
> so it's ok there.

Right, I have something to warn for apply to page range (and looking
at adding support for bigger pages). It doesn't even have a test and
warn at the moment which isn't good practice IMO so we should add one
even without huge vmap.

> 
>> @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
>>  	if (unlikely(!size))
>>  		return NULL;
>>  
>> -	if (flags & VM_IOREMAP)
>> -		align = 1ul << clamp_t(int, get_count_order_long(size),
>> -				       PAGE_SHIFT, IOREMAP_MAX_ORDER);
>> +	if (flags & VM_IOREMAP) {
>> +		align = max(align,
>> +			    1ul << clamp_t(int, get_count_order_long(size),
>> +					   PAGE_SHIFT, IOREMAP_MAX_ORDER));
>> +	}
> 
> 
> I don't follow this part. Please could you explain why you're potentially
> aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest
> of the patch.

Trying to remember. If the caller asks for a particular alignment we 
shouldn't reduce it. Should put it in another patch.

Thanks,
Nick

WARNING: multiple messages have this Message-ID (diff)
From: Nicholas Piggin <npiggin@gmail.com>
To: Will Deacon <will@kernel.org>
Cc: linux-arch@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	linuxppc-dev@lists.ozlabs.org,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings
Date: Thu, 16 Apr 2020 12:38:00 +1000	[thread overview]
Message-ID: <1587003993.x84ylh11b2.astroid@bobo.none> (raw)
In-Reply-To: <20200415104755.GD12621@willie-the-truck>

Excerpts from Will Deacon's message of April 15, 2020 8:47 pm:
> Hi Nick,
> 
> On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote:
>> For platforms that define HAVE_ARCH_HUGE_VMAP and support PMD vmap mappings,
>> have vmalloc attempt to allocate PMD-sized pages first, before falling back
>> to small pages. Allocations which use something other than PAGE_KERNEL
>> protections are not permitted to use huge pages yet, not all callers expect
>> this (e.g., module allocations vs strict module rwx).
>> 
>> This gives a 6x reduction in dTLB misses for a `git diff` (of linux), from
>> 45600 to 6500 and a 2.2% reduction in cycles on a 2-node POWER9.
> 
> I wonder if it's worth extending vmap() to handle higher order pages in
> a similar way? That might be helpful for tracing PMUs such as Arm SPE,
> where the CPU streams tracing data out to a virtually addressed buffer
> (see rb_alloc_aux_page()).

Yeah it becomes pretty trivial to do that with VM_HUGE_PAGES after
this patch, I have something to do it but no callers ready yet, if
you have an easy one we can add it.

>> This can result in more internal fragmentation and memory overhead for a
>> given allocation. It can also cause greater NUMA unbalance on hashdist
>> allocations.
>> 
>> There may be other callers that expect small pages under vmalloc but use
>> PAGE_KERNEL, I'm not sure if it's feasible to catch them all. An
>> alternative would be a new function or flag which enables large mappings,
>> and use that in callers.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  include/linux/vmalloc.h |   2 +
>>  mm/vmalloc.c            | 135 +++++++++++++++++++++++++++++-----------
>>  2 files changed, 102 insertions(+), 35 deletions(-)
>> 
>> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
>> index 291313a7e663..853b82eac192 100644
>> --- a/include/linux/vmalloc.h
>> +++ b/include/linux/vmalloc.h
>> @@ -24,6 +24,7 @@ struct notifier_block;		/* in notifier.h */
>>  #define VM_UNINITIALIZED	0x00000020	/* vm_struct is not fully initialized */
>>  #define VM_NO_GUARD		0x00000040      /* don't add guard page */
>>  #define VM_KASAN		0x00000080      /* has allocated kasan shadow memory */
>> +#define VM_HUGE_PAGES		0x00000100	/* may use huge pages */
> 
> Please can you add a check for this in the arm64 change_memory_common()
> code? Other architectures might need something similar, but we need to
> forbid changing memory attributes for portions of the huge page.

Yeah good idea, I can look about adding some more checks.

> 
> In general, I'm a bit wary of software table walkers tripping over this.
> For example, I don't think apply_to_existing_page_range() can handle
> huge mappings at all, but the one user (KASAN) only ever uses page mappings
> so it's ok there.

Right, I have something to warn for apply to page range (and looking
at adding support for bigger pages). It doesn't even have a test and
warn at the moment which isn't good practice IMO so we should add one
even without huge vmap.

> 
>> @@ -2325,9 +2356,11 @@ static struct vm_struct *__get_vm_area_node(unsigned long size,
>>  	if (unlikely(!size))
>>  		return NULL;
>>  
>> -	if (flags & VM_IOREMAP)
>> -		align = 1ul << clamp_t(int, get_count_order_long(size),
>> -				       PAGE_SHIFT, IOREMAP_MAX_ORDER);
>> +	if (flags & VM_IOREMAP) {
>> +		align = max(align,
>> +			    1ul << clamp_t(int, get_count_order_long(size),
>> +					   PAGE_SHIFT, IOREMAP_MAX_ORDER));
>> +	}
> 
> 
> I don't follow this part. Please could you explain why you're potentially
> aligning above IOREMAP_MAX_ORDER? It doesn't seem to follow from the rest
> of the patch.

Trying to remember. If the caller asks for a particular alignment we 
shouldn't reduce it. Should put it in another patch.

Thanks,
Nick

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-04-16  2:38 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-13 12:52 [PATCH v2 0/4] huge vmalloc mappings Nicholas Piggin
2020-04-13 12:52 ` Nicholas Piggin
2020-04-13 12:52 ` Nicholas Piggin
2020-04-13 12:53 ` [PATCH v2 1/4] mm/vmalloc: fix vmalloc_to_page for huge vmap mappings Nicholas Piggin
2020-04-13 12:53   ` Nicholas Piggin
2020-04-13 12:53   ` Nicholas Piggin
2020-04-13 13:34   ` Matthew Wilcox
2020-04-13 13:34     ` Matthew Wilcox
2020-04-13 13:34     ` Matthew Wilcox
2020-04-14 11:31     ` Nicholas Piggin
2020-04-14 11:31       ` Nicholas Piggin
2020-04-14 11:31       ` Nicholas Piggin
2020-04-13 12:53 ` [PATCH v2 2/4] mm: Move ioremap page table mapping function to mm/ Nicholas Piggin
2020-04-13 12:53   ` Nicholas Piggin
2020-04-13 12:53   ` Nicholas Piggin
2020-04-13 12:53 ` [PATCH v2 3/4] mm: HUGE_VMAP arch query functions cleanup Nicholas Piggin
2020-04-13 12:53   ` Nicholas Piggin
2020-04-13 12:53   ` Nicholas Piggin
2020-04-13 20:17   ` kbuild test robot
2020-04-13 20:17     ` kbuild test robot
2020-04-13 20:29   ` kbuild test robot
2020-04-13 20:29     ` kbuild test robot
2020-04-13 23:56   ` kbuild test robot
2020-04-13 23:56     ` kbuild test robot
2020-04-13 23:56   ` [PATCH] mm: fix boolreturn.cocci warnings kbuild test robot
2020-04-13 23:56     ` kbuild test robot
2020-04-13 12:53 ` [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings Nicholas Piggin
2020-04-13 12:53   ` Nicholas Piggin
2020-04-13 12:53   ` Nicholas Piggin
2020-04-13 13:41   ` Matthew Wilcox
2020-04-13 13:41     ` Matthew Wilcox
2020-04-13 13:41     ` Matthew Wilcox
2020-04-14 11:39     ` Nicholas Piggin
2020-04-14 11:39       ` Nicholas Piggin
2020-04-14 11:39       ` Nicholas Piggin
2020-04-14 12:28     ` Christophe Leroy
2020-04-14 12:28       ` Christophe Leroy
2020-04-14 12:28       ` Christophe Leroy
2020-04-14 14:20       ` Matthew Wilcox
2020-04-14 14:20         ` Matthew Wilcox
2020-04-14 14:20         ` Matthew Wilcox
2020-04-14  7:23   ` Christoph Hellwig
2020-04-14  7:23     ` Christoph Hellwig
2020-04-14  7:23     ` Christoph Hellwig
2020-04-14 12:13     ` Nicholas Piggin
2020-04-14 12:13       ` Nicholas Piggin
2020-04-14 12:13       ` Nicholas Piggin
2020-04-14 13:02       ` Christoph Hellwig
2020-04-14 13:02         ` Christoph Hellwig
2020-04-14 13:02         ` Christoph Hellwig
2020-04-14 14:48         ` Nicholas Piggin
2020-04-14 14:48           ` Nicholas Piggin
2020-04-14 14:48           ` Nicholas Piggin
2020-04-15 10:47   ` Will Deacon
2020-04-15 10:47     ` Will Deacon
2020-04-15 10:47     ` Will Deacon
2020-04-16  2:38     ` Nicholas Piggin [this message]
2020-04-16  2:38       ` Nicholas Piggin
2020-04-16  2:38       ` Nicholas Piggin
2020-07-01  7:10   ` Zefan Li
2020-07-01  7:10     ` Zefan Li
2020-07-01  7:10     ` Zefan Li
2020-07-01  7:10     ` Zefan Li
2020-07-03  0:15     ` Nicholas Piggin
2020-07-03  0:15       ` Nicholas Piggin
2020-07-03  0:15       ` Nicholas Piggin
2020-07-20  2:02   ` Zefan Li
2020-07-20  2:02     ` Zefan Li
2020-07-20  2:02     ` Zefan Li
2020-07-20  2:02     ` Zefan Li
2020-07-20  2:49     ` Nicholas Piggin
2020-07-20  2:49       ` Nicholas Piggin
2020-07-20  2:49       ` Nicholas Piggin
2020-04-14  0:27 ` [PATCH v2 0/4] huge " David Rientjes
2020-04-14  0:27   ` David Rientjes
2020-04-14  0:27   ` David Rientjes
2020-04-14  0:27   ` David Rientjes
2020-04-14 12:23   ` Nicholas Piggin
2020-04-14 12:23     ` Nicholas Piggin
2020-04-14 12:23     ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1587003993.x84ylh11b2.astroid@bobo.none \
    --to=npiggin@gmail.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=hpa@zytor.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.