All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH] mm: align anon mmap for THP
@ 2019-01-11 20:10 Mike Kravetz
  2019-01-11 21:55 ` Kirill A. Shutemov
  0 siblings, 1 reply; 14+ messages in thread
From: Mike Kravetz @ 2019-01-11 20:10 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Hugh Dickins, Kirill A . Shutemov, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Boaz Harrosh, Andrew Morton,
	Mike Kravetz

At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
to get an address returned by mmap() suitably aligned for THP.  It seems
that if mmap is asking for a mapping length greater than huge page
size, it should align the returned address to huge page size.

THP alignment has already been added for DAX, shm and tmpfs.  However,
simple anon mappings does not take THP alignment into account.

I could not determine if this was ever considered or discussed in the past.

There is a maze of arch specific and independent get_unmapped_area
routines.  The patch below just modifies the common vm_unmapped_area
routine.  It may be too simplistic, but I wanted to throw out some
code while asking if something like this has ever been considered.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
---
 include/linux/huge_mm.h |  6 ++++++
 include/linux/mm.h      |  3 +++
 mm/mmap.c               | 11 +++++++++++
 3 files changed, 20 insertions(+)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 4663ee96cf59..dbff7ea7d2e7 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -117,6 +117,10 @@ static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma)
 	return false;
 }
 
+#define thp_enabled_globally()						\
+	(transparent_hugepage_flags &					\
+	 ((1<<TRANSPARENT_HUGEPAGE_FLAG) |				\
+	  (1<<TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)))
 #define transparent_hugepage_use_zero_page()				\
 	(transparent_hugepage_flags &					\
 	 (1<<TRANSPARENT_HUGEPAGE_USE_ZERO_PAGE_FLAG))
@@ -262,6 +266,8 @@ static inline bool transparent_hugepage_enabled(struct vm_area_struct *vma)
 	return false;
 }
 
+#define thp_enabled_globally() false
+
 static inline void prep_transhuge_page(struct page *page) {}
 
 #define transparent_hugepage_flags 0UL
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5411de93a363..131b0be0bbeb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2358,6 +2358,7 @@ struct vm_unmapped_area_info {
 	unsigned long align_offset;
 };
 
+extern void thp_vma_unmapped_align(struct vm_unmapped_area_info *info);
 extern unsigned long unmapped_area(struct vm_unmapped_area_info *info);
 extern unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info);
 
@@ -2373,6 +2374,8 @@ extern unsigned long unmapped_area_topdown(struct vm_unmapped_area_info *info);
 static inline unsigned long
 vm_unmapped_area(struct vm_unmapped_area_info *info)
 {
+	thp_vma_unmapped_align(info);
+
 	if (info->flags & VM_UNMAPPED_AREA_TOPDOWN)
 		return unmapped_area_topdown(info);
 	else
diff --git a/mm/mmap.c b/mm/mmap.c
index 6c04292e16a7..f9c111394052 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1864,6 +1864,17 @@ unsigned long mmap_region(struct file *file, unsigned long addr,
 	return error;
 }
 
+void thp_vma_unmapped_align(struct vm_unmapped_area_info *info)
+{
+	if (!thp_enabled_globally())
+		return;
+
+	if (info->align_mask || info->length < HPAGE_PMD_SIZE)
+		return;
+
+	info->align_mask = PAGE_MASK & (HPAGE_PMD_SIZE - 1);
+}
+
 unsigned long unmapped_area(struct vm_unmapped_area_info *info)
 {
 	/*
-- 
2.17.2


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-11 20:10 [RFC PATCH] mm: align anon mmap for THP Mike Kravetz
@ 2019-01-11 21:55 ` Kirill A. Shutemov
  2019-01-11 23:28   ` Mike Kravetz
  0 siblings, 1 reply; 14+ messages in thread
From: Kirill A. Shutemov @ 2019-01-11 21:55 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: linux-mm, linux-kernel, Hugh Dickins, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Boaz Harrosh, Andrew Morton

On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote:
> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
> to get an address returned by mmap() suitably aligned for THP.  It seems
> that if mmap is asking for a mapping length greater than huge page
> size, it should align the returned address to huge page size.
> 
> THP alignment has already been added for DAX, shm and tmpfs.  However,
> simple anon mappings does not take THP alignment into account.

In general case, when no hint address provided, all anonymous memory
requests have tendency to clamp into a single bigger VMA and get you
better chance having THP, even if a single allocation is too small.
This patch will *reduce* the effect and I guess the net result will be
net negative.

The patch also effectively reduces bit available for ASLR and increases
address space fragmentation (increases number of VMA and therefore page
fault cost).

I think any change in this direction has to be way more data driven.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-11 21:55 ` Kirill A. Shutemov
@ 2019-01-11 23:28   ` Mike Kravetz
  2019-01-14 13:50     ` Kirill A. Shutemov
  2019-01-14 15:35     ` Steven Sistare
  0 siblings, 2 replies; 14+ messages in thread
From: Mike Kravetz @ 2019-01-11 23:28 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: linux-mm, linux-kernel, Hugh Dickins, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Boaz Harrosh, Andrew Morton

On 1/11/19 1:55 PM, Kirill A. Shutemov wrote:
> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote:
>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
>> to get an address returned by mmap() suitably aligned for THP.  It seems
>> that if mmap is asking for a mapping length greater than huge page
>> size, it should align the returned address to huge page size.
>>
>> THP alignment has already been added for DAX, shm and tmpfs.  However,
>> simple anon mappings does not take THP alignment into account.
> 
> In general case, when no hint address provided, all anonymous memory
> requests have tendency to clamp into a single bigger VMA and get you
> better chance having THP, even if a single allocation is too small.
> This patch will *reduce* the effect and I guess the net result will be
> net negative.

Ah!  I forgot about combining like mappings into a single vma.  Increasing
alignment could/would prevent this.

> The patch also effectively reduces bit available for ASLR and increases
> address space fragmentation (increases number of VMA and therefore page
> fault cost).
> 
> I think any change in this direction has to be way more data driven.

Ok, I just wanted to ask the question.  I've seen application code doing
the 'mmap sufficiently large area' then unmap to get desired alignment
trick.  Was wondering if there was something we could do to help.

Thanks
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-11 23:28   ` Mike Kravetz
@ 2019-01-14 13:50     ` Kirill A. Shutemov
  2019-01-14 16:29       ` Harrosh, Boaz
  2019-01-14 15:35     ` Steven Sistare
  1 sibling, 1 reply; 14+ messages in thread
From: Kirill A. Shutemov @ 2019-01-14 13:50 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Hugh Dickins,
	Michal Hocko, Dan Williams, Matthew Wilcox, Toshi Kani,
	Boaz Harrosh, Andrew Morton

On Fri, Jan 11, 2019 at 03:28:37PM -0800, Mike Kravetz wrote:
> Ok, I just wanted to ask the question.  I've seen application code doing
> the 'mmap sufficiently large area' then unmap to get desired alignment
> trick.  Was wondering if there was something we could do to help.

Application may want to get aligned allocation for different reasons.
It should be okay for userspace to ask for size + (alignment - PAGE_SIZE)
and then round up the address to get the alignment. We basically do the
same on kernel side.

For THP, I believe, kernel already does The Right Thing™ for most users.
User still may want to get speific range as THP (to avoid false sharing or
something). But still I believe userspace has all required tools to get it
right.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-11 23:28   ` Mike Kravetz
  2019-01-14 13:50     ` Kirill A. Shutemov
@ 2019-01-14 15:35     ` Steven Sistare
  2019-01-14 16:40       ` Harrosh, Boaz
  2019-01-14 18:54       ` Mike Kravetz
  1 sibling, 2 replies; 14+ messages in thread
From: Steven Sistare @ 2019-01-14 15:35 UTC (permalink / raw)
  To: Mike Kravetz, Kirill A. Shutemov, linux_lkml_grp
  Cc: linux-mm, linux-kernel, Hugh Dickins, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Boaz Harrosh, Andrew Morton

On 1/11/2019 6:28 PM, Mike Kravetz wrote:
> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote:
>> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote:
>>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
>>> to get an address returned by mmap() suitably aligned for THP.  It seems
>>> that if mmap is asking for a mapping length greater than huge page
>>> size, it should align the returned address to huge page size.

A better heuristic would be to return an aligned address if the length
is a multiple of the huge page size.  The gap (if any) between the end of
the previous VMA and the start of this VMA would be filled by subsequent
smaller mmap requests.  The new behavior would need to become part of the
mmap interface definition so apps can rely on it and omit their hoop-jumping
code.

Personally I would like to see a new MAP_ALIGN flag and treat the addr
argument as the alignment (like Solaris), but I am told that adding flags
is problematic because old kernels accept undefined flag bits from userland
without complaint, so their behavior would change.

- Steve

>>> THP alignment has already been added for DAX, shm and tmpfs.  However,
>>> simple anon mappings does not take THP alignment into account.
>>
>> In general case, when no hint address provided, all anonymous memory
>> requests have tendency to clamp into a single bigger VMA and get you
>> better chance having THP, even if a single allocation is too small.
>> This patch will *reduce* the effect and I guess the net result will be
>> net negative.
> 
> Ah!  I forgot about combining like mappings into a single vma.  Increasing
> alignment could/would prevent this.
> 
>> The patch also effectively reduces bit available for ASLR and increases
>> address space fragmentation (increases number of VMA and therefore page
>> fault cost).
>>
>> I think any change in this direction has to be way more data driven.
> 
> Ok, I just wanted to ask the question.  I've seen application code doing
> the 'mmap sufficiently large area' then unmap to get desired alignment
> trick.  Was wondering if there was something we could do to help.
> 
> Thanks
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-14 13:50     ` Kirill A. Shutemov
@ 2019-01-14 16:29       ` Harrosh, Boaz
  2019-01-14 16:40         ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Harrosh, Boaz @ 2019-01-14 16:29 UTC (permalink / raw)
  To: Kirill A. Shutemov, Mike Kravetz
  Cc: Kirill A. Shutemov, linux-mm, linux-kernel, Hugh Dickins,
	Michal Hocko, Dan Williams, Matthew Wilcox, Toshi Kani,
	Andrew Morton

 Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Fri, Jan 11, 2019 at 03:28:37PM -0800, Mike Kravetz wrote:
>> Ok, I just wanted to ask the question.  I've seen application code doing
>> the 'mmap sufficiently large area' then unmap to get desired alignment
>> trick.  Was wondering if there was something we could do to help.
>
> Application may want to get aligned allocation for different reasons.
> It should be okay for userspace to ask for size + (alignment - PAGE_SIZE)
> and then round up the address to get the alignment. We basically do the
> same on kernel side.
>

This is what we do and will need to keep doing for old Kernels.
But it is a pity that those holes can not be reused for small maps, and most important
that we cannot have "mapping holes" around the mapping that catch memory
overruns

> For THP, I believe, kernel already does The Right Thing™ for most users.
> User still may want to get speific range as THP (to avoid false sharing or
> something).

I'm an OK Kernel programmer.  But I was not able to create a HugePage mapping
against /dev/shm/ in a reliable way. I think it only worked on Fedora 28/29
but not on any other distro/version. (MMAP_HUGE)

We run with our own compiled Kernel on various distros, THP is configured
in but mmap against /dev/shm/ never gives me Huge pages. Does it only
work with unanimous mmap ? (I think it is mount dependent which is not
in the application control)

Just a rant. One day I will figure this out. Meanwhile I do this ugly
user mode aligns the pointers, and try to sleep at night ...

> But still I believe userspace has all required tools to get it
> right.
>

I still wish that if I ask for an mmap size aligned on 2M that I would automatically
get a 2M pointer. I don't see how the system can benefit from having both ends
of the VMA cross Huge page boundary.

> --
> Kirill A. Shutemov

Thanks
Boaz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-14 16:29       ` Harrosh, Boaz
@ 2019-01-14 16:40         ` Michal Hocko
  2019-01-14 16:54           ` Harrosh, Boaz
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2019-01-14 16:40 UTC (permalink / raw)
  To: Harrosh, Boaz
  Cc: Kirill A. Shutemov, Mike Kravetz, Kirill A. Shutemov, linux-mm,
	linux-kernel, Hugh Dickins, Dan Williams, Matthew Wilcox,
	Toshi Kani, Andrew Morton

On Mon 14-01-19 16:29:29, Harrosh, Boaz wrote:
>  Kirill A. Shutemov <kirill@shutemov.name> wrote:
> > On Fri, Jan 11, 2019 at 03:28:37PM -0800, Mike Kravetz wrote:
> >> Ok, I just wanted to ask the question.  I've seen application code doing
> >> the 'mmap sufficiently large area' then unmap to get desired alignment
> >> trick.  Was wondering if there was something we could do to help.
> >
> > Application may want to get aligned allocation for different reasons.
> > It should be okay for userspace to ask for size + (alignment - PAGE_SIZE)
> > and then round up the address to get the alignment. We basically do the
> > same on kernel side.
> >
> 
> This is what we do and will need to keep doing for old Kernels.
> But it is a pity that those holes can not be reused for small maps, and most important
> that we cannot have "mapping holes" around the mapping that catch memory
> overruns

What does prevent you from mapping a larger area and MAP_FIXED,
PROT_NONE over it to get the protection?
 
> > For THP, I believe, kernel already does The Right Thing™ for most users.
> > User still may want to get speific range as THP (to avoid false sharing or
> > something).
> 
> I'm an OK Kernel programmer.  But I was not able to create a HugePage mapping
> against /dev/shm/ in a reliable way. I think it only worked on Fedora 28/29
> but not on any other distro/version. (MMAP_HUGE)

Are you mixing hugetlb rather than THP?

> We run with our own compiled Kernel on various distros, THP is configured
> in but mmap against /dev/shm/ never gives me Huge pages. Does it only
> work with unanimous mmap ? (I think it is mount dependent which is not
> in the application control)

If you are talking about THP then you have to enable huge pages for the
mapping AFAIR.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-14 15:35     ` Steven Sistare
@ 2019-01-14 16:40       ` Harrosh, Boaz
  2019-01-14 18:54       ` Mike Kravetz
  1 sibling, 0 replies; 14+ messages in thread
From: Harrosh, Boaz @ 2019-01-14 16:40 UTC (permalink / raw)
  To: Steven Sistare, Mike Kravetz, Kirill A. Shutemov, linux_lkml_grp
  Cc: linux-mm, linux-kernel, Hugh Dickins, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Andrew Morton

Sistare <steven.sistare@oracle.com> wrote:
> 
> A better heuristic would be to return an aligned address if the length
> is a multiple of the huge page size.  The gap (if any) between the end of
> the previous VMA and the start of this VMA would be filled by subsequent
> smaller mmap requests.  The new behavior would need to become part of the
> mmap interface definition so apps can rely on it and omit their hoop-jumping
> code.
> 

Yes that was my original request

> Personally I would like to see a new MAP_ALIGN flag and treat the addr
> argument as the alignment (like Solaris), 

Yes I would like that. So app can know when to do the old thing ...

> but I am told that adding flags
> is problematic because old kernels accept undefined flag bits from userland
> without complaint, so their behavior would change.
> 

There is already a mechanism in place since 4.14 I think or even before on
how to add new MMAP_XXX flags. This is done by combining MMAP_SHARED & MMAP_PRIVATE
flags together with the new set of flags. If there are present new flags this is allowed and means
requesting some new flag. Else and in old Kernels the combination above is not allowed in POSIX
and would fail in old Kernels.

Cheers
Boaz

> - Steve

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-14 16:40         ` Michal Hocko
@ 2019-01-14 16:54           ` Harrosh, Boaz
  2019-01-14 18:02             ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Harrosh, Boaz @ 2019-01-14 16:54 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Kirill A. Shutemov, Mike Kravetz, Kirill A. Shutemov, linux-mm,
	linux-kernel, Hugh Dickins, Dan Williams, Matthew Wilcox,
	Toshi Kani, Andrew Morton

Michal Hocko <mhocko@kernel.org> wrote:

<>
> What does prevent you from mapping a larger area and MAP_FIXED,
> PROT_NONE over it to get the protection?

Yes Thanks I will try. That's good.

>> > For THP, I believe, kernel already does The Right Thing™ for most users.
>> > User still may want to get speific range as THP (to avoid false sharing or
>> > something).
>>
>> I'm an OK Kernel programmer.  But I was not able to create a HugePage mapping
>> against /dev/shm/ in a reliable way. I think it only worked on Fedora 28/29
>> but not on any other distro/version. (MMAP_HUGE)
>
> Are you mixing hugetlb rather than THP?

Probably. I was looking for the easiest way to get my mmap based memory allocations
to be 2M based instead of 4k. to get better IO characteristics across the Kernel.
But I kept getting the 4k pointers. (Can't really remember all the things I tried.)

>> We run with our own compiled Kernel on various distros, THP is configured
>> in but mmap against /dev/shm/ never gives me Huge pages. Does it only
>> work with unanimous mmap ? (I think it is mount dependent which is not
>> in the application control)
>
> If you are talking about THP then you have to enable huge pages for the
> mapping AFAIR.

This is exactly what I was looking to achieve but was not able to do. Most probably
a stupid omission on my part, but just to show that it is not that trivial and strait
out-of-the-man-page way to do it.  (Would love a code snippet if you ever wrote one?)

> --
> Michal Hocko
> SUSE Labs

Thanks man
Boaz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-14 16:54           ` Harrosh, Boaz
@ 2019-01-14 18:02             ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2019-01-14 18:02 UTC (permalink / raw)
  To: Harrosh, Boaz
  Cc: Kirill A. Shutemov, Mike Kravetz, Kirill A. Shutemov, linux-mm,
	linux-kernel, Hugh Dickins, Dan Williams, Matthew Wilcox,
	Toshi Kani, Andrew Morton

On Mon 14-01-19 16:54:02, Harrosh, Boaz wrote:
> Michal Hocko <mhocko@kernel.org> wrote:
[...]
> >> We run with our own compiled Kernel on various distros, THP is configured
> >> in but mmap against /dev/shm/ never gives me Huge pages. Does it only
> >> work with unanimous mmap ? (I think it is mount dependent which is not
> >> in the application control)
> >
> > If you are talking about THP then you have to enable huge pages for the
> > mapping AFAIR.
> 
> This is exactly what I was looking to achieve but was not able to do. Most probably
> a stupid omission on my part, but just to show that it is not that trivial and strait
> out-of-the-man-page way to do it.  (Would love a code snippet if you ever wrote one?)

Have you tried
mount -t tmpfs -o huge=always none $MNT_POINT ?

It is true that man pages are silent about this but at least Documentation/admin-guide/mm/transhuge.rst
has an information. Time to send a patch to man pages I would say.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-14 15:35     ` Steven Sistare
  2019-01-14 16:40       ` Harrosh, Boaz
@ 2019-01-14 18:54       ` Mike Kravetz
  2019-01-14 19:26         ` Steven Sistare
  2019-01-15  8:24         ` Kirill A. Shutemov
  1 sibling, 2 replies; 14+ messages in thread
From: Mike Kravetz @ 2019-01-14 18:54 UTC (permalink / raw)
  To: Steven Sistare, Kirill A. Shutemov, linux_lkml_grp
  Cc: linux-mm, linux-kernel, Hugh Dickins, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Boaz Harrosh, Andrew Morton

On 1/14/19 7:35 AM, Steven Sistare wrote:
> On 1/11/2019 6:28 PM, Mike Kravetz wrote:
>> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote:
>>> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote:
>>>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
>>>> to get an address returned by mmap() suitably aligned for THP.  It seems
>>>> that if mmap is asking for a mapping length greater than huge page
>>>> size, it should align the returned address to huge page size.
> 
> A better heuristic would be to return an aligned address if the length
> is a multiple of the huge page size.  The gap (if any) between the end of
> the previous VMA and the start of this VMA would be filled by subsequent
> smaller mmap requests.  The new behavior would need to become part of the
> mmap interface definition so apps can rely on it and omit their hoop-jumping
> code.

Yes, the heuristic really should be 'length is a multiple of the huge page
size'.  As you mention, this would still leave gaps.  I need to look closer
but this may not be any worse than the trick of mapping an area with rounded
up length and then unmapping pages at the beginning.

When I sent this out, the thought in the back of my mind was that this doesn't
really matter unless there is some type of alignment guarantee.  Otherwise,
user space code needs continue employing their code to check/force alignment.
Making matters somewhat worse is that I do not believe there is C interface to
query huge page size.  I thought there was discussion about adding one, but I
can not find it.

> Personally I would like to see a new MAP_ALIGN flag and treat the addr
> argument as the alignment (like Solaris), but I am told that adding flags
> is problematic because old kernels accept undefined flag bits from userland
> without complaint, so their behavior would change.

Well, a flag would clearly define desired behavior.

As others have been mentioned, there are mechanisms in place that allow user
space code to get the alignment it wants.  However, it is at the expense of
an additional system call or two.  Perhaps the question is, "Is it worth
defining new behavior to eliminate this overhead?".

One other thing to consider is that at mmap time, we likely do not know if
the vma will/can use THP.  We would know if system wide THP configuration
is set to never or always.  However, I 'think' the default for most distros
is madvize.  Therefore, it is not until a subsequent madvise call that we
know THP will be employed.  If the application code will need to make this
separate madvise call, then perhaps it is not too much to expect that it
take explicit action to optimally align the mapping.

-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-14 18:54       ` Mike Kravetz
@ 2019-01-14 19:26         ` Steven Sistare
  2019-01-15  8:24         ` Kirill A. Shutemov
  1 sibling, 0 replies; 14+ messages in thread
From: Steven Sistare @ 2019-01-14 19:26 UTC (permalink / raw)
  To: Mike Kravetz, Kirill A. Shutemov, linux_lkml_grp
  Cc: linux-mm, linux-kernel, Hugh Dickins, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Boaz Harrosh, Andrew Morton

On 1/14/2019 1:54 PM, Mike Kravetz wrote:
> On 1/14/19 7:35 AM, Steven Sistare wrote:
>> On 1/11/2019 6:28 PM, Mike Kravetz wrote:
>>> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote:
>>>> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote:
>>>>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
>>>>> to get an address returned by mmap() suitably aligned for THP.  It seems
>>>>> that if mmap is asking for a mapping length greater than huge page
>>>>> size, it should align the returned address to huge page size.
>>
>> A better heuristic would be to return an aligned address if the length
>> is a multiple of the huge page size.  The gap (if any) between the end of
>> the previous VMA and the start of this VMA would be filled by subsequent
>> smaller mmap requests.  The new behavior would need to become part of the
>> mmap interface definition so apps can rely on it and omit their hoop-jumping
>> code.
> 
> Yes, the heuristic really should be 'length is a multiple of the huge page
> size'.  As you mention, this would still leave gaps.  I need to look closer
> but this may not be any worse than the trick of mapping an area with rounded
> up length and then unmapping pages at the beginning.
> 
> When I sent this out, the thought in the back of my mind was that this doesn't
> really matter unless there is some type of alignment guarantee.  Otherwise,
> user space code needs continue employing their code to check/force alignment.
> Making matters somewhat worse is that I do not believe there is C interface to
> query huge page size.  I thought there was discussion about adding one, but I
> can not find it.

Right. Solaris provides getpagesizes().

>> Personally I would like to see a new MAP_ALIGN flag and treat the addr
>> argument as the alignment (like Solaris), but I am told that adding flags
>> is problematic because old kernels accept undefined flag bits from userland
>> without complaint, so their behavior would change.
> 
> Well, a flag would clearly define desired behavior.
> 
> As others have been mentioned, there are mechanisms in place that allow user
> space code to get the alignment it wants.  However, it is at the expense of
> an additional system call or two.  Perhaps the question is, "Is it worth
> defining new behavior to eliminate this overhead?".
> 
> One other thing to consider is that at mmap time, we likely do not know if
> the vma will/can use THP.  We would know if system wide THP configuration
> is set to never or always.  However, I 'think' the default for most distros
> is madvize.  Therefore, it is not until a subsequent madvise call that we
> know THP will be employed.  If the application code will need to make this
> separate madvise call, then perhaps it is not too much to expect that it
> take explicit action to optimally align the mapping.

True.  It is annoying to write the extra code, but the power user will do it.

The heuristic alignment would primarily benefit applications that are not as
carefully optimized.

- Steve


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-14 18:54       ` Mike Kravetz
  2019-01-14 19:26         ` Steven Sistare
@ 2019-01-15  8:24         ` Kirill A. Shutemov
  2019-01-15 18:08           ` Mike Kravetz
  1 sibling, 1 reply; 14+ messages in thread
From: Kirill A. Shutemov @ 2019-01-15  8:24 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Steven Sistare, Kirill A. Shutemov, linux_lkml_grp, linux-mm,
	linux-kernel, Hugh Dickins, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Boaz Harrosh, Andrew Morton

On Mon, Jan 14, 2019 at 10:54:45AM -0800, Mike Kravetz wrote:
> On 1/14/19 7:35 AM, Steven Sistare wrote:
> > On 1/11/2019 6:28 PM, Mike Kravetz wrote:
> >> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote:
> >>> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote:
> >>>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
> >>>> to get an address returned by mmap() suitably aligned for THP.  It seems
> >>>> that if mmap is asking for a mapping length greater than huge page
> >>>> size, it should align the returned address to huge page size.
> > 
> > A better heuristic would be to return an aligned address if the length
> > is a multiple of the huge page size.  The gap (if any) between the end of
> > the previous VMA and the start of this VMA would be filled by subsequent
> > smaller mmap requests.  The new behavior would need to become part of the
> > mmap interface definition so apps can rely on it and omit their hoop-jumping
> > code.
> 
> Yes, the heuristic really should be 'length is a multiple of the huge page
> size'.  As you mention, this would still leave gaps.  I need to look closer
> but this may not be any worse than the trick of mapping an area with rounded
> up length and then unmapping pages at the beginning.

The question why is it any better. Virtual address space is generally
cheap, additional VMA maybe more signficiant due to find_vma() overhead.

And you don't *need* to unmap anything. Just use alinged pointer.

> 
> When I sent this out, the thought in the back of my mind was that this doesn't
> really matter unless there is some type of alignment guarantee.  Otherwise,
> user space code needs continue employing their code to check/force alignment.
> Making matters somewhat worse is that I do not believe there is C interface to
> query huge page size.  I thought there was discussion about adding one, but I
> can not find it.

We have posix_memalign(3).


-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC PATCH] mm: align anon mmap for THP
  2019-01-15  8:24         ` Kirill A. Shutemov
@ 2019-01-15 18:08           ` Mike Kravetz
  0 siblings, 0 replies; 14+ messages in thread
From: Mike Kravetz @ 2019-01-15 18:08 UTC (permalink / raw)
  To: Kirill A. Shutemov
  Cc: Steven Sistare, Kirill A. Shutemov, linux_lkml_grp, linux-mm,
	linux-kernel, Hugh Dickins, Michal Hocko, Dan Williams,
	Matthew Wilcox, Toshi Kani, Boaz Harrosh, Andrew Morton

On 1/15/19 12:24 AM, Kirill A. Shutemov wrote:
> On Mon, Jan 14, 2019 at 10:54:45AM -0800, Mike Kravetz wrote:
>> On 1/14/19 7:35 AM, Steven Sistare wrote:
>>> On 1/11/2019 6:28 PM, Mike Kravetz wrote:
>>>> On 1/11/19 1:55 PM, Kirill A. Shutemov wrote:
>>>>> On Fri, Jan 11, 2019 at 08:10:03PM +0000, Mike Kravetz wrote:
>>>>>> At LPC last year, Boaz Harrosh asked why he had to 'jump through hoops'
>>>>>> to get an address returned by mmap() suitably aligned for THP.  It seems
>>>>>> that if mmap is asking for a mapping length greater than huge page
>>>>>> size, it should align the returned address to huge page size.
>>>
>>> A better heuristic would be to return an aligned address if the length
>>> is a multiple of the huge page size.  The gap (if any) between the end of
>>> the previous VMA and the start of this VMA would be filled by subsequent
>>> smaller mmap requests.  The new behavior would need to become part of the
>>> mmap interface definition so apps can rely on it and omit their hoop-jumping
>>> code.
>>
>> Yes, the heuristic really should be 'length is a multiple of the huge page
>> size'.  As you mention, this would still leave gaps.  I need to look closer
>> but this may not be any worse than the trick of mapping an area with rounded
>> up length and then unmapping pages at the beginning.
> 
> The question why is it any better. Virtual address space is generally
> cheap, additional VMA maybe more signficiant due to find_vma() overhead.
> 
> And you don't *need* to unmap anything. Just use alinged pointer.

You are correct, it is not any better.

I know you do not need to unmap anything.  However, I believe people are
writing code which does this today.  For example, qemu's qemu_ram_mmap()
utility routine does this, but it may have other reasons for creating
the gap.

Thanks for all of the feedback.  I do not think there is anything we can
or should do in this area.  As Steve said, 'power users' who want to get
optimal THP usage will write the code to make that happen.
-- 
Mike Kravetz

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-01-15 18:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-11 20:10 [RFC PATCH] mm: align anon mmap for THP Mike Kravetz
2019-01-11 21:55 ` Kirill A. Shutemov
2019-01-11 23:28   ` Mike Kravetz
2019-01-14 13:50     ` Kirill A. Shutemov
2019-01-14 16:29       ` Harrosh, Boaz
2019-01-14 16:40         ` Michal Hocko
2019-01-14 16:54           ` Harrosh, Boaz
2019-01-14 18:02             ` Michal Hocko
2019-01-14 15:35     ` Steven Sistare
2019-01-14 16:40       ` Harrosh, Boaz
2019-01-14 18:54       ` Mike Kravetz
2019-01-14 19:26         ` Steven Sistare
2019-01-15  8:24         ` Kirill A. Shutemov
2019-01-15 18:08           ` Mike Kravetz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.