linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack
@ 2023-12-21  6:59 Yang Shi
  2023-12-21  6:59 ` [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE Yang Shi
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Yang Shi @ 2023-12-21  6:59 UTC (permalink / raw)
  To: oliver.sang, riel, fengwei.yin, willy, cl, ying.huang, akpm
  Cc: shy828301, linux-kernel, linux-mm

From: Yang Shi <yang@os.amperecomputing.com>

We avoid allocating THP for temporary stack, even tnough
khugepaged_enter_vma() is called for stack VMAs, it actualy returns
false.  So no need to call it in the first place at all.

Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
 mm/mmap.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
index b78e83d351d2..2ff79b1d1564 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
 		}
 	}
 	anon_vma_unlock_write(vma->anon_vma);
-	khugepaged_enter_vma(vma, vma->vm_flags);
 	mas_destroy(&mas);
 	validate_mm(mm);
 	return error;
@@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
 		}
 	}
 	anon_vma_unlock_write(vma->anon_vma);
-	khugepaged_enter_vma(vma, vma->vm_flags);
 	mas_destroy(&mas);
 	validate_mm(mm);
 	return error;
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2023-12-21  6:59 [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack Yang Shi
@ 2023-12-21  6:59 ` Yang Shi
  2024-01-10  1:36   ` Yin Fengwei
  2024-01-31  7:53   ` Florian Weimer
  2024-01-10  1:35 ` [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack Yin Fengwei
  2024-01-15  5:50 ` Huang, Ying
  2 siblings, 2 replies; 13+ messages in thread
From: Yang Shi @ 2023-12-21  6:59 UTC (permalink / raw)
  To: oliver.sang, riel, fengwei.yin, willy, cl, ying.huang, akpm
  Cc: shy828301, linux-kernel, linux-mm

From: Yang Shi <yang@os.amperecomputing.com>

The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
boundaries") incured regression for stress-ng pthread benchmark [1].
It is because THP get allocated to pthread's stack area much more possible
than before.  Pthread's stack area is allocated by mmap without VM_GROWSDOWN
or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.

The MAP_STACK flag is used to mark the stack area, but it is a no-op on
Linux.  Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
THP for such stack area.

With this change the stack area looks like:

fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
Size:               8192 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  12 kB
Pss:                  12 kB
Pss_Dirty:            12 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        12 kB
Referenced:           12 kB
Anonymous:            12 kB
KSM:                   0 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:           0
VmFlags: rd wr mr mw me ac nh

The "nh" flag is set.

[1] https://lore.kernel.org/linux-mm/202312192310.56367035-oliver.sang@intel.com/

Reported-by: kernel test robot <oliver.sang@intel.com>
Tested-by: Oliver Sang <oliver.sang@intel.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
---
 include/linux/mman.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/mman.h b/include/linux/mman.h
index 40d94411d492..dc7048824be8 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
 	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
 	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
 	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
+	       _calc_vm_trans(flags, MAP_STACK,	     VM_NOHUGEPAGE) |
 	       arch_calc_vm_flag_bits(flags);
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack
  2023-12-21  6:59 [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack Yang Shi
  2023-12-21  6:59 ` [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE Yang Shi
@ 2024-01-10  1:35 ` Yin Fengwei
  2024-01-15  5:50 ` Huang, Ying
  2 siblings, 0 replies; 13+ messages in thread
From: Yin Fengwei @ 2024-01-10  1:35 UTC (permalink / raw)
  To: Yang Shi, oliver.sang, riel, willy, cl, ying.huang, akpm
  Cc: linux-kernel, linux-mm



On 2023/12/21 14:59, Yang Shi wrote:
> From: Yang Shi <yang@os.amperecomputing.com>
> 
> We avoid allocating THP for temporary stack, even tnough
> khugepaged_enter_vma() is called for stack VMAs, it actualy returns
> false.  So no need to call it in the first place at all.
> 
> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>

> ---
>   mm/mmap.c | 2 --
>   1 file changed, 2 deletions(-)
> 
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b78e83d351d2..2ff79b1d1564 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>   		}
>   	}
>   	anon_vma_unlock_write(vma->anon_vma);
> -	khugepaged_enter_vma(vma, vma->vm_flags);
>   	mas_destroy(&mas);
>   	validate_mm(mm);
>   	return error;
> @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
>   		}
>   	}
>   	anon_vma_unlock_write(vma->anon_vma);
> -	khugepaged_enter_vma(vma, vma->vm_flags);
>   	mas_destroy(&mas);
>   	validate_mm(mm);
>   	return error;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2023-12-21  6:59 ` [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE Yang Shi
@ 2024-01-10  1:36   ` Yin Fengwei
  2024-01-16 19:22     ` Zach O'Keefe
  2024-01-31  7:53   ` Florian Weimer
  1 sibling, 1 reply; 13+ messages in thread
From: Yin Fengwei @ 2024-01-10  1:36 UTC (permalink / raw)
  To: Yang Shi, oliver.sang, riel, willy, cl, ying.huang, akpm
  Cc: linux-kernel, linux-mm



On 2023/12/21 14:59, Yang Shi wrote:
> From: Yang Shi <yang@os.amperecomputing.com>
> 
> The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> boundaries") incured regression for stress-ng pthread benchmark [1].
> It is because THP get allocated to pthread's stack area much more possible
> than before.  Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> 
> The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> Linux.  Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> THP for such stack area.
> 
> With this change the stack area looks like:
> 
> fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
> Size:               8192 kB
> KernelPageSize:        4 kB
> MMUPageSize:           4 kB
> Rss:                  12 kB
> Pss:                  12 kB
> Pss_Dirty:            12 kB
> Shared_Clean:          0 kB
> Shared_Dirty:          0 kB
> Private_Clean:         0 kB
> Private_Dirty:        12 kB
> Referenced:           12 kB
> Anonymous:            12 kB
> KSM:                   0 kB
> LazyFree:              0 kB
> AnonHugePages:         0 kB
> ShmemPmdMapped:        0 kB
> FilePmdMapped:         0 kB
> Shared_Hugetlb:        0 kB
> Private_Hugetlb:       0 kB
> Swap:                  0 kB
> SwapPss:               0 kB
> Locked:                0 kB
> THPeligible:           0
> VmFlags: rd wr mr mw me ac nh
> 
> The "nh" flag is set.
> 
> [1] https://lore.kernel.org/linux-mm/202312192310.56367035-oliver.sang@intel.com/
> 
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Tested-by: Oliver Sang <oliver.sang@intel.com>
> Cc: Yin Fengwei <fengwei.yin@intel.com>
> Cc: Rik van Riel <riel@surriel.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: Huang, Ying <ying.huang@intel.com>
> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>

Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>

> ---
>   include/linux/mman.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 40d94411d492..dc7048824be8 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
>   	return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
>   	       _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
>   	       _calc_vm_trans(flags, MAP_SYNC,	     VM_SYNC      ) |
> +	       _calc_vm_trans(flags, MAP_STACK,	     VM_NOHUGEPAGE) |
>   	       arch_calc_vm_flag_bits(flags);
>   }
>   

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack
  2023-12-21  6:59 [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack Yang Shi
  2023-12-21  6:59 ` [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE Yang Shi
  2024-01-10  1:35 ` [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack Yin Fengwei
@ 2024-01-15  5:50 ` Huang, Ying
  2024-01-16 21:39   ` Yang Shi
  2 siblings, 1 reply; 13+ messages in thread
From: Huang, Ying @ 2024-01-15  5:50 UTC (permalink / raw)
  To: Yang Shi
  Cc: oliver.sang, riel, fengwei.yin, willy, cl, akpm, linux-kernel, linux-mm

Yang Shi <shy828301@gmail.com> writes:

> From: Yang Shi <yang@os.amperecomputing.com>
>
> We avoid allocating THP for temporary stack, even tnough
                                                    ~~~~~~
                                                    though?

--
Best Regards,
Huang, Ying

> khugepaged_enter_vma() is called for stack VMAs, it actualy returns
> false.  So no need to call it in the first place at all.
>
> Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> ---
>  mm/mmap.c | 2 --
>  1 file changed, 2 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b78e83d351d2..2ff79b1d1564 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  		}
>  	}
>  	anon_vma_unlock_write(vma->anon_vma);
> -	khugepaged_enter_vma(vma, vma->vm_flags);
>  	mas_destroy(&mas);
>  	validate_mm(mm);
>  	return error;
> @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
>  		}
>  	}
>  	anon_vma_unlock_write(vma->anon_vma);
> -	khugepaged_enter_vma(vma, vma->vm_flags);
>  	mas_destroy(&mas);
>  	validate_mm(mm);
>  	return error;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2024-01-10  1:36   ` Yin Fengwei
@ 2024-01-16 19:22     ` Zach O'Keefe
  2024-01-16 20:57       ` Yang Shi
  0 siblings, 1 reply; 13+ messages in thread
From: Zach O'Keefe @ 2024-01-16 19:22 UTC (permalink / raw)
  To: Yin Fengwei
  Cc: Yang Shi, oliver.sang, riel, willy, cl, ying.huang, akpm,
	linux-kernel, linux-mm

Thanks Yang,

Should this be marked for stable? Given how easily it is for pthreads
to allocate hugepages w/o this change, it can easily cause memory
bloat on larger systems and/or users with high thread counts. I don't
think that will be welcomed, and seems odd that just 6.7 should suffer
this.

Thanks,
Zach

On Tue, Jan 9, 2024 at 5:36 PM Yin Fengwei <fengwei.yin@intel.com> wrote:
>
>
>
> On 2023/12/21 14:59, Yang Shi wrote:
> > From: Yang Shi <yang@os.amperecomputing.com>
> >
> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > boundaries") incured regression for stress-ng pthread benchmark [1].
> > It is because THP get allocated to pthread's stack area much more possible
> > than before.  Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> >
> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> > Linux.  Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> > THP for such stack area.
> >
> > With this change the stack area looks like:
> >
> > fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
> > Size:               8192 kB
> > KernelPageSize:        4 kB
> > MMUPageSize:           4 kB
> > Rss:                  12 kB
> > Pss:                  12 kB
> > Pss_Dirty:            12 kB
> > Shared_Clean:          0 kB
> > Shared_Dirty:          0 kB
> > Private_Clean:         0 kB
> > Private_Dirty:        12 kB
> > Referenced:           12 kB
> > Anonymous:            12 kB
> > KSM:                   0 kB
> > LazyFree:              0 kB
> > AnonHugePages:         0 kB
> > ShmemPmdMapped:        0 kB
> > FilePmdMapped:         0 kB
> > Shared_Hugetlb:        0 kB
> > Private_Hugetlb:       0 kB
> > Swap:                  0 kB
> > SwapPss:               0 kB
> > Locked:                0 kB
> > THPeligible:           0
> > VmFlags: rd wr mr mw me ac nh
> >
> > The "nh" flag is set.
> >
> > [1] https://lore.kernel.org/linux-mm/202312192310.56367035-oliver.sang@intel.com/
> >
> > Reported-by: kernel test robot <oliver.sang@intel.com>
> > Tested-by: Oliver Sang <oliver.sang@intel.com>
> > Cc: Yin Fengwei <fengwei.yin@intel.com>
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: Christopher Lameter <cl@linux.com>
> > Cc: Huang, Ying <ying.huang@intel.com>
> > Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
>
> Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
>
> > ---
> >   include/linux/mman.h | 1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > index 40d94411d492..dc7048824be8 100644
> > --- a/include/linux/mman.h
> > +++ b/include/linux/mman.h
> > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> >       return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> >              _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> >              _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> > +            _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
> >              arch_calc_vm_flag_bits(flags);
> >   }
> >
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2024-01-16 19:22     ` Zach O'Keefe
@ 2024-01-16 20:57       ` Yang Shi
  2024-01-16 21:31         ` Andrew Morton
  0 siblings, 1 reply; 13+ messages in thread
From: Yang Shi @ 2024-01-16 20:57 UTC (permalink / raw)
  To: Zach O'Keefe
  Cc: Yin Fengwei, oliver.sang, riel, willy, cl, ying.huang, akpm,
	linux-kernel, linux-mm

On Tue, Jan 16, 2024 at 11:22 AM Zach O'Keefe <zokeefe@google.com> wrote:
>
> Thanks Yang,
>
> Should this be marked for stable? Given how easily it is for pthreads
> to allocate hugepages w/o this change, it can easily cause memory
> bloat on larger systems and/or users with high thread counts. I don't
> think that will be welcomed, and seems odd that just 6.7 should suffer
> this.

Thanks for the suggestion, fine to me.

>
> Thanks,
> Zach
>
> On Tue, Jan 9, 2024 at 5:36 PM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >
> >
> >
> > On 2023/12/21 14:59, Yang Shi wrote:
> > > From: Yang Shi <yang@os.amperecomputing.com>
> > >
> > > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > > boundaries") incured regression for stress-ng pthread benchmark [1].
> > > It is because THP get allocated to pthread's stack area much more possible
> > > than before.  Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> > > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> > >
> > > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> > > Linux.  Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> > > THP for such stack area.
> > >
> > > With this change the stack area looks like:
> > >
> > > fffd18e10000-fffd19610000 rw-p 00000000 00:00 0
> > > Size:               8192 kB
> > > KernelPageSize:        4 kB
> > > MMUPageSize:           4 kB
> > > Rss:                  12 kB
> > > Pss:                  12 kB
> > > Pss_Dirty:            12 kB
> > > Shared_Clean:          0 kB
> > > Shared_Dirty:          0 kB
> > > Private_Clean:         0 kB
> > > Private_Dirty:        12 kB
> > > Referenced:           12 kB
> > > Anonymous:            12 kB
> > > KSM:                   0 kB
> > > LazyFree:              0 kB
> > > AnonHugePages:         0 kB
> > > ShmemPmdMapped:        0 kB
> > > FilePmdMapped:         0 kB
> > > Shared_Hugetlb:        0 kB
> > > Private_Hugetlb:       0 kB
> > > Swap:                  0 kB
> > > SwapPss:               0 kB
> > > Locked:                0 kB
> > > THPeligible:           0
> > > VmFlags: rd wr mr mw me ac nh
> > >
> > > The "nh" flag is set.
> > >
> > > [1] https://lore.kernel.org/linux-mm/202312192310.56367035-oliver.sang@intel.com/
> > >
> > > Reported-by: kernel test robot <oliver.sang@intel.com>
> > > Tested-by: Oliver Sang <oliver.sang@intel.com>
> > > Cc: Yin Fengwei <fengwei.yin@intel.com>
> > > Cc: Rik van Riel <riel@surriel.com>
> > > Cc: Matthew Wilcox <willy@infradead.org>
> > > Cc: Christopher Lameter <cl@linux.com>
> > > Cc: Huang, Ying <ying.huang@intel.com>
> > > Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> >
> > Reviewed-by: Yin Fengwei <fengwei.yin@intel.com>
> >
> > > ---
> > >   include/linux/mman.h | 1 +
> > >   1 file changed, 1 insertion(+)
> > >
> > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > index 40d94411d492..dc7048824be8 100644
> > > --- a/include/linux/mman.h
> > > +++ b/include/linux/mman.h
> > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > >       return _calc_vm_trans(flags, MAP_GROWSDOWN,  VM_GROWSDOWN ) |
> > >              _calc_vm_trans(flags, MAP_LOCKED,     VM_LOCKED    ) |
> > >              _calc_vm_trans(flags, MAP_SYNC,       VM_SYNC      ) |
> > > +            _calc_vm_trans(flags, MAP_STACK,      VM_NOHUGEPAGE) |
> > >              arch_calc_vm_flag_bits(flags);
> > >   }
> > >
> >

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2024-01-16 20:57       ` Yang Shi
@ 2024-01-16 21:31         ` Andrew Morton
  0 siblings, 0 replies; 13+ messages in thread
From: Andrew Morton @ 2024-01-16 21:31 UTC (permalink / raw)
  To: Yang Shi
  Cc: Zach O'Keefe, Yin Fengwei, oliver.sang, riel, willy, cl,
	ying.huang, linux-kernel, linux-mm

On Tue, 16 Jan 2024 12:57:41 -0800 Yang Shi <shy828301@gmail.com> wrote:

> > Should this be marked for stable? Given how easily it is for pthreads
> > to allocate hugepages w/o this change, it can easily cause memory
> > bloat on larger systems and/or users with high thread counts. I don't
> > think that will be welcomed, and seems odd that just 6.7 should suffer
> > this.
> 
> Thanks for the suggestion, fine to me.
>

Thanks, added, along with

Fixes: efa7df3e3bb5 ("mm: align larger anonymous mappings on THP boundaries")


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack
  2024-01-15  5:50 ` Huang, Ying
@ 2024-01-16 21:39   ` Yang Shi
  0 siblings, 0 replies; 13+ messages in thread
From: Yang Shi @ 2024-01-16 21:39 UTC (permalink / raw)
  To: Huang, Ying
  Cc: oliver.sang, riel, fengwei.yin, willy, cl, akpm, linux-kernel, linux-mm

On Sun, Jan 14, 2024 at 9:52 PM Huang, Ying <ying.huang@intel.com> wrote:
>
> Yang Shi <shy828301@gmail.com> writes:
>
> > From: Yang Shi <yang@os.amperecomputing.com>
> >
> > We avoid allocating THP for temporary stack, even tnough
>                                                     ~~~~~~
>                                                     though?

Yeah, it is a typo. Thanks for noticing this.

>
> --
> Best Regards,
> Huang, Ying
>
> > khugepaged_enter_vma() is called for stack VMAs, it actualy returns
> > false.  So no need to call it in the first place at all.
> >
> > Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
> > ---
> >  mm/mmap.c | 2 --
> >  1 file changed, 2 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index b78e83d351d2..2ff79b1d1564 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2046,7 +2046,6 @@ static int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> >               }
> >       }
> >       anon_vma_unlock_write(vma->anon_vma);
> > -     khugepaged_enter_vma(vma, vma->vm_flags);
> >       mas_destroy(&mas);
> >       validate_mm(mm);
> >       return error;
> > @@ -2140,7 +2139,6 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> >               }
> >       }
> >       anon_vma_unlock_write(vma->anon_vma);
> > -     khugepaged_enter_vma(vma, vma->vm_flags);
> >       mas_destroy(&mas);
> >       validate_mm(mm);
> >       return error;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2023-12-21  6:59 ` [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE Yang Shi
  2024-01-10  1:36   ` Yin Fengwei
@ 2024-01-31  7:53   ` Florian Weimer
  2024-01-31 18:46     ` Yang Shi
  1 sibling, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2024-01-31  7:53 UTC (permalink / raw)
  To: Yang Shi
  Cc: oliver.sang, riel, fengwei.yin, willy, cl, ying.huang, akpm,
	linux-kernel, linux-mm

* Yang Shi:

> From: Yang Shi <yang@os.amperecomputing.com>
>
> The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> boundaries") incured regression for stress-ng pthread benchmark [1].
> It is because THP get allocated to pthread's stack area much more possible
> than before.  Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
>
> The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> Linux.  Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> THP for such stack area.

Doesn't this introduce a regression in the other direction, where
workloads expect to use a hugepage TLB entry for the stack?

It's seems an odd approach to fixing the stress-ng regression.  Isn't it
very much coding to the benchmark?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2024-01-31  7:53   ` Florian Weimer
@ 2024-01-31 18:46     ` Yang Shi
  2024-02-01 15:34       ` Florian Weimer
  0 siblings, 1 reply; 13+ messages in thread
From: Yang Shi @ 2024-01-31 18:46 UTC (permalink / raw)
  To: Florian Weimer
  Cc: oliver.sang, riel, fengwei.yin, willy, cl, ying.huang, akpm,
	linux-kernel, linux-mm

On Tue, Jan 30, 2024 at 11:53 PM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Yang Shi:
>
> > From: Yang Shi <yang@os.amperecomputing.com>
> >
> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> > boundaries") incured regression for stress-ng pthread benchmark [1].
> > It is because THP get allocated to pthread's stack area much more possible
> > than before.  Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> >
> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> > Linux.  Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> > THP for such stack area.
>
> Doesn't this introduce a regression in the other direction, where
> workloads expect to use a hugepage TLB entry for the stack?

Maybe, it is theoretically possible. But AFAICT, the real life
workloads performance usually gets hurt if THP is used for stack.
Willy has an example:

https://lore.kernel.org/linux-mm/ZYPDwCcAjX+r+g6s@casper.infradead.org/#t

And avoiding THP on stack is not new, VM_GROWSDOWN | VM_GROWSUP areas
have been applied before, this patch just extends this to MAP_STACK.

>
> It's seems an odd approach to fixing the stress-ng regression.  Isn't it
> very much coding to the benchmark?
>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2024-01-31 18:46     ` Yang Shi
@ 2024-02-01 15:34       ` Florian Weimer
  2024-02-01 19:00         ` Yang Shi
  0 siblings, 1 reply; 13+ messages in thread
From: Florian Weimer @ 2024-02-01 15:34 UTC (permalink / raw)
  To: Yang Shi
  Cc: oliver.sang, riel, fengwei.yin, willy, cl, ying.huang, akpm,
	linux-kernel, linux-mm

* Yang Shi:

> On Tue, Jan 30, 2024 at 11:53 PM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * Yang Shi:
>>
>> > From: Yang Shi <yang@os.amperecomputing.com>
>> >
>> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
>> > boundaries") incured regression for stress-ng pthread benchmark [1].
>> > It is because THP get allocated to pthread's stack area much more possible
>> > than before.  Pthread's stack area is allocated by mmap without VM_GROWSDOWN
>> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
>> >
>> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
>> > Linux.  Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
>> > THP for such stack area.
>>
>> Doesn't this introduce a regression in the other direction, where
>> workloads expect to use a hugepage TLB entry for the stack?
>
> Maybe, it is theoretically possible. But AFAICT, the real life
> workloads performance usually gets hurt if THP is used for stack.
> Willy has an example:
>
> https://lore.kernel.org/linux-mm/ZYPDwCcAjX+r+g6s@casper.infradead.org/#t
>
> And avoiding THP on stack is not new, VM_GROWSDOWN | VM_GROWSUP areas
> have been applied before, this patch just extends this to MAP_STACK.

If it's *always* beneficial then we should help it along in glibc as
well.  We've started to offer a tunable in response to this observation
(also paper over in OpenJDK):

  Make thread stacks not use huge pages
  <https://bugs.openjdk.org/browse/JDK-8303215>

But this is specifically about RSS usage, and not directly about
reducing TLB misses etc.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE
  2024-02-01 15:34       ` Florian Weimer
@ 2024-02-01 19:00         ` Yang Shi
  0 siblings, 0 replies; 13+ messages in thread
From: Yang Shi @ 2024-02-01 19:00 UTC (permalink / raw)
  To: Florian Weimer
  Cc: oliver.sang, riel, fengwei.yin, willy, cl, ying.huang, akpm,
	linux-kernel, linux-mm

On Thu, Feb 1, 2024 at 7:34 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * Yang Shi:
>
> > On Tue, Jan 30, 2024 at 11:53 PM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * Yang Shi:
> >>
> >> > From: Yang Shi <yang@os.amperecomputing.com>
> >> >
> >> > The commit efa7df3e3bb5 ("mm: align larger anonymous mappings on THP
> >> > boundaries") incured regression for stress-ng pthread benchmark [1].
> >> > It is because THP get allocated to pthread's stack area much more possible
> >> > than before.  Pthread's stack area is allocated by mmap without VM_GROWSDOWN
> >> > or VM_GROWSUP flag, so kernel can't tell whether it is a stack area or not.
> >> >
> >> > The MAP_STACK flag is used to mark the stack area, but it is a no-op on
> >> > Linux.  Mapping MAP_STACK to VM_NOHUGEPAGE to prevent from allocating
> >> > THP for such stack area.
> >>
> >> Doesn't this introduce a regression in the other direction, where
> >> workloads expect to use a hugepage TLB entry for the stack?
> >
> > Maybe, it is theoretically possible. But AFAICT, the real life
> > workloads performance usually gets hurt if THP is used for stack.
> > Willy has an example:
> >
> > https://lore.kernel.org/linux-mm/ZYPDwCcAjX+r+g6s@casper.infradead.org/#t
> >
> > And avoiding THP on stack is not new, VM_GROWSDOWN | VM_GROWSUP areas
> > have been applied before, this patch just extends this to MAP_STACK.
>
> If it's *always* beneficial then we should help it along in glibc as
> well.  We've started to offer a tunable in response to this observation
> (also paper over in OpenJDK):
>
>   Make thread stacks not use huge pages
>   <https://bugs.openjdk.org/browse/JDK-8303215>
>
> But this is specifically about RSS usage, and not directly about
> reducing TLB misses etc.

Thanks for the data point. Out of curiosity, what mmap flags are used
by JVM to indicate a stack? MAP_STACK? If so it should get
VM_NOHUGEPAGE due to this patch (of course, on older kernel
MADV_NOHUGEPAGE must be called by JVM).

Letting others, for example, glibc, call MADV_NOHUGEPAGE explicitly on
stack area is fine too, but it may take some time to get there...

>
> Thanks,
> Florian
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-02-01 19:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-21  6:59 [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack Yang Shi
2023-12-21  6:59 ` [PATCH 2/2] mm: mmap: map MAP_STACK to VM_NOHUGEPAGE Yang Shi
2024-01-10  1:36   ` Yin Fengwei
2024-01-16 19:22     ` Zach O'Keefe
2024-01-16 20:57       ` Yang Shi
2024-01-16 21:31         ` Andrew Morton
2024-01-31  7:53   ` Florian Weimer
2024-01-31 18:46     ` Yang Shi
2024-02-01 15:34       ` Florian Weimer
2024-02-01 19:00         ` Yang Shi
2024-01-10  1:35 ` [PATCH 1/2] mm: mmap: no need to call khugepaged_enter_vma() for stack Yin Fengwei
2024-01-15  5:50 ` Huang, Ying
2024-01-16 21:39   ` Yang Shi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).