linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] mm,thp,shm: limit shmem THP alloc gfp_mask
@ 2020-11-05 19:15 Rik van Riel
  2020-11-05 19:15 ` [PATCH 1/2] mm,thp,shmem: " Rik van Riel
  2020-11-05 19:15 ` [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified Rik van Riel
  0 siblings, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2020-11-05 19:15 UTC (permalink / raw)
  To: hughd
  Cc: xuyu, akpm, mgorman, aarcange, willy, linux-kernel, kernel-team,
	linux-mm, vbabka, mhocko

The allocation flags of anonymous transparent huge pages can be controlled
through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
help the system from getting bogged down in the page reclaim and compaction
code when many THPs are getting allocated simultaneously.

However, the gfp_mask for shmem THP allocations were not limited by those
configuration settings, and some workloads ended up with all CPUs stuck
on the LRU lock in the page reclaim code, trying to allocate dozens of
THPs simultaneously.

This patch applies the same configurated limitation of THPs to shmem
hugepage allocations, to prevent that from happening.

This way a THP defrag setting of "never" or "defer+madvise" will result
in quick allocation failures without direct reclaim when no 2MB free
pages are available.

With this patch applied, THP allocations for tmpfs will be a little
more aggressive than today for files mmapped with MADV_HUGEPAGE,
and a little less aggressive for files that are not mmapped or
mapped without that flag.

v5: reduce gfp mask further if needed, to accomodate i915 (Matthew Wilcox)
v4: rename alloc_hugepage_direct_gfpmask to vma_thp_gfp_mask (Matthew Wilcox)
v3: fix NULL vma issue spotted by Hugh Dickins & tested
v2: move gfp calculation to shmem_getpage_gfp as suggested by Yu Xu




^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/2] mm,thp,shmem: limit shmem THP alloc gfp_mask
  2020-11-05 19:15 [PATCH 0/2] mm,thp,shm: limit shmem THP alloc gfp_mask Rik van Riel
@ 2020-11-05 19:15 ` Rik van Riel
  2020-11-12 10:52   ` Michal Hocko
  2020-11-05 19:15 ` [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2020-11-05 19:15 UTC (permalink / raw)
  To: hughd
  Cc: xuyu, akpm, mgorman, aarcange, willy, linux-kernel, kernel-team,
	linux-mm, vbabka, mhocko, Rik van Riel

The allocation flags of anonymous transparent huge pages can be controlled
through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
help the system from getting bogged down in the page reclaim and compaction
code when many THPs are getting allocated simultaneously.

However, the gfp_mask for shmem THP allocations were not limited by those
configuration settings, and some workloads ended up with all CPUs stuck
on the LRU lock in the page reclaim code, trying to allocate dozens of
THPs simultaneously.

This patch applies the same configurated limitation of THPs to shmem
hugepage allocations, to prevent that from happening.

This way a THP defrag setting of "never" or "defer+madvise" will result
in quick allocation failures without direct reclaim when no 2MB free
pages are available.

With this patch applied, THP allocations for tmpfs will be a little
more aggressive than today for files mmapped with MADV_HUGEPAGE,
and a little less aggressive for files that are not mmapped or
mapped without that flag.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 include/linux/gfp.h | 2 ++
 mm/huge_memory.c    | 6 +++---
 mm/shmem.c          | 8 +++++---
 3 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index c603237e006c..c7615c9ba03c 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -614,6 +614,8 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
 extern void pm_restrict_gfp_mask(void);
 extern void pm_restore_gfp_mask(void);
 
+extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
+
 #ifdef CONFIG_PM_SLEEP
 extern bool pm_suspended_storage(void);
 #else
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9474dbc150ed..c5d03b2f2f2f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -649,9 +649,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf,
  *	    available
  * never: never stall for any thp allocation
  */
-static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma)
+gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma)
 {
-	const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
+	const bool vma_madvised = vma && (vma->vm_flags & VM_HUGEPAGE);
 
 	/* Always do synchronous compaction */
 	if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags))
@@ -744,7 +744,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf)
 			pte_free(vma->vm_mm, pgtable);
 		return ret;
 	}
-	gfp = alloc_hugepage_direct_gfpmask(vma);
+	gfp = vma_thp_gfp_mask(vma);
 	page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER);
 	if (unlikely(!page)) {
 		count_vm_event(THP_FAULT_FALLBACK);
diff --git a/mm/shmem.c b/mm/shmem.c
index 537c137698f8..6c3cb192a88d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1545,8 +1545,8 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp,
 		return NULL;
 
 	shmem_pseudo_vma_init(&pvma, info, hindex);
-	page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN,
-			HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), true);
+	page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(),
+			       true);
 	shmem_pseudo_vma_destroy(&pvma);
 	if (page)
 		prep_transhuge_page(page);
@@ -1802,6 +1802,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	struct page *page;
 	enum sgp_type sgp_huge = sgp;
 	pgoff_t hindex = index;
+	gfp_t huge_gfp;
 	int error;
 	int once = 0;
 	int alloced = 0;
@@ -1887,7 +1888,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 	}
 
 alloc_huge:
-	page = shmem_alloc_and_acct_page(gfp, inode, index, true);
+	huge_gfp = vma_thp_gfp_mask(vma);
+	page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
 	if (IS_ERR(page)) {
 alloc_nohuge:
 		page = shmem_alloc_and_acct_page(gfp, inode,
-- 
2.25.4



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified
  2020-11-05 19:15 [PATCH 0/2] mm,thp,shm: limit shmem THP alloc gfp_mask Rik van Riel
  2020-11-05 19:15 ` [PATCH 1/2] mm,thp,shmem: " Rik van Riel
@ 2020-11-05 19:15 ` Rik van Riel
  2020-11-06  3:05   ` Hillf Danton
  2020-11-12 11:22   ` Michal Hocko
  1 sibling, 2 replies; 13+ messages in thread
From: Rik van Riel @ 2020-11-05 19:15 UTC (permalink / raw)
  To: hughd
  Cc: xuyu, akpm, mgorman, aarcange, willy, linux-kernel, kernel-team,
	linux-mm, vbabka, mhocko, Rik van Riel

Matthew Wilcox pointed out that the i915 driver opportunistically
allocates tmpfs memory, but will happily reclaim some of its
pool if no memory is available.

Make sure the gfp mask used to opportunistically allocate a THP
is always at least as restrictive as the original gfp mask.

Signed-off-by: Rik van Riel <riel@surriel.com>
Suggested-by: Matthew Wilcox <willy@infradead.org>
---
 mm/shmem.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 6c3cb192a88d..ee3cea10c2a4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1531,6 +1531,26 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
 	return page;
 }
 
+/*
+ * Make sure huge_gfp is always more limited than limit_gfp.
+ * Some of the flags set permissions, while others set limitations.
+ */
+static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
+{
+	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
+	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
+	gfp_t result = huge_gfp & ~allowflags;
+
+	/*
+	 * Minimize the result gfp by taking the union with the deny flags,
+	 * and the intersection of the allow flags.
+	 */
+	result |= (limit_gfp & denyflags);
+	result |= (huge_gfp & limit_gfp) & allowflags;
+
+	return result;
+}
+
 static struct page *shmem_alloc_hugepage(gfp_t gfp,
 		struct shmem_inode_info *info, pgoff_t index)
 {
@@ -1889,6 +1909,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
 
 alloc_huge:
 	huge_gfp = vma_thp_gfp_mask(vma);
+	huge_gfp = limit_gfp_mask(huge_gfp, gfp);
 	page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
 	if (IS_ERR(page)) {
 alloc_nohuge:
-- 
2.25.4



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified
  2020-11-05 19:15 ` [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified Rik van Riel
@ 2020-11-06  3:05   ` Hillf Danton
  2020-11-06 17:53     ` Rik van Riel
  2020-11-12 11:22   ` Michal Hocko
  1 sibling, 1 reply; 13+ messages in thread
From: Hillf Danton @ 2020-11-06  3:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka, mhocko

On Thu,  5 Nov 2020 14:15:08 -0500
> 
> Matthew Wilcox pointed out that the i915 driver opportunistically
> allocates tmpfs memory, but will happily reclaim some of its
> pool if no memory is available.
> 
> Make sure the gfp mask used to opportunistically allocate a THP
> is always at least as restrictive as the original gfp mask.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> ---
>  mm/shmem.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 6c3cb192a88d..ee3cea10c2a4 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1531,6 +1531,26 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
>  	return page;
>  }
>  
> +/*
> + * Make sure huge_gfp is always more limited than limit_gfp.
> + * Some of the flags set permissions, while others set limitations.
> + */
> +static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
> +{
> +	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
> +	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
> +	gfp_t result = huge_gfp & ~allowflags;
> +
> +	/*
> +	 * Minimize the result gfp by taking the union with the deny flags,
> +	 * and the intersection of the allow flags.
> +	 */
> +	result |= (limit_gfp & denyflags);

Currently NORETRY is always set regardless of i915 and if it's
determined in 1/2 then the i915 thing can be done like

	return huge_gfp | (limit_gfp & __GFP_RECLAIM);

in assumption that 1/2 gives the baseline gfp shmem needs.

> +	result |= (huge_gfp & limit_gfp) & allowflags;
> +
> +	return result;
> +}
> +
>  static struct page *shmem_alloc_hugepage(gfp_t gfp,
>  		struct shmem_inode_info *info, pgoff_t index)
>  {
> @@ -1889,6 +1909,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
>  
>  alloc_huge:
>  	huge_gfp = vma_thp_gfp_mask(vma);
> +	huge_gfp = limit_gfp_mask(huge_gfp, gfp);
>  	page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
>  	if (IS_ERR(page)) {
>  alloc_nohuge:
> -- 
> 2.25.4


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified
  2020-11-06  3:05   ` Hillf Danton
@ 2020-11-06 17:53     ` Rik van Riel
  2020-11-07  1:58       ` Hillf Danton
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2020-11-06 17:53 UTC (permalink / raw)
  To: Hillf Danton
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka, mhocko

[-- Attachment #1: Type: text/plain, Size: 2066 bytes --]

On Fri, 2020-11-06 at 11:05 +0800, Hillf Danton wrote:
> On Thu,  5 Nov 2020 14:15:08 -0500
> > Matthew Wilcox pointed out that the i915 driver opportunistically
> > allocates tmpfs memory, but will happily reclaim some of its
> > pool if no memory is available.
> > 
> > Make sure the gfp mask used to opportunistically allocate a THP
> > is always at least as restrictive as the original gfp mask.
> > 
> > Signed-off-by: Rik van Riel <riel@surriel.com>
> > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > ---
> >  mm/shmem.c | 21 +++++++++++++++++++++
> >  1 file changed, 21 insertions(+)
> > 
> > diff --git a/mm/shmem.c b/mm/shmem.c
> > index 6c3cb192a88d..ee3cea10c2a4 100644
> > --- a/mm/shmem.c
> > +++ b/mm/shmem.c
> > @@ -1531,6 +1531,26 @@ static struct page *shmem_swapin(swp_entry_t
> > swap, gfp_t gfp,
> >  	return page;
> >  }
> >  
> > +/*
> > + * Make sure huge_gfp is always more limited than limit_gfp.
> > + * Some of the flags set permissions, while others set
> > limitations.
> > + */
> > +static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
> > +{
> > +	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
> > +	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
> > +	gfp_t result = huge_gfp & ~allowflags;
> > +
> > +	/*
> > +	 * Minimize the result gfp by taking the union with the deny
> > flags,
> > +	 * and the intersection of the allow flags.
> > +	 */
> > +	result |= (limit_gfp & denyflags);
> 
> Currently NORETRY is always set regardless of i915 and if it's
> determined in 1/2 then the i915 thing can be done like
> 
> 	return huge_gfp | (limit_gfp & __GFP_RECLAIM);

No, if __GFP_KSWAPD_RECLAIM or __GFP_DIRECT_RECLAIM are
not set in either huge_gfp or limit_gfp, we want to ensure
the resulting gfp does not have it set, either.

Your suggested change
would result in __GFP_KSWAPD_RECLAIM
or __GFP_DIRECT_RECLAIM getting set if it was set in either
of the input gfp variables, which is probably not the desired
behavior.


-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified
  2020-11-06 17:53     ` Rik van Riel
@ 2020-11-07  1:58       ` Hillf Danton
  0 siblings, 0 replies; 13+ messages in thread
From: Hillf Danton @ 2020-11-07  1:58 UTC (permalink / raw)
  To: Rik van Riel
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka, mhocko

On Fri, 06 Nov 2020 12:53:33 -0500 Rik van Riel wrote:
> On Fri, 2020-11-06 at 11:05 +0800, Hillf Danton wrote:
> > On Thu,  5 Nov 2020 14:15:08 -0500
> > > Matthew Wilcox pointed out that the i915 driver opportunistically
> > > allocates tmpfs memory, but will happily reclaim some of its
> > > pool if no memory is available.
> > >
> > > Make sure the gfp mask used to opportunistically allocate a THP
> > > is always at least as restrictive as the original gfp mask.
> > >
> > > Signed-off-by: Rik van Riel <riel@surriel.com>
> > > Suggested-by: Matthew Wilcox <willy@infradead.org>
> > > ---
> > >  mm/shmem.c | 21 +++++++++++++++++++++
> > >  1 file changed, 21 insertions(+)
> > >
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 6c3cb192a88d..ee3cea10c2a4 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -1531,6 +1531,26 @@ static struct page *shmem_swapin(swp_entry_t
> > > swap, gfp_t gfp,
> > >  	return page;
> > >  }
> > > 
> > > +/*
> > > + * Make sure huge_gfp is always more limited than limit_gfp.
> > > + * Some of the flags set permissions, while others set
> > > limitations.
> > > + */
> > > +static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
> > > +{
> > > +	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
> > > +	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
> > > +	gfp_t result = huge_gfp & ~allowflags;
> > > +
> > > +	/*
> > > +	 * Minimize the result gfp by taking the union with the deny
> > > flags,
> > > +	 * and the intersection of the allow flags.
> > > +	 */
> > > +	result |= (limit_gfp & denyflags);
> >
> > Currently NORETRY is always set regardless of i915 and if it's
> > determined in 1/2 then the i915 thing can be done like
> >
> > 	return huge_gfp | (limit_gfp & __GFP_RECLAIM);
> 
> No, if __GFP_KSWAPD_RECLAIM or __GFP_DIRECT_RECLAIM are
> not set in either huge_gfp or limit_gfp, we want to ensure
> the resulting gfp does not have it set, either.

That means huge_gfp can play game without i915 considered if
__GFP_RECLAIM is determined in 1/2 too.  Then things become
simpler because we have no need to check limit_gfp from the
begining.
> 
> Your suggested change
> would result in __GFP_KSWAPD_RECLAIM
> or __GFP_DIRECT_RECLAIM getting set if it was set in either
> of the input gfp variables, which is probably not the desired
> behavior.

It makes sense on if we could not determine __GFP_RECLAIM
without i915 considered. Now it is safe to ignore it.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm,thp,shmem: limit shmem THP alloc gfp_mask
  2020-11-05 19:15 ` [PATCH 1/2] mm,thp,shmem: " Rik van Riel
@ 2020-11-12 10:52   ` Michal Hocko
  2020-11-14  3:44     ` Rik van Riel
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2020-11-12 10:52 UTC (permalink / raw)
  To: Rik van Riel
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka

On Thu 05-11-20 14:15:07, Rik van Riel wrote:
> The allocation flags of anonymous transparent huge pages can be controlled
> through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can
> help the system from getting bogged down in the page reclaim and compaction
> code when many THPs are getting allocated simultaneously.
> 
> However, the gfp_mask for shmem THP allocations were not limited by those
> configuration settings, and some workloads ended up with all CPUs stuck
> on the LRU lock in the page reclaim code, trying to allocate dozens of
> THPs simultaneously.
> 
> This patch applies the same configurated limitation of THPs to shmem
> hugepage allocations, to prevent that from happening.

I believe you should also exaplain why we want to control defrag by the
global knob while the enable logic is per mount.

> This way a THP defrag setting of "never" or "defer+madvise" will result
> in quick allocation failures without direct reclaim when no 2MB free
> pages are available.
> 
> With this patch applied, THP allocations for tmpfs will be a little
> more aggressive than today for files mmapped with MADV_HUGEPAGE,
> and a little less aggressive for files that are not mmapped or
> mapped without that flag.

This begs some numbers. A little is rather bad unit of performance. I do
agree that unifying those makes sense in general though.

> Signed-off-by: Rik van Riel <riel@surriel.com>
> ---
>  include/linux/gfp.h | 2 ++
>  mm/huge_memory.c    | 6 +++---
>  mm/shmem.c          | 8 +++++---
>  3 files changed, 10 insertions(+), 6 deletions(-)
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified
  2020-11-05 19:15 ` [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified Rik van Riel
  2020-11-06  3:05   ` Hillf Danton
@ 2020-11-12 11:22   ` Michal Hocko
  2020-11-14  3:40     ` Rik van Riel
  1 sibling, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2020-11-12 11:22 UTC (permalink / raw)
  To: Rik van Riel
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka, Andrey Grodzovsky, Chris Wilson

[Cc Chris for i915 and Andray]

On Thu 05-11-20 14:15:08, Rik van Riel wrote:
> Matthew Wilcox pointed out that the i915 driver opportunistically
> allocates tmpfs memory, but will happily reclaim some of its
> pool if no memory is available.

It would be good to explicitly mention the requested gfp flags for those
allocations. i915 uses __GFP_NORETRY | __GFP_NOWARN, or GFP_KERNEL. Is
__shmem_rw really meant to not allocate from highmeme/movable zones? Can
it be ever backed by THPs?

ttm might want __GFP_RETRY_MAYFAIL while shmem_read_mapping_page use
the mapping gfp mask which can be NOFS or something else. This is quite
messy already and I suspect that they are more targeting regular order-0
requests. E.g. have a look at cb5f1a52caf23.

I am worried that this games with gfp flags will lead to unmaintainable
code later on. There is a clear disconnect betwen the core THP
allocation strategy and what drivers are asking for and those
requirements might be really conflicting. Not to mention that flags
might be different between regular and THP pages.

> Make sure the gfp mask used to opportunistically allocate a THP
> is always at least as restrictive as the original gfp mask.
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> ---
>  mm/shmem.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 6c3cb192a88d..ee3cea10c2a4 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1531,6 +1531,26 @@ static struct page *shmem_swapin(swp_entry_t swap, gfp_t gfp,
>  	return page;
>  }
>  
> +/*
> + * Make sure huge_gfp is always more limited than limit_gfp.
> + * Some of the flags set permissions, while others set limitations.
> + */
> +static gfp_t limit_gfp_mask(gfp_t huge_gfp, gfp_t limit_gfp)
> +{
> +	gfp_t allowflags = __GFP_IO | __GFP_FS | __GFP_RECLAIM;
> +	gfp_t denyflags = __GFP_NOWARN | __GFP_NORETRY;
> +	gfp_t result = huge_gfp & ~allowflags;
> +
> +	/*
> +	 * Minimize the result gfp by taking the union with the deny flags,
> +	 * and the intersection of the allow flags.
> +	 */
> +	result |= (limit_gfp & denyflags);
> +	result |= (huge_gfp & limit_gfp) & allowflags;
> +
> +	return result;
> +}
> +
>  static struct page *shmem_alloc_hugepage(gfp_t gfp,
>  		struct shmem_inode_info *info, pgoff_t index)
>  {
> @@ -1889,6 +1909,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
>  
>  alloc_huge:
>  	huge_gfp = vma_thp_gfp_mask(vma);
> +	huge_gfp = limit_gfp_mask(huge_gfp, gfp);
>  	page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true);
>  	if (IS_ERR(page)) {
>  alloc_nohuge:
> -- 
> 2.25.4

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified
  2020-11-12 11:22   ` Michal Hocko
@ 2020-11-14  3:40     ` Rik van Riel
  2020-11-19  9:38       ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2020-11-14  3:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka, Andrey Grodzovsky, Chris Wilson

[-- Attachment #1: Type: text/plain, Size: 1766 bytes --]

On Thu, 2020-11-12 at 12:22 +0100, Michal Hocko wrote:
> [Cc Chris for i915 and Andray]
> 
> On Thu 05-11-20 14:15:08, Rik van Riel wrote:
> > Matthew Wilcox pointed out that the i915 driver opportunistically
> > allocates tmpfs memory, but will happily reclaim some of its
> > pool if no memory is available.
> 
> It would be good to explicitly mention the requested gfp flags for
> those
> allocations. i915 uses __GFP_NORETRY | __GFP_NOWARN, or GFP_KERNEL.
> Is
> __shmem_rw really meant to not allocate from highmeme/movable zones?
> Can
> it be ever backed by THPs?

You are right, I need to copy the zone flags __GFP_DMA
through
__GFP_MOVABLE straight from the limiting gfp_mask
into the gfp_mask used for THP allocations, and not use
the default THP zone flags if the caller specifies something
else.

I'll send out a new version that fixes that.

> ttm might want __GFP_RETRY_MAYFAIL while shmem_read_mapping_page use
> the mapping gfp mask which can be NOFS or something else. This is
> quite
> messy already and I suspect that they are more targeting regular
> order-0
> requests. E.g. have a look at cb5f1a52caf23.
> 
> I am worried that this games with gfp flags will lead to
> unmaintainable
> code later on. There is a clear disconnect betwen the core THP
> allocation strategy and what drivers are asking for and those
> requirements might be really conflicting. Not to mention that flags
> might be different between regular and THP pages.

That is exactly why I want to make sure the THP allocations
are never more aggressive than the gfp flags the drivers
request, and the THP allocations may only ever be less
aggressive than the order 0 gfp_mask specified by the drivers.


-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm,thp,shmem: limit shmem THP alloc gfp_mask
  2020-11-12 10:52   ` Michal Hocko
@ 2020-11-14  3:44     ` Rik van Riel
  2020-11-19  9:37       ` Michal Hocko
  0 siblings, 1 reply; 13+ messages in thread
From: Rik van Riel @ 2020-11-14  3:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka

[-- Attachment #1: Type: text/plain, Size: 1705 bytes --]

On Thu, 2020-11-12 at 11:52 +0100, Michal Hocko wrote:
> On Thu 05-11-20 14:15:07, Rik van Riel wrote:
> > 
> > This patch applies the same configurated limitation of THPs to
> > shmem
> > hugepage allocations, to prevent that from happening.
> 
> I believe you should also exaplain why we want to control defrag by
> the
> global knob while the enable logic is per mount.

I added that to the changelog for the next version of
the patches.

> > This way a THP defrag setting of "never" or "defer+madvise" will
> > result
> > in quick allocation failures without direct reclaim when no 2MB
> > free
> > pages are available.
> > 
> > With this patch applied, THP allocations for tmpfs will be a little
> > more aggressive than today for files mmapped with MADV_HUGEPAGE,
> > and a little less aggressive for files that are not mmapped or
> > mapped without that flag.
> 
> This begs some numbers. A little is rather bad unit of performance. I
> do
> agree that unifying those makes sense in general though.

The aggressiveness is in changes to the gfp_mask, eg by
adding __GFP_NORETRY. How that translates into THP
allocation success rates is entirely dependent on the
workload and on what else is in memory at the time.

I am not sure any
numbers I could gather will be
representative for anything but the workloads I am
testing.

However, I did find an issue in hugepage_vma_check
that prevents khugepaged from collapsing pages on
shmem filesystems mounted with huge=always or
huge=within_size when transparent_hugepage/enabled
is set to [madvise].

The next version of the series will have a third
patch, in order to fix that.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] mm,thp,shmem: limit shmem THP alloc gfp_mask
  2020-11-14  3:44     ` Rik van Riel
@ 2020-11-19  9:37       ` Michal Hocko
  0 siblings, 0 replies; 13+ messages in thread
From: Michal Hocko @ 2020-11-19  9:37 UTC (permalink / raw)
  To: Rik van Riel
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka

On Fri 13-11-20 22:44:20, Rik van Riel wrote:
> On Thu, 2020-11-12 at 11:52 +0100, Michal Hocko wrote:
> > On Thu 05-11-20 14:15:07, Rik van Riel wrote:
> > > 
> > > This patch applies the same configurated limitation of THPs to
> > > shmem
> > > hugepage allocations, to prevent that from happening.
> > 
> > I believe you should also exaplain why we want to control defrag by
> > the
> > global knob while the enable logic is per mount.
> 
> I added that to the changelog for the next version of
> the patches.
> 
> > > This way a THP defrag setting of "never" or "defer+madvise" will
> > > result
> > > in quick allocation failures without direct reclaim when no 2MB
> > > free
> > > pages are available.
> > > 
> > > With this patch applied, THP allocations for tmpfs will be a little
> > > more aggressive than today for files mmapped with MADV_HUGEPAGE,
> > > and a little less aggressive for files that are not mmapped or
> > > mapped without that flag.
> > 
> > This begs some numbers. A little is rather bad unit of performance. I
> > do
> > agree that unifying those makes sense in general though.
> 
> The aggressiveness is in changes to the gfp_mask, eg by
> adding __GFP_NORETRY. How that translates into THP
> allocation success rates is entirely dependent on the
> workload and on what else is in memory at the time.

Yes and that is why I would argue about consistency with THP rather than
put claims that hard to back by numbers.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified
  2020-11-14  3:40     ` Rik van Riel
@ 2020-11-19  9:38       ` Michal Hocko
  2020-11-23 19:39         ` Rik van Riel
  0 siblings, 1 reply; 13+ messages in thread
From: Michal Hocko @ 2020-11-19  9:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka, Andrey Grodzovsky, Chris Wilson

On Fri 13-11-20 22:40:40, Rik van Riel wrote:
> On Thu, 2020-11-12 at 12:22 +0100, Michal Hocko wrote:
> > [Cc Chris for i915 and Andray]
> > 
> > On Thu 05-11-20 14:15:08, Rik van Riel wrote:
> > > Matthew Wilcox pointed out that the i915 driver opportunistically
> > > allocates tmpfs memory, but will happily reclaim some of its
> > > pool if no memory is available.
> > 
> > It would be good to explicitly mention the requested gfp flags for
> > those
> > allocations. i915 uses __GFP_NORETRY | __GFP_NOWARN, or GFP_KERNEL.
> > Is
> > __shmem_rw really meant to not allocate from highmeme/movable zones?
> > Can
> > it be ever backed by THPs?
> 
> You are right, I need to copy the zone flags __GFP_DMA
> through
> __GFP_MOVABLE straight from the limiting gfp_mask
> into the gfp_mask used for THP allocations, and not use
> the default THP zone flags if the caller specifies something
> else.
> 
> I'll send out a new version that fixes that.

Can we make one step back here and actually check whether all this is
actually needed for those shmem users before adding more hacks here and
there?
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified
  2020-11-19  9:38       ` Michal Hocko
@ 2020-11-23 19:39         ` Rik van Riel
  0 siblings, 0 replies; 13+ messages in thread
From: Rik van Riel @ 2020-11-23 19:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: hughd, xuyu, akpm, mgorman, aarcange, willy, linux-kernel,
	kernel-team, linux-mm, vbabka, Andrey Grodzovsky, Chris Wilson

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

On Thu, 2020-11-19 at 10:38 +0100, Michal Hocko wrote:
> On Fri 13-11-20 22:40:40, Rik van Riel wrote:
> > On Thu, 2020-11-12 at 12:22 +0100, Michal Hocko wrote:
> > > [Cc Chris for i915 and Andray]
> > > 
> > > On Thu 05-11-20 14:15:08, Rik van Riel wrote:
> > > > Matthew Wilcox pointed out that the i915 driver
> > > > opportunistically
> > > > allocates tmpfs memory, but will happily reclaim some of its
> > > > pool if no memory is available.
> > > 
> > > It would be good to explicitly mention the requested gfp flags
> > > for
> > > those
> > > allocations. i915 uses __GFP_NORETRY | __GFP_NOWARN, or
> > > GFP_KERNEL.
> > > Is
> > > __shmem_rw really meant to not allocate from highmeme/movable
> > > zones?
> > > Can
> > > it be ever backed by THPs?
> > 
> > You are right, I need to copy the zone flags __GFP_DMA
> > through
> > __GFP_MOVABLE straight from the limiting gfp_mask
> > into the gfp_mask used for THP allocations, and not use
> > the default THP zone flags if the caller specifies something
> > else.
> > 
> > I'll send out a new version that fixes that.
> 
> Can we make one step back here and actually check whether all this is
> actually needed for those shmem users before adding more hacks here
> and
> there?

It doesn't look like that is needed, after all.

The i915 driver seems to support having its buffer in
highmem, the shmem_pwrite and shmem_pread functions
both do kmap/kunmap.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-11-23 19:40 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05 19:15 [PATCH 0/2] mm,thp,shm: limit shmem THP alloc gfp_mask Rik van Riel
2020-11-05 19:15 ` [PATCH 1/2] mm,thp,shmem: " Rik van Riel
2020-11-12 10:52   ` Michal Hocko
2020-11-14  3:44     ` Rik van Riel
2020-11-19  9:37       ` Michal Hocko
2020-11-05 19:15 ` [PATCH 2/2] mm,thp,shm: limit gfp mask to no more than specified Rik van Riel
2020-11-06  3:05   ` Hillf Danton
2020-11-06 17:53     ` Rik van Riel
2020-11-07  1:58       ` Hillf Danton
2020-11-12 11:22   ` Michal Hocko
2020-11-14  3:40     ` Rik van Riel
2020-11-19  9:38       ` Michal Hocko
2020-11-23 19:39         ` Rik van Riel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).