[PATCH] mm: limit direct reclaim for higher order allocations

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm: limit direct reclaim for higher order allocations
@ 2016-02-24 21:38 Rik van Riel
  2016-02-24 22:15 ` David Rientjes
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Rik van Riel @ 2016-02-24 21:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: hannes, akpm, vbabka, mgorman

For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
the kernel will do direct reclaim if compaction failed for any
reason. This worked fine when Linux systems had 128MB RAM, but
on my 24GB system I frequently see higher order allocations
free up over 3GB of memory, pushing all kinds of things into
swap, and slowing down applications.

It would be much better to limit the amount of reclaim done,
rather than cause excessive pageout activity.

When enough memory is free to do compaction for the highest order
allocation possible, bail out of the direct page reclaim code.

On smaller systems, this may be enough to obtain contiguous
free memory areas to satisfy small allocations, continuing our
strategy of relying on luck occasionally. On larger systems,
relying on luck like that has not been working for years.

Signed-off-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index fc62546096f9..8dd15d514761 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
 				continue;	/* Let kswapd poll it */

 			/*
-			 * If we already have plenty of memory free for
-			 * compaction in this zone, don't free any more.
-			 * Even though compaction is invoked for any
-			 * non-zero order, only frequent costly order
-			 * reclamation is disruptive enough to become a
-			 * noticeable problem, like transparent huge
-			 * page allocations.
+			 * For higher order allocations, free enough memory
+			 * to be able to do compaction for the largest possible
+			 * allocation. On smaller systems, this may be enough
+			 * that smaller allocations can skip compaction, if
+			 * enough adjacent pages get freed.
 			 */
-			if (IS_ENABLED(CONFIG_COMPACTION) &&
-			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
+			if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
 			    zonelist_zone_idx(z) <= requested_highidx &&
-			    compaction_ready(zone, sc->order)) {
+			    compaction_ready(zone, MAX_ORDER)) {
 				sc->compaction_ready = true;
-				continue;
+				return true;
 			}

 			/*

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-24 21:38 [PATCH] mm: limit direct reclaim for higher order allocations Rik van Riel
@ 2016-02-24 22:15 ` David Rientjes
  2016-02-24 22:17   ` Rik van Riel
  2016-02-24 23:02   ` Andrew Morton
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 11+ messages in thread
From: David Rientjes @ 2016-02-24 22:15 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, hannes, akpm, vbabka, mgorman

On Wed, 24 Feb 2016, Rik van Riel wrote:

> For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
> the kernel will do direct reclaim if compaction failed for any
> reason. This worked fine when Linux systems had 128MB RAM, but
> on my 24GB system I frequently see higher order allocations
> free up over 3GB of memory, pushing all kinds of things into
> swap, and slowing down applications.
> 

Just curious, are these higher order allocations typically done by the 
slub allocator or where are they coming from?

> It would be much better to limit the amount of reclaim done,
> rather than cause excessive pageout activity.
> 
> When enough memory is free to do compaction for the highest order
> allocation possible, bail out of the direct page reclaim code.
> 
> On smaller systems, this may be enough to obtain contiguous
> free memory areas to satisfy small allocations, continuing our
> strategy of relying on luck occasionally. On larger systems,
> relying on luck like that has not been working for years.
> 
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  mm/vmscan.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fc62546096f9..8dd15d514761 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  				continue;	/* Let kswapd poll it */
>  
>  			/*
> -			 * If we already have plenty of memory free for
> -			 * compaction in this zone, don't free any more.
> -			 * Even though compaction is invoked for any
> -			 * non-zero order, only frequent costly order
> -			 * reclamation is disruptive enough to become a
> -			 * noticeable problem, like transparent huge
> -			 * page allocations.
> +			 * For higher order allocations, free enough memory
> +			 * to be able to do compaction for the largest possible
> +			 * allocation. On smaller systems, this may be enough
> +			 * that smaller allocations can skip compaction, if
> +			 * enough adjacent pages get freed.
>  			 */
> -			if (IS_ENABLED(CONFIG_COMPACTION) &&
> -			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> +			if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
>  			    zonelist_zone_idx(z) <= requested_highidx &&
> -			    compaction_ready(zone, sc->order)) {
> +			    compaction_ready(zone, MAX_ORDER)) {
>  				sc->compaction_ready = true;
> -				continue;
> +				return true;
>  			}
>  
>  			/*
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-24 22:15 ` David Rientjes
@ 2016-02-24 22:17   ` Rik van Riel
  2016-02-25  0:30     ` Joonsoo Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2016-02-24 22:17 UTC (permalink / raw)
  To: David Rientjes; +Cc: linux-kernel, hannes, akpm, vbabka, mgorman

[-- Attachment #1: Type: text/plain, Size: 3979 bytes --]

On Wed, 2016-02-24 at 14:15 -0800, David Rientjes wrote:
> On Wed, 24 Feb 2016, Rik van Riel wrote:
> 
> > For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
> > the kernel will do direct reclaim if compaction failed for any
> > reason. This worked fine when Linux systems had 128MB RAM, but
> > on my 24GB system I frequently see higher order allocations
> > free up over 3GB of memory, pushing all kinds of things into
> > swap, and slowing down applications.
> > 
> 
> Just curious, are these higher order allocations typically done by
> the 
> slub allocator or where are they coming from?

These are slab allocator ones, indeed.

The allocations seem to be order 2 and 3, mostly
on behalf of the inode cache and alloc_skb.

> > It would be much better to limit the amount of reclaim done,
> > rather than cause excessive pageout activity.
> > 
> > When enough memory is free to do compaction for the highest order
> > allocation possible, bail out of the direct page reclaim code.
> > 
> > On smaller systems, this may be enough to obtain contiguous
> > free memory areas to satisfy small allocations, continuing our
> > strategy of relying on luck occasionally. On larger systems,
> > relying on luck like that has not been working for years.
> > 
> > Signed-off-by: Rik van Riel <riel@redhat.com>
> > ---
> >  mm/vmscan.c | 19 ++++++++-----------
> >  1 file changed, 8 insertions(+), 11 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index fc62546096f9..8dd15d514761 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist
> *zonelist, struct scan_control *sc)
> >                               continue;       /* Let kswapd poll it
> */
> >  
> >                       /*
> > -                      * If we already have plenty of memory free
> for
> > -                      * compaction in this zone, don't free any
> more.
> > -                      * Even though compaction is invoked for any
> > -                      * non-zero order, only frequent costly order
> > -                      * reclamation is disruptive enough to become
> a
> > -                      * noticeable problem, like transparent huge
> > -                      * page allocations.
> > +                      * For higher order allocations, free enough
> memory
> > +                      * to be able to do compaction for the
> largest possible
> > +                      * allocation. On smaller systems, this may
> be enough
> > +                      * that smaller allocations can skip
> compaction, if
> > +                      * enough adjacent pages get freed.
> >                        */
> > -                     if (IS_ENABLED(CONFIG_COMPACTION) &&
> > -                         sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> > +                     if (IS_ENABLED(CONFIG_COMPACTION) && sc-
> >order &&
> >                           zonelist_zone_idx(z) <= requested_highidx
> &&
> > -                         compaction_ready(zone, sc->order)) {
> > +                         compaction_ready(zone, MAX_ORDER)) {
> >                               sc->compaction_ready = true;
> > -                             continue;
> > +                             return true;
> >                       }
> >  
> >                       /*
> > 
-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-24 21:38 [PATCH] mm: limit direct reclaim for higher order allocations Rik van Riel
@ 2016-02-24 23:02   ` Andrew Morton
  2016-02-24 23:02   ` Andrew Morton
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2016-02-24 23:02 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, hannes, vbabka, mgorman, linux-mm

On Wed, 24 Feb 2016 16:38:50 -0500 Rik van Riel <riel@redhat.com> wrote:

> For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
> the kernel will do direct reclaim if compaction failed for any
> reason. This worked fine when Linux systems had 128MB RAM, but
> on my 24GB system I frequently see higher order allocations
> free up over 3GB of memory, pushing all kinds of things into
> swap, and slowing down applications.

hm.  Seems a pretty obvious flaw - why didn't we notice+fix it earlier?

> It would be much better to limit the amount of reclaim done,
> rather than cause excessive pageout activity.
> 
> When enough memory is free to do compaction for the highest order
> allocation possible, bail out of the direct page reclaim code.
> 
> On smaller systems, this may be enough to obtain contiguous
> free memory areas to satisfy small allocations, continuing our
> strategy of relying on luck occasionally. On larger systems,
> relying on luck like that has not been working for years.
> 

It would be nice to see some solid testing results on real-world
workloads?

(patch retained for linux-mm)

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fc62546096f9..8dd15d514761 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  				continue;	/* Let kswapd poll it */
>  
>  			/*
> -			 * If we already have plenty of memory free for
> -			 * compaction in this zone, don't free any more.
> -			 * Even though compaction is invoked for any
> -			 * non-zero order, only frequent costly order
> -			 * reclamation is disruptive enough to become a
> -			 * noticeable problem, like transparent huge
> -			 * page allocations.
> +			 * For higher order allocations, free enough memory
> +			 * to be able to do compaction for the largest possible
> +			 * allocation. On smaller systems, this may be enough
> +			 * that smaller allocations can skip compaction, if
> +			 * enough adjacent pages get freed.
>  			 */
> -			if (IS_ENABLED(CONFIG_COMPACTION) &&
> -			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> +			if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
>  			    zonelist_zone_idx(z) <= requested_highidx &&
> -			    compaction_ready(zone, sc->order)) {
> +			    compaction_ready(zone, MAX_ORDER)) {
>  				sc->compaction_ready = true;
> -				continue;
> +				return true;
>  			}
>  
>  			/*

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
@ 2016-02-24 23:02   ` Andrew Morton
  0 siblings, 0 replies; 11+ messages in thread
From: Andrew Morton @ 2016-02-24 23:02 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, hannes, vbabka, mgorman, linux-mm

On Wed, 24 Feb 2016 16:38:50 -0500 Rik van Riel <riel@redhat.com> wrote:

> For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
> the kernel will do direct reclaim if compaction failed for any
> reason. This worked fine when Linux systems had 128MB RAM, but
> on my 24GB system I frequently see higher order allocations
> free up over 3GB of memory, pushing all kinds of things into
> swap, and slowing down applications.

hm.  Seems a pretty obvious flaw - why didn't we notice+fix it earlier?

> It would be much better to limit the amount of reclaim done,
> rather than cause excessive pageout activity.
> 
> When enough memory is free to do compaction for the highest order
> allocation possible, bail out of the direct page reclaim code.
> 
> On smaller systems, this may be enough to obtain contiguous
> free memory areas to satisfy small allocations, continuing our
> strategy of relying on luck occasionally. On larger systems,
> relying on luck like that has not been working for years.
> 

It would be nice to see some solid testing results on real-world
workloads?

(patch retained for linux-mm)

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fc62546096f9..8dd15d514761 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  				continue;	/* Let kswapd poll it */
>  
>  			/*
> -			 * If we already have plenty of memory free for
> -			 * compaction in this zone, don't free any more.
> -			 * Even though compaction is invoked for any
> -			 * non-zero order, only frequent costly order
> -			 * reclamation is disruptive enough to become a
> -			 * noticeable problem, like transparent huge
> -			 * page allocations.
> +			 * For higher order allocations, free enough memory
> +			 * to be able to do compaction for the largest possible
> +			 * allocation. On smaller systems, this may be enough
> +			 * that smaller allocations can skip compaction, if
> +			 * enough adjacent pages get freed.
>  			 */
> -			if (IS_ENABLED(CONFIG_COMPACTION) &&
> -			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> +			if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
>  			    zonelist_zone_idx(z) <= requested_highidx &&
> -			    compaction_ready(zone, sc->order)) {
> +			    compaction_ready(zone, MAX_ORDER)) {
>  				sc->compaction_ready = true;
> -				continue;
> +				return true;
>  			}
>  
>  			/*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-24 23:02   ` Andrew Morton
  (?)
@ 2016-02-24 23:28   ` Rik van Riel
  -1 siblings, 0 replies; 11+ messages in thread
From: Rik van Riel @ 2016-02-24 23:28 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, hannes, vbabka, mgorman, linux-mm

[-- Attachment #1: Type: text/plain, Size: 3214 bytes --]

On Wed, 2016-02-24 at 15:02 -0800, Andrew Morton wrote:
> On Wed, 24 Feb 2016 16:38:50 -0500 Rik van Riel <riel@redhat.com>
> wrote:
> 
> > For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
> > the kernel will do direct reclaim if compaction failed for any
> > reason. This worked fine when Linux systems had 128MB RAM, but
> > on my 24GB system I frequently see higher order allocations
> > free up over 3GB of memory, pushing all kinds of things into
> > swap, and slowing down applications.
> 
> hm.  Seems a pretty obvious flaw - why didn't we notice+fix it
> earlier?

I have heard complaints about suspicious pageout
behaviour before, but had not investigated it
until recently.

> > It would be much better to limit the amount of reclaim done,
> > rather than cause excessive pageout activity.
> > 
> > When enough memory is free to do compaction for the highest order
> > allocation possible, bail out of the direct page reclaim code.
> > 
> > On smaller systems, this may be enough to obtain contiguous
> > free memory areas to satisfy small allocations, continuing our
> > strategy of relying on luck occasionally. On larger systems,
> > relying on luck like that has not been working for years.
> > 
> 
> It would be nice to see some solid testing results on real-world
> workloads?

That's why I posted it.  I suspect my workload
is not nearly as demanding as the workloads many
other people have, and this is the kind of thing
that wants some serious testing.

It might also make sense to carry it in -mm for
two full release cycles before sending it to Linus.

> (patch retained for linux-mm)
> 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index fc62546096f9..8dd15d514761 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist
> > *zonelist, struct scan_control *sc)
> >  				continue;	/* Let kswapd
> > poll it */
> >  
> >  			/*
> > -			 * If we already have plenty of memory
> > free for
> > -			 * compaction in this zone, don't free any
> > more.
> > -			 * Even though compaction is invoked for
> > any
> > -			 * non-zero order, only frequent costly
> > order
> > -			 * reclamation is disruptive enough to
> > become a
> > -			 * noticeable problem, like transparent
> > huge
> > -			 * page allocations.
> > +			 * For higher order allocations, free
> > enough memory
> > +			 * to be able to do compaction for the
> > largest possible
> > +			 * allocation. On smaller systems, this
> > may be enough
> > +			 * that smaller allocations can skip
> > compaction, if
> > +			 * enough adjacent pages get freed.
> >  			 */
> > -			if (IS_ENABLED(CONFIG_COMPACTION) &&
> > -			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> > +			if (IS_ENABLED(CONFIG_COMPACTION) && sc-
> > >order &&
> >  			    zonelist_zone_idx(z) <=
> > requested_highidx &&
> > -			    compaction_ready(zone, sc->order)) {
> > +			    compaction_ready(zone, MAX_ORDER)) {
> >  				sc->compaction_ready = true;
> > -				continue;
> > +				return true;
> >  			}
> >  
> >  			/*
-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-24 22:17   ` Rik van Riel
@ 2016-02-25  0:30     ` Joonsoo Kim
  2016-02-25  2:47       ` Rik van Riel
  0 siblings, 1 reply; 11+ messages in thread
From: Joonsoo Kim @ 2016-02-25  0:30 UTC (permalink / raw)
  To: Rik van Riel; +Cc: David Rientjes, linux-kernel, hannes, akpm, vbabka, mgorman

On Wed, Feb 24, 2016 at 05:17:56PM -0500, Rik van Riel wrote:
> On Wed, 2016-02-24 at 14:15 -0800, David Rientjes wrote:
> > On Wed, 24 Feb 2016, Rik van Riel wrote:
> > 
> > > For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
> > > the kernel will do direct reclaim if compaction failed for any
> > > reason. This worked fine when Linux systems had 128MB RAM, but
> > > on my 24GB system I frequently see higher order allocations
> > > free up over 3GB of memory, pushing all kinds of things into
> > > swap, and slowing down applications.
> > > 
> > 
> > Just curious, are these higher order allocations typically done by
> > the 
> > slub allocator or where are they coming from?
> 
> These are slab allocator ones, indeed.
> 
> The allocations seem to be order 2 and 3, mostly
> on behalf of the inode cache and alloc_skb.

Hello, Rik.

Could you tell me the kernel version you tested?

Commit 45eb00cd3a03 (mm/slub: don't wait for high-order page
allocation) changes slub allocator's behaviour that high order
allocation request by slub doesn't cause direct reclaim.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-25  0:30     ` Joonsoo Kim
@ 2016-02-25  2:47       ` Rik van Riel
  2016-02-25  4:42         ` Joonsoo Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Rik van Riel @ 2016-02-25  2:47 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: David Rientjes, linux-kernel, hannes, akpm, vbabka, mgorman

[-- Attachment #1: Type: text/plain, Size: 1428 bytes --]

On Thu, 2016-02-25 at 09:30 +0900, Joonsoo Kim wrote:
> On Wed, Feb 24, 2016 at 05:17:56PM -0500, Rik van Riel wrote:
> > On Wed, 2016-02-24 at 14:15 -0800, David Rientjes wrote:
> > > On Wed, 24 Feb 2016, Rik van Riel wrote:
> > > 
> > > > For multi page allocations smaller than
> > > > PAGE_ALLOC_COSTLY_ORDER,
> > > > the kernel will do direct reclaim if compaction failed for any
> > > > reason. This worked fine when Linux systems had 128MB RAM, but
> > > > on my 24GB system I frequently see higher order allocations
> > > > free up over 3GB of memory, pushing all kinds of things into
> > > > swap, and slowing down applications.
> > > >  
> > > 
> > > Just curious, are these higher order allocations typically done
> > > by
> > > the 
> > > slub allocator or where are they coming from?
> > 
> > These are slab allocator ones, indeed.
> > 
> > The allocations seem to be order 2 and 3, mostly
> > on behalf of the inode cache and alloc_skb.
> 
> Hello, Rik.
> 
> Could you tell me the kernel version you tested?
> 
> Commit 45eb00cd3a03 (mm/slub: don't wait for high-order page
> allocation) changes slub allocator's behaviour that high order
> allocation request by slub doesn't cause direct reclaim.

The system I observed the problem on has a
4.2 based kernel on it. That would explain.

Are we sure the problem is limited just to
slub, though?

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-25  2:47       ` Rik van Riel
@ 2016-02-25  4:42         ` Joonsoo Kim
  0 siblings, 0 replies; 11+ messages in thread
From: Joonsoo Kim @ 2016-02-25  4:42 UTC (permalink / raw)
  To: Rik van Riel; +Cc: David Rientjes, linux-kernel, hannes, akpm, vbabka, mgorman

On Wed, Feb 24, 2016 at 09:47:27PM -0500, Rik van Riel wrote:
> On Thu, 2016-02-25 at 09:30 +0900, Joonsoo Kim wrote:
> > On Wed, Feb 24, 2016 at 05:17:56PM -0500, Rik van Riel wrote:
> > > On Wed, 2016-02-24 at 14:15 -0800, David Rientjes wrote:
> > > > On Wed, 24 Feb 2016, Rik van Riel wrote:
> > > > 
> > > > > For multi page allocations smaller than
> > > > > PAGE_ALLOC_COSTLY_ORDER,
> > > > > the kernel will do direct reclaim if compaction failed for any
> > > > > reason. This worked fine when Linux systems had 128MB RAM, but
> > > > > on my 24GB system I frequently see higher order allocations
> > > > > free up over 3GB of memory, pushing all kinds of things into
> > > > > swap, and slowing down applications.
> > > > >  
> > > > 
> > > > Just curious, are these higher order allocations typically done
> > > > by
> > > > the 
> > > > slub allocator or where are they coming from?
> > > 
> > > These are slab allocator ones, indeed.
> > > 
> > > The allocations seem to be order 2 and 3, mostly
> > > on behalf of the inode cache and alloc_skb.
> > 
> > Hello, Rik.
> > 
> > Could you tell me the kernel version you tested?
> > 
> > Commit 45eb00cd3a03 (mm/slub: don't wait for high-order page
> > allocation) changes slub allocator's behaviour that high order
> > allocation request by slub doesn't cause direct reclaim.
> 
> The system I observed the problem on has a
> 4.2 based kernel on it. That would explain.
> 
> Are we sure the problem is limited just to
> slub, though?

AFAIK, there is one more notable place to request high-order page,
allocation for thread_info. However, it would be much less aggressive
than slub one. Please refer THREAD_SIZE_ORDER definition.

If we need to fix this situation, I think that it is better to make
shrink_zone_memcg() to consider allocation requested order. Entering
direct reclaim means that async compaction already fails for this
low order. Although sync compaction has much more power than async one
but it is possible that compaction would not work well at that time.
Because this low order allocation is something we take care about
unlike PAGE_ALLOC_COSTLY_ORDER allocation, I think that
small amount of reclaim is better than just skipping it.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-24 21:38 [PATCH] mm: limit direct reclaim for higher order allocations Rik van Riel
  2016-02-24 22:15 ` David Rientjes
  2016-02-24 23:02   ` Andrew Morton
@ 2016-02-25 14:43 ` Michal Hocko
  2016-03-07 15:42 ` Vlastimil Babka
  3 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2016-02-25 14:43 UTC (permalink / raw)
  To: Rik van Riel; +Cc: linux-kernel, hannes, akpm, vbabka, mgorman

On Wed 24-02-16 16:38:50, Rik van Riel wrote:
> For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
> the kernel will do direct reclaim if compaction failed for any
> reason. This worked fine when Linux systems had 128MB RAM, but
> on my 24GB system I frequently see higher order allocations
> free up over 3GB of memory, pushing all kinds of things into
> swap, and slowing down applications.
> 
> It would be much better to limit the amount of reclaim done,
> rather than cause excessive pageout activity.
> 
> When enough memory is free to do compaction for the highest order
> allocation possible, bail out of the direct page reclaim code.
> 
> On smaller systems, this may be enough to obtain contiguous
> free memory areas to satisfy small allocations, continuing our
> strategy of relying on luck occasionally. On larger systems,
> relying on luck like that has not been working for years.

I guess I have seen the similar problem just from a different direction
though. With my oom detection rework I have started seeing pre mature
OOM killing for higher order requests (mostly order-2 from from).
This thing is that the oom rework has limitted the number of the
reclaim/compaction retries to a finit number. Currently we are relying
on zone_reclaimable which can keep the reclaim in a loop for a long time
reclaimaing order-0 pages while compaction doesn't bother to compact at
all. The reason is most probably that the compaction is mainly focused
on THP and doesn't care about !costly high order allocations. Wouldn't
it be better if the compaction tried harder for these requests rather
than falling back to the reclaim which is not guaranteed to help much?
We can compact pages if they are on the LRU even without reclaiming
them, right?

That being said, shouldn't we rather have a look at compaction than the
reclaim path?

> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
>  mm/vmscan.c | 19 ++++++++-----------
>  1 file changed, 8 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fc62546096f9..8dd15d514761 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>  				continue;	/* Let kswapd poll it */
>  
>  			/*
> -			 * If we already have plenty of memory free for
> -			 * compaction in this zone, don't free any more.
> -			 * Even though compaction is invoked for any
> -			 * non-zero order, only frequent costly order
> -			 * reclamation is disruptive enough to become a
> -			 * noticeable problem, like transparent huge
> -			 * page allocations.
> +			 * For higher order allocations, free enough memory
> +			 * to be able to do compaction for the largest possible
> +			 * allocation. On smaller systems, this may be enough
> +			 * that smaller allocations can skip compaction, if
> +			 * enough adjacent pages get freed.
>  			 */
> -			if (IS_ENABLED(CONFIG_COMPACTION) &&
> -			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> +			if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
>  			    zonelist_zone_idx(z) <= requested_highidx &&
> -			    compaction_ready(zone, sc->order)) {
> +			    compaction_ready(zone, MAX_ORDER)) {
>  				sc->compaction_ready = true;
> -				continue;
> +				return true;
>  			}
>  
>  			/*

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] mm: limit direct reclaim for higher order allocations
  2016-02-24 21:38 [PATCH] mm: limit direct reclaim for higher order allocations Rik van Riel
                   ` (2 preceding siblings ...)
  2016-02-25 14:43 ` Michal Hocko
@ 2016-03-07 15:42 ` Vlastimil Babka
  3 siblings, 0 replies; 11+ messages in thread
From: Vlastimil Babka @ 2016-03-07 15:42 UTC (permalink / raw)
  To: Rik van Riel, linux-kernel; +Cc: hannes, akpm, mgorman

On 02/24/2016 10:38 PM, Rik van Riel wrote:
> For multi page allocations smaller than PAGE_ALLOC_COSTLY_ORDER,
> the kernel will do direct reclaim if compaction failed for any
> reason. This worked fine when Linux systems had 128MB RAM, but
> on my 24GB system I frequently see higher order allocations
> free up over 3GB of memory, pushing all kinds of things into
> swap, and slowing down applications.
>
> It would be much better to limit the amount of reclaim done,
> rather than cause excessive pageout activity.
>
> When enough memory is free to do compaction for the highest order
> allocation possible, bail out of the direct page reclaim code.
>
> On smaller systems, this may be enough to obtain contiguous
> free memory areas to satisfy small allocations, continuing our
> strategy of relying on luck occasionally. On larger systems,
> relying on luck like that has not been working for years.
>
> Signed-off-by: Rik van Riel <riel@redhat.com>

So the main point of this patch is the change from "continue" to "return 
true", right? This will prevent looking at other zones, but I guess 
that's not the reason why without this patch reclaim frees 3 of 24GB for 
you?

What I suspect more is should_continue_reclaim() where it wants to 
reclaim (2UL << sc->order) pages regardless of watermark, or compaction 
status. But that one is called from shrink_zone(), and shrink_zones() 
should not call shrink_zone() if compaction is ready, even before this 
patch. Perhaps if multiple processes manage to enter shrink_zone() 
simultaneously, they could over-reclaim due to that?

> ---
>   mm/vmscan.c | 19 ++++++++-----------
>   1 file changed, 8 insertions(+), 11 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index fc62546096f9..8dd15d514761 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2584,20 +2584,17 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc)
>   				continue;	/* Let kswapd poll it */
>
>   			/*
> -			 * If we already have plenty of memory free for
> -			 * compaction in this zone, don't free any more.
> -			 * Even though compaction is invoked for any
> -			 * non-zero order, only frequent costly order
> -			 * reclamation is disruptive enough to become a
> -			 * noticeable problem, like transparent huge
> -			 * page allocations.
> +			 * For higher order allocations, free enough memory
> +			 * to be able to do compaction for the largest possible
> +			 * allocation. On smaller systems, this may be enough
> +			 * that smaller allocations can skip compaction, if
> +			 * enough adjacent pages get freed.
>   			 */
> -			if (IS_ENABLED(CONFIG_COMPACTION) &&
> -			    sc->order > PAGE_ALLOC_COSTLY_ORDER &&
> +			if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
>   			    zonelist_zone_idx(z) <= requested_highidx &&
> -			    compaction_ready(zone, sc->order)) {
> +			    compaction_ready(zone, MAX_ORDER)) {
>   				sc->compaction_ready = true;
> -				continue;
> +				return true;
>   			}
>
>   			/*
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-03-07 15:42 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-24 21:38 [PATCH] mm: limit direct reclaim for higher order allocations Rik van Riel
2016-02-24 22:15 ` David Rientjes
2016-02-24 22:17   ` Rik van Riel
2016-02-25  0:30     ` Joonsoo Kim
2016-02-25  2:47       ` Rik van Riel
2016-02-25  4:42         ` Joonsoo Kim
2016-02-24 23:02 ` Andrew Morton
2016-02-24 23:02   ` Andrew Morton
2016-02-24 23:28   ` Rik van Riel
2016-02-25 14:43 ` Michal Hocko
2016-03-07 15:42 ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.