linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] mm, vmscan: avoid thrashing anon lru when free + file is low
@ 2017-04-18  0:06 David Rientjes
  2017-04-18  1:36 ` Minchan Kim
  2017-04-18  7:11 ` Michal Hocko
  0 siblings, 2 replies; 19+ messages in thread
From: David Rientjes @ 2017-04-18  0:06 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
not swap anon pages just because free+file is low'") reintroduces is to
prefer swapping anonymous memory rather than trashing the file lru.

If all anonymous memory is unevictable, however, this insistance on
SCAN_ANON ends up thrashing that lru instead.

Check that enough evictable anon memory is actually on this lruvec before
insisting on SCAN_ANON.  SWAP_CLUSTER_MAX is used as the threshold to
determine if only scanning anon is beneficial.

Otherwise, fallback to balanced reclaim so the file lru doesn't remain
untouched.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/vmscan.c | 41 +++++++++++++++++++++++------------------
 1 file changed, 23 insertions(+), 18 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2186,26 +2186,31 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 	 * anon pages.  Try to detect this based on file LRU size.
 	 */
 	if (global_reclaim(sc)) {
-		unsigned long pgdatfile;
-		unsigned long pgdatfree;
-		int z;
-		unsigned long total_high_wmark = 0;
-
-		pgdatfree = sum_zone_node_page_state(pgdat->node_id, NR_FREE_PAGES);
-		pgdatfile = node_page_state(pgdat, NR_ACTIVE_FILE) +
-			   node_page_state(pgdat, NR_INACTIVE_FILE);
-
-		for (z = 0; z < MAX_NR_ZONES; z++) {
-			struct zone *zone = &pgdat->node_zones[z];
-			if (!managed_zone(zone))
-				continue;
+		anon = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, sc->reclaim_idx) +
+		       lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx);
+		if (likely(anon >= SWAP_CLUSTER_MAX)) {
+			unsigned long total_high_wmark = 0;
+			unsigned long pgdatfile;
+			unsigned long pgdatfree;
+			int z;
+
+			pgdatfree = sum_zone_node_page_state(pgdat->node_id,
+							     NR_FREE_PAGES);
+			pgdatfile = node_page_state(pgdat, NR_ACTIVE_FILE) +
+				    node_page_state(pgdat, NR_INACTIVE_FILE);
+
+			for (z = 0; z < MAX_NR_ZONES; z++) {
+				struct zone *zone = &pgdat->node_zones[z];
+				if (!managed_zone(zone))
+					continue;
 
-			total_high_wmark += high_wmark_pages(zone);
-		}
+				total_high_wmark += high_wmark_pages(zone);
+			}
 
-		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
-			scan_balance = SCAN_ANON;
-			goto out;
+			if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
+				scan_balance = SCAN_ANON;
+				goto out;
+			}
 		}
 	}
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-04-18  0:06 [patch] mm, vmscan: avoid thrashing anon lru when free + file is low David Rientjes
@ 2017-04-18  1:36 ` Minchan Kim
  2017-04-18 21:32   ` David Rientjes
  2017-04-18  7:11 ` Michal Hocko
  1 sibling, 1 reply; 19+ messages in thread
From: Minchan Kim @ 2017-04-18  1:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

Hello David,

On Mon, Apr 17, 2017 at 05:06:20PM -0700, David Rientjes wrote:
> The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
> not swap anon pages just because free+file is low'") reintroduces is to
> prefer swapping anonymous memory rather than trashing the file lru.
> 
> If all anonymous memory is unevictable, however, this insistance on

"unevictable" means hot workingset, not (mlocked and increased refcount
by some driver)?
I got confused.

> SCAN_ANON ends up thrashing that lru instead.

Sound reasonable.

> 
> Check that enough evictable anon memory is actually on this lruvec before
> insisting on SCAN_ANON.  SWAP_CLUSTER_MAX is used as the threshold to
> determine if only scanning anon is beneficial.

Why do you use SWAP_CLUSTER_MAX instead of (high wmark + free) like
file-backed pages?
As considering anonymous pages have more probability to become workingset
because they are are mapped, IMO, more {strong or equal} condition than
file-LRU would be better to prevent anon LRU thrashing.

> 
> Otherwise, fallback to balanced reclaim so the file lru doesn't remain
> untouched.
> 
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  mm/vmscan.c | 41 +++++++++++++++++++++++------------------
>  1 file changed, 23 insertions(+), 18 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2186,26 +2186,31 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  	 * anon pages.  Try to detect this based on file LRU size.

Please update this comment, too.

>  	 */
>  	if (global_reclaim(sc)) {
> -		unsigned long pgdatfile;
> -		unsigned long pgdatfree;
> -		int z;
> -		unsigned long total_high_wmark = 0;
> -
> -		pgdatfree = sum_zone_node_page_state(pgdat->node_id, NR_FREE_PAGES);
> -		pgdatfile = node_page_state(pgdat, NR_ACTIVE_FILE) +
> -			   node_page_state(pgdat, NR_INACTIVE_FILE);
> -
> -		for (z = 0; z < MAX_NR_ZONES; z++) {
> -			struct zone *zone = &pgdat->node_zones[z];
> -			if (!managed_zone(zone))
> -				continue;
> +		anon = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, sc->reclaim_idx) +
> +		       lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx);
> +		if (likely(anon >= SWAP_CLUSTER_MAX)) {

With high_wmark, we can do this.

        if (global_reclaim(sc)) {
                pgdatfree = xxx;
                pgdatfile = xxx;
                total_high_wmark = xxx;

                if (pgdatfile + pgdatfree <= total_high_wmark) {
                        pgdatanon = xxx;
                        if (pgdatanon + pgdatfree > total_high_wmark) {
                                scan_balance = SCAN_ANON;
                                goto out;
                        }
                }
        }


> +			unsigned long total_high_wmark = 0;
> +			unsigned long pgdatfile;
> +			unsigned long pgdatfree;
> +			int z;
> +
> +			pgdatfree = sum_zone_node_page_state(pgdat->node_id,
> +							     NR_FREE_PAGES);
> +			pgdatfile = node_page_state(pgdat, NR_ACTIVE_FILE) +
> +				    node_page_state(pgdat, NR_INACTIVE_FILE);
> +
> +			for (z = 0; z < MAX_NR_ZONES; z++) {
> +				struct zone *zone = &pgdat->node_zones[z];
> +				if (!managed_zone(zone))
> +					continue;
>  
> -			total_high_wmark += high_wmark_pages(zone);
> -		}
> +				total_high_wmark += high_wmark_pages(zone);
> +			}
>  
> -		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
> -			scan_balance = SCAN_ANON;
> -			goto out;
> +			if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
> +				scan_balance = SCAN_ANON;
> +				goto out;
> +			}
>  		}
>  	}
>  
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-04-18  0:06 [patch] mm, vmscan: avoid thrashing anon lru when free + file is low David Rientjes
  2017-04-18  1:36 ` Minchan Kim
@ 2017-04-18  7:11 ` Michal Hocko
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-04-18  7:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

On Mon 17-04-17 17:06:20, David Rientjes wrote:
> The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
> not swap anon pages just because free+file is low'") reintroduces is to
> prefer swapping anonymous memory rather than trashing the file lru.
> 
> If all anonymous memory is unevictable, however, this insistance on
> SCAN_ANON ends up thrashing that lru instead.

Why would be the anonymous memory unevictable? If the swap is depleted
then we enforce file scanning AFAIR. Are those pages pinned somehow, by
who? It would be great if you could describe the workload which triggers
a problem which you are trying to fix.

> Check that enough evictable anon memory is actually on this lruvec before
> insisting on SCAN_ANON.  SWAP_CLUSTER_MAX is used as the threshold to
> determine if only scanning anon is beneficial.
>
> Otherwise, fallback to balanced reclaim so the file lru doesn't remain
> untouched.

Why should we treat anonymous and file pages any different here. In
other words why should file pages check for high wmark and anonymous for
SWAP_CLUSTER_MAX.

[...]
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-04-18  1:36 ` Minchan Kim
@ 2017-04-18 21:32   ` David Rientjes
  2017-04-19  0:14     ` Minchan Kim
  2017-04-19  7:04     ` [patch] " Michal Hocko
  0 siblings, 2 replies; 19+ messages in thread
From: David Rientjes @ 2017-04-18 21:32 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

On Tue, 18 Apr 2017, Minchan Kim wrote:

> > The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
> > not swap anon pages just because free+file is low'") reintroduces is to
> > prefer swapping anonymous memory rather than trashing the file lru.
> > 
> > If all anonymous memory is unevictable, however, this insistance on
> 
> "unevictable" means hot workingset, not (mlocked and increased refcount
> by some driver)?
> I got confused.
> 

For my purposes, it's mlocked, but I think this thrashing is possible 
anytime we fail the file lru heuristic and the evictable anon lrus are 
very small themselves.  I'll update the changelog to make this explicit.

> > Check that enough evictable anon memory is actually on this lruvec before
> > insisting on SCAN_ANON.  SWAP_CLUSTER_MAX is used as the threshold to
> > determine if only scanning anon is beneficial.
> 
> Why do you use SWAP_CLUSTER_MAX instead of (high wmark + free) like
> file-backed pages?
> As considering anonymous pages have more probability to become workingset
> because they are are mapped, IMO, more {strong or equal} condition than
> file-LRU would be better to prevent anon LRU thrashing.
> 

If the suggestion is checking
NR_ACTIVE_ANON + NR_INACTIVE_ANON > total_high_wmark pages, it would be a 
separate heurstic to address a problem that I'm not having :)  My issue is 
specifically when NR_ACTIVE_FILE + NR_INACTIVE_FILE < total_high_wmark, 
NR_ACTIVE_ANON + NR_INACTIVE_ANON is very large, but all not on this 
lruvec's evictable lrus.

This is the reason why I chose lruvec_lru_size() rather than per-node 
statistics.  The argument could also be made for the file lrus in the 
get_scan_count() heuristic that forces SCAN_ANON, but I have not met such 
an issue (yet).  I could follow-up with that change or incorporate it into 
a v2 of this patch if you'd prefer.

In other words, I want get_scan_count() to not force SCAN_ANON and 
fallback to SCAN_FRACT, absent other heuristics, if the amount of 
evictable anon is below a certain threshold for this lruvec.  I 
arbitrarily chose SWAP_CLUSTER_MAX to be conservative, but I could easily 
compare to total_high_wmark as well, although I would consider that more 
aggressive.

So we're in global reclaim, our file lrus are below thresholds, but we 
don't want to force SCAN_ANON for all lruvecs if there's not enough to 
reclaim from evictable anon.  Do you have a suggestion for how to 
implement this logic other than this patch?

> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2186,26 +2186,31 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
> >  	 * anon pages.  Try to detect this based on file LRU size.
> 
> Please update this comment, too.
> 

Ok, I've added: "Try to detect this based on file LRU size, but do not 
limit scanning to anon if it is too small itself."

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-04-18 21:32   ` David Rientjes
@ 2017-04-19  0:14     ` Minchan Kim
  2017-04-19 23:24       ` David Rientjes
  2017-04-19  7:04     ` [patch] " Michal Hocko
  1 sibling, 1 reply; 19+ messages in thread
From: Minchan Kim @ 2017-04-19  0:14 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

Hi David,

On Tue, Apr 18, 2017 at 02:32:56PM -0700, David Rientjes wrote:
> On Tue, 18 Apr 2017, Minchan Kim wrote:
> 
> > > The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
> > > not swap anon pages just because free+file is low'") reintroduces is to
> > > prefer swapping anonymous memory rather than trashing the file lru.
> > > 
> > > If all anonymous memory is unevictable, however, this insistance on
> > 
> > "unevictable" means hot workingset, not (mlocked and increased refcount
> > by some driver)?
> > I got confused.
> > 
> 
> For my purposes, it's mlocked, but I think this thrashing is possible 
> anytime we fail the file lru heuristic and the evictable anon lrus are 
> very small themselves.  I'll update the changelog to make this explicit.

I understood now. Thanks for clarifying.

> 
> > > Check that enough evictable anon memory is actually on this lruvec before
> > > insisting on SCAN_ANON.  SWAP_CLUSTER_MAX is used as the threshold to
> > > determine if only scanning anon is beneficial.
> > 
> > Why do you use SWAP_CLUSTER_MAX instead of (high wmark + free) like
> > file-backed pages?
> > As considering anonymous pages have more probability to become workingset
> > because they are are mapped, IMO, more {strong or equal} condition than
> > file-LRU would be better to prevent anon LRU thrashing.
> > 
> 
> If the suggestion is checking
> NR_ACTIVE_ANON + NR_INACTIVE_ANON > total_high_wmark pages, it would be a 
> separate heurstic to address a problem that I'm not having :)  My issue is 
> specifically when NR_ACTIVE_FILE + NR_INACTIVE_FILE < total_high_wmark, 
> NR_ACTIVE_ANON + NR_INACTIVE_ANON is very large, but all not on this 
> lruvec's evictable lrus.

I understand it as "all not eligible LRU lists". Right?
I will write the comment below with that my assumption is right.

> 
> This is the reason why I chose lruvec_lru_size() rather than per-node 
> statistics.  The argument could also be made for the file lrus in the 
> get_scan_count() heuristic that forces SCAN_ANON, but I have not met such 
> an issue (yet).  I could follow-up with that change or incorporate it into 
> a v2 of this patch if you'd prefer.

I don't think we need to fix that part because the logic is to keep
some amount of file-backed page workingset regardless of eligible
zones. 

> 
> In other words, I want get_scan_count() to not force SCAN_ANON and 
> fallback to SCAN_FRACT, absent other heuristics, if the amount of 
> evictable anon is below a certain threshold for this lruvec.  I 
> arbitrarily chose SWAP_CLUSTER_MAX to be conservative, but I could easily 
> compare to total_high_wmark as well, although I would consider that more 
> aggressive.

I realize your problem now. It's rather different heuristic so no need
to align file-lru. But SWAP_CLUSTER_MAX is too conservatie, too. IMHO.

How about this?

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 24efcc20af91..5d2f3fa41e92 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2174,8 +2174,17 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 		}
 
 		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
-			scan_balance = SCAN_ANON;
-			goto out;
+			/*
+			 * force SCAN_ANON if inactive anonymous LRU lists of
+			 * eligible zones are enough pages. Otherwise, thrashing
+			 * can be happen on the small anonymous LRU list.
+			 */
+			if (!inactive_list_is_low(lruvec, false, NULL, sc, false) &&
+			     lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)
+					>> sc->priority) {
+				scan_balance = SCAN_ANON;
+				goto out;
+			}
 		}
 	}
 

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-04-18 21:32   ` David Rientjes
  2017-04-19  0:14     ` Minchan Kim
@ 2017-04-19  7:04     ` Michal Hocko
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-04-19  7:04 UTC (permalink / raw)
  To: David Rientjes
  Cc: Minchan Kim, Andrew Morton, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

On Tue 18-04-17 14:32:56, David Rientjes wrote:
[...]
> If the suggestion is checking
> NR_ACTIVE_ANON + NR_INACTIVE_ANON > total_high_wmark pages, it would be a 
> separate heurstic to address a problem that I'm not having :)  My issue is 
> specifically when NR_ACTIVE_FILE + NR_INACTIVE_FILE < total_high_wmark, 
> NR_ACTIVE_ANON + NR_INACTIVE_ANON is very large, but all not on this 
> lruvec's evictable lrus.

Hmm, why are those pages not moved to the unevictable LRU lists?

> This is the reason why I chose lruvec_lru_size() rather than per-node 
> statistics.  The argument could also be made for the file lrus in the 
> get_scan_count() heuristic that forces SCAN_ANON, but I have not met such 
> an issue (yet).  I could follow-up with that change or incorporate it into 
> a v2 of this patch if you'd prefer.
> 
> In other words, I want get_scan_count() to not force SCAN_ANON and 
> fallback to SCAN_FRACT, absent other heuristics, if the amount of 
> evictable anon is below a certain threshold for this lruvec.  I 
> arbitrarily chose SWAP_CLUSTER_MAX to be conservative, but I could easily 
> compare to total_high_wmark as well, although I would consider that more 
> aggressive.
> 
> So we're in global reclaim, our file lrus are below thresholds, but we 
> don't want to force SCAN_ANON for all lruvecs if there's not enough to 
> reclaim from evictable anon.  Do you have a suggestion for how to 
> implement this logic other than this patch?

I agree that forcing SCAN_ANON without looking at the ANON lru size is
not optimal but I would rather see the same criterion for both anon and
file. get_scan_count is full of magic heuristics which tend to break for
different workloads. Let's not add another magic on top please.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-04-19  0:14     ` Minchan Kim
@ 2017-04-19 23:24       ` David Rientjes
  2017-04-20  6:09         ` Minchan Kim
  0 siblings, 1 reply; 19+ messages in thread
From: David Rientjes @ 2017-04-19 23:24 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

On Wed, 19 Apr 2017, Minchan Kim wrote:

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 24efcc20af91..5d2f3fa41e92 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2174,8 +2174,17 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  		}
>  
>  		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
> -			scan_balance = SCAN_ANON;
> -			goto out;
> +			/*
> +			 * force SCAN_ANON if inactive anonymous LRU lists of
> +			 * eligible zones are enough pages. Otherwise, thrashing
> +			 * can be happen on the small anonymous LRU list.
> +			 */
> +			if (!inactive_list_is_low(lruvec, false, NULL, sc, false) &&
> +			     lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)
> +					>> sc->priority) {
> +				scan_balance = SCAN_ANON;
> +				goto out;
> +			}
>  		}
>  	}
>  

Hi Minchan,

This looks good and it correctly biases against SCAN_ANON for my workload 
that was thrashing the anon lrus.  Feel free to use parts of my changelog 
if you'd like.

Tested-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-04-19 23:24       ` David Rientjes
@ 2017-04-20  6:09         ` Minchan Kim
  2017-05-01 21:34           ` [patch v2] " David Rientjes
  0 siblings, 1 reply; 19+ messages in thread
From: Minchan Kim @ 2017-04-20  6:09 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

Hi David,

On Wed, Apr 19, 2017 at 04:24:48PM -0700, David Rientjes wrote:
> On Wed, 19 Apr 2017, Minchan Kim wrote:
> 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 24efcc20af91..5d2f3fa41e92 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2174,8 +2174,17 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
> >  		}
> >  
> >  		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
> > -			scan_balance = SCAN_ANON;
> > -			goto out;
> > +			/*
> > +			 * force SCAN_ANON if inactive anonymous LRU lists of
> > +			 * eligible zones are enough pages. Otherwise, thrashing
> > +			 * can be happen on the small anonymous LRU list.
> > +			 */
> > +			if (!inactive_list_is_low(lruvec, false, NULL, sc, false) &&
> > +			     lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)
> > +					>> sc->priority) {
> > +				scan_balance = SCAN_ANON;
> > +				goto out;
> > +			}
> >  		}
> >  	}
> >  
> 
> Hi Minchan,
> 
> This looks good and it correctly biases against SCAN_ANON for my workload 
> that was thrashing the anon lrus.  Feel free to use parts of my changelog 
> if you'd like.

Thanks for the testing!
As considering how it's hard to find such a problem, it should be totally your
credit. So you can send the patch with detailed description. Feel free to
add my suggested-by. :)

Thanks!

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-04-20  6:09         ` Minchan Kim
@ 2017-05-01 21:34           ` David Rientjes
  2017-05-02  8:02             ` Michal Hocko
                               ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: David Rientjes @ 2017-05-01 21:34 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
not swap anon pages just because free+file is low'") reintroduces is to
prefer swapping anonymous memory rather than trashing the file lru.

If the anonymous inactive lru for the set of eligible zones is considered
low, however, or the length of the list for the given reclaim priority
does not allow for effective anonymous-only reclaiming, then avoid
forcing SCAN_ANON.  Forcing SCAN_ANON will end up thrashing the small
list and leave unreclaimed memory on the file lrus.

If the inactive list is insufficient, fallback to balanced reclaim so the
file lru doesn't remain untouched.

Suggested-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: David Rientjes <rientjes@google.com>
---
 to akpm: this issue has been possible since at least 3.15, so it's
 probably not high priority for 4.12 but applies cleanly if it can sneak
 in

 mm/vmscan.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2204,8 +2204,17 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 		}
 
 		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
-			scan_balance = SCAN_ANON;
-			goto out;
+			/*
+			 * Force SCAN_ANON if there are enough inactive
+			 * anonymous pages on the LRU in eligible zones.
+			 * Otherwise, the small LRU gets thrashed.
+			 */
+			if (!inactive_list_is_low(lruvec, false, sc, false) &&
+			    lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)
+					>> sc->priority) {
+				scan_balance = SCAN_ANON;
+				goto out;
+			}
 		}
 	}
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-01 21:34           ` [patch v2] " David Rientjes
@ 2017-05-02  8:02             ` Michal Hocko
  2017-05-02 20:41               ` David Rientjes
  2017-05-31 15:20             ` Michal Hocko
  2017-06-02 20:36             ` Andrew Morton
  2 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2017-05-02  8:02 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Minchan Kim, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

On Mon 01-05-17 14:34:21, David Rientjes wrote:
[...]
> @@ -2204,8 +2204,17 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  		}
>  
>  		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
> -			scan_balance = SCAN_ANON;
> -			goto out;
> +			/*
> +			 * Force SCAN_ANON if there are enough inactive
> +			 * anonymous pages on the LRU in eligible zones.
> +			 * Otherwise, the small LRU gets thrashed.
> +			 */
> +			if (!inactive_list_is_low(lruvec, false, sc, false) &&
> +			    lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)
> +					>> sc->priority) {
> +				scan_balance = SCAN_ANON;
> +				goto out;
> +			}

I have already asked and my questions were ignored. So let me ask again
and hopefuly not get ignored this time. So Why do we need a different
criterion on anon pages than file pages? I do agree that blindly
scanning anon pages when file pages are low is very suboptimal but this
adds yet another heuristic without _any_ numbers. Why cannot we simply
treat anon and file pages equally? Something like the following

	if (pgdatfile + pgdatanon + pgdatfree > 2*total_high_wmark) {
		scan_balance = SCAN_FILE;
		if (pgdatfile < pgdatanon)
			scan_balance = SCAN_ANON;
		goto out;
	}

Also it would help to describe the workload which can trigger this
behavior so that we can compare numbers before and after this patch.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-02  8:02             ` Michal Hocko
@ 2017-05-02 20:41               ` David Rientjes
  2017-05-03  6:15                 ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: David Rientjes @ 2017-05-02 20:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Minchan Kim, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

On Tue, 2 May 2017, Michal Hocko wrote:

> I have already asked and my questions were ignored. So let me ask again
> and hopefuly not get ignored this time. So Why do we need a different
> criterion on anon pages than file pages?

The preference in get_scan_count() as already implemented is to reclaim 
from file pages if there is enough memory on the inactive list to reclaim.  
That is unchanged with this patch.

> I do agree that blindly
> scanning anon pages when file pages are low is very suboptimal but this
> adds yet another heuristic without _any_ numbers. Why cannot we simply
> treat anon and file pages equally? Something like the following
> 
> 	if (pgdatfile + pgdatanon + pgdatfree > 2*total_high_wmark) {
> 		scan_balance = SCAN_FILE;
> 		if (pgdatfile < pgdatanon)
> 			scan_balance = SCAN_ANON;
> 		goto out;
> 	}
> 

This would be substantially worse than the current code because it 
thrashes the anon lru when anon out numbers file pages rather than at the 
point we fall under the high watermarks for all eligible zones.  If you 
tested your suggestion, you could see gigabytes of memory left untouched 
on the file lru.  Anonymous memory is more probable to be part of the 
working set.

> Also it would help to describe the workload which can trigger this
> behavior so that we can compare numbers before and after this patch.

Any workload that fills system RAM with anonymous memory that cannot be 
reclaimed will thrash the anon lru without this patch.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-02 20:41               ` David Rientjes
@ 2017-05-03  6:15                 ` Michal Hocko
  2017-05-03  7:06                   ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2017-05-03  6:15 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Minchan Kim, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

On Tue 02-05-17 13:41:23, David Rientjes wrote:
> On Tue, 2 May 2017, Michal Hocko wrote:
> 
> > I have already asked and my questions were ignored. So let me ask again
> > and hopefuly not get ignored this time. So Why do we need a different
> > criterion on anon pages than file pages?
> 
> The preference in get_scan_count() as already implemented is to reclaim 
> from file pages if there is enough memory on the inactive list to reclaim.  
> That is unchanged with this patch.

My fault, I was too vague. My question was basically why should we use
a different criterion to SCAN_ANON than SCAN_FILE.

> > I do agree that blindly
> > scanning anon pages when file pages are low is very suboptimal but this
> > adds yet another heuristic without _any_ numbers. Why cannot we simply
> > treat anon and file pages equally? Something like the following
> > 
> > 	if (pgdatfile + pgdatanon + pgdatfree > 2*total_high_wmark) {
> > 		scan_balance = SCAN_FILE;
> > 		if (pgdatfile < pgdatanon)
> > 			scan_balance = SCAN_ANON;
> > 		goto out;
> > 	}
> > 
> 
> This would be substantially worse than the current code because it 
> thrashes the anon lru when anon out numbers file pages rather than at the 
> point we fall under the high watermarks for all eligible zones.  If you 
> tested your suggestion, you could see gigabytes of memory left untouched 
> on the file lru.  Anonymous memory is more probable to be part of the 
> working set.

This was supposed to be more an example of a direction I was thinking,
definitely not a final patch. I will think more to come up with a
more complete proposal.

> > Also it would help to describe the workload which can trigger this
> > behavior so that we can compare numbers before and after this patch.
> 
> Any workload that fills system RAM with anonymous memory that cannot be 
> reclaimed will thrash the anon lru without this patch.

I have already asked, but I do not understand why this anon memory
couldn't be reclaimed. Who is pinning it? Why cannot it be swapped out?
If it is mlocked it should be moved to unevictable LRU. What am I
missing?

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-03  6:15                 ` Michal Hocko
@ 2017-05-03  7:06                   ` Michal Hocko
  2017-05-03  8:49                     ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2017-05-03  7:06 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Minchan Kim, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

On Wed 03-05-17 08:15:28, Michal Hocko wrote:
> On Tue 02-05-17 13:41:23, David Rientjes wrote:
> > On Tue, 2 May 2017, Michal Hocko wrote:
[...]
> > > I do agree that blindly
> > > scanning anon pages when file pages are low is very suboptimal but this
> > > adds yet another heuristic without _any_ numbers. Why cannot we simply
> > > treat anon and file pages equally? Something like the following
> > > 
> > > 	if (pgdatfile + pgdatanon + pgdatfree > 2*total_high_wmark) {
> > > 		scan_balance = SCAN_FILE;
> > > 		if (pgdatfile < pgdatanon)
> > > 			scan_balance = SCAN_ANON;
> > > 		goto out;
> > > 	}
> > > 
> > 
> > This would be substantially worse than the current code because it 
> > thrashes the anon lru when anon out numbers file pages rather than at the 
> > point we fall under the high watermarks for all eligible zones.  If you 
> > tested your suggestion, you could see gigabytes of memory left untouched 
> > on the file lru.  Anonymous memory is more probable to be part of the 
> > working set.
> 
> This was supposed to be more an example of a direction I was thinking,
> definitely not a final patch. I will think more to come up with a
> more complete proposal.

This is still untested but should be much closer to what I've had in
mind.
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 24efcc20af91..bcdad30f942d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2174,8 +2174,14 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 		}
 
 		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
-			scan_balance = SCAN_ANON;
-			goto out;
+			unsigned long pgdatanon;
+
+			pgdatanon = node_page_state(pgdat, NR_ACTIVE_ANON) +
+				node_page_state(pgdat, NR_INACTIVE_ANON);
+			if (pgdatanon + pgdatfree > total_high_wmark) {
+				scan_balance = SCAN_ANON;
+				goto out;
+			}
 		}
 	}
 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-03  7:06                   ` Michal Hocko
@ 2017-05-03  8:49                     ` Michal Hocko
  2017-05-03 22:52                       ` David Rientjes
  0 siblings, 1 reply; 19+ messages in thread
From: Michal Hocko @ 2017-05-03  8:49 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Minchan Kim, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

On Wed 03-05-17 09:06:56, Michal Hocko wrote:
[...]
> This is still untested but should be much closer to what I've had in
> mind.
> ---
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 24efcc20af91..bcdad30f942d 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2174,8 +2174,14 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  		}
>  
>  		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
> -			scan_balance = SCAN_ANON;
> -			goto out;
> +			unsigned long pgdatanon;
> +
> +			pgdatanon = node_page_state(pgdat, NR_ACTIVE_ANON) +
> +				node_page_state(pgdat, NR_INACTIVE_ANON);
> +			if (pgdatanon + pgdatfree > total_high_wmark) {
> +				scan_balance = SCAN_ANON;
> +				goto out;
> +			}
>  		}
>  	}

I've realized that this just makes the situation more obscure than
necessary after thinking some more about it. It also doesn't achieve my
original intention to treat biased anon and file LRUs the same. Now that
I've digested the change more thoroughly I am willing to ack your patch.
So feel free to add
Acked-by: Michal Hocko <mhocko@suse.com>
And sorry about the diversion here but I am always nervous when touching
g_s_c because this tends to lead to subtle issues.

Maybe we could make this aspect of the biased LRUs more explicit by
doing the following rather than duplicating the condition. What do you
think?
---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 24efcc20af91..f3ec8760dc06 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2113,16 +2113,14 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 	u64 denominator = 0;	/* gcc */
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	unsigned long anon_prio, file_prio;
-	enum scan_balance scan_balance;
+	enum scan_balance scan_balance = SCAN_FILE;
 	unsigned long anon, file;
 	unsigned long ap, fp;
 	enum lru_list lru;
 
 	/* If we have no swap space, do not bother scanning anon pages. */
-	if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) {
-		scan_balance = SCAN_FILE;
+	if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0)
 		goto out;
-	}
 
 	/*
 	 * Global reclaim will swap to prevent OOM even with no
@@ -2131,10 +2129,8 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 	 * using the memory controller's swap limit feature would be
 	 * too expensive.
 	 */
-	if (!global_reclaim(sc) && !swappiness) {
-		scan_balance = SCAN_FILE;
+	if (!global_reclaim(sc) && !swappiness)
 		goto out;
-	}
 
 	/*
 	 * Do not apply any pressure balancing cleverness when the
@@ -2147,8 +2143,9 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 	}
 
 	/*
-	 * Prevent the reclaimer from falling into the cache trap: as
-	 * cache pages start out inactive, every cache fault will tip
+	 * We usually want to bias page cache reclaim over anonymous
+	 * memory. Prevent the reclaimer from falling into the cache trap:
+	 * as cache pages start out inactive, every cache fault will tip
 	 * the scan balance towards the file LRU.  And as the file LRU
 	 * shrinks, so does the window for rotation from references.
 	 * This means we have a runaway feedback loop where a tiny
@@ -2173,26 +2170,24 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
 			total_high_wmark += high_wmark_pages(zone);
 		}
 
-		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
+		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark))
 			scan_balance = SCAN_ANON;
-			goto out;
-		}
 	}
 
 	/*
-	 * If there is enough inactive page cache, i.e. if the size of the
-	 * inactive list is greater than that of the active list *and* the
-	 * inactive list actually has some pages to scan on this priority, we
-	 * do not reclaim anything from the anonymous working set right now.
-	 * Without the second condition we could end up never scanning an
-	 * lruvec even if it has plenty of old anonymous pages unless the
-	 * system is under heavy pressure.
+	 * Make sure there are enough pages on the biased LRU before we go
+	 * and do an exclusive reclaim from that list, i.e. if the
+	 * size of the inactive list is greater than that of the active list
+	 * *and* the inactive list actually has some pages to scan on this
+	 * priority.
+	 * Without the second condition we could end up never scanning other
+	 * lruvecs even if they have plenty of old pages unless the system is
+	 * under heavy pressure.
 	 */
-	if (!inactive_list_is_low(lruvec, true, memcg, sc, false) &&
-	    lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) {
-		scan_balance = SCAN_FILE;
+	lru = LRU_INACTIVE_ANON + LRU_FILE * (scan_balance == SCAN_FILE);
+	if (!inactive_list_is_low(lruvec, is_file_lru(lru), memcg, sc, false) &&
+	    lruvec_lru_size(lruvec, lru, sc->reclaim_idx) >> sc->priority)
 		goto out;
-	}
 
 	scan_balance = SCAN_FRACT;
 
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-03  8:49                     ` Michal Hocko
@ 2017-05-03 22:52                       ` David Rientjes
  2017-05-04 11:43                         ` Michal Hocko
  0 siblings, 1 reply; 19+ messages in thread
From: David Rientjes @ 2017-05-03 22:52 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, Minchan Kim, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

On Wed, 3 May 2017, Michal Hocko wrote:

> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 24efcc20af91..f3ec8760dc06 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2113,16 +2113,14 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  	u64 denominator = 0;	/* gcc */
>  	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>  	unsigned long anon_prio, file_prio;
> -	enum scan_balance scan_balance;
> +	enum scan_balance scan_balance = SCAN_FILE;
>  	unsigned long anon, file;
>  	unsigned long ap, fp;
>  	enum lru_list lru;
>  
>  	/* If we have no swap space, do not bother scanning anon pages. */
> -	if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0) {
> -		scan_balance = SCAN_FILE;
> +	if (!sc->may_swap || mem_cgroup_get_nr_swap_pages(memcg) <= 0)
>  		goto out;
> -	}
>  
>  	/*
>  	 * Global reclaim will swap to prevent OOM even with no
> @@ -2131,10 +2129,8 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  	 * using the memory controller's swap limit feature would be
>  	 * too expensive.
>  	 */
> -	if (!global_reclaim(sc) && !swappiness) {
> -		scan_balance = SCAN_FILE;
> +	if (!global_reclaim(sc) && !swappiness)
>  		goto out;
> -	}
>  
>  	/*
>  	 * Do not apply any pressure balancing cleverness when the

Good as a cleanup so far.

> @@ -2147,8 +2143,9 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  	}
>  
>  	/*
> -	 * Prevent the reclaimer from falling into the cache trap: as
> -	 * cache pages start out inactive, every cache fault will tip
> +	 * We usually want to bias page cache reclaim over anonymous
> +	 * memory. Prevent the reclaimer from falling into the cache trap:
> +	 * as cache pages start out inactive, every cache fault will tip
>  	 * the scan balance towards the file LRU.  And as the file LRU
>  	 * shrinks, so does the window for rotation from references.
>  	 * This means we have a runaway feedback loop where a tiny

I think Minchan made a good point earlier about anon being more likely to 
be working set since it is mapped, but this may be a biased opinion coming 
from me since I am primarily concerned with malloc.

> @@ -2173,26 +2170,24 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  			total_high_wmark += high_wmark_pages(zone);
>  		}
>  
> -		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
> +		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark))
>  			scan_balance = SCAN_ANON;
> -			goto out;
> -		}
>  	}
>  
>  	/*
> -	 * If there is enough inactive page cache, i.e. if the size of the
> -	 * inactive list is greater than that of the active list *and* the
> -	 * inactive list actually has some pages to scan on this priority, we
> -	 * do not reclaim anything from the anonymous working set right now.
> -	 * Without the second condition we could end up never scanning an
> -	 * lruvec even if it has plenty of old anonymous pages unless the
> -	 * system is under heavy pressure.
> +	 * Make sure there are enough pages on the biased LRU before we go
> +	 * and do an exclusive reclaim from that list, i.e. if the
> +	 * size of the inactive list is greater than that of the active list
> +	 * *and* the inactive list actually has some pages to scan on this
> +	 * priority.
> +	 * Without the second condition we could end up never scanning other
> +	 * lruvecs even if they have plenty of old pages unless the system is
> +	 * under heavy pressure.
>  	 */
> -	if (!inactive_list_is_low(lruvec, true, memcg, sc, false) &&
> -	    lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) {
> -		scan_balance = SCAN_FILE;
> +	lru = LRU_INACTIVE_ANON + LRU_FILE * (scan_balance == SCAN_FILE);

This part seems to complicate the logic since it determines the lru under 
test based on the current setting of scan_balance.  I think I prefer 
individual heuristics with well written comments, but others may feel 
differently about this.

> +	if (!inactive_list_is_low(lruvec, is_file_lru(lru), memcg, sc, false) &&
> +	    lruvec_lru_size(lruvec, lru, sc->reclaim_idx) >> sc->priority)
>  		goto out;
> -	}
>  
>  	scan_balance = SCAN_FRACT;
>  

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-03 22:52                       ` David Rientjes
@ 2017-05-04 11:43                         ` Michal Hocko
  0 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-05-04 11:43 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Minchan Kim, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

On Wed 03-05-17 15:52:04, David Rientjes wrote:
> On Wed, 3 May 2017, Michal Hocko wrote:
[...]
> >  	/*
> > -	 * If there is enough inactive page cache, i.e. if the size of the
> > -	 * inactive list is greater than that of the active list *and* the
> > -	 * inactive list actually has some pages to scan on this priority, we
> > -	 * do not reclaim anything from the anonymous working set right now.
> > -	 * Without the second condition we could end up never scanning an
> > -	 * lruvec even if it has plenty of old anonymous pages unless the
> > -	 * system is under heavy pressure.
> > +	 * Make sure there are enough pages on the biased LRU before we go
> > +	 * and do an exclusive reclaim from that list, i.e. if the
> > +	 * size of the inactive list is greater than that of the active list
> > +	 * *and* the inactive list actually has some pages to scan on this
> > +	 * priority.
> > +	 * Without the second condition we could end up never scanning other
> > +	 * lruvecs even if they have plenty of old pages unless the system is
> > +	 * under heavy pressure.
> >  	 */
> > -	if (!inactive_list_is_low(lruvec, true, memcg, sc, false) &&
> > -	    lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, sc->reclaim_idx) >> sc->priority) {
> > -		scan_balance = SCAN_FILE;
> > +	lru = LRU_INACTIVE_ANON + LRU_FILE * (scan_balance == SCAN_FILE);
> 
> This part seems to complicate the logic since it determines the lru under 
> test based on the current setting of scan_balance.  I think I prefer 
> individual heuristics with well written comments, but others may feel 
> differently about this.

I do not claim the code would more obvious than before but it gets rid
of the duplication which is usually a good thing. This size check has
the same reasoning regardless of the type of the LRU. But I am not going
to insist...
 
> > +	if (!inactive_list_is_low(lruvec, is_file_lru(lru), memcg, sc, false) &&
> > +	    lruvec_lru_size(lruvec, lru, sc->reclaim_idx) >> sc->priority)
> >  		goto out;
> > -	}
> >  
> >  	scan_balance = SCAN_FRACT;
> >  

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-01 21:34           ` [patch v2] " David Rientjes
  2017-05-02  8:02             ` Michal Hocko
@ 2017-05-31 15:20             ` Michal Hocko
  2017-06-02 20:36             ` Andrew Morton
  2 siblings, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2017-05-31 15:20 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Minchan Kim, Johannes Weiner, Mel Gorman,
	linux-kernel, linux-mm

Andrew,
it seems that this patch fallen through cracks. I am sorry if this was
due to my review feedback because it turned out that I missed the point
and later added my Acked-by. Sorry about that

On Mon 01-05-17 14:34:21, David Rientjes wrote:
> The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
> not swap anon pages just because free+file is low'") reintroduces is to
> prefer swapping anonymous memory rather than trashing the file lru.
> 
> If the anonymous inactive lru for the set of eligible zones is considered
> low, however, or the length of the list for the given reclaim priority
> does not allow for effective anonymous-only reclaiming, then avoid
> forcing SCAN_ANON.  Forcing SCAN_ANON will end up thrashing the small
> list and leave unreclaimed memory on the file lrus.
> 
> If the inactive list is insufficient, fallback to balanced reclaim so the
> file lru doesn't remain untouched.
> 
> Suggested-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: David Rientjes <rientjes@google.com>
> ---
>  to akpm: this issue has been possible since at least 3.15, so it's
>  probably not high priority for 4.12 but applies cleanly if it can sneak
>  in
> 
>  mm/vmscan.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2204,8 +2204,17 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  		}
>  
>  		if (unlikely(pgdatfile + pgdatfree <= total_high_wmark)) {
> -			scan_balance = SCAN_ANON;
> -			goto out;
> +			/*
> +			 * Force SCAN_ANON if there are enough inactive
> +			 * anonymous pages on the LRU in eligible zones.
> +			 * Otherwise, the small LRU gets thrashed.
> +			 */
> +			if (!inactive_list_is_low(lruvec, false, sc, false) &&
> +			    lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)
> +					>> sc->priority) {
> +				scan_balance = SCAN_ANON;
> +				goto out;
> +			}
>  		}
>  	}
>  
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-05-01 21:34           ` [patch v2] " David Rientjes
  2017-05-02  8:02             ` Michal Hocko
  2017-05-31 15:20             ` Michal Hocko
@ 2017-06-02 20:36             ` Andrew Morton
  2017-06-04 22:27               ` David Rientjes
  2 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2017-06-02 20:36 UTC (permalink / raw)
  To: David Rientjes
  Cc: Minchan Kim, Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

On Mon, 1 May 2017 14:34:21 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:

> The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
> not swap anon pages just because free+file is low'") reintroduces is to
> prefer swapping anonymous memory rather than trashing the file lru.
> 
> If the anonymous inactive lru for the set of eligible zones is considered
> low, however, or the length of the list for the given reclaim priority
> does not allow for effective anonymous-only reclaiming, then avoid
> forcing SCAN_ANON.  Forcing SCAN_ANON will end up thrashing the small
> list and leave unreclaimed memory on the file lrus.
> 
> If the inactive list is insufficient, fallback to balanced reclaim so the
> file lru doesn't remain untouched.
> 

--- a/mm/vmscan.c~mm-vmscan-avoid-thrashing-anon-lru-when-free-file-is-low-fix
+++ a/mm/vmscan.c
@@ -2233,7 +2233,7 @@ static void get_scan_count(struct lruvec
 			 * anonymous pages on the LRU in eligible zones.
 			 * Otherwise, the small LRU gets thrashed.
 			 */
-			if (!inactive_list_is_low(lruvec, false, sc, false) &&
+			if (!inactive_list_is_low(lruvec, false, memcg, sc, false) &&
 			    lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)
 					>> sc->priority) {
 				scan_balance = SCAN_ANON;

Worried.  Did you send the correct version?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch v2] mm, vmscan: avoid thrashing anon lru when free + file is low
  2017-06-02 20:36             ` Andrew Morton
@ 2017-06-04 22:27               ` David Rientjes
  0 siblings, 0 replies; 19+ messages in thread
From: David Rientjes @ 2017-06-04 22:27 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Johannes Weiner, Mel Gorman, linux-kernel, linux-mm

On Fri, 2 Jun 2017, Andrew Morton wrote:

> On Mon, 1 May 2017 14:34:21 -0700 (PDT) David Rientjes <rientjes@google.com> wrote:
> 
> > The purpose of the code that commit 623762517e23 ("revert 'mm: vmscan: do
> > not swap anon pages just because free+file is low'") reintroduces is to
> > prefer swapping anonymous memory rather than trashing the file lru.
> > 
> > If the anonymous inactive lru for the set of eligible zones is considered
> > low, however, or the length of the list for the given reclaim priority
> > does not allow for effective anonymous-only reclaiming, then avoid
> > forcing SCAN_ANON.  Forcing SCAN_ANON will end up thrashing the small
> > list and leave unreclaimed memory on the file lrus.
> > 
> > If the inactive list is insufficient, fallback to balanced reclaim so the
> > file lru doesn't remain untouched.
> > 
> 
> --- a/mm/vmscan.c~mm-vmscan-avoid-thrashing-anon-lru-when-free-file-is-low-fix
> +++ a/mm/vmscan.c
> @@ -2233,7 +2233,7 @@ static void get_scan_count(struct lruvec
>  			 * anonymous pages on the LRU in eligible zones.
>  			 * Otherwise, the small LRU gets thrashed.
>  			 */
> -			if (!inactive_list_is_low(lruvec, false, sc, false) &&
> +			if (!inactive_list_is_low(lruvec, false, memcg, sc, false) &&
>  			    lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, sc->reclaim_idx)
>  					>> sc->priority) {
>  				scan_balance = SCAN_ANON;
> 
> Worried.  Did you send the correct version?
> 

The patch was written before commit 2a2e48854d70 ("mm: vmscan: fix 
IO/refault regression in cache workingset transition") was merged and 
changed inactive_list_is_low().

Your rebase looks good.  It could have used NULL instead of memcg since 
this is only for global_reclaim() and memcg will always be NULL here, but 
that's just personal preference.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-06-04 22:28 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-18  0:06 [patch] mm, vmscan: avoid thrashing anon lru when free + file is low David Rientjes
2017-04-18  1:36 ` Minchan Kim
2017-04-18 21:32   ` David Rientjes
2017-04-19  0:14     ` Minchan Kim
2017-04-19 23:24       ` David Rientjes
2017-04-20  6:09         ` Minchan Kim
2017-05-01 21:34           ` [patch v2] " David Rientjes
2017-05-02  8:02             ` Michal Hocko
2017-05-02 20:41               ` David Rientjes
2017-05-03  6:15                 ` Michal Hocko
2017-05-03  7:06                   ` Michal Hocko
2017-05-03  8:49                     ` Michal Hocko
2017-05-03 22:52                       ` David Rientjes
2017-05-04 11:43                         ` Michal Hocko
2017-05-31 15:20             ` Michal Hocko
2017-06-02 20:36             ` Andrew Morton
2017-06-04 22:27               ` David Rientjes
2017-04-19  7:04     ` [patch] " Michal Hocko
2017-04-18  7:11 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).