linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Rik van Riel <riel@redhat.com>, Mel Gorman <mgorman@suse.de>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andi Kleen <andi@firstfloor.org>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	kernel-team@fb.com
Subject: Re: [PATCH 07/10] mm: base LRU balancing on an explicit cost model
Date: Wed, 8 Jun 2016 14:51:37 +0200	[thread overview]
Message-ID: <20160608125137.GH22570@dhcp22.suse.cz> (raw)
In-Reply-To: <20160606194836.3624-8-hannes@cmpxchg.org>

On Mon 06-06-16 15:48:33, Johannes Weiner wrote:
> Currently, scan pressure between the anon and file LRU lists is
> balanced based on a mixture of reclaim efficiency and a somewhat vague
> notion of "value" of having certain pages in memory over others. That
> concept of value is problematic, because it has caused us to count any
> event that remotely makes one LRU list more or less preferrable for
> reclaim, even when these events are not directly comparable to each
> other and impose very different costs on the system - such as a
> referenced file page that we still deactivate and a referenced
> anonymous page that we actually rotate back to the head of the list.
> 
> There is also conceptual overlap with the LRU algorithm itself. By
> rotating recently used pages instead of reclaiming them, the algorithm
> already biases the applied scan pressure based on page value. Thus,
> when rebalancing scan pressure due to rotations, we should think of
> reclaim cost, and leave assessing the page value to the LRU algorithm.
> 
> Lastly, considering both value-increasing as well as value-decreasing
> events can sometimes cause the same type of event to be counted twice,
> i.e. how rotating a page increases the LRU value, while reclaiming it
> succesfully decreases the value. In itself this will balance out fine,
> but it quietly skews the impact of events that are only recorded once.
> 
> The abstract metric of "value", the murky relationship with the LRU
> algorithm, and accounting both negative and positive events make the
> current pressure balancing model hard to reason about and modify.
> 
> In preparation for thrashing-based LRU balancing, this patch switches
> to a balancing model of accounting the concrete, actually observed
> cost of reclaiming one LRU over another. For now, that cost includes
> pages that are scanned but rotated back to the list head.

This makes a lot of sense to me

> Subsequent
> patches will add consideration for IO caused by refaulting recently
> evicted pages. The idea is to primarily scan the LRU that thrashes the
> least, and secondarily scan the LRU that needs the least amount of
> work to free memory.
> 
> Rename struct zone_reclaim_stat to struct lru_cost, and move from two
> separate value ratios for the LRU lists to a relative LRU cost metric
> with a shared denominator.

I just do not like the too generic `number'. I guess cost or price would
fit better and look better in the code as well. Up you though...

> Then make everything that affects the cost go through a new
> lru_note_cost() function.

Just curious, have you tried to measure just the effect of this change
without the rest of the series? I do not expect it would show large
differences because we are not doing SCAN_FRACT most of the time...

> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
>  include/linux/mmzone.h | 23 +++++++++++------------
>  include/linux/swap.h   |  2 ++
>  mm/swap.c              | 15 +++++----------
>  mm/vmscan.c            | 35 +++++++++++++++--------------------
>  4 files changed, 33 insertions(+), 42 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 02069c23486d..4d257d00fbf5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -191,22 +191,21 @@ static inline int is_active_lru(enum lru_list lru)
>  	return (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE);
>  }
>  
> -struct zone_reclaim_stat {
> -	/*
> -	 * The pageout code in vmscan.c keeps track of how many of the
> -	 * mem/swap backed and file backed pages are referenced.
> -	 * The higher the rotated/scanned ratio, the more valuable
> -	 * that cache is.
> -	 *
> -	 * The anon LRU stats live in [0], file LRU stats in [1]
> -	 */
> -	unsigned long		recent_rotated[2];
> -	unsigned long		recent_scanned[2];
> +/*
> + * This tracks cost of reclaiming one LRU type - file or anon - over
> + * the other. As the observed cost of pressure on one type increases,
> + * the scan balance in vmscan.c tips toward the other type.
> + *
> + * The recorded cost for anon is in numer[0], file in numer[1].
> + */
> +struct lru_cost {
> +	unsigned long		numer[2];
> +	unsigned long		denom;
>  };
>  
>  struct lruvec {
>  	struct list_head		lists[NR_LRU_LISTS];
> -	struct zone_reclaim_stat	reclaim_stat;
> +	struct lru_cost			balance;
>  	/* Evictions & activations on the inactive file list */
>  	atomic_long_t			inactive_age;
>  #ifdef CONFIG_MEMCG
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 178f084365c2..c461ce0533da 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -295,6 +295,8 @@ extern unsigned long nr_free_pagecache_pages(void);
>  
>  
>  /* linux/mm/swap.c */
> +extern void lru_note_cost(struct lruvec *lruvec, bool file,
> +			  unsigned int nr_pages);
>  extern void lru_cache_add(struct page *);
>  extern void lru_cache_putback(struct page *page);
>  extern void lru_add_page_tail(struct page *page, struct page *page_tail,
> diff --git a/mm/swap.c b/mm/swap.c
> index 814e3a2e54b4..645d21242324 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -249,15 +249,10 @@ void rotate_reclaimable_page(struct page *page)
>  	}
>  }
>  
> -static void update_page_reclaim_stat(struct lruvec *lruvec,
> -				     int file, int rotated,
> -				     unsigned int nr_pages)
> +void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages)
>  {
> -	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
> -
> -	reclaim_stat->recent_scanned[file] += nr_pages;
> -	if (rotated)
> -		reclaim_stat->recent_rotated[file] += nr_pages;
> +	lruvec->balance.numer[file] += nr_pages;
> +	lruvec->balance.denom += nr_pages;
>  }
>  
>  static void __activate_page(struct page *page, struct lruvec *lruvec,
> @@ -543,7 +538,7 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
>  
>  	if (active)
>  		__count_vm_event(PGDEACTIVATE);
> -	update_page_reclaim_stat(lruvec, file, 0, hpage_nr_pages(page));
> +	lru_note_cost(lruvec, !file, hpage_nr_pages(page));
>  }
>  
>  
> @@ -560,7 +555,7 @@ static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
>  		add_page_to_lru_list(page, lruvec, lru);
>  
>  		__count_vm_event(PGDEACTIVATE);
> -		update_page_reclaim_stat(lruvec, file, 0, hpage_nr_pages(page));
> +		lru_note_cost(lruvec, !file, hpage_nr_pages(page));
>  	}
>  }
>  
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 8503713bb60e..06e381e1004c 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1492,7 +1492,6 @@ static int too_many_isolated(struct zone *zone, int file,
>  static noinline_for_stack void
>  putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list)
>  {
> -	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
>  	struct zone *zone = lruvec_zone(lruvec);
>  	LIST_HEAD(pages_to_free);
>  
> @@ -1521,8 +1520,13 @@ putback_inactive_pages(struct lruvec *lruvec, struct list_head *page_list)
>  		if (is_active_lru(lru)) {
>  			int file = is_file_lru(lru);
>  			int numpages = hpage_nr_pages(page);
> -			reclaim_stat->recent_rotated[file] += numpages;
> +			/*
> +			 * Rotating pages costs CPU without actually
> +			 * progressing toward the reclaim goal.
> +			 */
> +			lru_note_cost(lruvec, file, numpages);
>  		}
> +
>  		if (put_page_testzero(page)) {
>  			__ClearPageLRU(page);
>  			__ClearPageActive(page);
> @@ -1577,7 +1581,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  	isolate_mode_t isolate_mode = 0;
>  	int file = is_file_lru(lru);
>  	struct zone *zone = lruvec_zone(lruvec);
> -	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
>  
>  	while (unlikely(too_many_isolated(zone, file, sc))) {
>  		congestion_wait(BLK_RW_ASYNC, HZ/10);
> @@ -1601,7 +1604,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>  
>  	update_lru_size(lruvec, lru, -nr_taken);
>  	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
> -	reclaim_stat->recent_scanned[file] += nr_taken;
>  
>  	if (global_reclaim(sc)) {
>  		__mod_zone_page_state(zone, NR_PAGES_SCANNED, nr_scanned);
> @@ -1773,7 +1775,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  	LIST_HEAD(l_active);
>  	LIST_HEAD(l_inactive);
>  	struct page *page;
> -	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
>  	unsigned long nr_rotated = 0;
>  	isolate_mode_t isolate_mode = 0;
>  	int file = is_file_lru(lru);
> @@ -1793,7 +1794,6 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  
>  	update_lru_size(lruvec, lru, -nr_taken);
>  	__mod_zone_page_state(zone, NR_ISOLATED_ANON + file, nr_taken);
> -	reclaim_stat->recent_scanned[file] += nr_taken;
>  
>  	if (global_reclaim(sc))
>  		__mod_zone_page_state(zone, NR_PAGES_SCANNED, nr_scanned);
> @@ -1851,7 +1851,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
>  	 * helps balance scan pressure between file and anonymous pages in
>  	 * get_scan_count.
>  	 */
> -	reclaim_stat->recent_rotated[file] += nr_rotated;
> +	lru_note_cost(lruvec, file, nr_rotated);
>  
>  	move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
>  	move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
> @@ -1947,7 +1947,6 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  			   unsigned long *lru_pages)
>  {
>  	int swappiness = mem_cgroup_swappiness(memcg);
> -	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
>  	u64 fraction[2];
>  	u64 denominator = 0;	/* gcc */
>  	struct zone *zone = lruvec_zone(lruvec);
> @@ -2072,14 +2071,10 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  		lruvec_lru_size(lruvec, LRU_INACTIVE_FILE);
>  
>  	spin_lock_irq(&zone->lru_lock);
> -	if (unlikely(reclaim_stat->recent_scanned[0] > anon / 4)) {
> -		reclaim_stat->recent_scanned[0] /= 2;
> -		reclaim_stat->recent_rotated[0] /= 2;
> -	}
> -
> -	if (unlikely(reclaim_stat->recent_scanned[1] > file / 4)) {
> -		reclaim_stat->recent_scanned[1] /= 2;
> -		reclaim_stat->recent_rotated[1] /= 2;
> +	if (unlikely(lruvec->balance.denom > (anon + file) / 8)) {
> +		lruvec->balance.numer[0] /= 2;
> +		lruvec->balance.numer[1] /= 2;
> +		lruvec->balance.denom /= 2;
>  	}
>  
>  	/*
> @@ -2087,11 +2082,11 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
>  	 * proportional to the fraction of recently scanned pages on
>  	 * each list that were recently referenced and in active use.
>  	 */
> -	ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
> -	ap /= reclaim_stat->recent_rotated[0] + 1;
> +	ap = anon_prio * (lruvec->balance.denom + 1);
> +	ap /= lruvec->balance.numer[0] + 1;
>  
> -	fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
> -	fp /= reclaim_stat->recent_rotated[1] + 1;
> +	fp = file_prio * (lruvec->balance.denom + 1);
> +	fp /= lruvec->balance.numer[1] + 1;
>  	spin_unlock_irq(&zone->lru_lock);
>  
>  	fraction[0] = ap;
> -- 
> 2.8.3

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-06-08 12:51 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-06 19:48 [PATCH 00/10] mm: balance LRU lists based on relative thrashing Johannes Weiner
2016-06-06 19:48 ` [PATCH 01/10] mm: allow swappiness that prefers anon over file Johannes Weiner
2016-06-07  0:25   ` Minchan Kim
2016-06-07 14:18     ` Johannes Weiner
2016-06-08  0:06       ` Minchan Kim
2016-06-08 15:58         ` Johannes Weiner
2016-06-09  1:01           ` Minchan Kim
2016-06-09 13:32             ` Johannes Weiner
2016-06-06 19:48 ` [PATCH 02/10] mm: swap: unexport __pagevec_lru_add() Johannes Weiner
2016-06-06 21:32   ` Rik van Riel
2016-06-07  9:07   ` Michal Hocko
2016-06-08  7:14   ` Minchan Kim
2016-06-06 19:48 ` [PATCH 03/10] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() Johannes Weiner
2016-06-06 21:33   ` Rik van Riel
2016-06-07  9:12   ` Michal Hocko
2016-06-08  7:24   ` Minchan Kim
2016-06-06 19:48 ` [PATCH 04/10] mm: fix LRU balancing effect of new transparent huge pages Johannes Weiner
2016-06-06 21:36   ` Rik van Riel
2016-06-07  9:19   ` Michal Hocko
2016-06-08  7:28   ` Minchan Kim
2016-06-06 19:48 ` [PATCH 05/10] mm: remove LRU balancing effect of temporary page isolation Johannes Weiner
2016-06-06 21:56   ` Rik van Riel
2016-06-06 22:15     ` Johannes Weiner
2016-06-07  1:11       ` Rik van Riel
2016-06-07 13:57         ` Johannes Weiner
2016-06-07  9:26       ` Michal Hocko
2016-06-07 14:06         ` Johannes Weiner
2016-06-07  9:49   ` Michal Hocko
2016-06-08  7:39   ` Minchan Kim
2016-06-08 16:02     ` Johannes Weiner
2016-06-06 19:48 ` [PATCH 06/10] mm: remove unnecessary use-once cache bias from LRU balancing Johannes Weiner
2016-06-07  2:20   ` Rik van Riel
2016-06-07 14:11     ` Johannes Weiner
2016-06-08  8:03   ` Minchan Kim
2016-06-08 12:31   ` Michal Hocko
2016-06-06 19:48 ` [PATCH 07/10] mm: base LRU balancing on an explicit cost model Johannes Weiner
2016-06-06 19:13   ` kbuild test robot
2016-06-07  2:34   ` Rik van Riel
2016-06-07 14:12     ` Johannes Weiner
2016-06-08  8:14   ` Minchan Kim
2016-06-08 16:06     ` Johannes Weiner
2016-06-08 12:51   ` Michal Hocko [this message]
2016-06-08 16:16     ` Johannes Weiner
2016-06-09 12:18       ` Michal Hocko
2016-06-09 13:33         ` Johannes Weiner
2016-06-06 19:48 ` [PATCH 08/10] mm: deactivations shouldn't bias the LRU balance Johannes Weiner
2016-06-08  8:15   ` Minchan Kim
2016-06-08 12:57   ` Michal Hocko
2016-06-06 19:48 ` [PATCH 09/10] mm: only count actual rotations as LRU reclaim cost Johannes Weiner
2016-06-08  8:19   ` Minchan Kim
2016-06-08 13:18   ` Michal Hocko
2016-06-06 19:48 ` [PATCH 10/10] mm: balance LRU lists based on relative thrashing Johannes Weiner
2016-06-06 19:22   ` kbuild test robot
2016-06-06 23:50   ` Tim Chen
2016-06-07 16:23     ` Johannes Weiner
2016-06-07 19:56       ` Tim Chen
2016-06-08 13:58   ` Michal Hocko
2016-06-10  2:19   ` Minchan Kim
2016-06-13 15:52     ` Johannes Weiner
2016-06-15  2:23       ` Minchan Kim
2016-06-16 15:12         ` Johannes Weiner
2016-06-17  7:49           ` Minchan Kim
2016-06-17 17:01             ` Johannes Weiner
2016-06-20  7:42               ` Minchan Kim
2016-06-22 21:56                 ` Johannes Weiner
2016-06-24  6:22                   ` Minchan Kim
2016-06-07  9:51 ` [PATCH 00/10] " Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160608125137.GH22570@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=riel@redhat.com \
    --cc=tim.c.chen@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).