All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Joonsoo Kim <js1304@gmail.com>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	Rik van Riel <riel@surriel.com>,
	Minchan Kim <minchan.kim@gmail.com>,
	Michal Hocko <mhocko@suse.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com
Subject: Re: [PATCH 05/14] mm: workingset: let cache workingset challenge anon
Date: Wed, 27 May 2020 09:43:33 -0400	[thread overview]
Message-ID: <20200527134333.GF6781@cmpxchg.org> (raw)
In-Reply-To: <CAAmzW4MMLw=4BHRtt=g8BHpQLTaSmhG03Ct6WMjmDxVnOkNQYA@mail.gmail.com>

On Wed, May 27, 2020 at 11:06:47AM +0900, Joonsoo Kim wrote:
> 2020년 5월 21일 (목) 오전 8:26, Johannes Weiner <hannes@cmpxchg.org>님이 작성:
> >
> > We activate cache refaults with reuse distances in pages smaller than
> > the size of the total cache. This allows new pages with competitive
> > access frequencies to establish themselves, as well as challenge and
> > potentially displace pages on the active list that have gone cold.
> >
> > However, that assumes that active cache can only replace other active
> > cache in a competition for the hottest memory. This is not a great
> > default assumption. The page cache might be thrashing while there are
> > enough completely cold and unused anonymous pages sitting around that
> > we'd only have to write to swap once to stop all IO from the cache.
> >
> > Activate cache refaults when their reuse distance in pages is smaller
> > than the total userspace workingset, including anonymous pages.
> 
> Hmm... I'm not sure the correctness of this change.
> 
> IIUC, this patch leads to more activations in the file list and more activations
> here will challenge the anon list since rotation ratio for the file
> list will be increased.

Yes.

> However, this change breaks active/inactive concept of the file list.
> active/inactive
> separation is implemented by in-list refault distance. anon list size has
> no direct connection with refault distance of the file list so using
> anon list size
> to detect workingset for file page breaks the concept.

This is intentional, because there IS a connection: they both take up
space in RAM, and they both cost IO to bring back once reclaimed.

When file is refaulting, it means we need to make more space for
cache. That space can come from stale active file pages. But what if
active cache is all hot, and meanwhile there are cold anon pages that
we could swap out once and then serve everything from RAM?

When file is refaulting, we should find the coldest data that is
taking up RAM and kick it out. It doesn't matter whether it's file or
anon: the goal is to free up RAM with the least amount of IO risk.

Remember that the file/anon split, and the inactive/active split, are
there to optimize reclaim. It doesn't mean that these memory pools are
independent from each other.

The file list is split in two because of use-once cache. The anon and
file lists are split because of different IO patterns, because we may
not have swap etc. But once we are out of use-once cache, have swap
space available, and have corrected for the different cost of IO,
there needs to be a relative order between all pages in the system to
find the optimal candidates to reclaim.

> My suspicion is started by this counter example.
> 
> Environment:
> anon: 500 MB (so hot) / 500 MB (so hot)
> file: 50 MB (hot) / 50 MB (cold)
> 
> Think about the situation that there is periodical access to other file (100 MB)
> with low frequency (refault distance is 500 MB)
> 
> Without your change, this periodical access doesn't make thrashing for cached
> active file page since refault distance of periodical access is larger
> than the size of
> the active file list. However, with your change, it causes thrashing
> on the file list.

It doesn't cause thrashing. It causes scanning because that 100M file
IS thrashing: with or without my patch, that refault IO is occuring.

What this patch acknowledges is that the 100M file COULD fit fully
into memory, and not require any IO to serve, IFF 100M of the active
file or anon pages were cold and could be reclaimed or swapped out.

In your example, the anon set is hot. We'll scan it slowly (at the
rate of IO from the other file) and rotate the pages that are in use -
which would be all of them. Likewise for the file - there will be some
deactivations, but mark_page_accessed() or the second chance algorithm
in page_check_references() for mapped will keep the hottest pages active.

In a slightly modified example, 400M of the anon set is hot and 100M
cold. Without my patch, we would never look for them and the second
file would be IO-bound forever. After my patch, we would scan anon,
eventually find the cold pages, swap them out, and then serve the
entire workingset from memory.

Does it cause more scanning than before in your scenario? Yes, but we
don't even know it's your scenario instead of mine until we actually
sample references of all memory. Not scanning is a false stable state.

And your scenario could easily change over time. Even if the amount of
repeatedly accessed pages stays larger than memory, and will always
require IO to serve, the relative access frequencies could change.
Some pages could become hotter, others colder. Without scanning, we
wouldn't adapt the LRU ordering and cause more IO than necessary.

  reply	other threads:[~2020-05-27 13:44 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 23:25 [PATCH 00/14] mm: balance LRU lists based on relative thrashing v2 Johannes Weiner
2020-05-20 23:25 ` [PATCH 01/14] mm: fix LRU balancing effect of new transparent huge pages Johannes Weiner
2020-05-27 19:48   ` Shakeel Butt
2020-05-27 19:48     ` Shakeel Butt
2020-05-20 23:25 ` [PATCH 02/14] mm: keep separate anon and file statistics on page reclaim activity Johannes Weiner
2020-05-20 23:25 ` [PATCH 03/14] mm: allow swappiness that prefers reclaiming anon over the file workingset Johannes Weiner
2020-05-20 23:25 ` [PATCH 04/14] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() Johannes Weiner
2020-05-20 23:25 ` [PATCH 05/14] mm: workingset: let cache workingset challenge anon Johannes Weiner
2020-05-27  2:06   ` Joonsoo Kim
2020-05-27  2:06     ` Joonsoo Kim
2020-05-27 13:43     ` Johannes Weiner [this message]
2020-05-28  7:16       ` Joonsoo Kim
2020-05-28  7:16         ` Joonsoo Kim
2020-05-28 17:01         ` Johannes Weiner
2020-05-29  6:48           ` Joonsoo Kim
2020-05-29  6:48             ` Joonsoo Kim
2020-05-29 15:12             ` Johannes Weiner
2020-06-01  6:14               ` Joonsoo Kim
2020-06-01  6:14                 ` Joonsoo Kim
2020-06-01 15:56                 ` Johannes Weiner
2020-06-01 20:44                   ` Johannes Weiner
2020-06-04 13:35                     ` Vlastimil Babka
2020-06-04 15:05                       ` Johannes Weiner
2020-06-12  3:19                         ` Joonsoo Kim
2020-06-12  3:19                           ` Joonsoo Kim
2020-06-15 13:41                           ` Johannes Weiner
2020-06-16  6:09                             ` Joonsoo Kim
2020-06-16  6:09                               ` Joonsoo Kim
2020-06-02  2:34                   ` Joonsoo Kim
2020-06-02  2:34                     ` Joonsoo Kim
2020-06-02 16:47                     ` Johannes Weiner
2020-06-03  5:40                       ` Joonsoo Kim
2020-06-03  5:40                         ` Joonsoo Kim
2020-05-20 23:25 ` [PATCH 06/14] mm: remove use-once cache bias from LRU balancing Johannes Weiner
2020-05-20 23:25 ` [PATCH 07/14] mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count() Johannes Weiner
2020-05-20 23:25 ` [PATCH 08/14] mm: base LRU balancing on an explicit cost model Johannes Weiner
2020-05-20 23:25 ` [PATCH 09/14] mm: deactivations shouldn't bias the LRU balance Johannes Weiner
2020-05-22 13:33   ` Qian Cai
2020-05-26 15:55     ` Johannes Weiner
2020-05-27  0:54       ` Qian Cai
2020-05-20 23:25 ` [PATCH 10/14] mm: only count actual rotations as LRU reclaim cost Johannes Weiner
2020-05-20 23:25 ` [PATCH 11/14] mm: balance LRU lists based on relative thrashing Johannes Weiner
2020-05-20 23:25 ` [PATCH 12/14] mm: vmscan: determine anon/file pressure balance at the reclaim root Johannes Weiner
2020-05-20 23:25 ` [PATCH 13/14] mm: vmscan: reclaim writepage is IO cost Johannes Weiner
2020-05-20 23:25 ` [PATCH 14/14] mm: vmscan: limit the range of LRU type balancing Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200527134333.GF6781@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=js1304@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan.kim@gmail.com \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.