linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joonsoo Kim <js1304@gmail.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: Linux Memory Management List <linux-mm@kvack.org>,
	Rik van Riel <riel@surriel.com>,
	 Minchan Kim <minchan.kim@gmail.com>,
	Michal Hocko <mhocko@suse.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	 LKML <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com
Subject: Re: [PATCH 05/14] mm: workingset: let cache workingset challenge anon
Date: Wed, 3 Jun 2020 14:40:40 +0900	[thread overview]
Message-ID: <CAAmzW4P0HZw-K1jHogpCevDCfsSfm2SRSXYwUqznoDBNWa_NUQ@mail.gmail.com> (raw)
In-Reply-To: <20200602164726.GA225032@cmpxchg.org>

2020년 6월 3일 (수) 오전 1:48, Johannes Weiner <hannes@cmpxchg.org>님이 작성:
>
> On Tue, Jun 02, 2020 at 11:34:17AM +0900, Joonsoo Kim wrote:
> > 2020년 6월 2일 (화) 오전 12:56, Johannes Weiner <hannes@cmpxchg.org>님이 작성:
> > > On Mon, Jun 01, 2020 at 03:14:24PM +0900, Joonsoo Kim wrote:
> > > > But, I still think that modified refault activation equation isn't
> > > > safe. The next
> > > > problem I found is related to the scan ratio limit patch ("limit the range of
> > > > LRU type balancing") on this series. See the below example.
> > > >
> > > > anon: Hot (X M)
> > > > file: Hot (200 M) / dummy (200 M)
> > > > P: 1200 M (3 parts, each one 400 M, P1, P2, P3)
> > > > Access Pattern: A -> F(H) -> P1 -> A -> F(H) -> P2 -> ... ->
> > > >
> > > > Without this patch, A and F(H) are kept on the memory and look like
> > > > it's correct.
> > > >
> > > > With this patch and below fix, refault equation for Pn would be:
> > > >
> > > > Refault dist of Pn = 1200 (from file non-resident) + 1200 * anon scan
> > > > ratio (from anon non-resident)
> > > > anon + active file = X + 200
> > > > 1200 + 1200 * anon scan ratio (0.5 ~ 2) < X + 200
> > >
> > > That doesn't look quite right to me. The anon part of the refault
> > > distance is driven by X, so the left-hand of this formula contains X
> > > as well.
> > >
> > > 1000 file (1200M reuse distance, 200M in-core size) + F(H) reactivations + X * scan ratio < X + 1000
> >
> > As I said before, there is no X on left-hand of this formula. To
> > access all Pn and
> > re-access P1, we need 1200M file list scan and reclaim. More scan isn't needed.
> > With your patch "limit the range of LRU type balancing", scan ratio
> > between file/anon
> > list is limited to 0.5 ~ 2.0, so, maximum anon scan would be 1200 M *
> > 2.0, that is,
> > 2400 M and not bounded by X. That means that file list cannot be
> > stable with some X.
>
> Oh, no X on the left because you're talking about the number of pages
> scanned until the first refaults, which is fixed - so why are we still
> interpreting the refault distance against a variable anon size X?

Looks like I was confused again. Your formula is correct and mine is
wrong. My mistake is I thought that your patch "limit the range of LRU
type balancing"
which makes scan *ratio* 2:1 leads to actual scan *count* ratio
between anon/file to 2:1.
But, now I realized that 2:1 is just scan ratio and actual scan
*count* ratio could be far
larger with certain list size. It would be X * scan ratio in above example so my
explanation is wrong and you are right.

Sorry for making a trouble.

> Well, that's misleading. We compare against anon because part of the
> cache is already encoded in the refault distance. What we're really
> checking is access distance against total amount of available RAM.
>
> Consider this. We want to activate pages where
>
>         access_distance <= RAM
>
> and our measure of access distance is:
>
>         access_distance = refault_distance + inactive_file
>
> So the comparison becomes:
>
>         refault_distance + inactive_file < RAM
>
> which we simplify to:
>
>         refault_distance < active_file + anon
>
> There is a certain threshold for X simply because there is a certain
> threshold for RAM beyond which we need to start activating. X cannot
> be arbitrary, it must be X + cache filling up memory - after all we
> have page reclaim evicting pages.
>
> Again, this isn't new. In the current code, we activate when:
>
>         refault_distance < active_file
>
> which is
>
>         access_distance <= RAM - anon
>
> You can see, whether things are stable or not always depends on the
> existing workingset size. It's just a proxy for how much total RAM we
> have potentially available to the refaulting page.
>
> > If my lastly found example is a correct example (your confirm is required),
> > it is also related to the correctness issue since cold pages causes
> > eviction of the hot pages repeatedly.
>
> I think your example is correct, but it benefits from the VM
> arbitrarily making an assumption that has a 50/50 shot of being true.
>
> You and I know which pages are hot and which are cold because you
> designed the example.
>
> All the VM sees is this:
>
> - We have an established workingset that has previously shown an
>   access distance <= RAM and therefor was activated.
>
> - We now have another set that also appears to have an access distance
>   <= RAM. The only way to know for sure, however, is sample the
>   established workingset and compare the relative access frequencies.
>
> Currently, we just assume the incoming pages are colder. Clearly
> that's beneficial when it's true. Clearly that can be totally wrong.
>
> We must allow a fair comparison between these two sets.
>
> For cache, that's already the case - that's why I brought up the
> cache-only example: if refault distances are 50M and you have 60M of
> active cache, we activate all refaults and force an even competition
> between the established workingset and the new pages.
>
> Whether we can protect active file when anon needs to shrink first and
> can't (the activate/iocost split) that's a different question. But I'm
> no longer so sure after looking into it further.
>
> First, we would need two different refault distances: either we
> consider anon age and need to compare to active_file + anon, or we
> don't and compare to active_file only. We cannot mix willy nilly,
> because the metrics wouldn't be comparable. We don't have the space to
> store two different eviction timestamps, nor could we afford to cut
> the precision in half.
>
> Second, the additional page flag required to implement it.
>
> Third, it's somewhat moot because we still have the same behavior when
> active_file would need to shrink and can't. There can't be a stable
> state as long as refault distances <= active_file.
>
> > In this case, they (without patch, with patch) all have some correctness
> > issue so we need to judge which one is better in terms of overall impact.
> > I don't have strong opinion about it so it's up to you to decide the way to go.
>
> If my patch was simply changing the default assumption on which pages
> are hot and which are cold, I would agree with you - the pros would be
> equal to the cons, one way wouldn't be more correct than the other.
>
> But that isn't what my patch is doing. What it does is get rid of the
> assumption, to actually sample and compare the access frequencies when
> there isn't enough data to make an informed decision.
>
> That's a net improvement.

Okay. Now I think that this patch has a net improvement.

Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Thanks.


  reply	other threads:[~2020-06-03  5:40 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-20 23:25 [PATCH 00/14] mm: balance LRU lists based on relative thrashing v2 Johannes Weiner
2020-05-20 23:25 ` [PATCH 01/14] mm: fix LRU balancing effect of new transparent huge pages Johannes Weiner
2020-05-27 19:48   ` Shakeel Butt
2020-05-20 23:25 ` [PATCH 02/14] mm: keep separate anon and file statistics on page reclaim activity Johannes Weiner
2020-05-20 23:25 ` [PATCH 03/14] mm: allow swappiness that prefers reclaiming anon over the file workingset Johannes Weiner
2020-05-20 23:25 ` [PATCH 04/14] mm: fold and remove lru_cache_add_anon() and lru_cache_add_file() Johannes Weiner
2020-05-20 23:25 ` [PATCH 05/14] mm: workingset: let cache workingset challenge anon Johannes Weiner
2020-05-27  2:06   ` Joonsoo Kim
2020-05-27 13:43     ` Johannes Weiner
2020-05-28  7:16       ` Joonsoo Kim
2020-05-28 17:01         ` Johannes Weiner
2020-05-29  6:48           ` Joonsoo Kim
2020-05-29 15:12             ` Johannes Weiner
2020-06-01  6:14               ` Joonsoo Kim
2020-06-01 15:56                 ` Johannes Weiner
2020-06-01 20:44                   ` Johannes Weiner
2020-06-04 13:35                     ` Vlastimil Babka
2020-06-04 15:05                       ` Johannes Weiner
2020-06-12  3:19                         ` Joonsoo Kim
2020-06-15 13:41                           ` Johannes Weiner
2020-06-16  6:09                             ` Joonsoo Kim
2020-06-02  2:34                   ` Joonsoo Kim
2020-06-02 16:47                     ` Johannes Weiner
2020-06-03  5:40                       ` Joonsoo Kim [this message]
2020-05-20 23:25 ` [PATCH 06/14] mm: remove use-once cache bias from LRU balancing Johannes Weiner
2020-05-20 23:25 ` [PATCH 07/14] mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count() Johannes Weiner
2020-05-20 23:25 ` [PATCH 08/14] mm: base LRU balancing on an explicit cost model Johannes Weiner
2020-05-20 23:25 ` [PATCH 09/14] mm: deactivations shouldn't bias the LRU balance Johannes Weiner
2020-05-22 13:33   ` Qian Cai
2020-05-26 15:55     ` Johannes Weiner
2020-05-20 23:25 ` [PATCH 10/14] mm: only count actual rotations as LRU reclaim cost Johannes Weiner
2020-05-20 23:25 ` [PATCH 11/14] mm: balance LRU lists based on relative thrashing Johannes Weiner
2020-05-20 23:25 ` [PATCH 12/14] mm: vmscan: determine anon/file pressure balance at the reclaim root Johannes Weiner
2020-05-20 23:25 ` [PATCH 13/14] mm: vmscan: reclaim writepage is IO cost Johannes Weiner
2020-05-20 23:25 ` [PATCH 14/14] mm: vmscan: limit the range of LRU type balancing Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAmzW4P0HZw-K1jHogpCevDCfsSfm2SRSXYwUqznoDBNWa_NUQ@mail.gmail.com \
    --to=js1304@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=minchan.kim@gmail.com \
    --cc=riel@surriel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).