Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Shakeel Butt <shakeelb@google.com>,
	Rik van Riel <riel@surriel.com>, Michal Hocko <mhocko@suse.com>,
	linux-mm <linux-mm@kvack.org>,
	cgroups mailinglist <cgroups@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	kernel-team@fb.com
Subject: Re: [PATCH 2/3] mm: vmscan: detect file thrashing at the reclaim root
Date: Tue, 12 Nov 2019 12:45:33 -0500
Message-ID: <20191112174533.GA178331@cmpxchg.org> (raw)
In-Reply-To: <CAJuCfpFtr9ODyOEJWt+=z=fnR0j8CJPSfhN+50N=d4SjLO-Z7A@mail.gmail.com>

On Sun, Nov 10, 2019 at 06:01:18PM -0800, Suren Baghdasaryan wrote:
> On Thu, Nov 7, 2019 at 12:53 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> > We use refault information to determine whether the cache workingset
> > is stable or transitioning, and dynamically adjust the inactive:active
> > file LRU ratio so as to maximize protection from one-off cache during
> > stable periods, and minimize IO during transitions.
> >
> > With cgroups and their nested LRU lists, we currently don't do this
> > correctly. While recursive cgroup reclaim establishes a relative LRU
> > order among the pages of all involved cgroups, refaults only affect
> > the local LRU order in the cgroup in which they are occuring. As a
> > result, cache transitions can take longer in a cgrouped system as the
> > active pages of sibling cgroups aren't challenged when they should be.
> >
> > [ Right now, this is somewhat theoretical, because the siblings, under
> >   continued regular reclaim pressure, should eventually run out of
> >   inactive pages - and since inactive:active *size* balancing is also
> >   done on a cgroup-local level, we will challenge the active pages
> >   eventually in most cases. But the next patch will move that relative
> >   size enforcement to the reclaim root as well, and then this patch
> >   here will be necessary to propagate refault pressure to siblings. ]
> >
> > This patch moves refault detection to the root of reclaim. Instead of
> > remembering the cgroup owner of an evicted page, remember the cgroup
> > that caused the reclaim to happen. When refaults later occur, they'll
> > correctly influence the cross-cgroup LRU order that reclaim follows.
> 
> I spent some time thinking about the idea of calculating refault
> distance using target_memcg's inactive_age and then activating
> refaulted page in (possibly) another memcg and I am still having
> trouble convincing myself that this should work correctly. However I
> also was unable to convince myself otherwise... We use refault
> distance to calculate the deficit in inactive LRU space and then
> activate the refaulted page if that distance is less that
> active+inactive LRU size. However making that decision based on LRU
> sizes of one memcg and then activating the page in another one seems
> very counterintuitive to me. Maybe that's just me though...

It's not activating in a random, unrelated memcg - it's the parental
relationship that makes it work.

If you have a cgroup tree

	root
         |
         A
        / \
       B1 B2

and reclaim is driven by a limit in A, we are reclaiming the pages in
B1 and B2 as if they were on a single LRU list A (it's approximated by
the round-robin reclaim and has some caveats, but that's the idea).

So when a page that belongs to B2 gets evicted, it gets evicted from
virtual LRU list A. When it refaults later, we make the (in)active
size and distance comparisons against virtual LRU list A as well.

The pages on the physical LRU list B2 are not just ordered relative to
its B2 peers, they are also ordered relative to the pages in B1. And
that of course is necessary if we want fair competition between them
under shared reclaim pressure from A.


  reply index

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-07 20:53 [PATCH 0/3] mm: fix page aging across multiple cgroups Johannes Weiner
2019-11-07 20:53 ` [PATCH 1/3] mm: vmscan: move file exhaustion detection to the node level Johannes Weiner
2019-11-10 22:02   ` Suren Baghdasaryan
2019-11-10 22:09   ` Khadarnimcaan Khadarnimcaan
2019-11-07 20:53 ` [PATCH 2/3] mm: vmscan: detect file thrashing at the reclaim root Johannes Weiner
2019-11-11  2:01   ` Suren Baghdasaryan
2019-11-12 17:45     ` Johannes Weiner [this message]
2019-11-12 18:45       ` Suren Baghdasaryan
2019-11-12 18:59         ` Johannes Weiner
2019-11-12 20:35           ` Suren Baghdasaryan
2019-11-14 23:47   ` Shakeel Butt
2019-11-15 16:07     ` Johannes Weiner
2019-11-15 16:52       ` Shakeel Butt
2019-11-07 20:53 ` [PATCH 3/3] mm: vmscan: enforce inactive:active ratio " Johannes Weiner
2019-11-11  2:15   ` Suren Baghdasaryan
2019-11-12 18:00     ` Johannes Weiner
2019-11-12 19:13       ` Suren Baghdasaryan
2019-11-12 20:34         ` Suren Baghdasaryan
2019-11-15  0:29   ` Shakeel Butt
2019-11-27 22:16     ` Shakeel Butt

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191112174533.GA178331@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=aryabinin@virtuozzo.com \
    --cc=cgroups@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git