All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Ryabinin <aryabinin@virtuozzo.com>
To: Roman Gushchin <guro@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>, Rik van Riel <riel@surriel.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Shakeel Butt <shakeelb@google.com>
Subject: Re: [PATCH RFC] mm/vmscan: try to protect active working set of cgroup from reclaim.
Date: Tue, 26 Feb 2019 18:36:38 +0300	[thread overview]
Message-ID: <88207884-c643-eb2c-a784-6a7b11d0e7c7@virtuozzo.com> (raw)
In-Reply-To: <20190225040255.GA31684@castle.DHCP.thefacebook.com>



On 2/25/19 7:03 AM, Roman Gushchin wrote:
> On Fri, Feb 22, 2019 at 08:58:25PM +0300, Andrey Ryabinin wrote:
>> In a presence of more than 1 memory cgroup in the system our reclaim
>> logic is just suck. When we hit memory limit (global or a limit on
>> cgroup with subgroups) we reclaim some memory from all cgroups.
>> This is sucks because, the cgroup that allocates more often always wins.
>> E.g. job that allocates a lot of clean rarely used page cache will push
>> out of memory other jobs with active relatively small all in memory
>> working set.
>>
>> To prevent such situations we have memcg controls like low/max, etc which
>> are supposed to protect jobs or limit them so they to not hurt others.
>> But memory cgroups are very hard to configure right because it requires
>> precise knowledge of the workload which may vary during the execution.
>> E.g. setting memory limit means that job won't be able to use all memory
>> in the system for page cache even if the rest the system is idle.
>> Basically our current scheme requires to configure every single cgroup
>> in the system.
>>
>> I think we can do better. The idea proposed by this patch is to reclaim
>> only inactive pages and only from cgroups that have big
>> (!inactive_is_low()) inactive list. And go back to shrinking active lists
>> only if all inactive lists are low.
> 
> Hi Andrey!
> 
> It's definitely an interesting idea! However, let me bring some concerns:
> 1) What's considered active and inactive depends on memory pressure inside
> a cgroup.

There is no such dependency. High memory pressure may be generated both
by active and inactive pages. We also can have a cgroup creating no pressure
with almost only active (or only inactive) pages.

> Actually active pages in one cgroup (e.g. just deleted) can be colder
> than inactive pages in an other (e.g. a memory-hungry cgroup with a tight
> memory.max).
> 

Well, yes, this is a drawback of having per-memcg lrus.

> Also a workload inside a cgroup can to some extend control what's going
> to the active LRU. So it opens a way to get more memory unfairly by
> artificially promoting more pages to the active LRU. So a cgroup
> can get an unfair advantage over other cgroups.
> 

Unfair is usually a negative term, but in this case it's very much depends on definition of what is "fair".

If fair means to put equal reclaim pressure on all cgroups, than yes, the patch
increases such unfairness, but such unfairness is a good thing.
Obviously it's more valuable to keep in memory actively used page than the page that not used.

> Generally speaking, now we have a way to measure the memory pressure
> inside a cgroup. So, in theory, it should be possible to balance
> scanning effort based on memory pressure.
> 

Simply by design, the inactive pages are the first candidates to reclaim.
Any decision that doesn't take into account inactive pages probably would be wrong.

E.g. cgroup A with active job loading a big and active working set which creates high memory pressure
and cgroup B - idle (no memory pressure) with a huge not used cache.
It's definitely preferable to reclaim from B rather than from A.

  reply	other threads:[~2019-02-26 15:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-22 17:58 [PATCH RFC] mm/vmscan: try to protect active working set of cgroup from reclaim Andrey Ryabinin
2019-02-22 18:56 ` Rik van Riel
2019-02-22 19:15 ` Johannes Weiner
2019-02-26 12:50   ` Andrey Ryabinin
2019-03-01 10:38     ` Andrey Ryabinin
2019-03-01 17:49       ` Johannes Weiner
2019-03-01 19:46         ` Andrey Ryabinin
2019-03-01 22:20           ` Johannes Weiner
2019-03-04 17:02             ` Andrey Ryabinin
2019-02-25  4:03 ` Roman Gushchin
2019-02-26 15:36   ` Andrey Ryabinin [this message]
2019-02-26 22:08     ` Roman Gushchin
2019-02-25 13:57 ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88207884-c643-eb2c-a784-6a7b11d0e7c7@virtuozzo.com \
    --to=aryabinin@virtuozzo.com \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.