Re: [PATCH RFC] mm/vmscan: try to protect active working set of cgroup from reclaim.

From: Rik van Riel <riel@surriel.com>
To: Andrey Ryabinin <aryabinin@virtuozzo.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Roman Gushchin <guro@fb.com>, Shakeel Butt <shakeelb@google.com>
Subject: Re: [PATCH RFC] mm/vmscan: try to protect active working set of cgroup from reclaim.
Date: Fri, 22 Feb 2019 13:56:13 -0500	[thread overview]
Message-ID: <f39cf989bc4c39c6065cd0896c99e20c63316b3b.camel@surriel.com> (raw)
In-Reply-To: <20190222175825.18657-1-aryabinin@virtuozzo.com>

[-- Attachment #1: Type: text/plain, Size: 2086 bytes --]

On Fri, 2019-02-22 at 20:58 +0300, Andrey Ryabinin wrote:
> In a presence of more than 1 memory cgroup in the system our reclaim
> logic is just suck. When we hit memory limit (global or a limit on
> cgroup with subgroups) we reclaim some memory from all cgroups.
> This is sucks because, the cgroup that allocates more often always
> wins.
> E.g. job that allocates a lot of clean rarely used page cache will
> push
> out of memory other jobs with active relatively small all in memory
> working set.
> 
> To prevent such situations we have memcg controls like low/max, etc
> which
> are supposed to protect jobs or limit them so they to not hurt
> others.
> But memory cgroups are very hard to configure right because it
> requires
> precise knowledge of the workload which may vary during the
> execution.
> E.g. setting memory limit means that job won't be able to use all
> memory
> in the system for page cache even if the rest the system is idle.
> Basically our current scheme requires to configure every single
> cgroup
> in the system.
> 
> I think we can do better. The idea proposed by this patch is to
> reclaim
> only inactive pages and only from cgroups that have big
> (!inactive_is_low()) inactive list. And go back to shrinking active
> lists
> only if all inactive lists are low.

Your general idea seems like a good one, but
the logic in the code seems a little convoluted
to me.

I wonder if we can simplify things a little, by
checking (when we enter page reclaim) whether
the pgdat has enough inactive pages based on
the node_page_state statistics, and basing our
decision whether or not to scan the active lists
off that.

As it stands, your patch seems like the kind of
code that makes perfect sense today, but which
will confuse people who look at the code two
years from now.

If the code could be made a little more explicit,
great. If there are good reasons to do things in
the fallback way your current patch does it, the
code could use some good comments explaining why :)

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]