From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AD115C3A5A6 for ; Thu, 19 Sep 2019 13:13:50 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2F92D20678 for ; Thu, 19 Sep 2019 13:13:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2F92D20678 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 89BA96B0358; Thu, 19 Sep 2019 09:13:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 824CE6B0359; Thu, 19 Sep 2019 09:13:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 739A86B035A; Thu, 19 Sep 2019 09:13:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0136.hostedemail.com [216.40.44.136]) by kanga.kvack.org (Postfix) with ESMTP id 516436B0358 for ; Thu, 19 Sep 2019 09:13:49 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id 05356824CA19 for ; Thu, 19 Sep 2019 13:13:49 +0000 (UTC) X-FDA: 75951712578.05.turn70_786314e7abe1c X-HE-Tag: turn70_786314e7abe1c X-Filterd-Recvd-Size: 6985 Received: from mail3-162.sinamail.sina.com.cn (mail3-162.sinamail.sina.com.cn [202.108.3.162]) by imf16.hostedemail.com (Postfix) with SMTP for ; Thu, 19 Sep 2019 13:13:46 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([114.253.228.21]) by sina.com with ESMTP id 5D837F05000021B9; Thu, 19 Sep 2019 21:13:43 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 63545549285096 From: Hillf Danton To: Michal Hocko , Johannes Weiner Cc: Andrew Morton , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Shakeel Butt , Roman Gushchin , Matthew Wilcox , Hillf Danton Subject: [RFC] mm: memcg: add priority for soft limit reclaiming Date: Thu, 19 Sep 2019 21:13:32 +0800 Message-Id: <20190919131332.4180-1-hdanton@sina.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently memory controler is playing increasingly important role in how memory is used and how pages are reclaimed on memory pressure. In daily works memcg is often created for critical tasks and their pre configured memory usage is supposed to be met even on memory pressure. Administrator wants to make it configurable that the pages consumed by memcg-B can be reclaimed by page allocations invoked not by memcg-A but by memcg-C. That configurability is addressed by adding priority for soft limit reclaiming to make sure that no pages will be reclaimed from memcg of higer priortiy in favor of memcg of lower priority. Pages are reclaimed with no priority being taken into account by default unless user turns it on, and then they are responsible for their smart activities almost the same way as they play realtime FIFO/RR games. Priority is available only in the direct reclaiming context in order to advoid churning in the complex kswapd behavior. Cc: Shakeel Butt Cc: Roman Gushchin Cc: Matthew Wilcox Cc: Johannes Weiner Cc: Michal Hocko Signed-off-by: Hillf Danton --- --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -230,6 +230,21 @@ struct mem_cgroup { int under_oom; =20 int swappiness; + /* + * slrp, soft limit reclaiming priority + * + * 0, by default, no slrp considered on soft reclaiming. + * + * 1-32, user configurable in ascending order, + * no page will be reclaimed from memcg of higher slrp in + * favor of memcg of lower slrp. + * + * only in direct reclaiming context now. + */ + int slrp; +#define MEMCG_SLRP_MIN 1 +#define MEMCG_SLRP_MAX 32 + /* OOM-Killer disable */ int oom_kill_disable; =20 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -647,7 +647,8 @@ static void mem_cgroup_remove_from_trees } =20 static struct mem_cgroup_per_node * -__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mc= tz) +__mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mc= tz, + int slrp) { struct mem_cgroup_per_node *mz; =20 @@ -664,7 +665,7 @@ retry: * position in the tree. */ __mem_cgroup_remove_exceeded(mz, mctz); - if (!soft_limit_excess(mz->memcg) || + if (!soft_limit_excess(mz->memcg) || mz->memcg->slrp > slrp || !css_tryget_online(&mz->memcg->css)) goto retry; done: @@ -672,12 +673,13 @@ done: } =20 static struct mem_cgroup_per_node * -mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz= ) +mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz= , + int slrp) { struct mem_cgroup_per_node *mz; =20 spin_lock_irq(&mctz->lock); - mz =3D __mem_cgroup_largest_soft_limit_node(mctz); + mz =3D __mem_cgroup_largest_soft_limit_node(mctz, slrp); spin_unlock_irq(&mctz->lock); return mz; } @@ -2972,6 +2974,31 @@ static int mem_cgroup_resize_max(struct return ret; } =20 +static int mem_cgroup_get_slrp(void) +{ + int slrp; + + if (current->flags & PF_KTHREAD) { + /* + * now slrp does not churn in background reclaiming to + * make life simple + */ + slrp =3D 0; + } else { + struct mem_cgroup *memcg; + + rcu_read_lock(); + memcg =3D mem_cgroup_from_task(current); + if (!memcg || memcg =3D=3D root_mem_cgroup) + slrp =3D 0; + else + slrp =3D memcg->slrp; + rcu_read_unlock(); + } + + return slrp; +} + unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, gfp_t gfp_mask, unsigned long *total_scanned) @@ -2980,6 +3007,7 @@ unsigned long mem_cgroup_soft_limit_recl struct mem_cgroup_per_node *mz, *next_mz =3D NULL; unsigned long reclaimed; int loop =3D 0; + int slrp; struct mem_cgroup_tree_per_node *mctz; unsigned long excess; unsigned long nr_scanned; @@ -2997,6 +3025,7 @@ unsigned long mem_cgroup_soft_limit_recl if (!mctz || RB_EMPTY_ROOT(&mctz->rb_root)) return 0; =20 + slrp =3D mem_cgroup_get_slrp(); /* * This loop can run a while, specially if mem_cgroup's continuously * keep exceeding their soft limit and putting the system under @@ -3006,7 +3035,7 @@ unsigned long mem_cgroup_soft_limit_recl if (next_mz) mz =3D next_mz; else - mz =3D mem_cgroup_largest_soft_limit_node(mctz); + mz =3D mem_cgroup_largest_soft_limit_node(mctz, slrp); if (!mz) break; =20 @@ -3024,8 +3053,8 @@ unsigned long mem_cgroup_soft_limit_recl */ next_mz =3D NULL; if (!reclaimed) - next_mz =3D __mem_cgroup_largest_soft_limit_node(mctz); - + next_mz =3D __mem_cgroup_largest_soft_limit_node(mctz, + slrp); excess =3D soft_limit_excess(mz->memcg); /* * One school of thought says that we should not add @@ -5817,6 +5846,37 @@ static ssize_t memory_oom_group_write(st return nbytes; } =20 +static int memory_slrp_show(struct seq_file *m, void *v) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_seq(m); + + seq_printf(m, "%d\n", memcg->slrp); + + return 0; +} + +static ssize_t memory_slrp_write(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct mem_cgroup *memcg =3D mem_cgroup_from_css(of_css(of)); + int ret, slrp; + + buf =3D strstrip(buf); + if (!buf) + return -EINVAL; + + ret =3D kstrtoint(buf, 0, &slrp); + if (ret) + return ret; + + if (slrp < MEMCG_SLRP_MIN || MEMCG_SLRP_MAX < slrp) + return -EINVAL; + + memcg->slrp =3D slrp; + + return nbytes; +} + static struct cftype memory_files[] =3D { { .name =3D "current", @@ -5870,6 +5930,12 @@ static struct cftype memory_files[] =3D { .seq_show =3D memory_oom_group_show, .write =3D memory_oom_group_write, }, + { + .name =3D "slrp", + .flags =3D CFTYPE_NOT_ON_ROOT | CFTYPE_NS_DELEGATABLE, + .seq_show =3D memory_slrp_show, + .write =3D memory_slrp_write, + }, { } /* terminate */ }; =20