From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D79B3C433B4 for ; Mon, 5 Apr 2021 18:09:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6F09061246 for ; Mon, 5 Apr 2021 18:09:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6F09061246 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 027306B007E; Mon, 5 Apr 2021 14:09:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ECA656B0080; Mon, 5 Apr 2021 14:09:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF65C6B0081; Mon, 5 Apr 2021 14:09:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130]) by kanga.kvack.org (Postfix) with ESMTP id AE90B6B007E for ; Mon, 5 Apr 2021 14:09:06 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 648AC824999B for ; Mon, 5 Apr 2021 18:09:06 +0000 (UTC) X-FDA: 77999099892.30.5526ADD Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf06.hostedemail.com (Postfix) with ESMTP id 7DDF4C0007CF for ; Mon, 5 Apr 2021 18:09:06 +0000 (UTC) IronPort-SDR: Xa6vK0t732LaAug40qQuxxHbWF7JarW9+qFqS3fz+hYz0J2X7cRoVqmDGn9romvPcbQ+Xmo016 9MfnlnEeED2Q== X-IronPort-AV: E=McAfee;i="6000,8403,9945"; a="172968210" X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="172968210" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2021 11:09:05 -0700 IronPort-SDR: dqiazJeDK+5dnrLq82BR3rtp/1vJ1y/Xf2Kr8yfYi7ZHLqw5nCAi+y3mvFlQ+nLNp5VEX6GwQH yoQOMuM6C30w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.81,307,1610438400"; d="scan'208";a="448153918" Received: from skl-02.jf.intel.com ([10.54.74.28]) by fmsmga002.fm.intel.com with ESMTP; 05 Apr 2021 11:09:04 -0700 From: Tim Chen To: Michal Hocko Cc: Tim Chen , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Shakeel Butt , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 05/11] mm: Add soft_limit_top_tier tree for mem cgroup Date: Mon, 5 Apr 2021 10:08:29 -0700 Message-Id: <04b7c9bce901d271eae216dcfbb928aadc8d48d0.1617642417.git.tim.c.chen@linux.intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 7DDF4C0007CF X-Stat-Signature: wmwj8owoemrf3ht4i7xmq1964errnzsz Received-SPF: none (linux.intel.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=mga17.intel.com; client-ip=192.55.52.151 X-HE-DKIM-Result: none/none X-HE-Tag: 1617646146-41239 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Define a per node soft_limit_top_tier red black tree that sort and track the cgroups by each group's excess over its toptier soft limit. A cgroup is added to the tree if it has exceeded its top tier soft limit and it has used pages on the node. Signed-off-by: Tim Chen --- mm/memcontrol.c | 68 +++++++++++++++++++++++++++++++++++++------------ 1 file changed, 52 insertions(+), 16 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 68590f46fa76..90a78ff3fca8 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -122,6 +122,7 @@ struct mem_cgroup_tree { }; =20 static struct mem_cgroup_tree soft_limit_tree __read_mostly; +static struct mem_cgroup_tree soft_limit_toptier_tree __read_mostly; =20 /* for OOM */ struct mem_cgroup_eventfd_list { @@ -590,17 +591,27 @@ mem_cgroup_page_nodeinfo(struct mem_cgroup *memcg, = struct page *page) } =20 static struct mem_cgroup_tree_per_node * -soft_limit_tree_node(int nid) -{ - return soft_limit_tree.rb_tree_per_node[nid]; +soft_limit_tree_node(int nid, enum node_states type) +{ + switch (type) { + case N_MEMORY: + return soft_limit_tree.rb_tree_per_node[nid]; + case N_TOPTIER: + if (node_state(nid, N_TOPTIER)) + return soft_limit_toptier_tree.rb_tree_per_node[nid]; + else + return NULL; + default: + return NULL; + } } =20 static struct mem_cgroup_tree_per_node * -soft_limit_tree_from_page(struct page *page) +soft_limit_tree_from_page(struct page *page, enum node_states type) { int nid =3D page_to_nid(page); =20 - return soft_limit_tree.rb_tree_per_node[nid]; + return soft_limit_tree_node(nid, type); } =20 static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz, @@ -661,12 +672,24 @@ static void mem_cgroup_remove_exceeded(struct mem_c= group_per_node *mz, spin_unlock_irqrestore(&mctz->lock, flags); } =20 -static unsigned long soft_limit_excess(struct mem_cgroup *memcg) +static unsigned long soft_limit_excess(struct mem_cgroup *memcg, enum no= de_states type) { - unsigned long nr_pages =3D page_counter_read(&memcg->memory); - unsigned long soft_limit =3D READ_ONCE(memcg->soft_limit); + unsigned long nr_pages; + unsigned long soft_limit; unsigned long excess =3D 0; =20 + switch (type) { + case N_MEMORY: + nr_pages =3D page_counter_read(&memcg->memory); + soft_limit =3D READ_ONCE(memcg->soft_limit); + break; + case N_TOPTIER: + nr_pages =3D page_counter_read(&memcg->toptier); + soft_limit =3D READ_ONCE(memcg->toptier_soft_limit); + break; + default: + return 0; + } if (nr_pages > soft_limit) excess =3D nr_pages - soft_limit; =20 @@ -679,7 +702,7 @@ static void mem_cgroup_update_tree(struct mem_cgroup = *memcg, struct page *page) struct mem_cgroup_per_node *mz; struct mem_cgroup_tree_per_node *mctz; =20 - mctz =3D soft_limit_tree_from_page(page); + mctz =3D soft_limit_tree_from_page(page, N_MEMORY); if (!mctz) return; /* @@ -688,7 +711,7 @@ static void mem_cgroup_update_tree(struct mem_cgroup = *memcg, struct page *page) */ for (; memcg; memcg =3D parent_mem_cgroup(memcg)) { mz =3D mem_cgroup_page_nodeinfo(memcg, page); - excess =3D soft_limit_excess(memcg); + excess =3D soft_limit_excess(memcg, N_MEMORY); /* * We have to update the tree if mz is on RB-tree or * mem is over its softlimit. @@ -718,7 +741,7 @@ static void mem_cgroup_remove_from_trees(struct mem_c= group *memcg) =20 for_each_node(nid) { mz =3D mem_cgroup_nodeinfo(memcg, nid); - mctz =3D soft_limit_tree_node(nid); + mctz =3D soft_limit_tree_node(nid, N_MEMORY); if (mctz) mem_cgroup_remove_exceeded(mz, mctz); } @@ -742,7 +765,7 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgrou= p_tree_per_node *mctz) * position in the tree. */ __mem_cgroup_remove_exceeded(mz, mctz); - if (!soft_limit_excess(mz->memcg) || + if (!soft_limit_excess(mz->memcg, N_MEMORY) || !css_tryget(&mz->memcg->css)) goto retry; done: @@ -1805,7 +1828,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgrou= p *root_memcg, .pgdat =3D pgdat, }; =20 - excess =3D soft_limit_excess(root_memcg); + excess =3D soft_limit_excess(root_memcg, N_MEMORY); =20 while (1) { victim =3D mem_cgroup_iter(root_memcg, victim, &reclaim); @@ -1834,7 +1857,7 @@ static int mem_cgroup_soft_reclaim(struct mem_cgrou= p *root_memcg, total +=3D mem_cgroup_shrink_node(victim, gfp_mask, false, pgdat, &nr_scanned); *total_scanned +=3D nr_scanned; - if (!soft_limit_excess(root_memcg)) + if (!soft_limit_excess(root_memcg, N_MEMORY)) break; } mem_cgroup_iter_break(root_memcg, victim); @@ -3457,7 +3480,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data= _t *pgdat, int order, if (order > 0) return 0; =20 - mctz =3D soft_limit_tree_node(pgdat->node_id); + mctz =3D soft_limit_tree_node(pgdat->node_id, N_MEMORY); =20 /* * Do not even bother to check the largest node if the root @@ -3513,7 +3536,7 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data= _t *pgdat, int order, if (!reclaimed) next_mz =3D __mem_cgroup_largest_soft_limit_node(mctz); =20 - excess =3D soft_limit_excess(mz->memcg); + excess =3D soft_limit_excess(mz->memcg, N_MEMORY); /* * One school of thought says that we should not add * back the node to the tree if reclaim returns 0. @@ -7189,6 +7212,19 @@ static int __init mem_cgroup_init(void) rtpn->rb_rightmost =3D NULL; spin_lock_init(&rtpn->lock); soft_limit_tree.rb_tree_per_node[node] =3D rtpn; + + if (!node_state(node, N_TOPTIER)) { + soft_limit_toptier_tree.rb_tree_per_node[node] =3D NULL; + continue; + } + + rtpn =3D kzalloc_node(sizeof(*rtpn), GFP_KERNEL, + node_online(node) ? node : NUMA_NO_NODE); + + rtpn->rb_root =3D RB_ROOT; + rtpn->rb_rightmost =3D NULL; + spin_lock_init(&rtpn->lock); + soft_limit_toptier_tree.rb_tree_per_node[node] =3D rtpn; } =20 return 0; --=20 2.20.1