From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2CEACA9EAF for ; Mon, 21 Oct 2019 11:57:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9167320B7C for ; Mon, 21 Oct 2019 11:57:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9167320B7C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 13CA56B000C; Mon, 21 Oct 2019 07:57:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C7136B000D; Mon, 21 Oct 2019 07:57:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EA9A16B000E; Mon, 21 Oct 2019 07:57:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0220.hostedemail.com [216.40.44.220]) by kanga.kvack.org (Postfix) with ESMTP id C2C936B000C for ; Mon, 21 Oct 2019 07:57:12 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 503BF52CC for ; Mon, 21 Oct 2019 11:57:12 +0000 (UTC) X-FDA: 76067641104.16.ear00_47cd6a3daa0b X-HE-Tag: ear00_47cd6a3daa0b X-Filterd-Recvd-Size: 5147 Received: from mail3-162.sinamail.sina.com.cn (mail3-162.sinamail.sina.com.cn [202.108.3.162]) by imf20.hostedemail.com (Postfix) with SMTP for ; Mon, 21 Oct 2019 11:57:10 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([222.131.66.83]) by sina.com with ESMTP id 5DAD9D0F0000ACFD; Mon, 21 Oct 2019 19:57:07 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 84247849283687 From: Hillf Danton To: linux-mm Cc: Andrew Morton , linux-kernel , Chris Down , Tejun Heo , Roman Gushchin , Michal Hocko , Johannes Weiner , Shakeel Butt , Matthew Wilcox , Minchan Kim , Mel Gorman , Hillf Danton Subject: [RFC v1] memcg: add memcg lru for page reclaiming Date: Mon, 21 Oct 2019 19:56:54 +0800 Message-Id: <20191021115654.14740-1-hdanton@sina.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently soft limit reclaim is frozen, see Documentation/admin-guide/cgroup-v2.rst for reasons. Copying the page lru idea, memcg lru is added for selecting victim memcg to reclaim pages from under memory pressure. It now works in parallel to slr not only because the latter needs some time to reap but the coexistence facilitates it a lot to add the lru in a straight forward manner. A lru list paired with a spin lock is added, thanks to the current memcg high_work that provides other things it needs, and a couple of helpers to add memcg to and pick victim from lru. V1 is based on 5.4-rc3. Changes since v0 - add MEMCG_LRU in init/Kconfig - drop changes in mm/vmscan.c - make memcg lru work in parallel to slr Cc: Chris Down Cc: Tejun Heo Cc: Roman Gushchin Cc: Michal Hocko Cc: Johannes Weiner Cc: Shakeel Butt Cc: Matthew Wilcox Cc: Minchan Kim Cc: Mel Gorman Signed-off-by: Hillf Danton --- --- a/init/Kconfig +++ b/init/Kconfig @@ -843,6 +843,14 @@ config MEMCG help Provides control over the memory footprint of tasks in a cgroup. =20 +config MEMCG_LRU + bool + depends on MEMCG + help + Select victim memcg on lru for page reclaiming. + + Say N if unsure. + config MEMCG_SWAP bool "Swap controller" depends on MEMCG && SWAP --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -223,6 +223,10 @@ struct mem_cgroup { /* Upper bound of normal memory consumption range */ unsigned long high; =20 +#ifdef CONFIG_MEMCG_LRU + struct list_head lru_node; +#endif + /* Range enforcement for interrupt charges */ struct work_struct high_work; =20 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2338,14 +2338,54 @@ static int memcg_hotplug_cpu_dead(unsign return 0; } =20 +#ifdef CONFIG_MEMCG_LRU +static DEFINE_SPINLOCK(memcg_lru_lock); +static LIST_HEAD(memcg_lru); /* a copy of page lru */ + +static void memcg_add_lru(struct mem_cgroup *memcg) +{ + spin_lock_irq(&memcg_lru_lock); + if (list_empty(&memcg->lru_node)) + list_add_tail(&memcg->lru_node, &memcg_lru); + spin_unlock_irq(&memcg_lru_lock); +} + +static struct mem_cgroup *memcg_pick_lru(void) +{ + struct mem_cgroup *memcg, *next; + + spin_lock_irq(&memcg_lru_lock); + + list_for_each_entry_safe(memcg, next, &memcg_lru, lru_node) { + list_del_init(&memcg->lru_node); + + if (page_counter_read(&memcg->memory) > memcg->high) { + spin_unlock_irq(&memcg_lru_lock); + return memcg; + } + } + spin_unlock_irq(&memcg_lru_lock); + + return NULL; +} +#endif + static void reclaim_high(struct mem_cgroup *memcg, unsigned int nr_pages, gfp_t gfp_mask) { +#ifdef CONFIG_MEMCG_LRU + struct mem_cgroup *start =3D memcg; +#endif do { if (page_counter_read(&memcg->memory) <=3D memcg->high) continue; memcg_memory_event(memcg, MEMCG_HIGH); + if (IS_ENABLED(CONFIG_MEMCG_LRU)) + if (start !=3D memcg) { + memcg_add_lru(memcg); + return; + } try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true); } while ((memcg =3D parent_mem_cgroup(memcg))); } @@ -3158,6 +3198,13 @@ unsigned long mem_cgroup_soft_limit_recl unsigned long excess; unsigned long nr_scanned; =20 + if (IS_ENABLED(CONFIG_MEMCG_LRU)) { + struct mem_cgroup *memcg =3D memcg_pick_lru(); + if (memcg) + schedule_work(&memcg->high_work); + return 0; + } + if (order > 0) return 0; =20 @@ -5068,6 +5115,8 @@ static struct mem_cgroup *mem_cgroup_all if (memcg_wb_domain_init(memcg, GFP_KERNEL)) goto fail; =20 + if (IS_ENABLED(CONFIG_MEMCG_LRU)) + INIT_LIST_HEAD(&memcg->lru_node); INIT_WORK(&memcg->high_work, high_work_func); memcg->last_scanned_node =3D MAX_NUMNODES; INIT_LIST_HEAD(&memcg->oom_notify); --