From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E667ECA9EB7 for ; Mon, 21 Oct 2019 12:14:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B5515205C9 for ; Mon, 21 Oct 2019 12:14:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B5515205C9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3A1906B0003; Mon, 21 Oct 2019 08:14:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3296C6B0005; Mon, 21 Oct 2019 08:14:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23F926B0006; Mon, 21 Oct 2019 08:14:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0248.hostedemail.com [216.40.44.248]) by kanga.kvack.org (Postfix) with ESMTP id 0457E6B0003 for ; Mon, 21 Oct 2019 08:14:56 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 79414584A for ; Mon, 21 Oct 2019 12:14:56 +0000 (UTC) X-FDA: 76067685792.15.wall86_df7fac85101a X-HE-Tag: wall86_df7fac85101a X-Filterd-Recvd-Size: 5717 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Oct 2019 12:14:55 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 6C924ACB7; Mon, 21 Oct 2019 12:14:54 +0000 (UTC) Date: Mon, 21 Oct 2019 14:14:53 +0200 From: Michal Hocko To: Hillf Danton Cc: linux-mm , Andrew Morton , linux-kernel , Chris Down , Tejun Heo , Roman Gushchin , Johannes Weiner , Shakeel Butt , Matthew Wilcox , Minchan Kim , Mel Gorman Subject: Re: [RFC v1] memcg: add memcg lru for page reclaiming Message-ID: <20191021121453.GK9379@dhcp22.suse.cz> References: <20191021115654.14740-1-hdanton@sina.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191021115654.14740-1-hdanton@sina.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 21-10-19 19:56:54, Hillf Danton wrote: > > Currently soft limit reclaim is frozen, see > Documentation/admin-guide/cgroup-v2.rst for reasons. > > Copying the page lru idea, memcg lru is added for selecting victim > memcg to reclaim pages from under memory pressure. It now works in > parallel to slr not only because the latter needs some time to reap > but the coexistence facilitates it a lot to add the lru in a straight > forward manner. This doesn't explain what is the problem/feature you would like to fix/achieve. It also doesn't explain the overall design. > A lru list paired with a spin lock is added, thanks to the current > memcg high_work that provides other things it needs, and a couple of > helpers to add memcg to and pick victim from lru. > > V1 is based on 5.4-rc3. > > Changes since v0 > - add MEMCG_LRU in init/Kconfig > - drop changes in mm/vmscan.c > - make memcg lru work in parallel to slr > > Cc: Chris Down > Cc: Tejun Heo > Cc: Roman Gushchin > Cc: Michal Hocko > Cc: Johannes Weiner > Cc: Shakeel Butt > Cc: Matthew Wilcox > Cc: Minchan Kim > Cc: Mel Gorman > Signed-off-by: Hillf Danton > --- > > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -843,6 +843,14 @@ config MEMCG > help > Provides control over the memory footprint of tasks in a cgroup. > > +config MEMCG_LRU > + bool > + depends on MEMCG > + help > + Select victim memcg on lru for page reclaiming. > + > + Say N if unsure. > + > config MEMCG_SWAP > bool "Swap controller" > depends on MEMCG && SWAP > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -223,6 +223,10 @@ struct mem_cgroup { > /* Upper bound of normal memory consumption range */ > unsigned long high; > > +#ifdef CONFIG_MEMCG_LRU > + struct list_head lru_node; > +#endif > + > /* Range enforcement for interrupt charges */ > struct work_struct high_work; > > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -2338,14 +2338,54 @@ static int memcg_hotplug_cpu_dead(unsign > return 0; > } > > +#ifdef CONFIG_MEMCG_LRU > +static DEFINE_SPINLOCK(memcg_lru_lock); > +static LIST_HEAD(memcg_lru); /* a copy of page lru */ > + > +static void memcg_add_lru(struct mem_cgroup *memcg) > +{ > + spin_lock_irq(&memcg_lru_lock); > + if (list_empty(&memcg->lru_node)) > + list_add_tail(&memcg->lru_node, &memcg_lru); > + spin_unlock_irq(&memcg_lru_lock); > +} > + > +static struct mem_cgroup *memcg_pick_lru(void) > +{ > + struct mem_cgroup *memcg, *next; > + > + spin_lock_irq(&memcg_lru_lock); > + > + list_for_each_entry_safe(memcg, next, &memcg_lru, lru_node) { > + list_del_init(&memcg->lru_node); > + > + if (page_counter_read(&memcg->memory) > memcg->high) { > + spin_unlock_irq(&memcg_lru_lock); > + return memcg; > + } > + } > + spin_unlock_irq(&memcg_lru_lock); > + > + return NULL; > +} > +#endif > + > static void reclaim_high(struct mem_cgroup *memcg, > unsigned int nr_pages, > gfp_t gfp_mask) > { > +#ifdef CONFIG_MEMCG_LRU > + struct mem_cgroup *start = memcg; > +#endif > do { > if (page_counter_read(&memcg->memory) <= memcg->high) > continue; > memcg_memory_event(memcg, MEMCG_HIGH); > + if (IS_ENABLED(CONFIG_MEMCG_LRU)) > + if (start != memcg) { > + memcg_add_lru(memcg); > + return; > + } > try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, true); > } while ((memcg = parent_mem_cgroup(memcg))); > } > @@ -3158,6 +3198,13 @@ unsigned long mem_cgroup_soft_limit_recl > unsigned long excess; > unsigned long nr_scanned; > > + if (IS_ENABLED(CONFIG_MEMCG_LRU)) { > + struct mem_cgroup *memcg = memcg_pick_lru(); > + if (memcg) > + schedule_work(&memcg->high_work); > + return 0; > + } > + > if (order > 0) > return 0; > > @@ -5068,6 +5115,8 @@ static struct mem_cgroup *mem_cgroup_all > if (memcg_wb_domain_init(memcg, GFP_KERNEL)) > goto fail; > > + if (IS_ENABLED(CONFIG_MEMCG_LRU)) > + INIT_LIST_HEAD(&memcg->lru_node); > INIT_WORK(&memcg->high_work, high_work_func); > memcg->last_scanned_node = MAX_NUMNODES; > INIT_LIST_HEAD(&memcg->oom_notify); > -- > -- Michal Hocko SUSE Labs