From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.5 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12000CA9EAE for ; Wed, 23 Oct 2019 04:45:05 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CB9852086D for ; Wed, 23 Oct 2019 04:45:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB9852086D Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6BBC86B0003; Wed, 23 Oct 2019 00:45:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 66CDD6B0006; Wed, 23 Oct 2019 00:45:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 55B326B0007; Wed, 23 Oct 2019 00:45:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0035.hostedemail.com [216.40.44.35]) by kanga.kvack.org (Postfix) with ESMTP id 2F99B6B0003 for ; Wed, 23 Oct 2019 00:45:04 -0400 (EDT) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 9E046181AEF3F for ; Wed, 23 Oct 2019 04:45:03 +0000 (UTC) X-FDA: 76073809686.15.hands23_834a144008912 X-HE-Tag: hands23_834a144008912 X-Filterd-Recvd-Size: 4283 Received: from r3-11.sinamail.sina.com.cn (r3-11.sinamail.sina.com.cn [202.108.3.11]) by imf07.hostedemail.com (Postfix) with SMTP for ; Wed, 23 Oct 2019 04:45:01 +0000 (UTC) Received: from unknown (HELO localhost.localdomain)([222.131.72.81]) by sina.com with ESMTP id 5DAFDAC9000266FF; Wed, 23 Oct 2019 12:44:59 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 66098349283323 From: Hillf Danton To: Michal Hocko Cc: Hillf Danton , linux-mm , Andrew Morton , linux-kernel , Chris Down , Tejun Heo , Roman Gushchin , Johannes Weiner , Shakeel Butt , Matthew Wilcox , Minchan Kim , Mel Gorman Subject: Re: [RFC v1] memcg: add memcg lru for page reclaiming Date: Wed, 23 Oct 2019 12:44:48 +0800 Message-Id: <20191023044448.16484-1-hdanton@sina.com> In-Reply-To: <20191022133050.15620-1-hdanton@sina.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 22 Oct 2019 15:58:32 +0200 Michal Hocko wrote: >=20 > On Tue 22-10-19 21:30:50, Hillf Danton wrote: > >=20 > > On Mon, 21 Oct 2019 14:14:53 +0200 Michal Hocko wrote: > > >=20 > > > On Mon 21-10-19 19:56:54, Hillf Danton wrote: > > > >=20 > > > > Currently soft limit reclaim is frozen, see > > > > Documentation/admin-guide/cgroup-v2.rst for reasons. > > > >=20 > > > > Copying the page lru idea, memcg lru is added for selecting victi= m > > > > memcg to reclaim pages from under memory pressure. It now works i= n > > > > parallel to slr not only because the latter needs some time to re= ap > > > > but the coexistence facilitates it a lot to add the lru in a stra= ight > > > > forward manner. > > >=20 > > > This doesn't explain what is the problem/feature you would like to > > > fix/achieve. It also doesn't explain the overall design.=20 > >=20 > > 1, memcg lru makes page reclaiming hierarchy aware >=20 > Is that a problem statement or a design goal? A problem in soft limit reclaim as per cgroup-v2.rst that is addressed in the RFC. > > While doing the high work, memcgs are currently reclaimed one after > > another up through the hierarchy; >=20 > Which is the design because it is the the memcg where the high limit go= t > hit. The hierarchical behavior ensures that the subtree of that memcg i= s > reclaimed and we try to spread the reclaim fairly over the tree. Yeah, that coding is scarcely able to escape standing ovation. No one of its merits yet is missed in the RFC except for breaking spiraling up the memcg hierarchy into two parts, the up half that rips pages off the first victim, and the bottom half that queues the victim's first ancestor on th= e lru(the ice box storing the cakes baked for kswapd), see below for reason= s. > > in this RFC after ripping pages off > > the first victim, the work finishes with the first ancestor of the vi= ctim > > added to lru. > >=20 > > Recaliming is defered until kswapd becomes active. >=20 > This is a wrong assumption because high limit might be configured way > before kswapd is woken up. This change was introduced because high limit breach looks not like a serious problem in the absence of memory pressure. Lets do the hard work, reclaiming one memcg a time up through the hierarchy, when kswapd becomes active. It also explains the BH introduced. > > 2, memcg lru tries much to avoid overreclaim >=20 > Again, is this a problem statement or a design goal? Another problem in SLR as per cgroup-v2.rst that is addressed in the RFC. > > Only one memcg is picked off lru in FIFO mode under memory pressure, > > and MEMCG_CHARGE_BATCH pages are reclaimed one memcg at a time. >=20 > And why is this preferred over SWAP_CLUSTER_MAX No change is added in the current high work behavior in terms of MEMCG_CHARGE_BATCH; try_to_free_mem_cgroup_pages() takes care of both. > and whole subtree reclaim that we do currently?=20 We terminate climbing up the hierarchy once kswapd finger snaps "Cut. Wor= k done." Thanks Hillf