From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E898EC47404 for ; Sun, 6 Oct 2019 00:10:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A6651222C0 for ; Sun, 6 Oct 2019 00:10:59 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="lsbO3ccU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A6651222C0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 58E486B0005; Sat, 5 Oct 2019 20:10:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53EBE6B0006; Sat, 5 Oct 2019 20:10:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47D776B0008; Sat, 5 Oct 2019 20:10:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0202.hostedemail.com [216.40.44.202]) by kanga.kvack.org (Postfix) with ESMTP id 27DC66B0005 for ; Sat, 5 Oct 2019 20:10:59 -0400 (EDT) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id A469D82437D2 for ; Sun, 6 Oct 2019 00:10:58 +0000 (UTC) X-FDA: 76011429396.04.pot62_5fc6062dbca17 X-HE-Tag: pot62_5fc6062dbca17 X-Filterd-Recvd-Size: 4926 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Sun, 6 Oct 2019 00:10:58 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CFD8B222C0; Sun, 6 Oct 2019 00:10:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1570320657; bh=+g4qQPQ8aX+QtRctlz2wqFvAUAUNS60/isZgWnk+1kI=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=lsbO3ccUnnvDN1ooX3iU7G3CNiR9pdvCy9JZG8ppSntngyt2YPprkRchEAwTgPKBU lp36JmVVzeWAc5t5fCf9q0fMTnSrBUJCChRAu1KFG4PKKDDBnETG8NMbUSR91hFgXG OP62ZBZ6OE549xIgqX6Xjjg//PREWGQobGdaOgkg= Date: Sat, 5 Oct 2019 17:10:56 -0700 From: Andrew Morton To: Honglei Wang Cc: linux-mm@kvack.org, vdavydov.dev@gmail.com, hannes@cmpxchg.org, mhocko@kernel.org Subject: Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size Message-Id: <20191005171056.f96adf25459757a907b32dd7@linux-foundation.org> In-Reply-To: <20190905071034.16822-1-honglei.wang@oracle.com> References: <20190905071034.16822-1-honglei.wang@oracle.com> X-Mailer: Sylpheed 3.5.1 (GTK+ 2.24.31; x86_64-pc-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, 5 Sep 2019 15:10:34 +0800 Honglei Wang wrote: > lruvec_lru_size() is involving lruvec_page_state_local() to get the > lru_size in the current code. It's base on lruvec_stat_local.count[] > of mem_cgroup_per_node. This counter is updated in batch. It won't > do charge if the number of coming pages doesn't meet the needs of > MEMCG_CHARGE_BATCH who's defined as 32 now. > > The testcase in LTP madvise09[1] fails due to small block memory is > not charged. It creates a new memcgroup and sets up 32 MADV_FREE > pages. Then it forks child who will introduce memory pressure in the > memcgroup. The MADV_FREE pages are expected to be released under the > pressure, but 32 is not more than MEMCG_CHARGE_BATCH and these pages > won't be charged in lruvec_stat_local.count[] until some more pages > come in to satisfy the needs of batch charging. So these MADV_FREE > pages can't be freed in memory pressure which is a bit conflicted > with the definition of MADV_FREE. > > Getting lru_size base on lru_zone_size of mem_cgroup_per_node which > is not updated in batch can make it a bit more accurate in similar > scenario. I redid the changelog somewhat: : lruvec_lru_size() is invokving lruvec_page_state_local() to get the : lru_size. It's base on lruvec_stat_local.count[] of mem_cgroup_per_node. : This counter is updated in a batched way. It won't be charged if the : number of incoming pages doesn't meet the needs of MEMCG_CHARGE_BATCH : which is defined as 32. : : The testcase in LTP madvise09[1] fails because small blocks of memory are : not charged. It creates a new memcgroup and sets up 32 MADV_FREE pages. : Then it forks a child who will introduce memory pressure in the memcgroup. : The MADV_FREE pages are expected to be released under the pressure, but : 32 is not more than MEMCG_CHARGE_BATCH and these pages won't be charged in : lruvec_stat_local.count[] until some more pages come in to satisfy the : needs of batch charging. So these MADV_FREE pages can't be freed in : memory pressure which is a bit conflicted with the definition of : MADV_FREE. : : Get the lru_size base on lru_zone_size of mem_cgroup_per_node which is not : updated via batching can making it more accurate in this scenario. : : This is effectively a partial reversion of 1a61ab8038e72 ("mm: memcontrol: : replace zone summing with lruvec_page_state()"). : : [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise09.c > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -354,12 +354,13 @@ unsigned long zone_reclaimable_pages(struct zone *zone) > */ > unsigned long lruvec_lru_size(struct lruvec *lruvec, enum lru_list lru, int zone_idx) > { > - unsigned long lru_size; > + unsigned long lru_size = 0; > int zid; > > - if (!mem_cgroup_disabled()) > - lru_size = lruvec_page_state_local(lruvec, NR_LRU_BASE + lru); > - else > + if (!mem_cgroup_disabled()) { > + for (zid = 0; zid < MAX_NR_ZONES; zid++) > + lru_size += mem_cgroup_get_zone_lru_size(lruvec, lru, zid); > + } else > lru_size = node_page_state(lruvec_pgdat(lruvec), NR_LRU_BASE + lru); > > for (zid = zone_idx + 1; zid < MAX_NR_ZONES; zid++) { Do we think this problem is serious enough to warrant backporting into earlier kernels?