From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 372F1C47404 for ; Wed, 9 Oct 2019 14:16:48 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 014BC20B7C for ; Wed, 9 Oct 2019 14:16:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 014BC20B7C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9F29D8E0005; Wed, 9 Oct 2019 10:16:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A24D8E0003; Wed, 9 Oct 2019 10:16:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E0038E0005; Wed, 9 Oct 2019 10:16:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7049D8E0003 for ; Wed, 9 Oct 2019 10:16:47 -0400 (EDT) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 0BF35640A for ; Wed, 9 Oct 2019 14:16:47 +0000 (UTC) X-FDA: 76024447254.02.bit37_5847f0d70663d X-HE-Tag: bit37_5847f0d70663d X-Filterd-Recvd-Size: 3396 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf07.hostedemail.com (Postfix) with ESMTP for ; Wed, 9 Oct 2019 14:16:46 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 49126B14B; Wed, 9 Oct 2019 14:16:45 +0000 (UTC) Date: Wed, 9 Oct 2019 16:16:44 +0200 From: Michal Hocko To: Honglei Wang Cc: linux-mm@kvack.org, vdavydov.dev@gmail.com, hannes@cmpxchg.org Subject: Re: [PATCH v2] mm/vmscan: get number of pages on the LRU list in memcgroup base on lru_zone_size Message-ID: <20191009141644.GD6681@dhcp22.suse.cz> References: <20190905071034.16822-1-honglei.wang@oracle.com> <20191007142805.GM2381@dhcp22.suse.cz> <991b4719-a2a0-9efe-de02-56a928752fe3@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <991b4719-a2a0-9efe-de02-56a928752fe3@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 08-10-19 17:34:03, Honglei Wang wrote: > How about we describe it like this: > > Get the lru_size base on lru_zone_size of mem_cgroup_per_node which is not > updated via batching can help any related code path get more precise lru > size in mem_cgroup case. This makes memory reclaim code won't ignore small > blocks of memory(say, less than MEMCG_CHARGE_BATCH pages) in the lru list. I am sorry but this doesn't really explain the problem nor justify the patch. Let's have a look at where we are at first. lruvec_lru_size provides an estimate of the number of pages on the given lru that qualifies for the given zone index. Note the estimate part because that is an optimization for the updaters path which tend to be really hot. Here we are consistent between the global and memcg cases. Now we can have a look at differences between the two cases. The global LRU case relies on periodic syncing from a kworker context. This has no guarantee on the timing and as such we cannot really rely on it to be precise. Memcg path batches updates to MEMCG_CHARGE_BATCH (32) pages and propages the value up the hierarchy. There is no periodic sync up so the unsynced case might stay for ever if there are no new accounting events happening. Now, does it really matter? 32 pages should be really negligible to normal workloads (read to those where MEMCG_CHARGE_BATCH << limits). So we can talk whether other usecases are really sensible. Do we really want to support memcgs with hard limit set to 10 pages? I would say I am not really convinced because I have hard time to see real application other than some artificial testing. On the other hand there is really non trivial effort to make such usecases to work - just consider all potential caching/batching that we do for performance reasons. That being said, making lruvec_lru_size more precise doesn't sound like a bad idea in general. But it comes with an additional cost which shouldn't really matter much with the current code because it shouldn't be used from hot paths. But is this really the case? Have you done all the audit? Is this going to stay that way? These are important questions to answer in the changelog to justify the change properly. I hope this makes more sense now. -- Michal Hocko SUSE Labs