From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.6 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CD91C433DF for ; Fri, 17 Jul 2020 01:37:29 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2DC712070E for ; Fri, 17 Jul 2020 01:37:29 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="PqXDgyEf" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2DC712070E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BC27E8D0009; Thu, 16 Jul 2020 21:37:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B73678D0003; Thu, 16 Jul 2020 21:37:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB02C8D0009; Thu, 16 Jul 2020 21:37:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0110.hostedemail.com [216.40.44.110]) by kanga.kvack.org (Postfix) with ESMTP id 93F398D0003 for ; Thu, 16 Jul 2020 21:37:28 -0400 (EDT) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 51A852C1E for ; Fri, 17 Jul 2020 01:37:28 +0000 (UTC) X-FDA: 77045855376.11.skate10_180f02526f06 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 4C883180F8B9C for ; Fri, 17 Jul 2020 01:37:26 +0000 (UTC) X-HE-Tag: skate10_180f02526f06 X-Filterd-Recvd-Size: 8206 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Jul 2020 01:37:25 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id r19so10640145ljn.12 for ; Thu, 16 Jul 2020 18:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4fwyecrqDo3xAzbltI8V1YX+FnW4zOMOFhcPK3N30M4=; b=PqXDgyEfEfNOXnWkDOIuH1UvNlOdmlNHdOcwFeIs0TRFLODS2F4x31VrzG+XnVzJa2 0gcmglp41ZGXXORmw9HPVM8Inf3MWsJb1zyrilKA7A6984hcjxm9VoDbsn7uF5UuAnf6 g6Tz7lI6M6DhWhNiOz0MaBdM32nQiFUzNtmXmXIKHZWgJt8OZNG6CTlGLwdeg+2CCHkG DLyUZeml9wlTTx7pSCfWrc9Kv/vfUlB13w1bwnzVDxJpBfLKdl27pwUOJaLpCHkwPFFq V8M17JsKOKGeRitE0UGsQRhMSoP5uVk752fbTY4a/4hHVCDP48BOUUywFB6fUgzl5inl lDwg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=4fwyecrqDo3xAzbltI8V1YX+FnW4zOMOFhcPK3N30M4=; b=OKsqH2Y45l2kYy653MZW7o4uBdhWg+vhDCLPSWwfGulPepC2AiTQFLDhljQc4u2Tig NSZPXBLmNEMv+fvMsI/jTN/o07bLGbEAU6eQ69cYXzLfW6Tu2lvku0jk0p7wjQEuKtS7 0HtK90m2ABI9hbgQ7XHDBtw0l3FM7xikfBT5YrGsnTG8tqTLVMhn5FIACVXYZMX5IZCn pKnBPnY85uJp8kLvn2/AjijEXvtt7KikoD9FEXTZIPKx+MSJM0RtNO2wL4AFcWsfegeq Bc+UMF2CUYJL9fYR24m0yFi5Hn8Md+1/Zf9wzYaHYfTGj2Ekist0p4zAtRwXHFYEsDTr EK2g== X-Gm-Message-State: AOAM5306e80rcBtJYehSUHlhnfvMq62V4OJ7WpIYmSW6lZSbtTtjfEvG NXX4zwl10xdWJQ/ADzvmyBmXZkH6Dluzg/St+2yw3w== X-Google-Smtp-Source: ABdhPJywPNKwcReDncD+Y3Bra8NgMUVwK6PVaktNdLTZgvPMo2hBPWGXBz1DQAnnQ/OryTfZgHjpNLXLY4w6sZ5zxsw= X-Received: by 2002:a2e:9585:: with SMTP id w5mr2972062ljh.58.1594949844035; Thu, 16 Jul 2020 18:37:24 -0700 (PDT) MIME-Version: 1.0 References: <20200715071522.19663-1-sjpark@amazon.com> In-Reply-To: From: Shakeel Butt Date: Thu, 16 Jul 2020 18:37:12 -0700 Message-ID: Subject: Re: [patch] mm, memcg: provide an anon_reclaimable stat To: David Rientjes Cc: SeongJae Park , Andrew Morton , Yang Shi , Michal Hocko , Yang Shi , Roman Gushchin , Greg Thelen , Johannes Weiner , Vladimir Davydov , Cgroups , Linux MM Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 4C883180F8B9C X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jul 16, 2020 at 2:28 PM David Rientjes wrote: > > On Thu, 16 Jul 2020, Shakeel Butt wrote: > > > > Userspace can lack insight into the amount of memory that can be reclaimed > > > from a memcg based on values from memory.stat. Two specific examples: > > > > > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > > > inactive file LRU that can be quickly reclaimed under memory pressure > > > but otherwise shows up as mapped anon in memory.stat, and > > > > > > - Memory on deferred split queues (thp) that are compound pages that can > > > be split and uncharged from the memcg under memory pressure, but > > > otherwise shows up as charged anon LRU memory in memory.stat. > > > > > > Both of this anonymous usage is also charged to memory.current. > > > > > > Userspace can currently derive this information but it depends on kernel > > > implementation details for how this memory is handled for the purposes of > > > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > > > > > For the purposes of writing portable userspace code that does not need to > > > have insight into the kernel implementation for reclaimable memory, this > > > exports a stat that reveals the amount of anonymous memory that can be > > > reclaimed and uncharged from the memcg to start new applications. > > > > > > As the kernel implementation evolves for memory that can be reclaimed > > > under memory pressure, this stat can be kept consistent. > > > > > > Signed-off-by: David Rientjes > > > --- > > > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > > > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > > > 2 files changed, 37 insertions(+) > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > > --- a/Documentation/admin-guide/cgroup-v2.rst > > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > > > Amount of memory used in anonymous mappings backed by > > > transparent hugepages > > > > > > + anon_reclaimable > > > + The amount of charged anonymous memory that can be reclaimed > > > + under memory pressure without swap. This currently includes > > > + lazy freeable memory (MADV_FREE) and compound pages that can be > > > + split and uncharged. > > > + > > > inactive_anon, active_anon, inactive_file, active_file, unevictable > > > Amount of memory, swap-backed and filesystem-backed, > > > on the internal memory management lists used by the > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > > > return false; > > > } > > > > > > +/* > > > + * Returns the amount of anon memory that is charged to the memcg that is > > > + * reclaimable under memory pressure without swap, in pages. > > > + */ > > > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > > > +{ > > > + long deferred, lazyfree; > > > + > > > + /* > > > + * Deferred pages are charged anonymous pages that are on the LRU but > > > + * are unmapped. These compound pages are split under memory pressure. > > > + */ > > > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > > > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > > > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > > > > Please note that the NR_ANON_MAPPED does not include tmpfs memory but > > NR_[IN]ACTIVE_ANON does include the tmpfs. > > > > > + /* > > > + * Lazyfree pages are charged clean anonymous pages that are on the file > > > + * LRU and can be reclaimed under memory pressure. > > > + */ > > > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > > > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > > > + memcg_page_state(memcg, NR_FILE_PAGES), 0); > > > > Similarly NR_FILE_PAGES includes tmpfs memory but NR_[IN]ACTIVE_FILE does not. > > > > Ah, so this adds to the motivation of providing the anon_reclaimable stat > because the calculation becomes even more convoluted and completely based > on the kernel implementation details for both lazyfree memory and deferred > split queues. Yes, I agree. > Did you have a calculation in mind for > memcg_anon_reclaimable()? For deferred, "memcg->deferred_split_queue.split_queue_len" should be usable. For lazyfree, NR_ACTIVE_FILE + NR_INACTIVE_FILE + NR_SHMEM - NR_FILE_PAGES seems like the right formula. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Shakeel Butt Subject: Re: [patch] mm, memcg: provide an anon_reclaimable stat Date: Thu, 16 Jul 2020 18:37:12 -0700 Message-ID: References: <20200715071522.19663-1-sjpark@amazon.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=4fwyecrqDo3xAzbltI8V1YX+FnW4zOMOFhcPK3N30M4=; b=PqXDgyEfEfNOXnWkDOIuH1UvNlOdmlNHdOcwFeIs0TRFLODS2F4x31VrzG+XnVzJa2 0gcmglp41ZGXXORmw9HPVM8Inf3MWsJb1zyrilKA7A6984hcjxm9VoDbsn7uF5UuAnf6 g6Tz7lI6M6DhWhNiOz0MaBdM32nQiFUzNtmXmXIKHZWgJt8OZNG6CTlGLwdeg+2CCHkG DLyUZeml9wlTTx7pSCfWrc9Kv/vfUlB13w1bwnzVDxJpBfLKdl27pwUOJaLpCHkwPFFq V8M17JsKOKGeRitE0UGsQRhMSoP5uVk752fbTY4a/4hHVCDP48BOUUywFB6fUgzl5inl lDwg== In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: David Rientjes Cc: SeongJae Park , Andrew Morton , Yang Shi , Michal Hocko , Yang Shi , Roman Gushchin , Greg Thelen , Johannes Weiner , Vladimir Davydov , Cgroups , Linux MM On Thu, Jul 16, 2020 at 2:28 PM David Rientjes wrote: > > On Thu, 16 Jul 2020, Shakeel Butt wrote: > > > > Userspace can lack insight into the amount of memory that can be reclaimed > > > from a memcg based on values from memory.stat. Two specific examples: > > > > > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > > > inactive file LRU that can be quickly reclaimed under memory pressure > > > but otherwise shows up as mapped anon in memory.stat, and > > > > > > - Memory on deferred split queues (thp) that are compound pages that can > > > be split and uncharged from the memcg under memory pressure, but > > > otherwise shows up as charged anon LRU memory in memory.stat. > > > > > > Both of this anonymous usage is also charged to memory.current. > > > > > > Userspace can currently derive this information but it depends on kernel > > > implementation details for how this memory is handled for the purposes of > > > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > > > > > For the purposes of writing portable userspace code that does not need to > > > have insight into the kernel implementation for reclaimable memory, this > > > exports a stat that reveals the amount of anonymous memory that can be > > > reclaimed and uncharged from the memcg to start new applications. > > > > > > As the kernel implementation evolves for memory that can be reclaimed > > > under memory pressure, this stat can be kept consistent. > > > > > > Signed-off-by: David Rientjes > > > --- > > > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > > > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > > > 2 files changed, 37 insertions(+) > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > > --- a/Documentation/admin-guide/cgroup-v2.rst > > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > > > Amount of memory used in anonymous mappings backed by > > > transparent hugepages > > > > > > + anon_reclaimable > > > + The amount of charged anonymous memory that can be reclaimed > > > + under memory pressure without swap. This currently includes > > > + lazy freeable memory (MADV_FREE) and compound pages that can be > > > + split and uncharged. > > > + > > > inactive_anon, active_anon, inactive_file, active_file, unevictable > > > Amount of memory, swap-backed and filesystem-backed, > > > on the internal memory management lists used by the > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > > > return false; > > > } > > > > > > +/* > > > + * Returns the amount of anon memory that is charged to the memcg that is > > > + * reclaimable under memory pressure without swap, in pages. > > > + */ > > > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > > > +{ > > > + long deferred, lazyfree; > > > + > > > + /* > > > + * Deferred pages are charged anonymous pages that are on the LRU but > > > + * are unmapped. These compound pages are split under memory pressure. > > > + */ > > > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > > > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > > > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > > > > Please note that the NR_ANON_MAPPED does not include tmpfs memory but > > NR_[IN]ACTIVE_ANON does include the tmpfs. > > > > > + /* > > > + * Lazyfree pages are charged clean anonymous pages that are on the file > > > + * LRU and can be reclaimed under memory pressure. > > > + */ > > > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > > > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > > > + memcg_page_state(memcg, NR_FILE_PAGES), 0); > > > > Similarly NR_FILE_PAGES includes tmpfs memory but NR_[IN]ACTIVE_FILE does not. > > > > Ah, so this adds to the motivation of providing the anon_reclaimable stat > because the calculation becomes even more convoluted and completely based > on the kernel implementation details for both lazyfree memory and deferred > split queues. Yes, I agree. > Did you have a calculation in mind for > memcg_anon_reclaimable()? For deferred, "memcg->deferred_split_queue.split_queue_len" should be usable. For lazyfree, NR_ACTIVE_FILE + NR_INACTIVE_FILE + NR_SHMEM - NR_FILE_PAGES seems like the right formula.