From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.0 required=3.0 tests=BAYES_00,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DD2EC433E3 for ; Fri, 17 Jul 2020 08:34:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D25062071A for ; Fri, 17 Jul 2020 08:34:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D25062071A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6DEF18D0020; Fri, 17 Jul 2020 04:34:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 691858D0001; Fri, 17 Jul 2020 04:34:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 557378D0020; Fri, 17 Jul 2020 04:34:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 3E1568D0001 for ; Fri, 17 Jul 2020 04:34:51 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id F127AB285 for ; Fri, 17 Jul 2020 08:34:50 +0000 (UTC) X-FDA: 77046907140.08.alley16_2914d2026f09 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 04329181BC1EA for ; Fri, 17 Jul 2020 08:34:22 +0000 (UTC) X-HE-Tag: alley16_2914d2026f09 X-Filterd-Recvd-Size: 7711 Received: from mail-wr1-f65.google.com (mail-wr1-f65.google.com [209.85.221.65]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Jul 2020 08:34:22 +0000 (UTC) Received: by mail-wr1-f65.google.com with SMTP id f2so10090609wrp.7 for ; Fri, 17 Jul 2020 01:34:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=UfhL2reVkXBsS/hWgwm1viNovyFExr39sLp/lV2ioDI=; b=FpVynHa2Y3CdkewFpsZ+oZkaf5hkRDNLJuC/KXA7IN3TffJGvGqc8T7OkDZQPRdKhB nCVfnAFNK00MqbRE2pbjwPJtLx9G4M0FhQ6Pkea8yT5n1zScmPJumm4ftuARdtAzaBQt fcw6kVTECukPAdCDApstzyNXa1+yCIrC2jIPGwN1GVOIFRfOtYiZYcjCtK8oqiqNIowg LsV9DRzHnF3sJ2gO0U4SbG8Ptm8nPurA8AKJnQipsHE0gbU8vGuyC7olGUXLa0KfjGEv bzGJpjLkH3bl7hFLNXcBkhdkGBlrsZcqgEIW/WXTDZvKGRWnDP3VH7+tCXTcjs6GIToX p/MA== X-Gm-Message-State: AOAM530s48bcY48LLpe0wZvEhNQ8bJ2GYmvJBVkBKwbKd6ND7WKm+x5k nWDhWBGIe4bEdCuM8pYwlbc= X-Google-Smtp-Source: ABdhPJzGDL/B3yrMnkpUkwB1qMqdmYEMftHGGpTAO2v1HxlyNaX29hMz1jHqMlYuXIKel7d4LrZYXQ== X-Received: by 2002:adf:f452:: with SMTP id f18mr9031837wrp.389.1594974861386; Fri, 17 Jul 2020 01:34:21 -0700 (PDT) Received: from localhost (ip-37-188-169-187.eurotel.cz. [37.188.169.187]) by smtp.gmail.com with ESMTPSA id 5sm12008096wmk.9.2020.07.17.01.34.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jul 2020 01:34:20 -0700 (PDT) Date: Fri, 17 Jul 2020 10:34:19 +0200 From: Michal Hocko To: David Rientjes Cc: SeongJae Park , Andrew Morton , Yang Shi , Shakeel Butt , Yang Shi , Roman Gushchin , Greg Thelen , Johannes Weiner , Vladimir Davydov , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch] mm, memcg: provide an anon_reclaimable stat Message-ID: <20200717083419.GD10655@dhcp22.suse.cz> References: <20200715071522.19663-1-sjpark@amazon.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Queue-Id: 04329181BC1EA X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu 16-07-20 13:58:19, David Rientjes wrote: > Userspace can lack insight into the amount of memory that can be reclaimed > from a memcg based on values from memory.stat. Two specific examples: > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > inactive file LRU that can be quickly reclaimed under memory pressure > but otherwise shows up as mapped anon in memory.stat, and > > - Memory on deferred split queues (thp) that are compound pages that can > be split and uncharged from the memcg under memory pressure, but > otherwise shows up as charged anon LRU memory in memory.stat. > > Both of this anonymous usage is also charged to memory.current. > > Userspace can currently derive this information but it depends on kernel > implementation details for how this memory is handled for the purposes of > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > For the purposes of writing portable userspace code that does not need to > have insight into the kernel implementation for reclaimable memory, this > exports a stat that reveals the amount of anonymous memory that can be > reclaimed and uncharged from the memcg to start new applications. > > As the kernel implementation evolves for memory that can be reclaimed > under memory pressure, this stat can be kept consistent. Please be much more specific about the expected usage. You have mentioned something in the email thread but this really belongs to the changelog. Why is reclaimable anonymous memory without any swap any special, say from any other clean and easily reclaimable caches? What if there is a swap available? > Signed-off-by: David Rientjes > --- > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > Amount of memory used in anonymous mappings backed by > transparent hugepages > > + anon_reclaimable > + The amount of charged anonymous memory that can be reclaimed > + under memory pressure without swap. This currently includes > + lazy freeable memory (MADV_FREE) and compound pages that can be > + split and uncharged. > + > inactive_anon, active_anon, inactive_file, active_file, unevictable > Amount of memory, swap-backed and filesystem-backed, > on the internal memory management lists used by the > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > return false; > } > > +/* > + * Returns the amount of anon memory that is charged to the memcg that is > + * reclaimable under memory pressure without swap, in pages. > + */ > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > +{ > + long deferred, lazyfree; > + > + /* > + * Deferred pages are charged anonymous pages that are on the LRU but > + * are unmapped. These compound pages are split under memory pressure. > + */ > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > + /* > + * Lazyfree pages are charged clean anonymous pages that are on the file > + * LRU and can be reclaimed under memory pressure. > + */ > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > + memcg_page_state(memcg, NR_FILE_PAGES), 0); > + > + return deferred + lazyfree; > +} > + > static char *memory_stat_format(struct mem_cgroup *memcg) > { > struct seq_buf s; > @@ -1363,6 +1389,9 @@ static char *memory_stat_format(struct mem_cgroup *memcg) > * Provide statistics on the state of the memory subsystem as > * well as cumulative event counters that show past behavior. > * > + * All values in this buffer are read individually, so no implied > + * consistency amongst them. > + * > * This list is ordered following a combination of these gradients: > * 1) generic big picture -> specifics and details > * 2) reflecting userspace activity -> reflecting kernel heuristics > @@ -1405,6 +1434,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) > (u64)memcg_page_state(memcg, NR_ANON_THPS) * > HPAGE_PMD_SIZE); > #endif > + seq_buf_printf(&s, "anon_reclaimable %llu\n", > + (u64)memcg_anon_reclaimable(memcg) * PAGE_SIZE); > > for (i = 0; i < NR_LRU_LISTS; i++) > seq_buf_printf(&s, "%s %llu\n", lru_list_name(i), -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [patch] mm, memcg: provide an anon_reclaimable stat Date: Fri, 17 Jul 2020 10:34:19 +0200 Message-ID: <20200717083419.GD10655@dhcp22.suse.cz> References: <20200715071522.19663-1-sjpark@amazon.com> Mime-Version: 1.0 Return-path: Content-Disposition: inline In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: David Rientjes Cc: SeongJae Park , Andrew Morton , Yang Shi , Shakeel Butt , Yang Shi , Roman Gushchin , Greg Thelen , Johannes Weiner , Vladimir Davydov , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org On Thu 16-07-20 13:58:19, David Rientjes wrote: > Userspace can lack insight into the amount of memory that can be reclaimed > from a memcg based on values from memory.stat. Two specific examples: > > - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the > inactive file LRU that can be quickly reclaimed under memory pressure > but otherwise shows up as mapped anon in memory.stat, and > > - Memory on deferred split queues (thp) that are compound pages that can > be split and uncharged from the memcg under memory pressure, but > otherwise shows up as charged anon LRU memory in memory.stat. > > Both of this anonymous usage is also charged to memory.current. > > Userspace can currently derive this information but it depends on kernel > implementation details for how this memory is handled for the purposes of > reclaim (anon on inactive file LRU or unmapped anon on the LRU). > > For the purposes of writing portable userspace code that does not need to > have insight into the kernel implementation for reclaimable memory, this > exports a stat that reveals the amount of anonymous memory that can be > reclaimed and uncharged from the memcg to start new applications. > > As the kernel implementation evolves for memory that can be reclaimed > under memory pressure, this stat can be kept consistent. Please be much more specific about the expected usage. You have mentioned something in the email thread but this really belongs to the changelog. Why is reclaimable anonymous memory without any swap any special, say from any other clean and easily reclaimable caches? What if there is a swap available? > Signed-off-by: David Rientjes > --- > Documentation/admin-guide/cgroup-v2.rst | 6 +++++ > mm/memcontrol.c | 31 +++++++++++++++++++++++++ > 2 files changed, 37 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. > Amount of memory used in anonymous mappings backed by > transparent hugepages > > + anon_reclaimable > + The amount of charged anonymous memory that can be reclaimed > + under memory pressure without swap. This currently includes > + lazy freeable memory (MADV_FREE) and compound pages that can be > + split and uncharged. > + > inactive_anon, active_anon, inactive_file, active_file, unevictable > Amount of memory, swap-backed and filesystem-backed, > on the internal memory management lists used by the > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) > return false; > } > > +/* > + * Returns the amount of anon memory that is charged to the memcg that is > + * reclaimable under memory pressure without swap, in pages. > + */ > +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) > +{ > + long deferred, lazyfree; > + > + /* > + * Deferred pages are charged anonymous pages that are on the LRU but > + * are unmapped. These compound pages are split under memory pressure. > + */ > + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + > + memcg_page_state(memcg, NR_INACTIVE_ANON) - > + memcg_page_state(memcg, NR_ANON_MAPPED), 0); > + /* > + * Lazyfree pages are charged clean anonymous pages that are on the file > + * LRU and can be reclaimed under memory pressure. > + */ > + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + > + memcg_page_state(memcg, NR_INACTIVE_FILE) - > + memcg_page_state(memcg, NR_FILE_PAGES), 0); > + > + return deferred + lazyfree; > +} > + > static char *memory_stat_format(struct mem_cgroup *memcg) > { > struct seq_buf s; > @@ -1363,6 +1389,9 @@ static char *memory_stat_format(struct mem_cgroup *memcg) > * Provide statistics on the state of the memory subsystem as > * well as cumulative event counters that show past behavior. > * > + * All values in this buffer are read individually, so no implied > + * consistency amongst them. > + * > * This list is ordered following a combination of these gradients: > * 1) generic big picture -> specifics and details > * 2) reflecting userspace activity -> reflecting kernel heuristics > @@ -1405,6 +1434,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) > (u64)memcg_page_state(memcg, NR_ANON_THPS) * > HPAGE_PMD_SIZE); > #endif > + seq_buf_printf(&s, "anon_reclaimable %llu\n", > + (u64)memcg_anon_reclaimable(memcg) * PAGE_SIZE); > > for (i = 0; i < NR_LRU_LISTS; i++) > seq_buf_printf(&s, "%s %llu\n", lru_list_name(i), -- Michal Hocko SUSE Labs