From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AEBBAC433E1 for ; Thu, 16 Jul 2020 20:58:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4F1D520656 for ; Thu, 16 Jul 2020 20:58:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="oxEJuMEp" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4F1D520656 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7E63E8D000B; Thu, 16 Jul 2020 16:58:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7990E8D0003; Thu, 16 Jul 2020 16:58:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 685F68D000B; Thu, 16 Jul 2020 16:58:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id 4DFFF8D0003 for ; Thu, 16 Jul 2020 16:58:23 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CCD2B2DFA for ; Thu, 16 Jul 2020 20:58:22 +0000 (UTC) X-FDA: 77045152044.18.ants22_3d0e14726f05 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 908D3100ED3B7 for ; Thu, 16 Jul 2020 20:58:22 +0000 (UTC) X-HE-Tag: ants22_3d0e14726f05 X-Filterd-Recvd-Size: 7845 Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Thu, 16 Jul 2020 20:58:22 +0000 (UTC) Received: by mail-pl1-f194.google.com with SMTP id m16so4395170pls.5 for ; Thu, 16 Jul 2020 13:58:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=kxhHxWSY/EeOqL085lgvyjHtFXlx52qCOAF6Jf9tnk4=; b=oxEJuMEpdGw4BUMJcxYEgG52h/uvb7q2mmajsxp33ECVDv3GOgvi1AVFXEpAj//Fpq mxgK5gWcuNSMrrqkY/4H1htStrel93Se8s+vHvOoLlIE0H6Ucwg4BTirbn/T4O5STI2s ksi8EDO3LCOpksdq8VmSI5mVG+c/1LUnz+ugQ9TdaVIyOm0meIlIE0gUQr0r3Phdq9b6 aFRT+JongrkxKz591trMPY4tWkF/gA+xYJES3hjV4kjKzO26BY3QMeBzYCCj1grcmjt7 ByObvgdoTAFoYdA6ySKBMFjGatz2BfUaHv126vgfldA6X8t1xtvKAJudJG/CwtoPxXv/ RcWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=kxhHxWSY/EeOqL085lgvyjHtFXlx52qCOAF6Jf9tnk4=; b=D6xjhAj6zejkNFF77pUiY+KWIKMaYeKvYEO9mZQ/BmnlKNBbIkD+WhJWWSGyx0xQEg 9bWVcX3DSCCC+RqPs7MLUKOUXYCTHQGwaYwxTBoMfJ/06b7IUD2cquUgb4Nm17+otLI/ ehtyUve/qluPClKCM+RNQ3vr8NmjsJqgfPDuAJ2GVRmiUxu0jg6Vh8ukyo0973w+9wuA 4j0KSlz4Sf3zY7n91kkGKQJ2TaazRHHCJP2iKV2Ah8dcORSL8SZyNtGGH32DOkQ31Gda QIRMVVxhVX+OHA3PeRzq/L7Y/UAZLgirtmd5aKJcwPtfSsZ/qYvBVPjj9hOKvjiFdCvA HwEQ== X-Gm-Message-State: AOAM5339sS4rn7Pc3gC8NYfUQ/rc4m8q0t01I9iSo6kitKUFPouhWlrt x9N2j4AJvFj5+lLzziNYLdqhug== X-Google-Smtp-Source: ABdhPJxpHKcKS3ne/oV23aUkXMDVK9WF6TJ5tR53ZnGc0j269hYsvMNOi4CwmA6CaKQI46d3PPIB1Q== X-Received: by 2002:a17:902:c211:: with SMTP id 17mr4937449pll.302.1594933100849; Thu, 16 Jul 2020 13:58:20 -0700 (PDT) Received: from [2620:15c:17:3:4a0f:cfff:fe51:6667] ([2620:15c:17:3:4a0f:cfff:fe51:6667]) by smtp.gmail.com with ESMTPSA id 19sm5584565pfy.193.2020.07.16.13.58.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 16 Jul 2020 13:58:20 -0700 (PDT) Date: Thu, 16 Jul 2020 13:58:19 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: SeongJae Park , Andrew Morton cc: Yang Shi , Michal Hocko , Shakeel Butt , Yang Shi , Roman Gushchin , Greg Thelen , Johannes Weiner , Vladimir Davydov , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [patch] mm, memcg: provide an anon_reclaimable stat In-Reply-To: Message-ID: References: <20200715071522.19663-1-sjpark@amazon.com> User-Agent: Alpine 2.23 (DEB 453 2020-06-18) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: 908D3100ED3B7 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam04 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Userspace can lack insight into the amount of memory that can be reclaimed from a memcg based on values from memory.stat. Two specific examples: - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the inactive file LRU that can be quickly reclaimed under memory pressure but otherwise shows up as mapped anon in memory.stat, and - Memory on deferred split queues (thp) that are compound pages that can be split and uncharged from the memcg under memory pressure, but otherwise shows up as charged anon LRU memory in memory.stat. Both of this anonymous usage is also charged to memory.current. Userspace can currently derive this information but it depends on kernel implementation details for how this memory is handled for the purposes of reclaim (anon on inactive file LRU or unmapped anon on the LRU). For the purposes of writing portable userspace code that does not need to have insight into the kernel implementation for reclaimable memory, this exports a stat that reveals the amount of anonymous memory that can be reclaimed and uncharged from the memcg to start new applications. As the kernel implementation evolves for memory that can be reclaimed under memory pressure, this stat can be kept consistent. Signed-off-by: David Rientjes --- Documentation/admin-guide/cgroup-v2.rst | 6 +++++ mm/memcontrol.c | 31 +++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. Amount of memory used in anonymous mappings backed by transparent hugepages + anon_reclaimable + The amount of charged anonymous memory that can be reclaimed + under memory pressure without swap. This currently includes + lazy freeable memory (MADV_FREE) and compound pages that can be + split and uncharged. + inactive_anon, active_anon, inactive_file, active_file, unevictable Amount of memory, swap-backed and filesystem-backed, on the internal memory management lists used by the diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) return false; } +/* + * Returns the amount of anon memory that is charged to the memcg that is + * reclaimable under memory pressure without swap, in pages. + */ +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) +{ + long deferred, lazyfree; + + /* + * Deferred pages are charged anonymous pages that are on the LRU but + * are unmapped. These compound pages are split under memory pressure. + */ + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + + memcg_page_state(memcg, NR_INACTIVE_ANON) - + memcg_page_state(memcg, NR_ANON_MAPPED), 0); + /* + * Lazyfree pages are charged clean anonymous pages that are on the file + * LRU and can be reclaimed under memory pressure. + */ + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + + memcg_page_state(memcg, NR_INACTIVE_FILE) - + memcg_page_state(memcg, NR_FILE_PAGES), 0); + + return deferred + lazyfree; +} + static char *memory_stat_format(struct mem_cgroup *memcg) { struct seq_buf s; @@ -1363,6 +1389,9 @@ static char *memory_stat_format(struct mem_cgroup *memcg) * Provide statistics on the state of the memory subsystem as * well as cumulative event counters that show past behavior. * + * All values in this buffer are read individually, so no implied + * consistency amongst them. + * * This list is ordered following a combination of these gradients: * 1) generic big picture -> specifics and details * 2) reflecting userspace activity -> reflecting kernel heuristics @@ -1405,6 +1434,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) (u64)memcg_page_state(memcg, NR_ANON_THPS) * HPAGE_PMD_SIZE); #endif + seq_buf_printf(&s, "anon_reclaimable %llu\n", + (u64)memcg_anon_reclaimable(memcg) * PAGE_SIZE); for (i = 0; i < NR_LRU_LISTS; i++) seq_buf_printf(&s, "%s %llu\n", lru_list_name(i), From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Rientjes Subject: [patch] mm, memcg: provide an anon_reclaimable stat Date: Thu, 16 Jul 2020 13:58:19 -0700 (PDT) Message-ID: References: <20200715071522.19663-1-sjpark@amazon.com> Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=kxhHxWSY/EeOqL085lgvyjHtFXlx52qCOAF6Jf9tnk4=; b=oxEJuMEpdGw4BUMJcxYEgG52h/uvb7q2mmajsxp33ECVDv3GOgvi1AVFXEpAj//Fpq mxgK5gWcuNSMrrqkY/4H1htStrel93Se8s+vHvOoLlIE0H6Ucwg4BTirbn/T4O5STI2s ksi8EDO3LCOpksdq8VmSI5mVG+c/1LUnz+ugQ9TdaVIyOm0meIlIE0gUQr0r3Phdq9b6 aFRT+JongrkxKz591trMPY4tWkF/gA+xYJES3hjV4kjKzO26BY3QMeBzYCCj1grcmjt7 ByObvgdoTAFoYdA6ySKBMFjGatz2BfUaHv126vgfldA6X8t1xtvKAJudJG/CwtoPxXv/ RcWw== In-Reply-To: Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: SeongJae Park , Andrew Morton Cc: Yang Shi , Michal Hocko , Shakeel Butt , Yang Shi , Roman Gushchin , Greg Thelen , Johannes Weiner , Vladimir Davydov , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org Userspace can lack insight into the amount of memory that can be reclaimed from a memcg based on values from memory.stat. Two specific examples: - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the inactive file LRU that can be quickly reclaimed under memory pressure but otherwise shows up as mapped anon in memory.stat, and - Memory on deferred split queues (thp) that are compound pages that can be split and uncharged from the memcg under memory pressure, but otherwise shows up as charged anon LRU memory in memory.stat. Both of this anonymous usage is also charged to memory.current. Userspace can currently derive this information but it depends on kernel implementation details for how this memory is handled for the purposes of reclaim (anon on inactive file LRU or unmapped anon on the LRU). For the purposes of writing portable userspace code that does not need to have insight into the kernel implementation for reclaimable memory, this exports a stat that reveals the amount of anonymous memory that can be reclaimed and uncharged from the memcg to start new applications. As the kernel implementation evolves for memory that can be reclaimed under memory pressure, this stat can be kept consistent. Signed-off-by: David Rientjes --- Documentation/admin-guide/cgroup-v2.rst | 6 +++++ mm/memcontrol.c | 31 +++++++++++++++++++++++++ 2 files changed, 37 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1296,6 +1296,12 @@ PAGE_SIZE multiple when read back. Amount of memory used in anonymous mappings backed by transparent hugepages + anon_reclaimable + The amount of charged anonymous memory that can be reclaimed + under memory pressure without swap. This currently includes + lazy freeable memory (MADV_FREE) and compound pages that can be + split and uncharged. + inactive_anon, active_anon, inactive_file, active_file, unevictable Amount of memory, swap-backed and filesystem-backed, on the internal memory management lists used by the diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1350,6 +1350,32 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) return false; } +/* + * Returns the amount of anon memory that is charged to the memcg that is + * reclaimable under memory pressure without swap, in pages. + */ +static unsigned long memcg_anon_reclaimable(struct mem_cgroup *memcg) +{ + long deferred, lazyfree; + + /* + * Deferred pages are charged anonymous pages that are on the LRU but + * are unmapped. These compound pages are split under memory pressure. + */ + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + + memcg_page_state(memcg, NR_INACTIVE_ANON) - + memcg_page_state(memcg, NR_ANON_MAPPED), 0); + /* + * Lazyfree pages are charged clean anonymous pages that are on the file + * LRU and can be reclaimed under memory pressure. + */ + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + + memcg_page_state(memcg, NR_INACTIVE_FILE) - + memcg_page_state(memcg, NR_FILE_PAGES), 0); + + return deferred + lazyfree; +} + static char *memory_stat_format(struct mem_cgroup *memcg) { struct seq_buf s; @@ -1363,6 +1389,9 @@ static char *memory_stat_format(struct mem_cgroup *memcg) * Provide statistics on the state of the memory subsystem as * well as cumulative event counters that show past behavior. * + * All values in this buffer are read individually, so no implied + * consistency amongst them. + * * This list is ordered following a combination of these gradients: * 1) generic big picture -> specifics and details * 2) reflecting userspace activity -> reflecting kernel heuristics @@ -1405,6 +1434,8 @@ static char *memory_stat_format(struct mem_cgroup *memcg) (u64)memcg_page_state(memcg, NR_ANON_THPS) * HPAGE_PMD_SIZE); #endif + seq_buf_printf(&s, "anon_reclaimable %llu\n", + (u64)memcg_anon_reclaimable(memcg) * PAGE_SIZE); for (i = 0; i < NR_LRU_LISTS; i++) seq_buf_printf(&s, "%s %llu\n", lru_list_name(i),