From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.1 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25860C433E1 for ; Wed, 15 Jul 2020 03:18:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CB0DD2070E for ; Wed, 15 Jul 2020 03:18:55 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KJdT54GX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CB0DD2070E Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 795DB6B0003; Tue, 14 Jul 2020 23:18:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76E446B0005; Tue, 14 Jul 2020 23:18:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6AB278D0001; Tue, 14 Jul 2020 23:18:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0217.hostedemail.com [216.40.44.217]) by kanga.kvack.org (Postfix) with ESMTP id 566036B0003 for ; Tue, 14 Jul 2020 23:18:55 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1DD7B180AD804 for ; Wed, 15 Jul 2020 03:18:55 +0000 (UTC) X-FDA: 77038853430.28.hat49_60071f226ef6 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id EFDBA6C33 for ; Wed, 15 Jul 2020 03:18:54 +0000 (UTC) X-HE-Tag: hat49_60071f226ef6 X-Filterd-Recvd-Size: 7765 Received: from mail-pf1-f195.google.com (mail-pf1-f195.google.com [209.85.210.195]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Wed, 15 Jul 2020 03:18:54 +0000 (UTC) Received: by mail-pf1-f195.google.com with SMTP id a24so1202542pfc.10 for ; Tue, 14 Jul 2020 20:18:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=msyTwNMWxPuFJXiWs5O0lYGKZpuyME2C9XStVaZzeTU=; b=KJdT54GXo7EEU/QR4u7uq62UiYdr9gxzmi0ZURm9A0MTeNUw3zq+2pui1mR3K0t1M/ WUKYhb7JWjwhW+5QC6pwKbqRh96pnLKJWcl+Fe4Hiwjo+ldU+/Eo6O4RcZgZ6ESKJwXt Twjcj5jJy+7hqpX2ysrYTUPuRP8AAj59Ru24wZCOIauOq1NiCfosqhvt4/ahwDTEz5C+ oJsMN8EKywZZ56O6YYRQHRXkRutlGELSEcHXrmSc86ShTCKX1MXwc4prD99KiZWm5edE CLplwcmpvwvsF4yyq3o2L6vA+QIYQXRqh7hRD3zHaRrvmIPw5hD9aJy2F6bRrpXzz30G KKcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=msyTwNMWxPuFJXiWs5O0lYGKZpuyME2C9XStVaZzeTU=; b=GyTJL6XovMkNkACUH575cc4kbTd5NpL9WReOFryQOgRuIZoGhssZX/3hPplBGf4CJd YQWXjCduWTY6E8uq4D/vfgRuPRoLG9hVRA3L+JpnZiaEQO9e9Zg98vgdqBM3BvNpScp6 PUlZKWMpBXJibYcGxx+83BF2qwks39XO3JO+5ps6G6O9kPtAjFO+gVohG0MbvDYgYO7Q MljNixNu2JmbNGnfzIO7L2ns1SgWGjyvaH68DWKUltRyQ/ne/pZqGdYOPJJNl5MnWesq wwbL3BgsYrSKrVPZWBYyJsbkbdBbX8JVA4sFQDl3eb3vYs2iESORBPIQaWz+L+hCCtxx I3mA== X-Gm-Message-State: AOAM5302+m1XmALuXmvjzCR0mk/MiWMQJ3vHLXrbXJEzMst8lINL9VJU wIw50HKK/crdgZa4Te2JRjI2TQ== X-Google-Smtp-Source: ABdhPJyAFSnY6Qwoj4MZMBXn0tY/G4599d1iXI0mjFbFopN+SimNGdzFv8thYnjm0fe7E1mI28LiMQ== X-Received: by 2002:a62:3741:: with SMTP id e62mr6746699pfa.127.1594783133481; Tue, 14 Jul 2020 20:18:53 -0700 (PDT) Received: from [2620:15c:17:3:4a0f:cfff:fe51:6667] ([2620:15c:17:3:4a0f:cfff:fe51:6667]) by smtp.gmail.com with ESMTPSA id 4sm447683pgk.68.2020.07.14.20.18.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Jul 2020 20:18:53 -0700 (PDT) Date: Tue, 14 Jul 2020 20:18:52 -0700 (PDT) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Andrew Morton , Yang Shi cc: Michal Hocko , Shakeel Butt , Yang Shi , Roman Gushchin , Greg Thelen , Johannes Weiner , Vladimir Davydov , cgroups@vger.kernel.org, linux-mm@kvack.org Subject: [patch] mm, memcg: provide a stat to describe reclaimable memory Message-ID: User-Agent: Alpine 2.23 (DEB 453 2020-06-18) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspamd-Queue-Id: EFDBA6C33 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: MemAvailable in /proc/meminfo provides some guidance on the amount of memory that can be made available for starting new applications (see Documentation/filesystems/proc.rst). Userspace can lack insight into the amount of memory that can be reclaimed from a memcg based on values from memory.stat, however. Two specific examples: - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the inactive file LRU that can be quickly reclaimed under memory pressure but otherwise shows up as mapped anon in memory.stat, and - Memory on deferred split queues (thp) that are compound pages that can be split and uncharged from the memcg under memory pressure, but otherwise shows up as charged anon LRU memory in memory.stat. Userspace can currently derive this information and use the same heuristic as MemAvailable by doing this: deferred = (active_anon + inactive_anon) - anon lazyfree = (active_file + inactive_file) - file avail = deferred + lazyfree + (file + slab_reclaimable) / 2 But this depends on implementation details for how this memory is handled in the kernel for the purposes of reclaim (anon on inactive file LRU or unmapped anon on the LRU). For the purposes of writing portable userspace code that does not need to have insight into the kernel implementation for reclaimable memory, this exports a metric that can provide an estimate of the amount of memory that can be reclaimed and uncharged from the memcg to start new applications. As the kernel implementation evolves for memory that can be reclaimed under memory pressure, this metric can be kept consistent. Signed-off-by: David Rientjes --- Documentation/admin-guide/cgroup-v2.rst | 12 +++++++++ mm/memcontrol.c | 35 +++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1314,6 +1314,18 @@ PAGE_SIZE multiple when read back. Part of "slab" that cannot be reclaimed on memory pressure. + avail + An estimate of how much memory can be made available for + starting new applications, similar to MemAvailable from + /proc/meminfo (Documentation/filesystems/proc.rst). + + This is derived by assuming that half of page cahce and + reclaimable slab can be uncharged without significantly + impacting the workload, similar to MemAvailable. It also + factors in the amount of lazy freeable memory (MADV_FREE) and + compound pages that can be split and uncharged under memory + pressure. + pgfault Total number of page faults incurred diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1350,6 +1350,35 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) return false; } +/* + * Returns an estimate of the amount of available memory that can be reclaimed + * for a memcg, in pages. + */ +static unsigned long mem_cgroup_avail(struct mem_cgroup *memcg) +{ + long deferred, lazyfree; + + /* + * Deferred pages are charged anonymous pages that are on the LRU but + * are unmapped. These compound pages are split under memory pressure. + */ + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + + memcg_page_state(memcg, NR_INACTIVE_ANON) - + memcg_page_state(memcg, NR_ANON_MAPPED), 0); + /* + * Lazyfree pages are charged clean anonymous pages that are on the file + * LRU and can be reclaimed under memory pressure. + */ + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + + memcg_page_state(memcg, NR_INACTIVE_FILE) - + memcg_page_state(memcg, NR_FILE_PAGES), 0); + + /* Using same heuristic as si_mem_available() */ + return (unsigned long)deferred + (unsigned long)lazyfree + + (memcg_page_state(memcg, NR_FILE_PAGES) + + memcg_page_state(memcg, NR_SLAB_RECLAIMABLE)) / 2; +} + static char *memory_stat_format(struct mem_cgroup *memcg) { struct seq_buf s; @@ -1417,6 +1446,12 @@ static char *memory_stat_format(struct mem_cgroup *memcg) seq_buf_printf(&s, "slab_unreclaimable %llu\n", (u64)memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE) * PAGE_SIZE); + /* + * All values in this buffer are read individually, no implied + * consistency amongst them. + */ + seq_buf_printf(&s, "avail %llu\n", + (u64)mem_cgroup_avail(memcg) * PAGE_SIZE); /* Accumulated memory events */ From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Rientjes Subject: [patch] mm, memcg: provide a stat to describe reclaimable memory Date: Tue, 14 Jul 2020 20:18:52 -0700 (PDT) Message-ID: Mime-Version: 1.0 Return-path: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=msyTwNMWxPuFJXiWs5O0lYGKZpuyME2C9XStVaZzeTU=; b=KJdT54GXo7EEU/QR4u7uq62UiYdr9gxzmi0ZURm9A0MTeNUw3zq+2pui1mR3K0t1M/ WUKYhb7JWjwhW+5QC6pwKbqRh96pnLKJWcl+Fe4Hiwjo+ldU+/Eo6O4RcZgZ6ESKJwXt Twjcj5jJy+7hqpX2ysrYTUPuRP8AAj59Ru24wZCOIauOq1NiCfosqhvt4/ahwDTEz5C+ oJsMN8EKywZZ56O6YYRQHRXkRutlGELSEcHXrmSc86ShTCKX1MXwc4prD99KiZWm5edE CLplwcmpvwvsF4yyq3o2L6vA+QIYQXRqh7hRD3zHaRrvmIPw5hD9aJy2F6bRrpXzz30G KKcw== Sender: cgroups-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-ID: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Andrew Morton , Yang Shi Cc: Michal Hocko , Shakeel Butt , Yang Shi , Roman Gushchin , Greg Thelen , Johannes Weiner , Vladimir Davydov , cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org MemAvailable in /proc/meminfo provides some guidance on the amount of memory that can be made available for starting new applications (see Documentation/filesystems/proc.rst). Userspace can lack insight into the amount of memory that can be reclaimed from a memcg based on values from memory.stat, however. Two specific examples: - Lazy freeable memory (MADV_FREE) that are clean anonymous pages on the inactive file LRU that can be quickly reclaimed under memory pressure but otherwise shows up as mapped anon in memory.stat, and - Memory on deferred split queues (thp) that are compound pages that can be split and uncharged from the memcg under memory pressure, but otherwise shows up as charged anon LRU memory in memory.stat. Userspace can currently derive this information and use the same heuristic as MemAvailable by doing this: deferred = (active_anon + inactive_anon) - anon lazyfree = (active_file + inactive_file) - file avail = deferred + lazyfree + (file + slab_reclaimable) / 2 But this depends on implementation details for how this memory is handled in the kernel for the purposes of reclaim (anon on inactive file LRU or unmapped anon on the LRU). For the purposes of writing portable userspace code that does not need to have insight into the kernel implementation for reclaimable memory, this exports a metric that can provide an estimate of the amount of memory that can be reclaimed and uncharged from the memcg to start new applications. As the kernel implementation evolves for memory that can be reclaimed under memory pressure, this metric can be kept consistent. Signed-off-by: David Rientjes --- Documentation/admin-guide/cgroup-v2.rst | 12 +++++++++ mm/memcontrol.c | 35 +++++++++++++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1314,6 +1314,18 @@ PAGE_SIZE multiple when read back. Part of "slab" that cannot be reclaimed on memory pressure. + avail + An estimate of how much memory can be made available for + starting new applications, similar to MemAvailable from + /proc/meminfo (Documentation/filesystems/proc.rst). + + This is derived by assuming that half of page cahce and + reclaimable slab can be uncharged without significantly + impacting the workload, similar to MemAvailable. It also + factors in the amount of lazy freeable memory (MADV_FREE) and + compound pages that can be split and uncharged under memory + pressure. + pgfault Total number of page faults incurred diff --git a/mm/memcontrol.c b/mm/memcontrol.c --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1350,6 +1350,35 @@ static bool mem_cgroup_wait_acct_move(struct mem_cgroup *memcg) return false; } +/* + * Returns an estimate of the amount of available memory that can be reclaimed + * for a memcg, in pages. + */ +static unsigned long mem_cgroup_avail(struct mem_cgroup *memcg) +{ + long deferred, lazyfree; + + /* + * Deferred pages are charged anonymous pages that are on the LRU but + * are unmapped. These compound pages are split under memory pressure. + */ + deferred = max_t(long, memcg_page_state(memcg, NR_ACTIVE_ANON) + + memcg_page_state(memcg, NR_INACTIVE_ANON) - + memcg_page_state(memcg, NR_ANON_MAPPED), 0); + /* + * Lazyfree pages are charged clean anonymous pages that are on the file + * LRU and can be reclaimed under memory pressure. + */ + lazyfree = max_t(long, memcg_page_state(memcg, NR_ACTIVE_FILE) + + memcg_page_state(memcg, NR_INACTIVE_FILE) - + memcg_page_state(memcg, NR_FILE_PAGES), 0); + + /* Using same heuristic as si_mem_available() */ + return (unsigned long)deferred + (unsigned long)lazyfree + + (memcg_page_state(memcg, NR_FILE_PAGES) + + memcg_page_state(memcg, NR_SLAB_RECLAIMABLE)) / 2; +} + static char *memory_stat_format(struct mem_cgroup *memcg) { struct seq_buf s; @@ -1417,6 +1446,12 @@ static char *memory_stat_format(struct mem_cgroup *memcg) seq_buf_printf(&s, "slab_unreclaimable %llu\n", (u64)memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE) * PAGE_SIZE); + /* + * All values in this buffer are read individually, no implied + * consistency amongst them. + */ + seq_buf_printf(&s, "avail %llu\n", + (u64)mem_cgroup_avail(memcg) * PAGE_SIZE); /* Accumulated memory events */