From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 995B8C48286 for ; Sun, 4 Feb 2024 08:34:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 12D966B0078; Sun, 4 Feb 2024 03:34:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0DF366B007B; Sun, 4 Feb 2024 03:34:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F0EAD6B007D; Sun, 4 Feb 2024 03:34:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id E1FE16B0078 for ; Sun, 4 Feb 2024 03:34:41 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id B3E7080142 for ; Sun, 4 Feb 2024 08:34:41 +0000 (UTC) X-FDA: 81753460362.02.BC6B198 Received: from mail-m49198.qiye.163.com (mail-m49198.qiye.163.com [45.254.49.198]) by imf28.hostedemail.com (Postfix) with ESMTP id 4DE03C0007 for ; Sun, 4 Feb 2024 08:34:37 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of luochunsheng@ustc.edu designates 45.254.49.198 as permitted sender) smtp.mailfrom=luochunsheng@ustc.edu; dmarc=pass (policy=none) header.from=ustc.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707035680; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=mFTMDkjPqfQdTyi0kRzHnbjV7oSLk2O9kKkfQbUgeCg=; b=4opgxMzRJpRn5Gxx8WGgNwBvy9i+jeLzNZBR52qoRIA6mNU3kkYoxKhnhHYTHhbJusXwOA rYIcsFr506MxvXDKqoecOeiQ7g+CSEXRLsTKao4u85oL1Axce3P5r9+cn4+Aw1GpXg3I15 RLU+wdO5B1wVpWzStvLuK4u7Wa8uPjs= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of luochunsheng@ustc.edu designates 45.254.49.198 as permitted sender) smtp.mailfrom=luochunsheng@ustc.edu; dmarc=pass (policy=none) header.from=ustc.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707035680; a=rsa-sha256; cv=none; b=ggdEGHIfwZYePfobamNK89JcAzL81OHlBsifw8BgHZmyi3biiD0PWbl0I+HjVzuNSh2BNs QLyWF2Zs9XdADBdd+QFlr04o3audDbTB9x9KiJcTroFEIjZzEy2hl1j08SofjtMQ3iWEdj YLWU3mhISWvWfpgRRUrx2gr8eP2RVVU= Received: from localhost.localdomain (unknown [111.206.94.146]) by smtp.qiye.163.com (Hmail) with ESMTPA id E6B9C7E011C; Sun, 4 Feb 2024 16:34:16 +0800 (CST) From: Chunsheng Luo To: gregkh@linuxfoundation.org Cc: rafael@kernel.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chunsheng Luo Subject: [PATCH] meminfo: provide estimated per-node's available memory Date: Sun, 4 Feb 2024 03:34:14 -0500 Message-ID: <20240204083414.107799-1-luochunsheng@ustc.edu> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-HM-Spam-Status: e1kfGhgUHx5ZQUpXWQgPGg8OCBgUHx5ZQUlOS1dZFg8aDwILHllBWSg2Ly tZV1koWUFITzdXWS1ZQUlXWQ8JGhUIEh9ZQVkZS0pOVk0eGEJOSR5KH04eGVUTARMWGhIXJBQOD1 lXWRgSC1lBWUpKSlVJS01VQk9VSk9NWVdZFhoPEhUdFFlBWU9LSFVKTU9JTE5VSktLVUpCS0tZBg ++ X-HM-Tid: 0a8d7341047e03a2kunme6b9c7e011c X-HM-MType: 10 X-HM-Sender-Digest: e1kMHhlZQR0aFwgeV1kSHx4VD1lBWUc6PT46Hhw6GjMLHz8VH0wWAjw9 PCIKCwlVSlVKTEtMS0hOTU5MTENOVTMWGhIXVRcOFBgTDhUIEx4VHDsOCA8YVR4fDkVZV1kSC1lB WUpKSlVJS01VQk9VSk9NWVdZCAFZQU5IQks3Bg++ X-Rspamd-Queue-Id: 4DE03C0007 X-Rspam-User: X-Stat-Signature: i7txfem3xbb7w7n8nibeq9yg8sh6henj X-Rspamd-Server: rspam01 X-HE-Tag: 1707035677-313840 X-HE-Meta: U2FsdGVkX1/GCblbExFzYUVe8KHIEI4NHf6FbAhTICdprk7dELGjirnsN5E0GTeUcu3ZnN5ECIUKe0o75T0PbHdX6eLj6hgg5vkLadGV8TLZ9O7gvE1O+sw2wF0OnQKyjsB5EO6IzHpeqZ8dDQDgBA4nkn8Jp0rKGNI3ZnufZfK8/SWI+/DIk7+etn5ycdCu6swxtTSSSC8XkwaoKvRC0ORXIiHRS+aGe0Bsa/5qeNkSlORlTpnDV/5+cj5qpAYsKAKG2V2hUXVchq12dLNSRi6ONk2fCcwNT0Iy5fnI10LSqJ3eL5CxwoiZWcbsxfPyAJSLItaNZASx6acOHfSo0LahFimngvQrB/9Gz6R7XNmul7Ba0DYybENlKWo47rnDSaEX0H/ipddQy0cLkhWDcjJvLZZjKct2QWwlc+E+NkzmKrUKx6Va9R2n3Mgd5mo70F766+E9vXFbW1n8hbNmOS/yx0xt3fqaHE0XzPtPeU+5IT8Rm8NvdoCm+TAZuUXEe05dBclH6nvwlbE0imXSOh4wnYKhvviAjWU/bjLe4Ql9sA16T1NlN3XZQuoC/7lXgVMKbst0Hyb/rRiRVXr1UjZ9CBtUChJ7Xc1kHKSplgCudKXou7qJQhBWG4IlBXjyemN5SFdkwYgLxbZXXC+FJawoNQiV5WWrQQ2Qkc0YZJWUQq/M8XXMd06Yp8MXEY8D4Ijz+N0Kz1lqB69Q8dUvqOmD4GZTclkr3UOdnGbcIYA3DLukcLiQZ4RFPZXQ4cIgVWE1/Phwqui8oWyLGpkJUGnBkXDLAkPqlwye7uoCOWhosKaxJ/XyG0ETi0K9jlZF1v34JhjGHWLoN2OhDrkIi4eEQyWBeJFlqNAs9qqGWzaB6k+xy+xj59Sw7lMPGsbDR10DWppZK6r1sEbgLcMKBa3LdECSIiM9vHqjNUy7F/1aoLO+om2fD9u92wIKHjy6WfRApMr4JkfkD7ViG+L 8dLPh0yE eBpEotmtORKUhMXqlE51rt1ostExsbJIsvREhA2esXuXM0Wks0D00aqhnXlmjH6majgoKMIKrQiTmhN4VQ/Ps+SK2n/M7U/+5Db4OK71TtjLXLg0OoBQHmH2F4X1Uj7YZa9w+sC9fKtrYN7KVoclyi9iNnU5eH5iTrSWC X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: The system offers an estimate of the per-node's available memory, in addition to the system's available memory provided by /proc/meminfo. like commit 34e431b0ae39("/proc/meminfo: provide estimated available memory"), it is more convenient to provide such an estimate in /sys/bus/node/devices/nodex/meminfo. If things change in the future, we only have to change it in one place. Shown below: /sys/bus/node/devices/node1/meminfo: Node 1 MemTotal: 4084480 kB Node 1 MemFree: 3348820 kB Node 1 MemAvailable: 3647972 kB Node 1 MemUsed: 735660 kB .... Link: https://github.com/numactl/numactl/issues/210 Signed-off-by: Chunsheng Luo --- drivers/base/node.c | 4 ++++ include/linux/mm.h | 1 + mm/show_mem.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 48 insertions(+) diff --git a/drivers/base/node.c b/drivers/base/node.c index 1c05640461dd..ba27f25d2b81 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -372,11 +372,13 @@ static ssize_t node_read_meminfo(struct device *dev, int len = 0; int nid = dev->id; struct pglist_data *pgdat = NODE_DATA(nid); + long available; struct sysinfo i; unsigned long sreclaimable, sunreclaimable; unsigned long swapcached = 0; si_meminfo_node(&i, nid); + available = si_mem_node_available(nid); sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B); sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B); #ifdef CONFIG_SWAP @@ -385,6 +387,7 @@ static ssize_t node_read_meminfo(struct device *dev, len = sysfs_emit_at(buf, len, "Node %d MemTotal: %8lu kB\n" "Node %d MemFree: %8lu kB\n" + "Node %d MemAvailable: %8lu kB\n" "Node %d MemUsed: %8lu kB\n" "Node %d SwapCached: %8lu kB\n" "Node %d Active: %8lu kB\n" @@ -397,6 +400,7 @@ static ssize_t node_read_meminfo(struct device *dev, "Node %d Mlocked: %8lu kB\n", nid, K(i.totalram), nid, K(i.freeram), + nid, K(available), nid, K(i.totalram - i.freeram), nid, K(swapcached), nid, K(node_page_state(pgdat, NR_ACTIVE_ANON) + diff --git a/include/linux/mm.h b/include/linux/mm.h index f5a97dec5169..3caef083fe5b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3202,6 +3202,7 @@ static inline void show_mem(void) extern long si_mem_available(void); extern void si_meminfo(struct sysinfo * val); extern void si_meminfo_node(struct sysinfo *val, int nid); +extern long si_mem_node_available(int nid); #ifdef __HAVE_ARCH_RESERVED_KERNEL_PAGES extern unsigned long arch_reserved_kernel_pages(void); #endif diff --git a/mm/show_mem.c b/mm/show_mem.c index 8dcfafbd283c..37d4c7212b06 100644 --- a/mm/show_mem.c +++ b/mm/show_mem.c @@ -86,6 +86,49 @@ void si_meminfo(struct sysinfo *val) EXPORT_SYMBOL(si_meminfo); #ifdef CONFIG_NUMA +long si_mem_node_available(int nid) +{ + int zone_type; + long available; + unsigned long pagecache; + unsigned long wmark_low = 0; + unsigned long reclaimable; + pg_data_t *pgdat = NODE_DATA(nid); + + for (zone_type = 0; zone_type < MAX_NR_ZONES; zone_type++) + wmark_low += low_wmark_pages((&pgdat->node_zones[zone_type])); + + /* + * Estimate the amount of memory available for userspace allocations, + * without causing swapping for mbind process. + */ + available = sum_zone_node_page_state(nid, NR_FREE_PAGES) - pgdat->totalreserve_pages; + + /* + * Not all the page cache can be freed, otherwise the system will + * start swapping or thrashing. Assume at least half of the page + * cache, or the low watermark worth of cache, needs to stay. + */ + pagecache = node_page_state(pgdat, NR_ACTIVE_FILE) + + node_page_state(pgdat, NR_INACTIVE_FILE); + pagecache -= min(pagecache / 2, wmark_low); + available += pagecache; + + /* + * Part of the reclaimable slab and other kernel memory consists of + * items that are in use, and cannot be freed. Cap this estimate at the + * low watermark. + */ + reclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B) + + node_page_state(pgdat, NR_KERNEL_MISC_RECLAIMABLE); + reclaimable -= min(reclaimable / 2, wmark_low); + available += reclaimable; + + if (available < 0) + available = 0; + return available; +} + void si_meminfo_node(struct sysinfo *val, int nid) { int zone_type; /* needs to be signed */ -- 2.43.0