All of lore.kernel.org
 help / color / mirror / Atom feed
From: Luis Chamberlain <mcgrof@kernel.org>
To: akpm@linux-foundation.org, jhubbard@nvidia.com, vbabka@suse.cz,
	mgorman@suse.de, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org, dave@stgolabs.net,
	p.raghav@samsung.com, da.gomez@samsung.com, mcgrof@kernel.org
Subject: [RFC] mm/vmstat: add a single value debugfs fragmentation metric
Date: Wed, 13 Mar 2024 17:57:10 -0700	[thread overview]
Message-ID: <20240314005710.2964798-1-mcgrof@kernel.org> (raw)

In considering the general impact to memory fragmentation over
new features, or enhancehments I had asked what metric we could
use which is a single digit value. John Hubbard provided one [0]
however we'd need to tally up used folios per order as well, and
tallying this up would be expensive today. The value would also
only tell us how memory fragmented a system is. We can instead
just use the existing fragmentation index but generalize a single
value from it. This tells us more, when generalized to one value
it can tell us both how likely memory allocations might fail due
to external fragmention and how likely we are to fail allocations
due to low memory.

Today we expose an external fragmentation index per node and per zone,
per each supported order. This value is useful to tell us how externally
fragmented a system might be with the full scope of the CPU topology.
Obviously, the CPU topology can vary per system and per architecture,
for instance two separate x86 systems may have:

cat /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
Node 0, zone   Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
Node 1, zone   Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000

cat /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone      DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
Node 0, zone    DMA32 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
Node 0, zone   Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000

The number of zones we have may also change over time.

This puts a bit of onus on userspace if all it wants is a general sense
of how externally fragmented a system is, overall. Provide a simple
one unit average for the fragmentation index to allow to simplify
measurements in userspace.

To make it easier for humans to grok, adjust it to be a value between
-100 (allocations failing due to lack of memory) to 100 (super fragmented).

[0] https://lore.kernel.org/all/5ac6a387-0ca7-45ca-bebc-c3bdd48452cb@nvidia.com/T/#u

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 mm/vmstat.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/mm/vmstat.c b/mm/vmstat.c
index 582f89b37ccf..e80983772c83 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -2262,6 +2262,54 @@ static const struct seq_operations extfrag_sops = {
 
 DEFINE_SEQ_ATTRIBUTE(extfrag);
 
+static ssize_t read_extfrag_pct(struct file *file, char __user *user_buf,
+			     size_t count, loff_t *ppos)
+{
+	char buf[32];
+	unsigned int len;
+	pg_data_t *pgdat;
+	struct zone *zone;
+	unsigned long flags;
+	unsigned int order;
+	unsigned int num_pgdats = 0;
+	int index_total = 0;
+
+	for_each_online_pgdat(pgdat) {
+		int index_pgt = 0;
+		int num_zones = 0;
+		num_pgdats++;
+		for_each_populated_zone_pgdat(zone, pgdat) {
+			num_zones++;
+			int index = 0;
+
+			spin_lock_irqsave(&zone->lock, flags);
+			for (order = 0; order < NR_PAGE_ORDERS; ++order) {
+				index += fragmentation_index(zone, order);
+			}
+			spin_unlock_irqrestore(&zone->lock, flags);
+
+			index_pgt += index / NR_PAGE_ORDERS;
+		}
+
+		BUG_ON(!num_zones);
+
+		index_total += index_pgt / num_zones;
+	}
+
+	index_total = index_total / num_pgdats;
+
+        len = sprintf(buf, "%d.%02d\n",  index_total / 10, index_total % 10);
+
+        return simple_read_from_buffer(user_buf, count, ppos, buf, len);
+}
+
+static const struct file_operations extfrag_pct_fops = {
+        .read = read_extfrag_pct,
+        .open = simple_open,
+        .owner = THIS_MODULE,
+        .llseek = default_llseek,
+};
+
 static int __init extfrag_debug_init(void)
 {
 	struct dentry *extfrag_debug_root;
@@ -2274,6 +2322,9 @@ static int __init extfrag_debug_init(void)
 	debugfs_create_file("extfrag_index", 0444, extfrag_debug_root, NULL,
 			    &extfrag_fops);
 
+	debugfs_create_file("extfrag_pct", 0444, extfrag_debug_root, NULL,
+			    &extfrag_pct_fops);
+
 	return 0;
 }
 
-- 
2.43.0


                 reply	other threads:[~2024-03-14  0:57 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240314005710.2964798-1-mcgrof@kernel.org \
    --to=mcgrof@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=da.gomez@samsung.com \
    --cc=dave@stgolabs.net \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=p.raghav@samsung.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.