From: Yu Zhao <yuzhao@google.com> To: Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@kernel.org> Cc: "Andi Kleen" <ak@linux.intel.com>, "Aneesh Kumar" <aneesh.kumar@linux.ibm.com>, "Barry Song" <21cnbao@gmail.com>, "Catalin Marinas" <catalin.marinas@arm.com>, "Dave Hansen" <dave.hansen@linux.intel.com>, "Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>, "Jesse Barnes" <jsbarnes@google.com>, "Jonathan Corbet" <corbet@lwn.net>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Matthew Wilcox" <willy@infradead.org>, "Michael Larabel" <Michael@michaellarabel.com>, "Mike Rapoport" <rppt@kernel.org>, "Rik van Riel" <riel@surriel.com>, "Vlastimil Babka" <vbabka@suse.cz>, "Will Deacon" <will@kernel.org>, "Ying Huang" <ying.huang@intel.com>, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, "Yu Zhao" <yuzhao@google.com>, "Brian Geffon" <bgeffon@google.com>, "Jan Alexander Steffens" <heftig@archlinux.org>, "Oleksandr Natalenko" <oleksandr@natalenko.name>, "Steven Barrett" <steven@liquorix.net>, "Suleiman Souhlal" <suleiman@google.com>, "Daniel Byrne" <djbyrne@mtu.edu>, "Donald Carr" <d@chaos-reins.com>, "Holger Hoffstätte" <holger@applied-asynchrony.com>, "Konstantin Kharlamov" <Hi-Angel@yandex.ru>, "Shuang Zhai" <szhai2@cs.rochester.edu>, "Sofia Trinh" <sofia.trinh@edi.works> Subject: [PATCH v7 11/12] mm: multigenerational LRU: debugfs interface Date: Tue, 8 Feb 2022 01:19:01 -0700 [thread overview] Message-ID: <20220208081902.3550911-12-yuzhao@google.com> (raw) In-Reply-To: <20220208081902.3550911-1-yuzhao@google.com> Add /sys/kernel/debug/lru_gen for working set estimation and proactive reclaim. These features are required to optimize job scheduling (bin packing) in data centers [1][2]. Compared with the page table-based approach and the PFN-based approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has the following advantages: 1) It offers better choices because it's aware of memcgs, NUMA nodes, shared mappings and unmapped page cache. 2) It's more scalable because it's O(nr_hot_pages), whereas the PFN-based approach is O(nr_total_pages). Add /sys/kernel/debug/lru_gen_full for debugging. [1] https://research.google/pubs/pub48551/ [2] https://www.cs.cmu.edu/~dskarlat/publications/tmo_asplos22.pdf Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> --- include/linux/nodemask.h | 1 + mm/vmscan.c | 353 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 354 insertions(+) diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 567c3ddba2c4..90840c459abc 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -486,6 +486,7 @@ static inline int num_node_state(enum node_states state) #define first_online_node 0 #define first_memory_node 0 #define next_online_node(nid) (MAX_NUMNODES) +#define next_memory_node(nid) (MAX_NUMNODES) #define nr_node_ids 1U #define nr_online_nodes 1U diff --git a/mm/vmscan.c b/mm/vmscan.c index 4d37d63668b5..3dfa938a4c4a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -52,6 +52,8 @@ #include <linux/psi.h> #include <linux/pagewalk.h> #include <linux/shmem_fs.h> +#include <linux/ctype.h> +#include <linux/debugfs.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -5220,6 +5222,354 @@ static struct attribute_group lru_gen_attr_group = { .attrs = lru_gen_attrs, }; +/****************************************************************************** + * debugfs interface + ******************************************************************************/ + +static void *lru_gen_seq_start(struct seq_file *m, loff_t *pos) +{ + struct mem_cgroup *memcg; + loff_t nr_to_skip = *pos; + + m->private = kvmalloc(PATH_MAX, GFP_KERNEL); + if (!m->private) + return ERR_PTR(-ENOMEM); + + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + int nid; + + for_each_node_state(nid, N_MEMORY) { + if (!nr_to_skip--) + return get_lruvec(memcg, nid); + } + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); + + return NULL; +} + +static void lru_gen_seq_stop(struct seq_file *m, void *v) +{ + if (!IS_ERR_OR_NULL(v)) + mem_cgroup_iter_break(NULL, lruvec_memcg(v)); + + kvfree(m->private); + m->private = NULL; +} + +static void *lru_gen_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + int nid = lruvec_pgdat(v)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(v); + + ++*pos; + + nid = next_memory_node(nid); + if (nid == MAX_NUMNODES) { + memcg = mem_cgroup_iter(NULL, memcg, NULL); + if (!memcg) + return NULL; + + nid = first_memory_node; + } + + return get_lruvec(memcg, nid); +} + +static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, + unsigned long max_seq, unsigned long *min_seq, + unsigned long seq) +{ + int i; + int type, tier; + int hist = lru_hist_from_seq(seq); + struct lru_gen_struct *lrugen = &lruvec->lrugen; + + for (tier = 0; tier < MAX_NR_TIERS; tier++) { + seq_printf(m, " %10d", tier); + for (type = 0; type < ANON_AND_FILE; type++) { + unsigned long n[3] = {}; + + if (seq == max_seq) { + n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); + n[1] = READ_ONCE(lrugen->avg_total[type][tier]); + + seq_printf(m, " %10luR %10luT %10lu ", n[0], n[1], n[2]); + } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { + n[0] = atomic_long_read(&lrugen->refaulted[hist][type][tier]); + n[1] = atomic_long_read(&lrugen->evicted[hist][type][tier]); + if (tier) + n[2] = READ_ONCE(lrugen->promoted[hist][type][tier - 1]); + + seq_printf(m, " %10lur %10lue %10lup", n[0], n[1], n[2]); + } else + seq_puts(m, " 0 0 0 "); + } + seq_putc(m, '\n'); + } + + seq_puts(m, " "); + for (i = 0; i < NR_MM_STATS; i++) { + if (seq == max_seq && NR_HIST_GENS == 1) + seq_printf(m, " %10lu%c", READ_ONCE(lruvec->mm_state.stats[hist][i]), + toupper(MM_STAT_CODES[i])); + else if (seq != max_seq && NR_HIST_GENS > 1) + seq_printf(m, " %10lu%c", READ_ONCE(lruvec->mm_state.stats[hist][i]), + MM_STAT_CODES[i]); + else + seq_puts(m, " 0 "); + } + seq_putc(m, '\n'); +} + +static int lru_gen_seq_show(struct seq_file *m, void *v) +{ + unsigned long seq; + bool full = !debugfs_real_fops(m->file)->write; + struct lruvec *lruvec = v; + struct lru_gen_struct *lrugen = &lruvec->lrugen; + int nid = lruvec_pgdat(lruvec)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + DEFINE_MAX_SEQ(lruvec); + DEFINE_MIN_SEQ(lruvec); + + if (nid == first_memory_node) { + const char *path = memcg ? m->private : ""; + +#ifdef CONFIG_MEMCG + if (memcg) + cgroup_path(memcg->css.cgroup, m->private, PATH_MAX); +#endif + seq_printf(m, "memcg %5hu %s\n", mem_cgroup_id(memcg), path); + } + + seq_printf(m, " node %5d\n", nid); + + if (!full) + seq = min_seq[TYPE_ANON]; + else if (max_seq >= MAX_NR_GENS) + seq = max_seq - MAX_NR_GENS + 1; + else + seq = 0; + + for (; seq <= max_seq; seq++) { + int gen, type, zone; + unsigned int msecs; + + gen = lru_gen_from_seq(seq); + msecs = jiffies_to_msecs(jiffies - READ_ONCE(lrugen->timestamps[gen])); + + seq_printf(m, " %10lu %10u", seq, msecs); + + for (type = 0; type < ANON_AND_FILE; type++) { + long size = 0; + + if (seq < min_seq[type]) { + seq_puts(m, " -0 "); + continue; + } + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + size += READ_ONCE(lrugen->nr_pages[gen][type][zone]); + + seq_printf(m, " %10lu ", max(size, 0L)); + } + + seq_putc(m, '\n'); + + if (full) + lru_gen_seq_show_full(m, lruvec, max_seq, min_seq, seq); + } + + return 0; +} + +static const struct seq_operations lru_gen_seq_ops = { + .start = lru_gen_seq_start, + .stop = lru_gen_seq_stop, + .next = lru_gen_seq_next, + .show = lru_gen_seq_show, +}; + +static int run_aging(struct lruvec *lruvec, unsigned long seq, struct scan_control *sc, + bool can_swap, bool full_scan) +{ + DEFINE_MAX_SEQ(lruvec); + + if (seq == max_seq) + try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, full_scan); + + return seq > max_seq ? -EINVAL : 0; +} + +static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_control *sc, + int swappiness, unsigned long nr_to_reclaim) +{ + struct blk_plug plug; + int err = -EINTR; + DEFINE_MAX_SEQ(lruvec); + + if (max_seq < seq + MIN_NR_GENS) + return -EINVAL; + + sc->nr_reclaimed = 0; + + blk_start_plug(&plug); + + while (!signal_pending(current)) { + DEFINE_MIN_SEQ(lruvec); + + if (seq < min_seq[!swappiness] || sc->nr_reclaimed >= nr_to_reclaim || + !evict_folios(lruvec, sc, swappiness, NULL)) { + err = 0; + break; + } + + cond_resched(); + } + + blk_finish_plug(&plug); + + return err; +} + +static int run_cmd(char cmd, int memcg_id, int nid, unsigned long seq, + struct scan_control *sc, int swappiness, unsigned long opt) +{ + struct lruvec *lruvec; + int err = -EINVAL; + struct mem_cgroup *memcg = NULL; + + if (!mem_cgroup_disabled()) { + rcu_read_lock(); + memcg = mem_cgroup_from_id(memcg_id); +#ifdef CONFIG_MEMCG + if (memcg && !css_tryget(&memcg->css)) + memcg = NULL; +#endif + rcu_read_unlock(); + + if (!memcg) + goto done; + } + if (memcg_id != mem_cgroup_id(memcg)) + goto done; + + if (nid < 0 || nid >= MAX_NUMNODES || !node_state(nid, N_MEMORY)) + goto done; + + lruvec = get_lruvec(memcg, nid); + + if (swappiness < 0) + swappiness = get_swappiness(memcg); + else if (swappiness > 200) + goto done; + + switch (cmd) { + case '+': + err = run_aging(lruvec, seq, sc, swappiness, opt); + break; + case '-': + err = run_eviction(lruvec, seq, sc, swappiness, opt); + break; + } +done: + mem_cgroup_put(memcg); + + return err; +} + +static ssize_t lru_gen_seq_write(struct file *file, const char __user *src, + size_t len, loff_t *pos) +{ + void *buf; + char *cur, *next; + unsigned int flags; + int err = 0; + struct scan_control sc = { + .may_writepage = true, + .may_unmap = true, + .may_swap = true, + .reclaim_idx = MAX_NR_ZONES - 1, + .gfp_mask = GFP_KERNEL, + }; + + buf = kvmalloc(len + 1, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + if (copy_from_user(buf, src, len)) { + kvfree(buf); + return -EFAULT; + } + + next = buf; + next[len] = '\0'; + + sc.reclaim_state.mm_walk = alloc_mm_walk(); + if (!sc.reclaim_state.mm_walk) { + kvfree(buf); + return -ENOMEM; + } + + flags = memalloc_noreclaim_save(); + set_task_reclaim_state(current, &sc.reclaim_state); + + while ((cur = strsep(&next, ",;\n"))) { + int n; + int end; + char cmd; + unsigned int memcg_id; + unsigned int nid; + unsigned long seq; + unsigned int swappiness = -1; + unsigned long opt = -1; + + cur = skip_spaces(cur); + if (!*cur) + continue; + + n = sscanf(cur, "%c %u %u %lu %n %u %n %lu %n", &cmd, &memcg_id, &nid, + &seq, &end, &swappiness, &end, &opt, &end); + if (n < 4 || cur[end]) { + err = -EINVAL; + break; + } + + err = run_cmd(cmd, memcg_id, nid, seq, &sc, swappiness, opt); + if (err) + break; + } + + set_task_reclaim_state(current, NULL); + memalloc_noreclaim_restore(flags); + + free_mm_walk(sc.reclaim_state.mm_walk); + kvfree(buf); + + return err ? : len; +} + +static int lru_gen_seq_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &lru_gen_seq_ops); +} + +static const struct file_operations lru_gen_rw_fops = { + .open = lru_gen_seq_open, + .read = seq_read, + .write = lru_gen_seq_write, + .llseek = seq_lseek, + .release = seq_release, +}; + +static const struct file_operations lru_gen_ro_fops = { + .open = lru_gen_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + /****************************************************************************** * initialization ******************************************************************************/ @@ -5289,6 +5639,9 @@ static int __init init_lru_gen(void) if (sysfs_create_group(mm_kobj, &lru_gen_attr_group)) pr_err("lru_gen: failed to create sysfs group\n"); + debugfs_create_file("lru_gen", 0644, NULL, NULL, &lru_gen_rw_fops); + debugfs_create_file("lru_gen_full", 0444, NULL, NULL, &lru_gen_ro_fops); + return 0; }; late_initcall(init_lru_gen); -- 2.35.0.263.gb82422642f-goog
WARNING: multiple messages have this Message-ID (diff)
From: Yu Zhao <yuzhao@google.com> To: Andrew Morton <akpm@linux-foundation.org>, Johannes Weiner <hannes@cmpxchg.org>, Mel Gorman <mgorman@suse.de>, Michal Hocko <mhocko@kernel.org> Cc: "Andi Kleen" <ak@linux.intel.com>, "Aneesh Kumar" <aneesh.kumar@linux.ibm.com>, "Barry Song" <21cnbao@gmail.com>, "Catalin Marinas" <catalin.marinas@arm.com>, "Dave Hansen" <dave.hansen@linux.intel.com>, "Hillf Danton" <hdanton@sina.com>, "Jens Axboe" <axboe@kernel.dk>, "Jesse Barnes" <jsbarnes@google.com>, "Jonathan Corbet" <corbet@lwn.net>, "Linus Torvalds" <torvalds@linux-foundation.org>, "Matthew Wilcox" <willy@infradead.org>, "Michael Larabel" <Michael@michaellarabel.com>, "Mike Rapoport" <rppt@kernel.org>, "Rik van Riel" <riel@surriel.com>, "Vlastimil Babka" <vbabka@suse.cz>, "Will Deacon" <will@kernel.org>, "Ying Huang" <ying.huang@intel.com>, linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, "Yu Zhao" <yuzhao@google.com>, "Brian Geffon" <bgeffon@google.com>, "Jan Alexander Steffens" <heftig@archlinux.org>, "Oleksandr Natalenko" <oleksandr@natalenko.name>, "Steven Barrett" <steven@liquorix.net>, "Suleiman Souhlal" <suleiman@google.com>, "Daniel Byrne" <djbyrne@mtu.edu>, "Donald Carr" <d@chaos-reins.com>, "Holger Hoffstätte" <holger@applied-asynchrony.com>, "Konstantin Kharlamov" <Hi-Angel@yandex.ru>, "Shuang Zhai" <szhai2@cs.rochester.edu>, "Sofia Trinh" <sofia.trinh@edi.works> Subject: [PATCH v7 11/12] mm: multigenerational LRU: debugfs interface Date: Tue, 8 Feb 2022 01:19:01 -0700 [thread overview] Message-ID: <20220208081902.3550911-12-yuzhao@google.com> (raw) In-Reply-To: <20220208081902.3550911-1-yuzhao@google.com> Add /sys/kernel/debug/lru_gen for working set estimation and proactive reclaim. These features are required to optimize job scheduling (bin packing) in data centers [1][2]. Compared with the page table-based approach and the PFN-based approach, e.g., mm/damon/[vp]addr.c, this lruvec-based approach has the following advantages: 1) It offers better choices because it's aware of memcgs, NUMA nodes, shared mappings and unmapped page cache. 2) It's more scalable because it's O(nr_hot_pages), whereas the PFN-based approach is O(nr_total_pages). Add /sys/kernel/debug/lru_gen_full for debugging. [1] https://research.google/pubs/pub48551/ [2] https://www.cs.cmu.edu/~dskarlat/publications/tmo_asplos22.pdf Signed-off-by: Yu Zhao <yuzhao@google.com> Acked-by: Brian Geffon <bgeffon@google.com> Acked-by: Jan Alexander Steffens (heftig) <heftig@archlinux.org> Acked-by: Oleksandr Natalenko <oleksandr@natalenko.name> Acked-by: Steven Barrett <steven@liquorix.net> Acked-by: Suleiman Souhlal <suleiman@google.com> Tested-by: Daniel Byrne <djbyrne@mtu.edu> Tested-by: Donald Carr <d@chaos-reins.com> Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com> Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Tested-by: Shuang Zhai <szhai2@cs.rochester.edu> Tested-by: Sofia Trinh <sofia.trinh@edi.works> --- include/linux/nodemask.h | 1 + mm/vmscan.c | 353 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 354 insertions(+) diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 567c3ddba2c4..90840c459abc 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -486,6 +486,7 @@ static inline int num_node_state(enum node_states state) #define first_online_node 0 #define first_memory_node 0 #define next_online_node(nid) (MAX_NUMNODES) +#define next_memory_node(nid) (MAX_NUMNODES) #define nr_node_ids 1U #define nr_online_nodes 1U diff --git a/mm/vmscan.c b/mm/vmscan.c index 4d37d63668b5..3dfa938a4c4a 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -52,6 +52,8 @@ #include <linux/psi.h> #include <linux/pagewalk.h> #include <linux/shmem_fs.h> +#include <linux/ctype.h> +#include <linux/debugfs.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -5220,6 +5222,354 @@ static struct attribute_group lru_gen_attr_group = { .attrs = lru_gen_attrs, }; +/****************************************************************************** + * debugfs interface + ******************************************************************************/ + +static void *lru_gen_seq_start(struct seq_file *m, loff_t *pos) +{ + struct mem_cgroup *memcg; + loff_t nr_to_skip = *pos; + + m->private = kvmalloc(PATH_MAX, GFP_KERNEL); + if (!m->private) + return ERR_PTR(-ENOMEM); + + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + int nid; + + for_each_node_state(nid, N_MEMORY) { + if (!nr_to_skip--) + return get_lruvec(memcg, nid); + } + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); + + return NULL; +} + +static void lru_gen_seq_stop(struct seq_file *m, void *v) +{ + if (!IS_ERR_OR_NULL(v)) + mem_cgroup_iter_break(NULL, lruvec_memcg(v)); + + kvfree(m->private); + m->private = NULL; +} + +static void *lru_gen_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + int nid = lruvec_pgdat(v)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(v); + + ++*pos; + + nid = next_memory_node(nid); + if (nid == MAX_NUMNODES) { + memcg = mem_cgroup_iter(NULL, memcg, NULL); + if (!memcg) + return NULL; + + nid = first_memory_node; + } + + return get_lruvec(memcg, nid); +} + +static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, + unsigned long max_seq, unsigned long *min_seq, + unsigned long seq) +{ + int i; + int type, tier; + int hist = lru_hist_from_seq(seq); + struct lru_gen_struct *lrugen = &lruvec->lrugen; + + for (tier = 0; tier < MAX_NR_TIERS; tier++) { + seq_printf(m, " %10d", tier); + for (type = 0; type < ANON_AND_FILE; type++) { + unsigned long n[3] = {}; + + if (seq == max_seq) { + n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); + n[1] = READ_ONCE(lrugen->avg_total[type][tier]); + + seq_printf(m, " %10luR %10luT %10lu ", n[0], n[1], n[2]); + } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { + n[0] = atomic_long_read(&lrugen->refaulted[hist][type][tier]); + n[1] = atomic_long_read(&lrugen->evicted[hist][type][tier]); + if (tier) + n[2] = READ_ONCE(lrugen->promoted[hist][type][tier - 1]); + + seq_printf(m, " %10lur %10lue %10lup", n[0], n[1], n[2]); + } else + seq_puts(m, " 0 0 0 "); + } + seq_putc(m, '\n'); + } + + seq_puts(m, " "); + for (i = 0; i < NR_MM_STATS; i++) { + if (seq == max_seq && NR_HIST_GENS == 1) + seq_printf(m, " %10lu%c", READ_ONCE(lruvec->mm_state.stats[hist][i]), + toupper(MM_STAT_CODES[i])); + else if (seq != max_seq && NR_HIST_GENS > 1) + seq_printf(m, " %10lu%c", READ_ONCE(lruvec->mm_state.stats[hist][i]), + MM_STAT_CODES[i]); + else + seq_puts(m, " 0 "); + } + seq_putc(m, '\n'); +} + +static int lru_gen_seq_show(struct seq_file *m, void *v) +{ + unsigned long seq; + bool full = !debugfs_real_fops(m->file)->write; + struct lruvec *lruvec = v; + struct lru_gen_struct *lrugen = &lruvec->lrugen; + int nid = lruvec_pgdat(lruvec)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + DEFINE_MAX_SEQ(lruvec); + DEFINE_MIN_SEQ(lruvec); + + if (nid == first_memory_node) { + const char *path = memcg ? m->private : ""; + +#ifdef CONFIG_MEMCG + if (memcg) + cgroup_path(memcg->css.cgroup, m->private, PATH_MAX); +#endif + seq_printf(m, "memcg %5hu %s\n", mem_cgroup_id(memcg), path); + } + + seq_printf(m, " node %5d\n", nid); + + if (!full) + seq = min_seq[TYPE_ANON]; + else if (max_seq >= MAX_NR_GENS) + seq = max_seq - MAX_NR_GENS + 1; + else + seq = 0; + + for (; seq <= max_seq; seq++) { + int gen, type, zone; + unsigned int msecs; + + gen = lru_gen_from_seq(seq); + msecs = jiffies_to_msecs(jiffies - READ_ONCE(lrugen->timestamps[gen])); + + seq_printf(m, " %10lu %10u", seq, msecs); + + for (type = 0; type < ANON_AND_FILE; type++) { + long size = 0; + + if (seq < min_seq[type]) { + seq_puts(m, " -0 "); + continue; + } + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + size += READ_ONCE(lrugen->nr_pages[gen][type][zone]); + + seq_printf(m, " %10lu ", max(size, 0L)); + } + + seq_putc(m, '\n'); + + if (full) + lru_gen_seq_show_full(m, lruvec, max_seq, min_seq, seq); + } + + return 0; +} + +static const struct seq_operations lru_gen_seq_ops = { + .start = lru_gen_seq_start, + .stop = lru_gen_seq_stop, + .next = lru_gen_seq_next, + .show = lru_gen_seq_show, +}; + +static int run_aging(struct lruvec *lruvec, unsigned long seq, struct scan_control *sc, + bool can_swap, bool full_scan) +{ + DEFINE_MAX_SEQ(lruvec); + + if (seq == max_seq) + try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, full_scan); + + return seq > max_seq ? -EINVAL : 0; +} + +static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_control *sc, + int swappiness, unsigned long nr_to_reclaim) +{ + struct blk_plug plug; + int err = -EINTR; + DEFINE_MAX_SEQ(lruvec); + + if (max_seq < seq + MIN_NR_GENS) + return -EINVAL; + + sc->nr_reclaimed = 0; + + blk_start_plug(&plug); + + while (!signal_pending(current)) { + DEFINE_MIN_SEQ(lruvec); + + if (seq < min_seq[!swappiness] || sc->nr_reclaimed >= nr_to_reclaim || + !evict_folios(lruvec, sc, swappiness, NULL)) { + err = 0; + break; + } + + cond_resched(); + } + + blk_finish_plug(&plug); + + return err; +} + +static int run_cmd(char cmd, int memcg_id, int nid, unsigned long seq, + struct scan_control *sc, int swappiness, unsigned long opt) +{ + struct lruvec *lruvec; + int err = -EINVAL; + struct mem_cgroup *memcg = NULL; + + if (!mem_cgroup_disabled()) { + rcu_read_lock(); + memcg = mem_cgroup_from_id(memcg_id); +#ifdef CONFIG_MEMCG + if (memcg && !css_tryget(&memcg->css)) + memcg = NULL; +#endif + rcu_read_unlock(); + + if (!memcg) + goto done; + } + if (memcg_id != mem_cgroup_id(memcg)) + goto done; + + if (nid < 0 || nid >= MAX_NUMNODES || !node_state(nid, N_MEMORY)) + goto done; + + lruvec = get_lruvec(memcg, nid); + + if (swappiness < 0) + swappiness = get_swappiness(memcg); + else if (swappiness > 200) + goto done; + + switch (cmd) { + case '+': + err = run_aging(lruvec, seq, sc, swappiness, opt); + break; + case '-': + err = run_eviction(lruvec, seq, sc, swappiness, opt); + break; + } +done: + mem_cgroup_put(memcg); + + return err; +} + +static ssize_t lru_gen_seq_write(struct file *file, const char __user *src, + size_t len, loff_t *pos) +{ + void *buf; + char *cur, *next; + unsigned int flags; + int err = 0; + struct scan_control sc = { + .may_writepage = true, + .may_unmap = true, + .may_swap = true, + .reclaim_idx = MAX_NR_ZONES - 1, + .gfp_mask = GFP_KERNEL, + }; + + buf = kvmalloc(len + 1, GFP_KERNEL); + if (!buf) + return -ENOMEM; + + if (copy_from_user(buf, src, len)) { + kvfree(buf); + return -EFAULT; + } + + next = buf; + next[len] = '\0'; + + sc.reclaim_state.mm_walk = alloc_mm_walk(); + if (!sc.reclaim_state.mm_walk) { + kvfree(buf); + return -ENOMEM; + } + + flags = memalloc_noreclaim_save(); + set_task_reclaim_state(current, &sc.reclaim_state); + + while ((cur = strsep(&next, ",;\n"))) { + int n; + int end; + char cmd; + unsigned int memcg_id; + unsigned int nid; + unsigned long seq; + unsigned int swappiness = -1; + unsigned long opt = -1; + + cur = skip_spaces(cur); + if (!*cur) + continue; + + n = sscanf(cur, "%c %u %u %lu %n %u %n %lu %n", &cmd, &memcg_id, &nid, + &seq, &end, &swappiness, &end, &opt, &end); + if (n < 4 || cur[end]) { + err = -EINVAL; + break; + } + + err = run_cmd(cmd, memcg_id, nid, seq, &sc, swappiness, opt); + if (err) + break; + } + + set_task_reclaim_state(current, NULL); + memalloc_noreclaim_restore(flags); + + free_mm_walk(sc.reclaim_state.mm_walk); + kvfree(buf); + + return err ? : len; +} + +static int lru_gen_seq_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &lru_gen_seq_ops); +} + +static const struct file_operations lru_gen_rw_fops = { + .open = lru_gen_seq_open, + .read = seq_read, + .write = lru_gen_seq_write, + .llseek = seq_lseek, + .release = seq_release, +}; + +static const struct file_operations lru_gen_ro_fops = { + .open = lru_gen_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + /****************************************************************************** * initialization ******************************************************************************/ @@ -5289,6 +5639,9 @@ static int __init init_lru_gen(void) if (sysfs_create_group(mm_kobj, &lru_gen_attr_group)) pr_err("lru_gen: failed to create sysfs group\n"); + debugfs_create_file("lru_gen", 0644, NULL, NULL, &lru_gen_rw_fops); + debugfs_create_file("lru_gen_full", 0444, NULL, NULL, &lru_gen_ro_fops); + return 0; }; late_initcall(init_lru_gen); -- 2.35.0.263.gb82422642f-goog _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2022-02-08 8:20 UTC|newest] Thread overview: 150+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-02-08 8:18 [PATCH v7 00/12] Multigenerational LRU Framework Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:18 ` [PATCH v7 01/12] mm: x86, arm64: add arch_has_hw_pte_young() Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:24 ` Yu Zhao 2022-02-08 8:24 ` Yu Zhao 2022-02-08 10:33 ` Will Deacon 2022-02-08 10:33 ` Will Deacon 2022-02-08 8:18 ` [PATCH v7 02/12] mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:27 ` Yu Zhao 2022-02-08 8:27 ` Yu Zhao 2022-02-08 8:18 ` [PATCH v7 03/12] mm/vmscan.c: refactor shrink_node() Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:18 ` [PATCH v7 04/12] mm: multigenerational LRU: groundwork Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:28 ` Yu Zhao 2022-02-08 8:28 ` Yu Zhao 2022-02-10 20:41 ` Johannes Weiner 2022-02-10 20:41 ` Johannes Weiner 2022-02-15 9:43 ` Yu Zhao 2022-02-15 9:43 ` Yu Zhao 2022-02-15 21:53 ` Johannes Weiner 2022-02-15 21:53 ` Johannes Weiner 2022-02-21 8:14 ` Yu Zhao 2022-02-21 8:14 ` Yu Zhao 2022-02-23 21:18 ` Yu Zhao 2022-02-23 21:18 ` Yu Zhao 2022-02-25 16:34 ` Minchan Kim 2022-02-25 16:34 ` Minchan Kim 2022-03-03 15:29 ` Johannes Weiner 2022-03-03 15:29 ` Johannes Weiner 2022-03-03 19:26 ` Yu Zhao 2022-03-03 19:26 ` Yu Zhao 2022-03-03 21:43 ` Johannes Weiner 2022-03-03 21:43 ` Johannes Weiner 2022-03-11 10:16 ` Barry Song 2022-03-11 10:16 ` Barry Song 2022-03-11 23:45 ` Yu Zhao 2022-03-11 23:45 ` Yu Zhao 2022-03-12 10:37 ` Barry Song 2022-03-12 10:37 ` Barry Song 2022-03-12 21:11 ` Yu Zhao 2022-03-12 21:11 ` Yu Zhao 2022-03-13 4:57 ` Barry Song 2022-03-13 4:57 ` Barry Song 2022-03-14 11:11 ` Barry Song 2022-03-14 11:11 ` Barry Song 2022-03-14 16:45 ` Yu Zhao 2022-03-14 16:45 ` Yu Zhao 2022-03-14 23:38 ` Barry Song 2022-03-14 23:38 ` Barry Song 2022-03-15 5:18 ` Yu Zhao 2022-03-15 5:18 ` Yu Zhao 2022-03-15 9:27 ` Barry Song 2022-03-15 9:27 ` Barry Song 2022-03-15 10:29 ` Barry Song 2022-03-15 10:29 ` Barry Song 2022-03-16 2:46 ` Yu Zhao 2022-03-16 2:46 ` Yu Zhao 2022-03-16 4:37 ` Barry Song 2022-03-16 4:37 ` Barry Song 2022-03-16 5:44 ` Yu Zhao 2022-03-16 5:44 ` Yu Zhao 2022-03-16 6:06 ` Barry Song 2022-03-16 6:06 ` Barry Song 2022-03-16 21:37 ` Yu Zhao 2022-03-16 21:37 ` Yu Zhao 2022-02-10 21:37 ` Matthew Wilcox 2022-02-10 21:37 ` Matthew Wilcox 2022-02-13 21:16 ` Yu Zhao 2022-02-13 21:16 ` Yu Zhao 2022-02-08 8:18 ` [PATCH v7 05/12] mm: multigenerational LRU: minimal implementation Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:33 ` Yu Zhao 2022-02-08 8:33 ` Yu Zhao 2022-02-08 16:50 ` Johannes Weiner 2022-02-08 16:50 ` Johannes Weiner 2022-02-10 2:53 ` Yu Zhao 2022-02-10 2:53 ` Yu Zhao 2022-02-13 10:04 ` Hillf Danton 2022-02-17 0:13 ` Yu Zhao 2022-02-23 8:27 ` Huang, Ying 2022-02-23 8:27 ` Huang, Ying 2022-02-23 9:36 ` Yu Zhao 2022-02-23 9:36 ` Yu Zhao 2022-02-24 0:59 ` Huang, Ying 2022-02-24 0:59 ` Huang, Ying 2022-02-24 1:34 ` Yu Zhao 2022-02-24 1:34 ` Yu Zhao 2022-02-24 3:31 ` Huang, Ying 2022-02-24 3:31 ` Huang, Ying 2022-02-24 4:09 ` Yu Zhao 2022-02-24 4:09 ` Yu Zhao 2022-02-24 5:27 ` Huang, Ying 2022-02-24 5:27 ` Huang, Ying 2022-02-24 5:35 ` Yu Zhao 2022-02-24 5:35 ` Yu Zhao 2022-02-08 8:18 ` [PATCH v7 06/12] mm: multigenerational LRU: exploit locality in rmap Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:40 ` Yu Zhao 2022-02-08 8:40 ` Yu Zhao 2022-02-08 8:18 ` [PATCH v7 07/12] mm: multigenerational LRU: support page table walks Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:39 ` Yu Zhao 2022-02-08 8:39 ` Yu Zhao 2022-02-08 8:18 ` [PATCH v7 08/12] mm: multigenerational LRU: optimize multiple memcgs Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:18 ` [PATCH v7 09/12] mm: multigenerational LRU: runtime switch Yu Zhao 2022-02-08 8:18 ` Yu Zhao 2022-02-08 8:42 ` Yu Zhao 2022-02-08 8:42 ` Yu Zhao 2022-02-08 8:19 ` [PATCH v7 10/12] mm: multigenerational LRU: thrashing prevention Yu Zhao 2022-02-08 8:19 ` Yu Zhao 2022-02-08 8:43 ` Yu Zhao 2022-02-08 8:43 ` Yu Zhao 2022-02-08 8:19 ` Yu Zhao [this message] 2022-02-08 8:19 ` [PATCH v7 11/12] mm: multigenerational LRU: debugfs interface Yu Zhao 2022-02-18 18:56 ` [page-reclaim] " David Rientjes 2022-02-18 18:56 ` David Rientjes 2022-02-08 8:19 ` [PATCH v7 12/12] mm: multigenerational LRU: documentation Yu Zhao 2022-02-08 8:19 ` Yu Zhao 2022-02-08 8:44 ` Yu Zhao 2022-02-08 8:44 ` Yu Zhao 2022-02-14 10:28 ` Mike Rapoport 2022-02-14 10:28 ` Mike Rapoport 2022-02-16 3:22 ` Yu Zhao 2022-02-16 3:22 ` Yu Zhao 2022-02-21 9:01 ` Mike Rapoport 2022-02-21 9:01 ` Mike Rapoport 2022-02-22 1:47 ` Yu Zhao 2022-02-22 1:47 ` Yu Zhao 2022-02-23 10:58 ` Mike Rapoport 2022-02-23 10:58 ` Mike Rapoport 2022-02-23 21:20 ` Yu Zhao 2022-02-23 21:20 ` Yu Zhao 2022-02-08 10:11 ` [PATCH v7 00/12] Multigenerational LRU Framework Oleksandr Natalenko 2022-02-08 10:11 ` Oleksandr Natalenko 2022-02-08 11:14 ` Michal Hocko 2022-02-08 11:14 ` Michal Hocko 2022-02-08 11:23 ` Oleksandr Natalenko 2022-02-08 11:23 ` Oleksandr Natalenko 2022-02-11 20:12 ` Alexey Avramov 2022-02-11 20:12 ` Alexey Avramov 2022-02-12 21:01 ` Yu Zhao 2022-02-12 21:01 ` Yu Zhao 2022-03-03 6:06 ` Vaibhav Jain 2022-03-03 6:06 ` Vaibhav Jain 2022-03-03 6:47 ` Yu Zhao 2022-03-03 6:47 ` Yu Zhao
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220208081902.3550911-12-yuzhao@google.com \ --to=yuzhao@google.com \ --cc=21cnbao@gmail.com \ --cc=Hi-Angel@yandex.ru \ --cc=Michael@michaellarabel.com \ --cc=ak@linux.intel.com \ --cc=akpm@linux-foundation.org \ --cc=aneesh.kumar@linux.ibm.com \ --cc=axboe@kernel.dk \ --cc=bgeffon@google.com \ --cc=catalin.marinas@arm.com \ --cc=corbet@lwn.net \ --cc=d@chaos-reins.com \ --cc=dave.hansen@linux.intel.com \ --cc=djbyrne@mtu.edu \ --cc=hannes@cmpxchg.org \ --cc=hdanton@sina.com \ --cc=heftig@archlinux.org \ --cc=holger@applied-asynchrony.com \ --cc=jsbarnes@google.com \ --cc=linux-arm-kernel@lists.infradead.org \ --cc=linux-doc@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=mhocko@kernel.org \ --cc=oleksandr@natalenko.name \ --cc=page-reclaim@google.com \ --cc=riel@surriel.com \ --cc=rppt@kernel.org \ --cc=sofia.trinh@edi.works \ --cc=steven@liquorix.net \ --cc=suleiman@google.com \ --cc=szhai2@cs.rochester.edu \ --cc=torvalds@linux-foundation.org \ --cc=vbabka@suse.cz \ --cc=will@kernel.org \ --cc=willy@infradead.org \ --cc=x86@kernel.org \ --cc=ying.huang@intel.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.