From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-26.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_GIT,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 29BC1C4338F for ; Wed, 18 Aug 2021 06:31:56 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0A15E60F11 for ; Wed, 18 Aug 2021 06:31:56 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240218AbhHRGcZ (ORCPT ); Wed, 18 Aug 2021 02:32:25 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50180 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S239959AbhHRGcE (ORCPT ); Wed, 18 Aug 2021 02:32:04 -0400 Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com [IPv6:2607:f8b0:4864:20::b49]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 539CAC0617AD for ; Tue, 17 Aug 2021 23:31:24 -0700 (PDT) Received: by mail-yb1-xb49.google.com with SMTP id w201-20020a25dfd2000000b00594695384d1so1788890ybg.20 for ; Tue, 17 Aug 2021 23:31:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Sa/+kI2QQ5yzW+73rZnyPuODQVSPnlfsCdgMQjH8k3w=; b=p/kWK0lL+i7CF7DanrRtUfMJFmmz8fQT4ClShk8ocdN5YWEXJPS5ufC7DcnocXjud4 mfpRwtLsUt85zDj/Vel9Sy3Jjj3hox0+z3qxcKB3JAPlTaBCLqAbVsujMvLixzDj8xpA 7YbURlMIa/XunHEgChappMMLMe1W0v1tpJNhoqYo6dtXDFTS7fgcidPpm1UdfhuQt+Cm GUlkTg8AKW4VeZqM8zlDD46cHZQg9YvXeKCIo/LYj1mk7JTr8d8wRdeibTuEBDC5GgLg sFPtmD9SiU8NbM6KRor52pD3S3prQeHrpScJtPebeGIPNqCVE2YafoCfIG0lG5wFHL7H fyuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Sa/+kI2QQ5yzW+73rZnyPuODQVSPnlfsCdgMQjH8k3w=; b=XSBcldpaev5rOm6CUxnhulJSpJgsLtBVPjegEC/y6Osh82oaS6h0EXyhX0KFDdVBXQ cAW9p5jfnpKNRB1bTDmf6VHwIrTEi1gScj/fgA5bxLg2uEg6jiu+k4n3EnFFA3DAtFGD eX0aqIGGxYLugnnRJYgqz1tS0uWWSa1UE0yt+nHFX9jCR8I9iOCqKKw1tpLLD6dD7VXE S33ueH8fP5rm//ljm+ODytBoNmPYQXCNxp+nteGt1OpMi+75HEUxr1TahEnL5Bnf/f8G 4O6jqK3bfc26Qi0J8Jo6e7QUbu6vqq38aQCn14KFxJAx7AtVa7aTlZyix+NYvGD4hFjy UkLQ== X-Gm-Message-State: AOAM532ibpIzxsBTmT+WufIfsWStU5Haf46q7iY/0DFGts2by83YCUiE EDKTLMy/uAkmlbpFVsQ8g2aDUaxv4kc= X-Google-Smtp-Source: ABdhPJz8Y4PWzvx36wd+Xufm5SH2b/iFjdnqIog9P1g4iCjFBZjzAPAITo1V9qboEamgEU0upnvDj06FKHU= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:41f0:f89:87cd:8bd0]) (user=yuzhao job=sendgmr) by 2002:a25:bc82:: with SMTP id e2mr8957312ybk.307.1629268283542; Tue, 17 Aug 2021 23:31:23 -0700 (PDT) Date: Wed, 18 Aug 2021 00:31:05 -0600 In-Reply-To: <20210818063107.2696454-1-yuzhao@google.com> Message-Id: <20210818063107.2696454-10-yuzhao@google.com> Mime-Version: 1.0 References: <20210818063107.2696454-1-yuzhao@google.com> X-Mailer: git-send-email 2.33.0.rc1.237.g0d66db33f3-goog Subject: [PATCH v4 09/11] mm: multigenerational lru: user interface From: Yu Zhao To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Hillf Danton , page-reclaim@google.com, Yu Zhao , Konstantin Kharlamov Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Add /sys/kernel/mm/lru_gen/enabled to enable and disable the multigenerational lru at runtime. Add /sys/kernel/mm/lru_gen/min_ttl_ms to protect the working set of a given number of milliseconds. The OOM killer is invoked if this working set cannot be kept in memory. Add /sys/kernel/debug/lru_gen to monitor the multigenerational lru and invoke the aging and the eviction. This file has the following output: memcg memcg_id memcg_path node node_id min_gen birth_time anon_size file_size ... max_gen birth_time anon_size file_size min_gen is the oldest generation number and max_gen is the youngest generation number. birth_time is in milliseconds. anon_size and file_size are in pages. This file takes the following input: + memcg_id node_id max_gen [swappiness] - memcg_id node_id min_gen [swappiness] [nr_to_reclaim] The first command line invokes the aging, which scans PTEs for accessed pages and then creates the next generation max_gen+1. A swap file and a non-zero swappiness, which overrides vm.swappiness, are required to scan PTEs mapping anon pages. The second command line invokes the eviction, which evicts generations less than or equal to min_gen. min_gen should be less than max_gen-1 as max_gen and max_gen-1 are not fully aged and therefore cannot be evicted. nr_to_reclaim can be used to limit the number of pages to evict. Multiple command lines are supported, as is concatenation with delimiters "," and ";". Signed-off-by: Yu Zhao Tested-by: Konstantin Kharlamov --- include/linux/nodemask.h | 1 + mm/vmscan.c | 412 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 413 insertions(+) diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 567c3ddba2c4..90840c459abc 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -486,6 +486,7 @@ static inline int num_node_state(enum node_states state) #define first_online_node 0 #define first_memory_node 0 #define next_online_node(nid) (MAX_NUMNODES) +#define next_memory_node(nid) (MAX_NUMNODES) #define nr_node_ids 1U #define nr_online_nodes 1U diff --git a/mm/vmscan.c b/mm/vmscan.c index 2f1fffbd2d61..c6d539a73d00 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -52,6 +52,8 @@ #include #include #include +#include +#include #include #include @@ -4732,6 +4734,410 @@ static int __meminit __maybe_unused mem_notifier(struct notifier_block *self, return NOTIFY_DONE; } +/****************************************************************************** + * sysfs interface + ******************************************************************************/ + +static ssize_t show_min_ttl(struct kobject *kobj, struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, "%u\n", jiffies_to_msecs(READ_ONCE(lru_gen_min_ttl))); +} + +static ssize_t store_min_ttl(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t len) +{ + unsigned int msecs; + + if (kstrtouint(buf, 10, &msecs)) + return -EINVAL; + + WRITE_ONCE(lru_gen_min_ttl, msecs_to_jiffies(msecs)); + + return len; +} + +static struct kobj_attribute lru_gen_min_ttl_attr = __ATTR( + min_ttl_ms, 0644, show_min_ttl, store_min_ttl +); + +static ssize_t show_enable(struct kobject *kobj, struct kobj_attribute *attr, char *buf) +{ + return snprintf(buf, PAGE_SIZE, "%d\n", lru_gen_enabled()); +} + +static ssize_t store_enable(struct kobject *kobj, struct kobj_attribute *attr, + const char *buf, size_t len) +{ + int enable; + + if (kstrtoint(buf, 10, &enable)) + return -EINVAL; + + lru_gen_set_state(enable, true, false); + + return len; +} + +static struct kobj_attribute lru_gen_enabled_attr = __ATTR( + enabled, 0644, show_enable, store_enable +); + +static struct attribute *lru_gen_attrs[] = { + &lru_gen_min_ttl_attr.attr, + &lru_gen_enabled_attr.attr, + NULL +}; + +static struct attribute_group lru_gen_attr_group = { + .name = "lru_gen", + .attrs = lru_gen_attrs, +}; + +/****************************************************************************** + * debugfs interface + ******************************************************************************/ + +static void *lru_gen_seq_start(struct seq_file *m, loff_t *pos) +{ + struct mem_cgroup *memcg; + loff_t nr_to_skip = *pos; + + m->private = kvmalloc(PATH_MAX, GFP_KERNEL); + if (!m->private) + return ERR_PTR(-ENOMEM); + + memcg = mem_cgroup_iter(NULL, NULL, NULL); + do { + int nid; + + for_each_node_state(nid, N_MEMORY) { + if (!nr_to_skip--) + return mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + } + } while ((memcg = mem_cgroup_iter(NULL, memcg, NULL))); + + return NULL; +} + +static void lru_gen_seq_stop(struct seq_file *m, void *v) +{ + if (!IS_ERR_OR_NULL(v)) + mem_cgroup_iter_break(NULL, lruvec_memcg(v)); + + kvfree(m->private); + m->private = NULL; +} + +static void *lru_gen_seq_next(struct seq_file *m, void *v, loff_t *pos) +{ + int nid = lruvec_pgdat(v)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(v); + + ++*pos; + + nid = next_memory_node(nid); + if (nid == MAX_NUMNODES) { + memcg = mem_cgroup_iter(NULL, memcg, NULL); + if (!memcg) + return NULL; + + nid = first_memory_node; + } + + return mem_cgroup_lruvec(memcg, NODE_DATA(nid)); +} + +static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, + unsigned long max_seq, unsigned long *min_seq, + unsigned long seq) +{ + int i; + int type, tier; + int hist = lru_hist_from_seq(seq); + struct lrugen *lrugen = &lruvec->evictable; + int nid = lruvec_pgdat(lruvec)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + struct lru_gen_mm_list *mm_list = get_mm_list(memcg); + + for (tier = 0; tier < MAX_NR_TIERS; tier++) { + seq_printf(m, " %10d", tier); + for (type = 0; type < ANON_AND_FILE; type++) { + unsigned long n[3] = {}; + + if (seq == max_seq) { + n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); + n[1] = READ_ONCE(lrugen->avg_total[type][tier]); + + seq_printf(m, " %10luR %10luT %10lu ", n[0], n[1], n[2]); + } else if (seq == min_seq[type] || NR_STAT_GENS > 1) { + n[0] = atomic_long_read(&lrugen->refaulted[hist][type][tier]); + n[1] = atomic_long_read(&lrugen->evicted[hist][type][tier]); + if (tier) + n[2] = READ_ONCE(lrugen->protected[hist][type][tier - 1]); + + seq_printf(m, " %10lur %10lue %10lup", n[0], n[1], n[2]); + } else + seq_puts(m, " 0 0 0 "); + } + seq_putc(m, '\n'); + } + + seq_puts(m, " "); + for (i = 0; i < NR_MM_STATS; i++) { + if (i == 6) + seq_puts(m, "\n "); + + if (seq == max_seq && NR_STAT_GENS == 1) + seq_printf(m, " %10lu%c", READ_ONCE(mm_list->nodes[nid].stats[hist][i]), + toupper(MM_STAT_CODES[i])); + else if (seq != max_seq && NR_STAT_GENS > 1) + seq_printf(m, " %10lu%c", READ_ONCE(mm_list->nodes[nid].stats[hist][i]), + MM_STAT_CODES[i]); + else + seq_puts(m, " 0 "); + } + seq_putc(m, '\n'); +} + +static int lru_gen_seq_show(struct seq_file *m, void *v) +{ + unsigned long seq; + bool full = !debugfs_real_fops(m->file)->write; + struct lruvec *lruvec = v; + struct lrugen *lrugen = &lruvec->evictable; + int nid = lruvec_pgdat(lruvec)->node_id; + struct mem_cgroup *memcg = lruvec_memcg(lruvec); + DEFINE_MAX_SEQ(lruvec); + DEFINE_MIN_SEQ(lruvec); + + if (nid == first_memory_node) { + const char *path = memcg ? m->private : ""; + +#ifdef CONFIG_MEMCG + if (memcg) + cgroup_path(memcg->css.cgroup, m->private, PATH_MAX); +#endif + seq_printf(m, "memcg %5hu %s\n", mem_cgroup_id(memcg), path); + } + + seq_printf(m, " node %5d\n", nid); + + if (!full) + seq = min(min_seq[0], min_seq[1]); + else if (max_seq >= MAX_NR_GENS) + seq = max_seq - MAX_NR_GENS + 1; + else + seq = 0; + + for (; seq <= max_seq; seq++) { + int gen, type, zone; + unsigned int msecs; + + gen = lru_gen_from_seq(seq); + msecs = jiffies_to_msecs(jiffies - READ_ONCE(lrugen->timestamps[gen])); + + seq_printf(m, " %10lu %10u", seq, msecs); + + for (type = 0; type < ANON_AND_FILE; type++) { + long size = 0; + + if (seq < min_seq[type]) { + seq_puts(m, " -0 "); + continue; + } + + for (zone = 0; zone < MAX_NR_ZONES; zone++) + size += READ_ONCE(lrugen->sizes[gen][type][zone]); + + seq_printf(m, " %10lu ", max(size, 0L)); + } + + seq_putc(m, '\n'); + + if (full) + lru_gen_seq_show_full(m, lruvec, max_seq, min_seq, seq); + } + + return 0; +} + +static const struct seq_operations lru_gen_seq_ops = { + .start = lru_gen_seq_start, + .stop = lru_gen_seq_stop, + .next = lru_gen_seq_next, + .show = lru_gen_seq_show, +}; + +static int run_aging(struct lruvec *lruvec, unsigned long seq, int swappiness) +{ + struct scan_control sc = {}; + DEFINE_MAX_SEQ(lruvec); + + if (seq == max_seq) + try_to_inc_max_seq(lruvec, max_seq, &sc, swappiness); + + return seq > max_seq ? -EINVAL : 0; +} + +static int run_eviction(struct lruvec *lruvec, unsigned long seq, int swappiness, + unsigned long nr_to_reclaim) +{ + unsigned int flags; + struct blk_plug plug; + int err = -EINTR; + long nr_to_scan = LONG_MAX; + struct scan_control sc = { + .nr_to_reclaim = nr_to_reclaim, + .may_writepage = 1, + .may_unmap = 1, + .may_swap = 1, + .reclaim_idx = MAX_NR_ZONES - 1, + .gfp_mask = GFP_KERNEL, + }; + DEFINE_MAX_SEQ(lruvec); + + if (seq >= max_seq - 1) + return -EINVAL; + + flags = memalloc_noreclaim_save(); + + blk_start_plug(&plug); + + while (!signal_pending(current)) { + DEFINE_MIN_SEQ(lruvec); + + if (seq < min(min_seq[!swappiness], min_seq[swappiness < 200]) || + !evict_pages(lruvec, &sc, swappiness, &nr_to_scan)) { + err = 0; + break; + } + + cond_resched(); + } + + blk_finish_plug(&plug); + + memalloc_noreclaim_restore(flags); + + return err; +} + +static int run_cmd(char cmd, int memcg_id, int nid, unsigned long seq, + int swappiness, unsigned long nr_to_reclaim) +{ + struct lruvec *lruvec; + int err = -EINVAL; + struct mem_cgroup *memcg = NULL; + + if (!mem_cgroup_disabled()) { + rcu_read_lock(); + memcg = mem_cgroup_from_id(memcg_id); +#ifdef CONFIG_MEMCG + if (memcg && !css_tryget(&memcg->css)) + memcg = NULL; +#endif + rcu_read_unlock(); + + if (!memcg) + goto done; + } + if (memcg_id != mem_cgroup_id(memcg)) + goto done; + + if (nid < 0 || nid >= MAX_NUMNODES || !node_state(nid, N_MEMORY)) + goto done; + + lruvec = mem_cgroup_lruvec(memcg, NODE_DATA(nid)); + + if (swappiness == -1) + swappiness = get_swappiness(memcg); + else if (swappiness > 200U) + goto done; + + switch (cmd) { + case '+': + err = run_aging(lruvec, seq, swappiness); + break; + case '-': + err = run_eviction(lruvec, seq, swappiness, nr_to_reclaim); + break; + } +done: + mem_cgroup_put(memcg); + + return err; +} + +static ssize_t lru_gen_seq_write(struct file *file, const char __user *src, + size_t len, loff_t *pos) +{ + void *buf; + char *cur, *next; + int err = 0; + + buf = kvmalloc(len + 1, GFP_USER); + if (!buf) + return -ENOMEM; + + if (copy_from_user(buf, src, len)) { + kvfree(buf); + return -EFAULT; + } + + next = buf; + next[len] = '\0'; + + while ((cur = strsep(&next, ",;\n"))) { + int n; + int end; + char cmd; + unsigned int memcg_id; + unsigned int nid; + unsigned long seq; + unsigned int swappiness = -1; + unsigned long nr_to_reclaim = -1; + + cur = skip_spaces(cur); + if (!*cur) + continue; + + n = sscanf(cur, "%c %u %u %lu %n %u %n %lu %n", &cmd, &memcg_id, &nid, + &seq, &end, &swappiness, &end, &nr_to_reclaim, &end); + if (n < 4 || cur[end]) { + err = -EINVAL; + break; + } + + err = run_cmd(cmd, memcg_id, nid, seq, swappiness, nr_to_reclaim); + if (err) + break; + } + + kvfree(buf); + + return err ? : len; +} + +static int lru_gen_seq_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &lru_gen_seq_ops); +} + +static const struct file_operations lru_gen_rw_fops = { + .open = lru_gen_seq_open, + .read = seq_read, + .write = lru_gen_seq_write, + .llseek = seq_lseek, + .release = seq_release, +}; + +static const struct file_operations lru_gen_ro_fops = { + .open = lru_gen_seq_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; + /****************************************************************************** * initialization ******************************************************************************/ @@ -4772,6 +5178,12 @@ static int __init init_lru_gen(void) if (hotplug_memory_notifier(mem_notifier, 0)) pr_err("lru_gen: failed to subscribe hotplug notifications\n"); + if (sysfs_create_group(mm_kobj, &lru_gen_attr_group)) + pr_err("lru_gen: failed to create sysfs group\n"); + + debugfs_create_file("lru_gen", 0644, NULL, NULL, &lru_gen_rw_fops); + debugfs_create_file("lru_gen_full", 0444, NULL, NULL, &lru_gen_ro_fops); + return 0; }; /* -- 2.33.0.rc1.237.g0d66db33f3-goog