From: Tim Chen <tim.c.chen@linux.intel.com>
To: Davidlohr Bueso <dave@stgolabs.net>, linux-mm@kvack.org
Cc: mhocko@kernel.org, akpm@linux-foundation.org,
rientjes@google.com, yosryahmed@google.com, hannes@cmpxchg.org,
shakeelb@google.com, dave.hansen@linux.intel.com,
roman.gushchin@linux.dev, gthelen@google.com,
a.manzanares@samsung.com, heekwon.p@samsung.com,
gim.jongmin@samsung.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 4/6] mm: introduce per-node proactive reclaim interface
Date: Mon, 18 Apr 2022 17:00:55 -0700 [thread overview]
Message-ID: <c7520aec8bd41550a520e411a829de892807dcb5.camel@linux.intel.com> (raw)
In-Reply-To: <20220416053902.68517-5-dave@stgolabs.net>
On Fri, 2022-04-15 at 22:39 -0700, Davidlohr Bueso wrote:
> This patch introduces a mechanism to trigger memory reclaim
> as a per-node sysfs interface, inspired by compaction's
> equivalent; ie:
>
> echo 1G > /sys/devices/system/node/nodeX/reclaim
>
I think it will be more flexible to specify a node mask
as a parameter along with amount of memory with the
memory.reclaim memcg interface proposed by Yosry. Doing it node
by node is more cumbersome. It is just a special case
of reclaiming from root cgroup for a specific node.
Wei Gu, YIng and I have some discssions on this
https://lore.kernel.org/all/df6110a09cacc80ee1cbe905a71273a5f3953e16.camel@linux.intel.com/
Tim
> It is based on the discussions from David's thread[1] as
> well as the current upstreaming of the memcg[2] interface
> (which has nice explanations for the benefits of userspace
> reclaim overall). In both cases conclusions were that either
> way of inducing proactive reclaim should be KISS, and can be
> later extended. So this patch does not allow the user much
> fine tuning beyond the size of the reclaim, such as anon/file
> or whether or semantics of demotion.
>
> [1] https://lore.kernel.org/all/5df21376-7dd1-bf81-8414-32a73cea45dd@google.com/
> [2] https://lore.kernel.org/all/20220408045743.1432968-1-yosryahmed@google.com/
>
> Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
> ---
> Documentation/ABI/stable/sysfs-devices-node | 10 ++++
> drivers/base/node.c | 2 +
> include/linux/swap.h | 16 ++++++
> mm/vmscan.c | 59 +++++++++++++++++++++
> 4 files changed, 87 insertions(+)
>
> diff --git a/Documentation/ABI/stable/sysfs-devices-node b/Documentation/ABI/stable/sysfs-devices-node
> index 8db67aa472f1..3c935e1334f7 100644
> --- a/Documentation/ABI/stable/sysfs-devices-node
> +++ b/Documentation/ABI/stable/sysfs-devices-node
> @@ -182,3 +182,13 @@ Date: November 2021
> Contact: Jarkko Sakkinen <jarkko@kernel.org>
> Description:
> The total amount of SGX physical memory in bytes.
> +
> +What: /sys/devices/system/node/nodeX/reclaim
> +Date: April 2022
> +Contact: Davidlohr Bueso <dave@stgolabs.net>
> +Description:
> + Write the amount of bytes to induce memory reclaim in this node.
> + This file accepts a single key, the number of bytes to reclaim.
> + When it completes successfully, the specified amount or more memory
> + will have been reclaimed, and -EAGAIN if less bytes are reclaimed
> + than the specified amount.
> diff --git a/drivers/base/node.c b/drivers/base/node.c
> index 6cdf25fd26c3..d80c478e2a6e 100644
> --- a/drivers/base/node.c
> +++ b/drivers/base/node.c
> @@ -670,6 +670,7 @@ static int register_node(struct node *node, int num)
>
> hugetlb_register_node(node);
> compaction_register_node(node);
> + reclaim_register_node(node);
> return 0;
> }
>
> @@ -685,6 +686,7 @@ void unregister_node(struct node *node)
> hugetlb_unregister_node(node); /* no-op, if memoryless node */
> node_remove_accesses(node);
> node_remove_caches(node);
> + reclaim_unregister_node(node);
> device_unregister(&node->dev);
> }
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 27093b477c5f..cca43ae6d770 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -398,6 +398,22 @@ extern unsigned long shrink_all_memory(unsigned long nr_pages);
> extern int vm_swappiness;
> long remove_mapping(struct address_space *mapping, struct folio *folio);
>
> +#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
> +extern int reclaim_register_node(struct node *node);
> +extern void reclaim_unregister_node(struct node *node);
> +
> +#else
> +
> +static inline int reclaim_register_node(struct node *node)
> +{
> + return 0;
> +}
> +
> +static inline void reclaim_unregister_node(struct node *node)
> +{
> +}
> +#endif /* CONFIG_SYSFS && CONFIG_NUMA */
> +
> extern unsigned long reclaim_pages(struct list_head *page_list);
> #ifdef CONFIG_NUMA
> extern int node_reclaim_mode;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 1735c302831c..3539f8a0f0ea 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4819,3 +4819,62 @@ void check_move_unevictable_pages(struct pagevec *pvec)
> }
> }
> EXPORT_SYMBOL_GPL(check_move_unevictable_pages);
> +
> +#if defined(CONFIG_SYSFS) && defined(CONFIG_NUMA)
> +static ssize_t reclaim_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + int err, nid = dev->id;
> + gfp_t gfp_mask = GFP_KERNEL;
> + struct pglist_data *pgdat = NODE_DATA(nid);
> + unsigned long nr_to_reclaim, nr_reclaimed = 0;
> + unsigned int nr_retries = MAX_RECLAIM_RETRIES;
> + struct scan_control sc = {
> + .gfp_mask = current_gfp_context(gfp_mask),
> + .reclaim_idx = gfp_zone(gfp_mask),
> + .priority = NODE_RECLAIM_PRIORITY,
> + .may_writepage = !laptop_mode,
> + .may_unmap = 1,
> + .may_swap = 1,
> + };
> +
> + buf = strstrip((char *)buf);
> + err = page_counter_memparse(buf, "", &nr_to_reclaim);
> + if (err)
> + return err;
> +
> + sc.nr_to_reclaim = max(nr_to_reclaim, SWAP_CLUSTER_MAX);
> +
> + while (nr_reclaimed < nr_to_reclaim) {
> + unsigned long reclaimed;
> +
> + if (test_and_set_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags))
> + return -EAGAIN;
> +
> + /* does cond_resched() */
> + reclaimed = __node_reclaim(pgdat, gfp_mask,
> + nr_to_reclaim - nr_reclaimed, &sc);
> +
> + clear_bit(PGDAT_RECLAIM_LOCKED, &pgdat->flags);
> +
> + if (!reclaimed && !nr_retries--)
> + break;
> +
> + nr_reclaimed += reclaimed;
> + }
> +
> + return nr_reclaimed < nr_to_reclaim ? -EAGAIN : count;
> +}
> +
> +static DEVICE_ATTR_WO(reclaim);
> +int reclaim_register_node(struct node *node)
> +{
> + return device_create_file(&node->dev, &dev_attr_reclaim);
> +}
> +
> +void reclaim_unregister_node(struct node *node)
> +{
> + return device_remove_file(&node->dev, &dev_attr_reclaim);
> +}
> +#endif
next prev parent reply other threads:[~2022-04-19 0:01 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-16 5:38 [PATCH RFC lsfmm 0/6] mm: proactive reclaim and memory tiering topics Davidlohr Bueso
2022-04-16 5:38 ` [PATCH 1/6] drivers/base/node: cleanup register_node() Davidlohr Bueso
2022-04-25 22:30 ` Adam Manzanares
2022-05-03 18:17 ` David Hildenbrand
2022-05-04 4:33 ` David Rientjes
2022-04-16 5:38 ` [PATCH 2/6] mm/vmscan: use node_is_toptier helper in node_reclaim Davidlohr Bueso
2022-04-25 22:32 ` Adam Manzanares
2022-05-04 4:33 ` David Rientjes
2022-05-04 7:26 ` Jagdish Gediya
2022-05-31 11:50 ` Aneesh Kumar K.V
2022-06-01 6:12 ` Ying Huang
2022-06-01 14:00 ` Davidlohr Bueso
2022-04-16 5:38 ` [PATCH 3/6] mm: make __node_reclaim() more flexible Davidlohr Bueso
2022-04-16 5:39 ` [PATCH 4/6] mm: introduce per-node proactive reclaim interface Davidlohr Bueso
2022-04-19 0:00 ` Tim Chen [this message]
2022-04-16 5:39 ` [PATCH 5/6] mm/migration: export demotion_path of a node via sysfs Davidlohr Bueso
2022-04-22 17:31 ` Yang Shi
2022-04-22 17:33 ` Yang Shi
2022-04-22 17:50 ` Davidlohr Bueso
2022-04-17 3:49 ` [PATCH 6/6] mm/migrate: export whether or not node is toptier in sysf Davidlohr Bueso
2022-04-18 15:34 ` Dave Hansen
2022-04-18 16:45 ` Davidlohr Bueso
2022-04-18 16:50 ` Dave Hansen
2022-04-18 17:01 ` Davidlohr Bueso
2022-04-22 17:37 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c7520aec8bd41550a520e411a829de892807dcb5.camel@linux.intel.com \
--to=tim.c.chen@linux.intel.com \
--cc=a.manzanares@samsung.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@linux.intel.com \
--cc=dave@stgolabs.net \
--cc=gim.jongmin@samsung.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=heekwon.p@samsung.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=rientjes@google.com \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).