linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC v2] Opportunistic memory reclaim
@ 2020-10-05  8:13 Andrea Righi
  2020-10-05  8:13 ` [PATCH RFC v2 1/2] mm: memcontrol: make shrink_all_memory() memcg aware Andrea Righi
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Andrea Righi @ 2020-10-05  8:13 UTC (permalink / raw)
  To: Michal Hocko, Vladimir Davydov
  Cc: Li Zefan, Tejun Heo, Johannes Weiner, Andrew Morton,
	Luigi Semenzato, Rafael J . Wysocki, cgroups, linux-mm,
	linux-kernel, linux-doc

## Overview

Opportunistic memory reclaim aims to introduce a new interface that
allows user-space to trigger an artificial memory pressure condition and
force the kernel to reclaim memory (dropping page cache pages, swapping
out anonymous memory, etc.).

### Motivation

Reclaiming memory in advance to prepare the system to be more responsive
when needed.

### Use cases

 - Reduce system memory footprint
 - Speed up hibernation time
 - Speed up VM migration time
 - Prioritize responsiveness of foreground applications vs background
   applications
 - Prepare the system to be more responsiveness during large allocation
   bursts

## Interface

This feature is provided by adding a new file to each memcg:
memory.swap.reclaim.

Writing a number to this file forces a memcg to reclaim memory up to
that number of bytes ("max" means as much memory as possible). Reading
from the this file returns the amount of bytes reclaimed in the last
opportunistic memory reclaim attempt.

Memory reclaim can be interrupted sending a signal to the process that
is writing to memory.swap.reclaim (i.e., to set a timeout for the whole
memory reclaim run).

## Example usage

This feature has been successfully used to improve hibernation time of
cloud computing instances.

Certain cloud providers allow to run "spot instances": low-priority
instances that run when there are spare resources available and can be
stopped at any time to prioritize other more privileged instances [2].

Hibernation can be used to stop these low-priority instances nicely,
rather than losing state when the instance is shut down. Being able to
quickly stop low-priority instances can be critical to provide a better
quality of service in the overall cloud infrastructure [1].

The main bottleneck of hibernation is represented by the I/O generated
to write all the main memory (hibernation image) to a persistent
storage.

Opportunistic memory reclaimed can be used to reduce the size of the
hibernation image in advance, for example if the system is idle for a
certain amount of time, so if an hibernation request happens, the kernel
has already saved most of the memory to the swap device (caches have
been dropped, etc.) and hibernation can complete quickly.

## Testing and results

Here is a simple test case to show the effectiveness of this feature.

Environment:
```
   - VM (kvm):
     8GB of RAM
     disk speed: 100 MB/s
     8GB swap file on ext4 (/swapfile)
```

Test case:
```
  - allocate 85% of memory
  - wait for 60s almost in idle
  - hibernate and resume the system (measuring the time)
```

Result:
  - average of 10 runs tested with `/sys/power/image_size=default` and
    `/sys/power/image_size=0`:
```
                                 5.9-vanilla   5.9-mm_reclaim
                                 -----------   --------------
  [hibernate] image_size=default      49.07s            3.40s
     [resume] image_size=default      18.35s            7.13s

  [hibernate] image_size=0            71.55s            4.72s
     [resume] image_size=0             7.49s            7.41s
```

NOTE #1: in the `5.9-mm_reclaim` case a simple user-space daemon detects
when the system is idle for a certain amount of time and triggers the
opportunistic memory reclaim.

NOTE #2: `/sys/power/image_size=0` can be used with `5.9-vanilla` to
speed up resume time (because it shrinks even more the hibernation
image) at the cost of increasing hibernation time; with `5.9-mm_reclaim`
performance are pretty much identical in both cases, because the
hibernation image is already reduced to the minimum when the hibernation
request happens.

## Conclusion

Being able to trigger memory reclaim from user-space allows to prepare
the system in advance to be more responsive when needed.

This feature has been used with positive test results to speed up
hibernation time of cloud computing instances, but it can also provide
benefits to other use cases, for example:

 - prioritize responsiveness of foreground applications vs background
   applications

 - improve system responsiveness during large allocation bursts
   (preparing system by reclaiming memory in advance, e.g. using some
   idle cycles)

 - reduce overall system memory footprint (especially in VM / cloud
   computing environments)

## See also

 - [1] https://lwn.net/Articles/821158/
 - [2] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html
 - [3] user-space tools/scripts: https://github.com/arighi/opportunistic-memory-reclaim
 - [4] previous version: https://lore.kernel.org/lkml/20200601160636.148346-1-andrea.righi@canonical.com/

## Changes in v2:
 - move ABI from hibernation to memcg (since this feature can be used
   also in other contexts and it's not hibernation-specific)
 - drop memory release functionality (to re-load swapped out pages,
   since it ended being not very useful)
 - added the possibility to show the number of memory reclaimed in the
   last attempt (per-memcg)

----------------------------------------------------------------
Andrea Righi (2):
      mm: memcontrol: make shrink_all_memory() memcg aware
      mm: memcontrol: introduce opportunistic memory reclaim

 Documentation/admin-guide/cgroup-v2.rst | 18 ++++++++++
 include/linux/memcontrol.h              |  4 +++
 include/linux/swap.h                    |  9 ++++-
 mm/memcontrol.c                         | 59 +++++++++++++++++++++++++++++++++
 mm/vmscan.c                             |  6 ++--
 5 files changed, 92 insertions(+), 4 deletions(-)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 1/2] mm: memcontrol: make shrink_all_memory() memcg aware
  2020-10-05  8:13 [PATCH RFC v2] Opportunistic memory reclaim Andrea Righi
@ 2020-10-05  8:13 ` Andrea Righi
  2020-10-05  8:13 ` [PATCH RFC v2 2/2] mm: memcontrol: introduce opportunistic memory reclaim Andrea Righi
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2020-10-05  8:13 UTC (permalink / raw)
  To: Michal Hocko, Vladimir Davydov
  Cc: Li Zefan, Tejun Heo, Johannes Weiner, Andrew Morton,
	Luigi Semenzato, Rafael J . Wysocki, cgroups, linux-mm,
	linux-kernel, linux-doc

Allow to specify a memcg when calling shrink_all_memory() to reclaim
some memory from a specific cgroup.

Moreover, make shrink_all_memory() always available and do not depend on
having CONFIG_HIBERNATION enabled.

This is required by the opportunistic memory reclaim feature.

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
---
 include/linux/swap.h | 9 ++++++++-
 mm/vmscan.c          | 6 +++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 661046994db4..1490b09a6e6c 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -368,7 +368,14 @@ extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem,
 						gfp_t gfp_mask, bool noswap,
 						pg_data_t *pgdat,
 						unsigned long *nr_scanned);
-extern unsigned long shrink_all_memory(unsigned long nr_pages);
+extern unsigned long
+__shrink_all_memory(unsigned long nr_pages, struct mem_cgroup *memcg);
+
+static inline unsigned long shrink_all_memory(unsigned long nr_pages)
+{
+	return __shrink_all_memory(nr_pages, NULL);
+}
+
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 466fc3144fff..ac04d5e16c42 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3986,7 +3986,6 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
 	wake_up_interruptible(&pgdat->kswapd_wait);
 }
 
-#ifdef CONFIG_HIBERNATION
 /*
  * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
  * freed pages.
@@ -3995,7 +3994,8 @@ void wakeup_kswapd(struct zone *zone, gfp_t gfp_flags, int order,
  * LRU order by reclaiming preferentially
  * inactive > active > active referenced > active mapped
  */
-unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
+unsigned long
+__shrink_all_memory(unsigned long nr_to_reclaim, struct mem_cgroup *memcg)
 {
 	struct scan_control sc = {
 		.nr_to_reclaim = nr_to_reclaim,
@@ -4006,6 +4006,7 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 		.may_unmap = 1,
 		.may_swap = 1,
 		.hibernation_mode = 1,
+		.target_mem_cgroup = memcg,
 	};
 	struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
 	unsigned long nr_reclaimed;
@@ -4023,7 +4024,6 @@ unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 
 	return nr_reclaimed;
 }
-#endif /* CONFIG_HIBERNATION */
 
 /*
  * This kswapd start function will be called by init and node-hot-add.
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH RFC v2 2/2] mm: memcontrol: introduce opportunistic memory reclaim
  2020-10-05  8:13 [PATCH RFC v2] Opportunistic memory reclaim Andrea Righi
  2020-10-05  8:13 ` [PATCH RFC v2 1/2] mm: memcontrol: make shrink_all_memory() memcg aware Andrea Righi
@ 2020-10-05  8:13 ` Andrea Righi
  2020-10-05  8:35 ` [PATCH RFC v2] Opportunistic " Michal Hocko
  2020-10-05 11:25 ` Chris Down
  3 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2020-10-05  8:13 UTC (permalink / raw)
  To: Michal Hocko, Vladimir Davydov
  Cc: Li Zefan, Tejun Heo, Johannes Weiner, Andrew Morton,
	Luigi Semenzato, Rafael J . Wysocki, cgroups, linux-mm,
	linux-kernel, linux-doc

Opportunistic memory reclaim allows user-space to trigger an artificial
memory pressure condition and force the system to reclaim memory (drop
caches, swap out anonymous memory, etc.).

This feature is provided by adding a new file to each memcg:
memory.swap.reclaim.

Writing a number to this file forces a memcg to reclaim memory up to
that number of bytes ("max" means as much memory as possible). Reading
from the this file returns the amount of bytes reclaimed in the last
opportunistic memory reclaim attempt.

Memory reclaim can be interrupted sending a signal to the process that
is writing to memory.swap.reclaim (i.e., to set a timeout for the whole
memory reclaim run).

Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 18 ++++++++
 include/linux/memcontrol.h              |  4 ++
 mm/memcontrol.c                         | 59 +++++++++++++++++++++++++
 3 files changed, 81 insertions(+)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index baa07b30845e..2850a5cb4b1e 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1409,6 +1409,24 @@ PAGE_SIZE multiple when read back.
 	Swap usage hard limit.  If a cgroup's swap usage reaches this
 	limit, anonymous memory of the cgroup will not be swapped out.
 
+  memory.swap.reclaim
+        A read-write single value file that can be used to trigger
+        opportunistic memory reclaim.
+
+        The string written to this file represents the amount of memory to be
+        reclaimed (special value "max" means "as much memory as possible").
+
+        When opportunistic memory reclaim is started the system will be put
+        into an artificial memory pressure condition and memory will be
+        reclaimed by dropping clean page cache pages, swapping out anonymous
+        pages, etc.
+
+        NOTE: it is possible to interrupt the memory reclaim sending a signal
+        to the writer of this file.
+
+        Reading from memory.swap.reclaim returns the amount of bytes reclaimed
+        in the last attempt.
+
   memory.swap.events
 	A read-only flat-keyed file which exists on non-root cgroups.
 	The following entries are defined.  Unless specified
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d0b036123c6a..0c90d989bdc1 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -306,6 +306,10 @@ struct mem_cgroup {
 	bool			tcpmem_active;
 	int			tcpmem_pressure;
 
+#ifdef CONFIG_MEMCG_SWAP
+	unsigned long		nr_swap_reclaimed;
+#endif
+
 #ifdef CONFIG_MEMCG_KMEM
         /* Index in the kmem_cache->memcg_params.memcg_caches array */
 	int kmemcg_id;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6877c765b8d0..b98e9bbd61b0 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -7346,6 +7346,60 @@ static int swap_events_show(struct seq_file *m, void *v)
 	return 0;
 }
 
+/*
+ * Try to reclaim some memory in the system, stop when one of the following
+ * conditions occurs:
+ *  - at least "nr_pages" have been reclaimed
+ *  - no more pages can be reclaimed
+ *  - current task explicitly interrupted by a signal (e.g., user space
+ *    timeout)
+ *
+ *  @nr_pages - amount of pages to be reclaimed (0 means "as many pages as
+ *  possible").
+ */
+static unsigned long
+do_mm_reclaim(struct mem_cgroup *memcg, unsigned long nr_pages)
+{
+	unsigned long nr_reclaimed = 0;
+
+	while (nr_pages > 0) {
+		unsigned long reclaimed;
+
+		if (signal_pending(current))
+			break;
+		reclaimed = __shrink_all_memory(nr_pages, memcg);
+		if (!reclaimed)
+			break;
+		nr_reclaimed += reclaimed;
+		nr_pages -= min_t(unsigned long, reclaimed, nr_pages);
+	}
+	return nr_reclaimed;
+}
+
+static ssize_t swap_reclaim_write(struct kernfs_open_file *of,
+				  char *buf, size_t nbytes, loff_t off)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(of_css(of));
+	unsigned long nr_to_reclaim;
+	int err;
+
+	buf = strstrip(buf);
+	err = page_counter_memparse(buf, "max", &nr_to_reclaim);
+	if (err)
+		return err;
+	memcg->nr_swap_reclaimed = do_mm_reclaim(memcg, nr_to_reclaim);
+
+	return nbytes;
+}
+
+static u64 swap_reclaim_read(struct cgroup_subsys_state *css,
+			     struct cftype *cft)
+{
+	struct mem_cgroup *memcg = mem_cgroup_from_css(css);
+
+	return memcg->nr_swap_reclaimed << PAGE_SHIFT;
+}
+
 static struct cftype swap_files[] = {
 	{
 		.name = "swap.current",
@@ -7370,6 +7424,11 @@ static struct cftype swap_files[] = {
 		.file_offset = offsetof(struct mem_cgroup, swap_events_file),
 		.seq_show = swap_events_show,
 	},
+	{
+		.name = "swap.reclaim",
+		.write = swap_reclaim_write,
+		.read_u64 = swap_reclaim_read,
+	},
 	{ }	/* terminate */
 };
 
-- 
2.27.0



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC v2] Opportunistic memory reclaim
  2020-10-05  8:13 [PATCH RFC v2] Opportunistic memory reclaim Andrea Righi
  2020-10-05  8:13 ` [PATCH RFC v2 1/2] mm: memcontrol: make shrink_all_memory() memcg aware Andrea Righi
  2020-10-05  8:13 ` [PATCH RFC v2 2/2] mm: memcontrol: introduce opportunistic memory reclaim Andrea Righi
@ 2020-10-05  8:35 ` Michal Hocko
  2020-10-05  8:44   ` Andrea Righi
  2020-10-05 11:25 ` Chris Down
  3 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2020-10-05  8:35 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Vladimir Davydov, Li Zefan, Tejun Heo, Johannes Weiner,
	Andrew Morton, Luigi Semenzato, Rafael J . Wysocki, cgroups,
	linux-mm, linux-kernel, linux-doc

A similar thing has been proposed recently by Shakeel
http://lkml.kernel.org/r/20200909215752.1725525-1-shakeelb@google.com
Please have a look at the follow up discussion.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC v2] Opportunistic memory reclaim
  2020-10-05  8:35 ` [PATCH RFC v2] Opportunistic " Michal Hocko
@ 2020-10-05  8:44   ` Andrea Righi
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2020-10-05  8:44 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Vladimir Davydov, Li Zefan, Tejun Heo, Johannes Weiner,
	Andrew Morton, Luigi Semenzato, Rafael J . Wysocki, cgroups,
	linux-mm, linux-kernel, linux-doc

On Mon, Oct 05, 2020 at 10:35:16AM +0200, Michal Hocko wrote:
> A similar thing has been proposed recently by Shakeel
> http://lkml.kernel.org/r/20200909215752.1725525-1-shakeelb@google.com
> Please have a look at the follow up discussion.

Thanks for pointing this out, I wasn't aware of that patch and yes, it's
definitely similar. I'll follow up on that.

-Andrea


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC v2] Opportunistic memory reclaim
  2020-10-05  8:13 [PATCH RFC v2] Opportunistic memory reclaim Andrea Righi
                   ` (2 preceding siblings ...)
  2020-10-05  8:35 ` [PATCH RFC v2] Opportunistic " Michal Hocko
@ 2020-10-05 11:25 ` Chris Down
  2020-10-05 13:51   ` Andrea Righi
  3 siblings, 1 reply; 9+ messages in thread
From: Chris Down @ 2020-10-05 11:25 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Michal Hocko, Vladimir Davydov, Li Zefan, Tejun Heo,
	Johannes Weiner, Andrew Morton, Luigi Semenzato,
	Rafael J . Wysocki, cgroups, linux-mm, linux-kernel, linux-doc

Andrea Righi writes:
>This feature has been successfully used to improve hibernation time of
>cloud computing instances.
>
>Certain cloud providers allow to run "spot instances": low-priority
>instances that run when there are spare resources available and can be
>stopped at any time to prioritize other more privileged instances [2].
>
>Hibernation can be used to stop these low-priority instances nicely,
>rather than losing state when the instance is shut down. Being able to
>quickly stop low-priority instances can be critical to provide a better
>quality of service in the overall cloud infrastructure [1].
>
>The main bottleneck of hibernation is represented by the I/O generated
>to write all the main memory (hibernation image) to a persistent
>storage.
>
>Opportunistic memory reclaimed can be used to reduce the size of the
>hibernation image in advance, for example if the system is idle for a
>certain amount of time, so if an hibernation request happens, the kernel
>has already saved most of the memory to the swap device (caches have
>been dropped, etc.) and hibernation can complete quickly.

Hmm, why does this need to be implemented in kernelspace? We already have 
userspace shrinkers using memory pressure information as part of PID control 
already (eg. senpai). Using memory.high and pressure information looks a lot 
easier to reason about than having to choose an absolute number ahead of time 
and hoping it works.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC v2] Opportunistic memory reclaim
  2020-10-05 11:25 ` Chris Down
@ 2020-10-05 13:51   ` Andrea Righi
  2020-10-05 14:46     ` Chris Down
  0 siblings, 1 reply; 9+ messages in thread
From: Andrea Righi @ 2020-10-05 13:51 UTC (permalink / raw)
  To: Chris Down
  Cc: Michal Hocko, Vladimir Davydov, Li Zefan, Tejun Heo,
	Johannes Weiner, Andrew Morton, Luigi Semenzato,
	Rafael J . Wysocki, cgroups, linux-mm, linux-kernel, linux-doc

On Mon, Oct 05, 2020 at 12:25:55PM +0100, Chris Down wrote:
> Andrea Righi writes:
> > This feature has been successfully used to improve hibernation time of
> > cloud computing instances.
> > 
> > Certain cloud providers allow to run "spot instances": low-priority
> > instances that run when there are spare resources available and can be
> > stopped at any time to prioritize other more privileged instances [2].
> > 
> > Hibernation can be used to stop these low-priority instances nicely,
> > rather than losing state when the instance is shut down. Being able to
> > quickly stop low-priority instances can be critical to provide a better
> > quality of service in the overall cloud infrastructure [1].
> > 
> > The main bottleneck of hibernation is represented by the I/O generated
> > to write all the main memory (hibernation image) to a persistent
> > storage.
> > 
> > Opportunistic memory reclaimed can be used to reduce the size of the
> > hibernation image in advance, for example if the system is idle for a
> > certain amount of time, so if an hibernation request happens, the kernel
> > has already saved most of the memory to the swap device (caches have
> > been dropped, etc.) and hibernation can complete quickly.
> 
> Hmm, why does this need to be implemented in kernelspace? We already have
> userspace shrinkers using memory pressure information as part of PID control
> already (eg. senpai). Using memory.high and pressure information looks a lot
> easier to reason about than having to choose an absolute number ahead of
> time and hoping it works.

senpai is focused at estimating the ideal memory requirements without
affecting performance. And this covers the use case about reducing
memory footprint.

In my specific use-case (hibernation) I would let the system use as much
memory as possible if it's doing any activity (reclaiming memory only
when the kernel decides that it needs to reclaim memory) and apply a
more aggressive memory reclaiming policy when the system is mostly idle.

I could probably implement this behavior adjusting memory.high
dynamically, like senpai, but I'm worried about potential sudden large
allocations that may require to respond faster at increasing
memory.high. I think the user-space triggered memory reclaim approach is
a safer solution from this perspective.

-Andrea


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC v2] Opportunistic memory reclaim
  2020-10-05 13:51   ` Andrea Righi
@ 2020-10-05 14:46     ` Chris Down
  2020-10-05 15:39       ` Andrea Righi
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Down @ 2020-10-05 14:46 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Michal Hocko, Vladimir Davydov, Li Zefan, Tejun Heo,
	Johannes Weiner, Andrew Morton, Luigi Semenzato,
	Rafael J . Wysocki, cgroups, linux-mm, linux-kernel, linux-doc

Andrea Righi writes:
>senpai is focused at estimating the ideal memory requirements without
>affecting performance. And this covers the use case about reducing
>memory footprint.
>
>In my specific use-case (hibernation) I would let the system use as much
>memory as possible if it's doing any activity (reclaiming memory only
>when the kernel decides that it needs to reclaim memory) and apply a
>more aggressive memory reclaiming policy when the system is mostly idle.

 From this description, I don't see any reason why it needs to be implemented in 
kernel space. All of that information is available to userspace, and all of the 
knobs are there.

As it is I'm afraid of the "only when the system is mostly idle" comment, 
because it's usually after such periods that applications need to do large 
retrievals, and now they're going to be in slowpath (eg. periodic jobs).

Such tradeoffs for a specific situation might be fine in userspace as a 
distribution maintainer, but codifying them in the kernel seems premature to 
me, especially for a knob which we will have to maintain forever onwards.

>I could probably implement this behavior adjusting memory.high
>dynamically, like senpai, but I'm worried about potential sudden large
>allocations that may require to respond faster at increasing
>memory.high. I think the user-space triggered memory reclaim approach is
>a safer solution from this perspective.

Have you seen Shakeel's recent "per-memcg reclaim interface" patches? I suspect 
they may help you there.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH RFC v2] Opportunistic memory reclaim
  2020-10-05 14:46     ` Chris Down
@ 2020-10-05 15:39       ` Andrea Righi
  0 siblings, 0 replies; 9+ messages in thread
From: Andrea Righi @ 2020-10-05 15:39 UTC (permalink / raw)
  To: Chris Down
  Cc: Michal Hocko, Vladimir Davydov, Li Zefan, Tejun Heo,
	Johannes Weiner, Andrew Morton, Luigi Semenzato,
	Rafael J . Wysocki, cgroups, linux-mm, linux-kernel, linux-doc

On Mon, Oct 05, 2020 at 03:46:12PM +0100, Chris Down wrote:
> Andrea Righi writes:
> > senpai is focused at estimating the ideal memory requirements without
> > affecting performance. And this covers the use case about reducing
> > memory footprint.
> > 
> > In my specific use-case (hibernation) I would let the system use as much
> > memory as possible if it's doing any activity (reclaiming memory only
> > when the kernel decides that it needs to reclaim memory) and apply a
> > more aggressive memory reclaiming policy when the system is mostly idle.
> 
> From this description, I don't see any reason why it needs to be implemented
> in kernel space. All of that information is available to userspace, and all
> of the knobs are there.
> 
> As it is I'm afraid of the "only when the system is mostly idle" comment,
> because it's usually after such periods that applications need to do large
> retrievals, and now they're going to be in slowpath (eg. periodic jobs).

True, but with memory.high there's the risk to trash some applications
badly if I'm not reacting fast at increasing memory.high.

However, something that I could definitely want to try is to move all
the memory hogs to a cgroup, set memory.high to a very small value and
then immediately set it back to 'max'. The effect should be pretty much
the same as calling shrink_all_memory(), that is what I'm doing with my
memory.swap.reclaim.

> 
> Such tradeoffs for a specific situation might be fine in userspace as a
> distribution maintainer, but codifying them in the kernel seems premature to
> me, especially for a knob which we will have to maintain forever onwards.
> 
> > I could probably implement this behavior adjusting memory.high
> > dynamically, like senpai, but I'm worried about potential sudden large
> > allocations that may require to respond faster at increasing
> > memory.high. I think the user-space triggered memory reclaim approach is
> > a safer solution from this perspective.
> 
> Have you seen Shakeel's recent "per-memcg reclaim interface" patches? I
> suspect they may help you there.

Yes, Michal pointed out to me his work, it's basically the same approach
that I'm using.

I started this work with a patch that was hibernation specific
(https://lore.kernel.org/lkml/20200601160636.148346-1-andrea.righi@canonical.com/);
this v2 was the natural evolution of my previous work and I didn't
notice that something similar has been posted in the meantime.

Anyway, I already contacted Shakeel, so we won't duplicate the efforts
in the future. :)

Thanks for your feedback!
-Andrea


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-10-05 15:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-05  8:13 [PATCH RFC v2] Opportunistic memory reclaim Andrea Righi
2020-10-05  8:13 ` [PATCH RFC v2 1/2] mm: memcontrol: make shrink_all_memory() memcg aware Andrea Righi
2020-10-05  8:13 ` [PATCH RFC v2 2/2] mm: memcontrol: introduce opportunistic memory reclaim Andrea Righi
2020-10-05  8:35 ` [PATCH RFC v2] Opportunistic " Michal Hocko
2020-10-05  8:44   ` Andrea Righi
2020-10-05 11:25 ` Chris Down
2020-10-05 13:51   ` Andrea Righi
2020-10-05 14:46     ` Chris Down
2020-10-05 15:39       ` Andrea Righi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).