All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: vmpressure: make vmpressure_window a tunable.
@ 2016-02-03 10:06 ` Martijn Coenen
  0 siblings, 0 replies; 8+ messages in thread
From: Martijn Coenen @ 2016-02-03 10:06 UTC (permalink / raw)
  To: linux-mm
  Cc: Anton Vorontsov, Andrew Morton, Michal Hocko, Johannes Weiner,
	Vladimir Davydov, linux-kernel, linux-doc, Jonathan Corbet

The window size used for calculating vm pressure
events was previously fixed at 512 pages. The
window size has a big impact on the rate of notifications
sent off to userspace, in particular when using the
"low" level. On machines with a lot of memory, the
current value may be excessive.

On the other hand, making the window size depend on
machine size does not allow userspace to change the
notification rate based on the current state of the
system. For example, when a lot of memory is still
available, userspace may want to increase the window
since it's not interested in receiving notifications
for every 2MB scanned.

This patch makes vmpressure_window a sysctl tunable.

Signed-off-by: Martijn Coenen <maco@google.com>
---
  Documentation/sysctl/vm.txt | 15 +++++++++++++++
  include/linux/vmpressure.h  |  1 +
  kernel/sysctl.c             | 11 +++++++++++
  mm/vmpressure.c             |  5 ++---
  4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 89a887c..0fa4846 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -60,6 +60,7 @@ Currently, these files are in /proc/sys/vm:
  - swappiness
  - user_reserve_kbytes
  - vfs_cache_pressure
+- vmpressure_window
  - zone_reclaim_mode

  ==============================================================
@@ -805,6 +806,20 @@ ten times more freeable objects than there are.

  ==============================================================

+vmpressure_window
+
+The vmpressure algorithm calculates vm pressure by looking
+at the number of pages reclaimed vs the number of pages scanned.
+The vmpressure_window tunable specifies the minimum amount
+of pages that needs to be scanned before sending any vmpressure
+event. Setting a small window size can cause a lot of false
+positives; setting a large window size may delay notifications
+for too long.
+
+The default value is 512 pages.
+
+==============================================================
+
  zone_reclaim_mode:

  Zone_reclaim_mode allows someone to set more or less aggressive approaches  
to
diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 3347cc3..b5341d0 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -29,6 +29,7 @@ struct vmpressure {
  struct mem_cgroup;

  #ifdef CONFIG_MEMCG
+extern unsigned long vmpressure_win;
  extern void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
  		       unsigned long scanned, unsigned long reclaimed);
  extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 97715fd..64938ad 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -51,6 +51,7 @@
  #include <linux/dnotify.h>
  #include <linux/syscalls.h>
  #include <linux/vmstat.h>
+#include <linux/vmpressure.h>
  #include <linux/nfs_fs.h>
  #include <linux/acpi.h>
  #include <linux/reboot.h>
@@ -1590,6 +1591,16 @@ static struct ctl_table vm_table[] = {
  		.extra2		= (void *)&mmap_rnd_compat_bits_max,
  	},
  #endif
+#ifdef CONFIG_MEMCG
+	{
+		.procname	= "vmpressure_window",
+		.data		= &vmpressure_win,
+		.maxlen		= sizeof(vmpressure_win),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_minmax,
+		.extra1		= &one,
+	},
+#endif
  	{ }
  };

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 9a6c070..bda6af9 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -35,10 +35,9 @@
   * As the vmscan reclaimer logic works with chunks which are multiple of
   * SWAP_CLUSTER_MAX, it makes sense to use it for the window size as well.
   *
- * TODO: Make the window size depend on machine size, as we do for vmstat
- * thresholds. Currently we set it to 512 pages (2MB for 4KB pages).
+ * The window size is a tunable sysctl.
   */
-static const unsigned long vmpressure_win = SWAP_CLUSTER_MAX * 16;
+unsigned long __read_mostly vmpressure_win = SWAP_CLUSTER_MAX * 16;

  /*
   * These thresholds are used when we account memory pressure through
-- 
2.7.0.rc3.207.g0ac5344

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH] mm: vmpressure: make vmpressure_window a tunable.
@ 2016-02-03 10:06 ` Martijn Coenen
  0 siblings, 0 replies; 8+ messages in thread
From: Martijn Coenen @ 2016-02-03 10:06 UTC (permalink / raw)
  To: linux-mm
  Cc: Anton Vorontsov, Andrew Morton, Michal Hocko, Johannes Weiner,
	Vladimir Davydov, linux-kernel, linux-doc, Jonathan Corbet

The window size used for calculating vm pressure
events was previously fixed at 512 pages. The
window size has a big impact on the rate of notifications
sent off to userspace, in particular when using the
"low" level. On machines with a lot of memory, the
current value may be excessive.

On the other hand, making the window size depend on
machine size does not allow userspace to change the
notification rate based on the current state of the
system. For example, when a lot of memory is still
available, userspace may want to increase the window
since it's not interested in receiving notifications
for every 2MB scanned.

This patch makes vmpressure_window a sysctl tunable.

Signed-off-by: Martijn Coenen <maco@google.com>
---
  Documentation/sysctl/vm.txt | 15 +++++++++++++++
  include/linux/vmpressure.h  |  1 +
  kernel/sysctl.c             | 11 +++++++++++
  mm/vmpressure.c             |  5 ++---
  4 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 89a887c..0fa4846 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -60,6 +60,7 @@ Currently, these files are in /proc/sys/vm:
  - swappiness
  - user_reserve_kbytes
  - vfs_cache_pressure
+- vmpressure_window
  - zone_reclaim_mode

  ==============================================================
@@ -805,6 +806,20 @@ ten times more freeable objects than there are.

  ==============================================================

+vmpressure_window
+
+The vmpressure algorithm calculates vm pressure by looking
+at the number of pages reclaimed vs the number of pages scanned.
+The vmpressure_window tunable specifies the minimum amount
+of pages that needs to be scanned before sending any vmpressure
+event. Setting a small window size can cause a lot of false
+positives; setting a large window size may delay notifications
+for too long.
+
+The default value is 512 pages.
+
+==============================================================
+
  zone_reclaim_mode:

  Zone_reclaim_mode allows someone to set more or less aggressive approaches  
to
diff --git a/include/linux/vmpressure.h b/include/linux/vmpressure.h
index 3347cc3..b5341d0 100644
--- a/include/linux/vmpressure.h
+++ b/include/linux/vmpressure.h
@@ -29,6 +29,7 @@ struct vmpressure {
  struct mem_cgroup;

  #ifdef CONFIG_MEMCG
+extern unsigned long vmpressure_win;
  extern void vmpressure(gfp_t gfp, struct mem_cgroup *memcg, bool tree,
  		       unsigned long scanned, unsigned long reclaimed);
  extern void vmpressure_prio(gfp_t gfp, struct mem_cgroup *memcg, int prio);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 97715fd..64938ad 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -51,6 +51,7 @@
  #include <linux/dnotify.h>
  #include <linux/syscalls.h>
  #include <linux/vmstat.h>
+#include <linux/vmpressure.h>
  #include <linux/nfs_fs.h>
  #include <linux/acpi.h>
  #include <linux/reboot.h>
@@ -1590,6 +1591,16 @@ static struct ctl_table vm_table[] = {
  		.extra2		= (void *)&mmap_rnd_compat_bits_max,
  	},
  #endif
+#ifdef CONFIG_MEMCG
+	{
+		.procname	= "vmpressure_window",
+		.data		= &vmpressure_win,
+		.maxlen		= sizeof(vmpressure_win),
+		.mode		= 0644,
+		.proc_handler	= proc_doulongvec_minmax,
+		.extra1		= &one,
+	},
+#endif
  	{ }
  };

diff --git a/mm/vmpressure.c b/mm/vmpressure.c
index 9a6c070..bda6af9 100644
--- a/mm/vmpressure.c
+++ b/mm/vmpressure.c
@@ -35,10 +35,9 @@
   * As the vmscan reclaimer logic works with chunks which are multiple of
   * SWAP_CLUSTER_MAX, it makes sense to use it for the window size as well.
   *
- * TODO: Make the window size depend on machine size, as we do for vmstat
- * thresholds. Currently we set it to 512 pages (2MB for 4KB pages).
+ * The window size is a tunable sysctl.
   */
-static const unsigned long vmpressure_win = SWAP_CLUSTER_MAX * 16;
+unsigned long __read_mostly vmpressure_win = SWAP_CLUSTER_MAX * 16;

  /*
   * These thresholds are used when we account memory pressure through
-- 
2.7.0.rc3.207.g0ac5344

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: vmpressure: make vmpressure_window a tunable.
  2016-02-03 10:06 ` Martijn Coenen
@ 2016-02-03 16:19   ` Johannes Weiner
  -1 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2016-02-03 16:19 UTC (permalink / raw)
  To: Martijn Coenen
  Cc: linux-mm, Anton Vorontsov, Andrew Morton, Michal Hocko,
	Vladimir Davydov, linux-kernel, linux-doc, Jonathan Corbet

On Wed, Feb 03, 2016 at 11:06:20AM +0100, Martijn Coenen wrote:
> The window size used for calculating vm pressure
> events was previously fixed at 512 pages. The
> window size has a big impact on the rate of notifications
> sent off to userspace, in particular when using the
> "low" level. On machines with a lot of memory, the
> current value may be excessive.
> 
> On the other hand, making the window size depend on
> machine size does not allow userspace to change the
> notification rate based on the current state of the
> system. For example, when a lot of memory is still
> available, userspace may want to increase the window
> since it's not interested in receiving notifications
> for every 2MB scanned.
>
> This patch makes vmpressure_window a sysctl tunable.

If the machine is just cleaning up use-once cache, frequent events
make no sense. And if the machine is struggling, the notifications
better be in time.

That's hardly a tunable. It's a factor that needs constant dynamic
adjustment depending on VM state. The same state this mechanism is
supposed to report. If we can't get this right, how will userspace?

A better approach here would be to 1) find a minimum window size that
makes us confident that there are no false positives - this is likely
to be based on machine size, maybe the low watermark? - and 2) limit
reporting of lower levels, so you're not flooded with ALLGOOD! events.

VMPRESSURE_CRITICAL: report every vmpressure_win
VMPRESSURE_MEDIUM: report every vmpressure_win*2
VMPRESSURE_LOW: report every vmpressure_win*4

Pick your favorite scaling factor here.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: vmpressure: make vmpressure_window a tunable.
@ 2016-02-03 16:19   ` Johannes Weiner
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2016-02-03 16:19 UTC (permalink / raw)
  To: Martijn Coenen
  Cc: linux-mm, Anton Vorontsov, Andrew Morton, Michal Hocko,
	Vladimir Davydov, linux-kernel, linux-doc, Jonathan Corbet

On Wed, Feb 03, 2016 at 11:06:20AM +0100, Martijn Coenen wrote:
> The window size used for calculating vm pressure
> events was previously fixed at 512 pages. The
> window size has a big impact on the rate of notifications
> sent off to userspace, in particular when using the
> "low" level. On machines with a lot of memory, the
> current value may be excessive.
> 
> On the other hand, making the window size depend on
> machine size does not allow userspace to change the
> notification rate based on the current state of the
> system. For example, when a lot of memory is still
> available, userspace may want to increase the window
> since it's not interested in receiving notifications
> for every 2MB scanned.
>
> This patch makes vmpressure_window a sysctl tunable.

If the machine is just cleaning up use-once cache, frequent events
make no sense. And if the machine is struggling, the notifications
better be in time.

That's hardly a tunable. It's a factor that needs constant dynamic
adjustment depending on VM state. The same state this mechanism is
supposed to report. If we can't get this right, how will userspace?

A better approach here would be to 1) find a minimum window size that
makes us confident that there are no false positives - this is likely
to be based on machine size, maybe the low watermark? - and 2) limit
reporting of lower levels, so you're not flooded with ALLGOOD! events.

VMPRESSURE_CRITICAL: report every vmpressure_win
VMPRESSURE_MEDIUM: report every vmpressure_win*2
VMPRESSURE_LOW: report every vmpressure_win*4

Pick your favorite scaling factor here.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: vmpressure: make vmpressure_window a tunable.
  2016-02-03 16:19   ` Johannes Weiner
@ 2016-02-04 11:18     ` Martijn Coenen
  -1 siblings, 0 replies; 8+ messages in thread
From: Martijn Coenen @ 2016-02-04 11:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Anton Vorontsov, Andrew Morton, Michal Hocko,
	Vladimir Davydov, linux-kernel, linux-doc, Jonathan Corbet

On Wed, Feb 3, 2016 at 5:19 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> If the machine is just cleaning up use-once cache, frequent events
> make no sense. And if the machine is struggling, the notifications
> better be in time.
>
> That's hardly a tunable. It's a factor that needs constant dynamic
> adjustment depending on VM state. The same state this mechanism is
> supposed to report. If we can't get this right, how will userspace?

I tend to agree for the "machine is struggling" case; notifications
had better be in time so userspace can take the right action. But one
prime use for the "low" notification level is maintaining cache
levels, and in that scenario I can imagine the rate at which you want
to receive notifications can be very application-dependent.

For a bit more context, we'd like to use these events for implementing
a user-space low memory killer in Android (and get rid of the one in
staging). What we've found so far is that the "medium" level doesn't
trigger as often as we'd like: by the time we get it the page cache
may have been drained to such low levels that the device will have to
fetch pretty much everything from flash on the next app launch. I
think that's just the way the medium level was defined. The "low"
level on the other hand fires events almost constantly, and we spend a
lot of time waking up, looking at memory state, and then doing
nothing. My first idea was to make the window size dependent on
machine size; but my worry is that this will be somewhat specific to
our use of these pressure events. Maybe on Android devices it's okay
to generate events for every say 1% of main memory being scanned for
reclaim, but how do we know this is a decent value for other uses?

My other concern with changing the window size directly is that there
may be existing users of the API which would suddenly get different
behavior.

One other way to maintain the cache levels may be to not actually look
at vm pressure events, but to just look at the state of the system for
every X bytes allocated.

>
>
> A better approach here would be to 1) find a minimum window size that
> makes us confident that there are no false positives - this is likely
> to be based on machine size, maybe the low watermark? - and 2) limit
> reporting of lower levels, so you're not flooded with ALLGOOD! events.
>
> VMPRESSURE_CRITICAL: report every vmpressure_win
> VMPRESSURE_MEDIUM: report every vmpressure_win*2
> VMPRESSURE_LOW: report every vmpressure_win*4
>
> Pick your favorite scaling factor here.

I like this idea; I'm happy to come up with a window size and scaling
factors that we think works well, and get your feedback on that. My
only concern again would be that what works well for us may not work
well for others.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: vmpressure: make vmpressure_window a tunable.
@ 2016-02-04 11:18     ` Martijn Coenen
  0 siblings, 0 replies; 8+ messages in thread
From: Martijn Coenen @ 2016-02-04 11:18 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Anton Vorontsov, Andrew Morton, Michal Hocko,
	Vladimir Davydov, linux-kernel, linux-doc, Jonathan Corbet

On Wed, Feb 3, 2016 at 5:19 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> If the machine is just cleaning up use-once cache, frequent events
> make no sense. And if the machine is struggling, the notifications
> better be in time.
>
> That's hardly a tunable. It's a factor that needs constant dynamic
> adjustment depending on VM state. The same state this mechanism is
> supposed to report. If we can't get this right, how will userspace?

I tend to agree for the "machine is struggling" case; notifications
had better be in time so userspace can take the right action. But one
prime use for the "low" notification level is maintaining cache
levels, and in that scenario I can imagine the rate at which you want
to receive notifications can be very application-dependent.

For a bit more context, we'd like to use these events for implementing
a user-space low memory killer in Android (and get rid of the one in
staging). What we've found so far is that the "medium" level doesn't
trigger as often as we'd like: by the time we get it the page cache
may have been drained to such low levels that the device will have to
fetch pretty much everything from flash on the next app launch. I
think that's just the way the medium level was defined. The "low"
level on the other hand fires events almost constantly, and we spend a
lot of time waking up, looking at memory state, and then doing
nothing. My first idea was to make the window size dependent on
machine size; but my worry is that this will be somewhat specific to
our use of these pressure events. Maybe on Android devices it's okay
to generate events for every say 1% of main memory being scanned for
reclaim, but how do we know this is a decent value for other uses?

My other concern with changing the window size directly is that there
may be existing users of the API which would suddenly get different
behavior.

One other way to maintain the cache levels may be to not actually look
at vm pressure events, but to just look at the state of the system for
every X bytes allocated.

>
>
> A better approach here would be to 1) find a minimum window size that
> makes us confident that there are no false positives - this is likely
> to be based on machine size, maybe the low watermark? - and 2) limit
> reporting of lower levels, so you're not flooded with ALLGOOD! events.
>
> VMPRESSURE_CRITICAL: report every vmpressure_win
> VMPRESSURE_MEDIUM: report every vmpressure_win*2
> VMPRESSURE_LOW: report every vmpressure_win*4
>
> Pick your favorite scaling factor here.

I like this idea; I'm happy to come up with a window size and scaling
factors that we think works well, and get your feedback on that. My
only concern again would be that what works well for us may not work
well for others.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: vmpressure: make vmpressure_window a tunable.
  2016-02-04 11:18     ` Martijn Coenen
@ 2016-02-04 20:25       ` Johannes Weiner
  -1 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2016-02-04 20:25 UTC (permalink / raw)
  To: Martijn Coenen
  Cc: linux-mm, Anton Vorontsov, Andrew Morton, Michal Hocko,
	Vladimir Davydov, linux-kernel, linux-doc, Jonathan Corbet

On Thu, Feb 04, 2016 at 12:18:34PM +0100, Martijn Coenen wrote:
> I like this idea; I'm happy to come up with a window size and scaling
> factors that we think works well, and get your feedback on that. My
> only concern again would be that what works well for us may not work
> well for others.

Thanks for doing this. There is a good chance that this will work just
fine for others as well, so I think it's preferable to speculatively
change the implementation than adding ABI for potentially no reason.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] mm: vmpressure: make vmpressure_window a tunable.
@ 2016-02-04 20:25       ` Johannes Weiner
  0 siblings, 0 replies; 8+ messages in thread
From: Johannes Weiner @ 2016-02-04 20:25 UTC (permalink / raw)
  To: Martijn Coenen
  Cc: linux-mm, Anton Vorontsov, Andrew Morton, Michal Hocko,
	Vladimir Davydov, linux-kernel, linux-doc, Jonathan Corbet

On Thu, Feb 04, 2016 at 12:18:34PM +0100, Martijn Coenen wrote:
> I like this idea; I'm happy to come up with a window size and scaling
> factors that we think works well, and get your feedback on that. My
> only concern again would be that what works well for us may not work
> well for others.

Thanks for doing this. There is a good chance that this will work just
fine for others as well, so I think it's preferable to speculatively
change the implementation than adding ABI for potentially no reason.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-02-04 20:27 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 10:06 [PATCH] mm: vmpressure: make vmpressure_window a tunable Martijn Coenen
2016-02-03 10:06 ` Martijn Coenen
2016-02-03 16:19 ` Johannes Weiner
2016-02-03 16:19   ` Johannes Weiner
2016-02-04 11:18   ` Martijn Coenen
2016-02-04 11:18     ` Martijn Coenen
2016-02-04 20:25     ` Johannes Weiner
2016-02-04 20:25       ` Johannes Weiner

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.