linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] staging, android: remove lowmemory killer from the tree
@ 2017-02-22 12:01 Michal Hocko
  2017-02-23 20:24 ` John Stultz
  2017-03-09  9:15 ` Michal Hocko
  0 siblings, 2 replies; 21+ messages in thread
From: Michal Hocko @ 2017-02-22 12:01 UTC (permalink / raw)
  To: Greg KH
  Cc: Arve Hjønnevåg, Riley Andrews, devel, LKML, linux-mm,
	Michal Hocko

From: Michal Hocko <mhocko@suse.com>

Lowmemory killer is sitting in the staging tree since 2008 without any
serious interest for fixing issues brought up by the MM folks. The main
objection is that the implementation is basically broken by design:
	- it hooks into slab shrinker API which is not suitable for this
	  purpose. lowmem_count implementation just shows this nicely.
	  There is no scaling based on the memory pressure and no
	  feedback to the generic shrinker infrastructure.
	  Moreover lowmem_scan is called way too often for the heavy
	  work it performs.
	- it is not reclaim context aware - no NUMA and/or memcg
	  awareness.

As the code stands right now it just adds a maintenance overhead when
core MM changes have to update lowmemorykiller.c as well. It also seems
that the alternative LMK implementation will be solely in the userspace
so this code has no perspective it seems. The staging tree is supposed
to be for a code which needs to be put in shape before it can be merged
which is not the case here obviously.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 drivers/staging/android/Kconfig           |  10 --
 drivers/staging/android/Makefile          |   1 -
 drivers/staging/android/lowmemorykiller.c | 212 ------------------------------
 include/linux/sched.h                     |   4 -
 4 files changed, 227 deletions(-)
 delete mode 100644 drivers/staging/android/lowmemorykiller.c

diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
index 6c00d6f765c6..71a50b99caff 100644
--- a/drivers/staging/android/Kconfig
+++ b/drivers/staging/android/Kconfig
@@ -14,16 +14,6 @@ config ASHMEM
 	  It is, in theory, a good memory allocator for low-memory devices,
 	  because it can discard shared memory units when under memory pressure.
 
-config ANDROID_LOW_MEMORY_KILLER
-	bool "Android Low Memory Killer"
-	---help---
-	  Registers processes to be killed when low memory conditions, this is useful
-	  as there is no particular swap space on android.
-
-	  The registered process will kill according to the priorities in android init
-	  scripts (/init.rc), and it defines priority values with minimum free memory size
-	  for each priority.
-
 source "drivers/staging/android/ion/Kconfig"
 
 endif # if ANDROID
diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
index 7ed1be798909..7cf1564a49a5 100644
--- a/drivers/staging/android/Makefile
+++ b/drivers/staging/android/Makefile
@@ -3,4 +3,3 @@ ccflags-y += -I$(src)			# needed for trace events
 obj-y					+= ion/
 
 obj-$(CONFIG_ASHMEM)			+= ashmem.o
-obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)	+= lowmemorykiller.o
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
deleted file mode 100644
index ec3b66561412..000000000000
--- a/drivers/staging/android/lowmemorykiller.c
+++ /dev/null
@@ -1,212 +0,0 @@
-/* drivers/misc/lowmemorykiller.c
- *
- * The lowmemorykiller driver lets user-space specify a set of memory thresholds
- * where processes with a range of oom_score_adj values will get killed. Specify
- * the minimum oom_score_adj values in
- * /sys/module/lowmemorykiller/parameters/adj and the number of free pages in
- * /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma
- * separated list of numbers in ascending order.
- *
- * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
- * "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill
- * processes with a oom_score_adj value of 8 or higher when the free memory
- * drops below 4096 pages and kill processes with a oom_score_adj value of 0 or
- * higher when the free memory drops below 1024 pages.
- *
- * The driver considers memory used for caches to be free, but if a large
- * percentage of the cached memory is locked this can be very inaccurate
- * and processes may not get killed until the normal oom killer is triggered.
- *
- * Copyright (C) 2007-2008 Google, Inc.
- *
- * This software is licensed under the terms of the GNU General Public
- * License version 2, as published by the Free Software Foundation, and
- * may be copied, distributed, and modified under those terms.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- */
-
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/init.h>
-#include <linux/moduleparam.h>
-#include <linux/kernel.h>
-#include <linux/mm.h>
-#include <linux/oom.h>
-#include <linux/sched.h>
-#include <linux/swap.h>
-#include <linux/rcupdate.h>
-#include <linux/profile.h>
-#include <linux/notifier.h>
-
-static u32 lowmem_debug_level = 1;
-static short lowmem_adj[6] = {
-	0,
-	1,
-	6,
-	12,
-};
-
-static int lowmem_adj_size = 4;
-static int lowmem_minfree[6] = {
-	3 * 512,	/* 6MB */
-	2 * 1024,	/* 8MB */
-	4 * 1024,	/* 16MB */
-	16 * 1024,	/* 64MB */
-};
-
-static int lowmem_minfree_size = 4;
-
-static unsigned long lowmem_deathpending_timeout;
-
-#define lowmem_print(level, x...)			\
-	do {						\
-		if (lowmem_debug_level >= (level))	\
-			pr_info(x);			\
-	} while (0)
-
-static unsigned long lowmem_count(struct shrinker *s,
-				  struct shrink_control *sc)
-{
-	return global_node_page_state(NR_ACTIVE_ANON) +
-		global_node_page_state(NR_ACTIVE_FILE) +
-		global_node_page_state(NR_INACTIVE_ANON) +
-		global_node_page_state(NR_INACTIVE_FILE);
-}
-
-static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
-{
-	struct task_struct *tsk;
-	struct task_struct *selected = NULL;
-	unsigned long rem = 0;
-	int tasksize;
-	int i;
-	short min_score_adj = OOM_SCORE_ADJ_MAX + 1;
-	int minfree = 0;
-	int selected_tasksize = 0;
-	short selected_oom_score_adj;
-	int array_size = ARRAY_SIZE(lowmem_adj);
-	int other_free = global_page_state(NR_FREE_PAGES) - totalreserve_pages;
-	int other_file = global_node_page_state(NR_FILE_PAGES) -
-				global_node_page_state(NR_SHMEM) -
-				total_swapcache_pages();
-
-	if (lowmem_adj_size < array_size)
-		array_size = lowmem_adj_size;
-	if (lowmem_minfree_size < array_size)
-		array_size = lowmem_minfree_size;
-	for (i = 0; i < array_size; i++) {
-		minfree = lowmem_minfree[i];
-		if (other_free < minfree && other_file < minfree) {
-			min_score_adj = lowmem_adj[i];
-			break;
-		}
-	}
-
-	lowmem_print(3, "lowmem_scan %lu, %x, ofree %d %d, ma %hd\n",
-		     sc->nr_to_scan, sc->gfp_mask, other_free,
-		     other_file, min_score_adj);
-
-	if (min_score_adj == OOM_SCORE_ADJ_MAX + 1) {
-		lowmem_print(5, "lowmem_scan %lu, %x, return 0\n",
-			     sc->nr_to_scan, sc->gfp_mask);
-		return 0;
-	}
-
-	selected_oom_score_adj = min_score_adj;
-
-	rcu_read_lock();
-	for_each_process(tsk) {
-		struct task_struct *p;
-		short oom_score_adj;
-
-		if (tsk->flags & PF_KTHREAD)
-			continue;
-
-		p = find_lock_task_mm(tsk);
-		if (!p)
-			continue;
-
-		if (task_lmk_waiting(p) &&
-		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
-			task_unlock(p);
-			rcu_read_unlock();
-			return 0;
-		}
-		oom_score_adj = p->signal->oom_score_adj;
-		if (oom_score_adj < min_score_adj) {
-			task_unlock(p);
-			continue;
-		}
-		tasksize = get_mm_rss(p->mm);
-		task_unlock(p);
-		if (tasksize <= 0)
-			continue;
-		if (selected) {
-			if (oom_score_adj < selected_oom_score_adj)
-				continue;
-			if (oom_score_adj == selected_oom_score_adj &&
-			    tasksize <= selected_tasksize)
-				continue;
-		}
-		selected = p;
-		selected_tasksize = tasksize;
-		selected_oom_score_adj = oom_score_adj;
-		lowmem_print(2, "select '%s' (%d), adj %hd, size %d, to kill\n",
-			     p->comm, p->pid, oom_score_adj, tasksize);
-	}
-	if (selected) {
-		task_lock(selected);
-		send_sig(SIGKILL, selected, 0);
-		if (selected->mm)
-			task_set_lmk_waiting(selected);
-		task_unlock(selected);
-		lowmem_print(1, "Killing '%s' (%d), adj %hd,\n"
-				 "   to free %ldkB on behalf of '%s' (%d) because\n"
-				 "   cache %ldkB is below limit %ldkB for oom_score_adj %hd\n"
-				 "   Free memory is %ldkB above reserved\n",
-			     selected->comm, selected->pid,
-			     selected_oom_score_adj,
-			     selected_tasksize * (long)(PAGE_SIZE / 1024),
-			     current->comm, current->pid,
-			     other_file * (long)(PAGE_SIZE / 1024),
-			     minfree * (long)(PAGE_SIZE / 1024),
-			     min_score_adj,
-			     other_free * (long)(PAGE_SIZE / 1024));
-		lowmem_deathpending_timeout = jiffies + HZ;
-		rem += selected_tasksize;
-	}
-
-	lowmem_print(4, "lowmem_scan %lu, %x, return %lu\n",
-		     sc->nr_to_scan, sc->gfp_mask, rem);
-	rcu_read_unlock();
-	return rem;
-}
-
-static struct shrinker lowmem_shrinker = {
-	.scan_objects = lowmem_scan,
-	.count_objects = lowmem_count,
-	.seeks = DEFAULT_SEEKS * 16
-};
-
-static int __init lowmem_init(void)
-{
-	register_shrinker(&lowmem_shrinker);
-	return 0;
-}
-device_initcall(lowmem_init);
-
-/*
- * not really modular, but the easiest way to keep compat with existing
- * bootargs behaviour is to continue using module_param here.
- */
-module_param_named(cost, lowmem_shrinker.seeks, int, 0644);
-module_param_array_named(adj, lowmem_adj, short, &lowmem_adj_size, 0644);
-module_param_array_named(minfree, lowmem_minfree, uint, &lowmem_minfree_size,
-			 0644);
-module_param_named(debug_level, lowmem_debug_level, uint, 0644);
-
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e93594b88130..3cc6c650fa6a 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2347,7 +2347,6 @@ static inline void memalloc_noio_restore(unsigned int flags)
 #define PFA_NO_NEW_PRIVS 0	/* May not gain new privileges. */
 #define PFA_SPREAD_PAGE  1      /* Spread page cache over cpuset */
 #define PFA_SPREAD_SLAB  2      /* Spread some slab caches over cpuset */
-#define PFA_LMK_WAITING  3      /* Lowmemorykiller is waiting */
 
 
 #define TASK_PFA_TEST(name, func)					\
@@ -2371,9 +2370,6 @@ TASK_PFA_TEST(SPREAD_SLAB, spread_slab)
 TASK_PFA_SET(SPREAD_SLAB, spread_slab)
 TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab)
 
-TASK_PFA_TEST(LMK_WAITING, lmk_waiting)
-TASK_PFA_SET(LMK_WAITING, lmk_waiting)
-
 /*
  * task->jobctl flags
  */
-- 
2.11.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-22 12:01 [PATCH] staging, android: remove lowmemory killer from the tree Michal Hocko
@ 2017-02-23 20:24 ` John Stultz
  2017-02-23 20:28   ` Todd Kjos
                     ` (2 more replies)
  2017-03-09  9:15 ` Michal Hocko
  1 sibling, 3 replies; 21+ messages in thread
From: John Stultz @ 2017-02-23 20:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Greg KH, Arve Hjønnevåg, Riley Andrews, devel, LKML,
	Linux-MM, Michal Hocko, Todd Kjos, Android Kernel Team,
	Martijn Coenen, Rom Lemarchand

On Wed, Feb 22, 2017 at 4:01 AM, Michal Hocko <mhocko@kernel.org> wrote:
> From: Michal Hocko <mhocko@suse.com>
>
> Lowmemory killer is sitting in the staging tree since 2008 without any
> serious interest for fixing issues brought up by the MM folks. The main
> objection is that the implementation is basically broken by design:
>         - it hooks into slab shrinker API which is not suitable for this
>           purpose. lowmem_count implementation just shows this nicely.
>           There is no scaling based on the memory pressure and no
>           feedback to the generic shrinker infrastructure.
>           Moreover lowmem_scan is called way too often for the heavy
>           work it performs.
>         - it is not reclaim context aware - no NUMA and/or memcg
>           awareness.
>
> As the code stands right now it just adds a maintenance overhead when
> core MM changes have to update lowmemorykiller.c as well. It also seems
> that the alternative LMK implementation will be solely in the userspace
> so this code has no perspective it seems. The staging tree is supposed
> to be for a code which needs to be put in shape before it can be merged
> which is not the case here obviously.

So, just for context, Android does have a userland LMK daemon (using
the mempressure notifiers) as you mentioned, but unfortunately I'm
unaware of any devices that ship with that implementation.

This is reportedly because while the mempressure notifiers provide a
the signal to userspace, the work the deamon then has to do to look up
per process memory usage, in order to figure out who is best to kill
at that point was too costly and resulted in poor device performance.

So for shipping Android devices, the LMK is still needed. However, its
not critical for basic android development, as the system will
function without it. Additionally I believe most vendors heavily
customize the LMK in their vendor tree, so the value of having it in
staging might be relatively low.

It would be great however to get a discussion going here on what the
ulmkd needs from the kernel in order to efficiently determine who best
to kill, and how we might best implement that.

thanks
-john

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-23 20:24 ` John Stultz
@ 2017-02-23 20:28   ` Todd Kjos
  2017-02-23 20:36   ` Martijn Coenen
  2017-02-24  9:38   ` Michal Hocko
  2 siblings, 0 replies; 21+ messages in thread
From: Todd Kjos @ 2017-02-23 20:28 UTC (permalink / raw)
  To: John Stultz
  Cc: Michal Hocko, Greg KH, Arve Hjønnevåg, Riley Andrews,
	devel, LKML, Linux-MM, Michal Hocko, Android Kernel Team,
	Martijn Coenen, Rom Lemarchand, Tim Murray

[-- Attachment #1: Type: text/plain, Size: 2362 bytes --]

+timmurray

On Thu, Feb 23, 2017 at 12:24 PM, John Stultz <john.stultz@linaro.org>
wrote:

> On Wed, Feb 22, 2017 at 4:01 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Lowmemory killer is sitting in the staging tree since 2008 without any
> > serious interest for fixing issues brought up by the MM folks. The main
> > objection is that the implementation is basically broken by design:
> >         - it hooks into slab shrinker API which is not suitable for this
> >           purpose. lowmem_count implementation just shows this nicely.
> >           There is no scaling based on the memory pressure and no
> >           feedback to the generic shrinker infrastructure.
> >           Moreover lowmem_scan is called way too often for the heavy
> >           work it performs.
> >         - it is not reclaim context aware - no NUMA and/or memcg
> >           awareness.
> >
> > As the code stands right now it just adds a maintenance overhead when
> > core MM changes have to update lowmemorykiller.c as well. It also seems
> > that the alternative LMK implementation will be solely in the userspace
> > so this code has no perspective it seems. The staging tree is supposed
> > to be for a code which needs to be put in shape before it can be merged
> > which is not the case here obviously.
>
> So, just for context, Android does have a userland LMK daemon (using
> the mempressure notifiers) as you mentioned, but unfortunately I'm
> unaware of any devices that ship with that implementation.
>
> This is reportedly because while the mempressure notifiers provide a
> the signal to userspace, the work the deamon then has to do to look up
> per process memory usage, in order to figure out who is best to kill
> at that point was too costly and resulted in poor device performance.
>
> So for shipping Android devices, the LMK is still needed. However, its
> not critical for basic android development, as the system will
> function without it. Additionally I believe most vendors heavily
> customize the LMK in their vendor tree, so the value of having it in
> staging might be relatively low.
>
> It would be great however to get a discussion going here on what the
> ulmkd needs from the kernel in order to efficiently determine who best
> to kill, and how we might best implement that.
>
> thanks
> -john
>

[-- Attachment #2: Type: text/html, Size: 3046 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-23 20:24 ` John Stultz
  2017-02-23 20:28   ` Todd Kjos
@ 2017-02-23 20:36   ` Martijn Coenen
  2017-02-24  9:34     ` Michal Hocko
  2017-02-24 12:19     ` peter enderborg
  2017-02-24  9:38   ` Michal Hocko
  2 siblings, 2 replies; 21+ messages in thread
From: Martijn Coenen @ 2017-02-23 20:36 UTC (permalink / raw)
  To: John Stultz
  Cc: Michal Hocko, Greg KH, Arve Hjønnevåg, Riley Andrews,
	devel, LKML, Linux-MM, Michal Hocko, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On Thu, Feb 23, 2017 at 9:24 PM, John Stultz <john.stultz@linaro.org> wrote:
>
> So, just for context, Android does have a userland LMK daemon (using
> the mempressure notifiers) as you mentioned, but unfortunately I'm
> unaware of any devices that ship with that implementation.

I've previously worked on enabling userspace lmkd for a previous
release, but ran into some issues there (see below).

> This is reportedly because while the mempressure notifiers provide a
> the signal to userspace, the work the deamon then has to do to look up
> per process memory usage, in order to figure out who is best to kill
> at that point was too costly and resulted in poor device performance.

In particular, mempressure requires memory cgroups to function, and we
saw performance regressions due to the accounting done in mem cgroups.
At the time we didn't have enough time left to solve this before the
release, and we reverted back to kernel lmkd.

>
> So for shipping Android devices, the LMK is still needed. However, its
> not critical for basic android development, as the system will
> function without it.

It will function, but it most likely will perform horribly (as the
page cache will be trashed to such a level that the system will be
unusable).

>Additionally I believe most vendors heavily
> customize the LMK in their vendor tree, so the value of having it in
> staging might be relatively low.
>
> It would be great however to get a discussion going here on what the
> ulmkd needs from the kernel in order to efficiently determine who best
> to kill, and how we might best implement that.

The two main issues I think we need to address are:
1) Getting the right granularity of events from the kernel; I once
tried to submit a patch upstream to address this:
https://lkml.org/lkml/2016/2/24/582
2) Find out where exactly the memory cgroup overhead is coming from,
and how to reduce it or work around it to acceptable levels for
Android. This was also on 3.10, and maybe this has long been fixed or
improved in more recent kernel versions.

I don't have cycles to work on this now, but I'm happy to talk to
whoever picks this up on the Android side.

Thanks,
Martijn

>
> thanks
> -john

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-23 20:36   ` Martijn Coenen
@ 2017-02-24  9:34     ` Michal Hocko
  2017-02-24 18:38       ` Tim Murray
  2017-02-24 12:19     ` peter enderborg
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2017-02-24  9:34 UTC (permalink / raw)
  To: Martijn Coenen
  Cc: John Stultz, Greg KH, Arve Hjønnevåg, Riley Andrews,
	devel, LKML, Linux-MM, Todd Kjos, Android Kernel Team,
	Rom Lemarchand, Tim Murray

On Thu 23-02-17 21:36:00, Martijn Coenen wrote:
> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz <john.stultz@linaro.org> wrote:
[...]
> > This is reportedly because while the mempressure notifiers provide a
> > the signal to userspace, the work the deamon then has to do to look up
> > per process memory usage, in order to figure out who is best to kill
> > at that point was too costly and resulted in poor device performance.
> 
> In particular, mempressure requires memory cgroups to function, and we
> saw performance regressions due to the accounting done in mem cgroups.
> At the time we didn't have enough time left to solve this before the
> release, and we reverted back to kernel lmkd.

I would be more than interested to hear details. We used to have some
visible charge path performance footprint but this should be gone now.

[...]
> > It would be great however to get a discussion going here on what the
> > ulmkd needs from the kernel in order to efficiently determine who best
> > to kill, and how we might best implement that.
> 
> The two main issues I think we need to address are:
> 1) Getting the right granularity of events from the kernel; I once
> tried to submit a patch upstream to address this:
> https://lkml.org/lkml/2016/2/24/582

Not only that, the implementation of tht vmpressure needs some serious
rethinking as well. The current one can hit critical events
unexpectedly. The calculation also doesn't consider slab reclaim
sensibly.

> 2) Find out where exactly the memory cgroup overhead is coming from,
> and how to reduce it or work around it to acceptable levels for
> Android. This was also on 3.10, and maybe this has long been fixed or
> improved in more recent kernel versions.

3e32cb2e0a12 ("mm: memcontrol: lockless page counters") has improved
situation a lot as all the charging is lockless since then (3.19).
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-23 20:24 ` John Stultz
  2017-02-23 20:28   ` Todd Kjos
  2017-02-23 20:36   ` Martijn Coenen
@ 2017-02-24  9:38   ` Michal Hocko
  2 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2017-02-24  9:38 UTC (permalink / raw)
  To: John Stultz
  Cc: Greg KH, Arve Hjønnevåg, Riley Andrews, devel, LKML,
	Linux-MM, Todd Kjos, Android Kernel Team, Martijn Coenen,
	Rom Lemarchand

On Thu 23-02-17 12:24:57, John Stultz wrote:
> On Wed, Feb 22, 2017 at 4:01 AM, Michal Hocko <mhocko@kernel.org> wrote:
> > From: Michal Hocko <mhocko@suse.com>
> >
> > Lowmemory killer is sitting in the staging tree since 2008 without any
> > serious interest for fixing issues brought up by the MM folks. The main
> > objection is that the implementation is basically broken by design:
> >         - it hooks into slab shrinker API which is not suitable for this
> >           purpose. lowmem_count implementation just shows this nicely.
> >           There is no scaling based on the memory pressure and no
> >           feedback to the generic shrinker infrastructure.
> >           Moreover lowmem_scan is called way too often for the heavy
> >           work it performs.
> >         - it is not reclaim context aware - no NUMA and/or memcg
> >           awareness.
> >
> > As the code stands right now it just adds a maintenance overhead when
> > core MM changes have to update lowmemorykiller.c as well. It also seems
> > that the alternative LMK implementation will be solely in the userspace
> > so this code has no perspective it seems. The staging tree is supposed
> > to be for a code which needs to be put in shape before it can be merged
> > which is not the case here obviously.
> 
> So, just for context, Android does have a userland LMK daemon (using
> the mempressure notifiers) as you mentioned, but unfortunately I'm
> unaware of any devices that ship with that implementation.
> 
> This is reportedly because while the mempressure notifiers provide a
> the signal to userspace, the work the deamon then has to do to look up
> per process memory usage, in order to figure out who is best to kill
> at that point was too costly and resulted in poor device performance.

What was the expensive part?

> So for shipping Android devices, the LMK is still needed. However, its
> not critical for basic android development, as the system will
> function without it. Additionally I believe most vendors heavily
> customize the LMK in their vendor tree, so the value of having it in
> staging might be relatively low.

This is even a stronger reason to drop it from the tree. We do not want
to maintain the code which is not used in fact.
 
> It would be great however to get a discussion going here on what the
> ulmkd needs from the kernel in order to efficiently determine who best
> to kill, and how we might best implement that.

I would really like to see this happen and, to be honest, it should have
happened quite some time ago (around the time when the lmk was merged to
the staging tree).
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-23 20:36   ` Martijn Coenen
  2017-02-24  9:34     ` Michal Hocko
@ 2017-02-24 12:19     ` peter enderborg
  2017-02-24 12:28       ` Michal Hocko
  1 sibling, 1 reply; 21+ messages in thread
From: peter enderborg @ 2017-02-24 12:19 UTC (permalink / raw)
  To: Martijn Coenen, John Stultz
  Cc: Michal Hocko, Greg KH, Arve Hjønnevåg, Riley Andrews,
	devel, LKML, Linux-MM, Michal Hocko, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On 02/23/2017 09:36 PM, Martijn Coenen wrote:
> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz <john.stultz@linaro.org> wrote:
>> So, just for context, Android does have a userland LMK daemon (using
>> the mempressure notifiers) as you mentioned, but unfortunately I'm
>> unaware of any devices that ship with that implementation.
> I've previously worked on enabling userspace lmkd for a previous
> release, but ran into some issues there (see below).
>
>> This is reportedly because while the mempressure notifiers provide a
>> the signal to userspace, the work the deamon then has to do to look up
>> per process memory usage, in order to figure out who is best to kill
>> at that point was too costly and resulted in poor device performance.
> In particular, mempressure requires memory cgroups to function, and we
> saw performance regressions due to the accounting done in mem cgroups.
> At the time we didn't have enough time left to solve this before the
> release, and we reverted back to kernel lmkd.
>
>> So for shipping Android devices, the LMK is still needed. However, its
>> not critical for basic android development, as the system will
>> function without it.
> It will function, but it most likely will perform horribly (as the
> page cache will be trashed to such a level that the system will be
> unusable).
>
>> Additionally I believe most vendors heavily
>> customize the LMK in their vendor tree, so the value of having it in
>> staging might be relatively low.
>>
>> It would be great however to get a discussion going here on what the
>> ulmkd needs from the kernel in order to efficiently determine who best
>> to kill, and how we might best implement that.
> The two main issues I think we need to address are:
> 1) Getting the right granularity of events from the kernel; I once
> tried to submit a patch upstream to address this:
> https://lkml.org/lkml/2016/2/24/582
> 2) Find out where exactly the memory cgroup overhead is coming from,
> and how to reduce it or work around it to acceptable levels for
> Android. This was also on 3.10, and maybe this has long been fixed or
> improved in more recent kernel versions.
>
> I don't have cycles to work on this now, but I'm happy to talk to
> whoever picks this up on the Android side.
I sent some patches that is different approach. It still uses shrinkers
but it has a kernel part that do the kill part better than the old one
but it does it the android way. The future for this is get it triggered
with other path's than slab shrinker. But we will not continue unless
we get google-android to be part of it. Hocko objected heavy on
the patches but seems not to see that we need something to
do the job before we can disconnect from shrinker.

> Thanks,
> Martijn
>
>> thanks
>> -john


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 12:19     ` peter enderborg
@ 2017-02-24 12:28       ` Michal Hocko
  2017-02-24 13:16         ` peter enderborg
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2017-02-24 12:28 UTC (permalink / raw)
  To: peter enderborg
  Cc: Martijn Coenen, John Stultz, Greg KH, Arve Hjønnevåg,
	Riley Andrews, devel, LKML, Linux-MM, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On Fri 24-02-17 13:19:46, peter enderborg wrote:
> On 02/23/2017 09:36 PM, Martijn Coenen wrote:
> > On Thu, Feb 23, 2017 at 9:24 PM, John Stultz <john.stultz@linaro.org> wrote:
> >> So, just for context, Android does have a userland LMK daemon (using
> >> the mempressure notifiers) as you mentioned, but unfortunately I'm
> >> unaware of any devices that ship with that implementation.
> > I've previously worked on enabling userspace lmkd for a previous
> > release, but ran into some issues there (see below).
> >
> >> This is reportedly because while the mempressure notifiers provide a
> >> the signal to userspace, the work the deamon then has to do to look up
> >> per process memory usage, in order to figure out who is best to kill
> >> at that point was too costly and resulted in poor device performance.
> > In particular, mempressure requires memory cgroups to function, and we
> > saw performance regressions due to the accounting done in mem cgroups.
> > At the time we didn't have enough time left to solve this before the
> > release, and we reverted back to kernel lmkd.
> >
> >> So for shipping Android devices, the LMK is still needed. However, its
> >> not critical for basic android development, as the system will
> >> function without it.
> > It will function, but it most likely will perform horribly (as the
> > page cache will be trashed to such a level that the system will be
> > unusable).
> >
> >> Additionally I believe most vendors heavily
> >> customize the LMK in their vendor tree, so the value of having it in
> >> staging might be relatively low.
> >>
> >> It would be great however to get a discussion going here on what the
> >> ulmkd needs from the kernel in order to efficiently determine who best
> >> to kill, and how we might best implement that.
> > The two main issues I think we need to address are:
> > 1) Getting the right granularity of events from the kernel; I once
> > tried to submit a patch upstream to address this:
> > https://lkml.org/lkml/2016/2/24/582
> > 2) Find out where exactly the memory cgroup overhead is coming from,
> > and how to reduce it or work around it to acceptable levels for
> > Android. This was also on 3.10, and maybe this has long been fixed or
> > improved in more recent kernel versions.
> >
> > I don't have cycles to work on this now, but I'm happy to talk to
> > whoever picks this up on the Android side.
> I sent some patches that is different approach. It still uses shrinkers
> but it has a kernel part that do the kill part better than the old one
> but it does it the android way. The future for this is get it triggered
> with other path's than slab shrinker. But we will not continue unless
> we get google-android to be part of it. Hocko objected heavy on
> the patches but seems not to see that we need something to
> do the job before we can disconnect from shrinker.

Yeah, I strongly believe that the chosen approach is completely wrong.
Both in abusing the shrinker interface and abusing oom_score_adj as the
only criterion for the oom victim selection.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 12:28       ` Michal Hocko
@ 2017-02-24 13:16         ` peter enderborg
  2017-02-24 14:11           ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: peter enderborg @ 2017-02-24 13:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Martijn Coenen, John Stultz, Greg KH, Arve Hjønnevåg,
	Riley Andrews, devel, LKML, Linux-MM, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On 02/24/2017 01:28 PM, Michal Hocko wrote:
> On Fri 24-02-17 13:19:46, peter enderborg wrote:
>> On 02/23/2017 09:36 PM, Martijn Coenen wrote:
>>> On Thu, Feb 23, 2017 at 9:24 PM, John Stultz <john.stultz@linaro.org> wrote:
>>>> So, just for context, Android does have a userland LMK daemon (using
>>>> the mempressure notifiers) as you mentioned, but unfortunately I'm
>>>> unaware of any devices that ship with that implementation.
>>> I've previously worked on enabling userspace lmkd for a previous
>>> release, but ran into some issues there (see below).
>>>
>>>> This is reportedly because while the mempressure notifiers provide a
>>>> the signal to userspace, the work the deamon then has to do to look up
>>>> per process memory usage, in order to figure out who is best to kill
>>>> at that point was too costly and resulted in poor device performance.
>>> In particular, mempressure requires memory cgroups to function, and we
>>> saw performance regressions due to the accounting done in mem cgroups.
>>> At the time we didn't have enough time left to solve this before the
>>> release, and we reverted back to kernel lmkd.
>>>
>>>> So for shipping Android devices, the LMK is still needed. However, its
>>>> not critical for basic android development, as the system will
>>>> function without it.
>>> It will function, but it most likely will perform horribly (as the
>>> page cache will be trashed to such a level that the system will be
>>> unusable).
>>>
>>>> Additionally I believe most vendors heavily
>>>> customize the LMK in their vendor tree, so the value of having it in
>>>> staging might be relatively low.
>>>>
>>>> It would be great however to get a discussion going here on what the
>>>> ulmkd needs from the kernel in order to efficiently determine who best
>>>> to kill, and how we might best implement that.
>>> The two main issues I think we need to address are:
>>> 1) Getting the right granularity of events from the kernel; I once
>>> tried to submit a patch upstream to address this:
>>> https://lkml.org/lkml/2016/2/24/582
>>> 2) Find out where exactly the memory cgroup overhead is coming from,
>>> and how to reduce it or work around it to acceptable levels for
>>> Android. This was also on 3.10, and maybe this has long been fixed or
>>> improved in more recent kernel versions.
>>>
>>> I don't have cycles to work on this now, but I'm happy to talk to
>>> whoever picks this up on the Android side.
>> I sent some patches that is different approach. It still uses shrinkers
>> but it has a kernel part that do the kill part better than the old one
>> but it does it the android way. The future for this is get it triggered
>> with other path's than slab shrinker. But we will not continue unless
>> we get google-android to be part of it. Hocko objected heavy on
>> the patches but seems not to see that we need something to
>> do the job before we can disconnect from shrinker.
> Yeah, I strongly believe that the chosen approach is completely wrong.
> Both in abusing the shrinker interface and abusing oom_score_adj as the
> only criterion for the oom victim selection.

No one is arguing that shrinker is not problematic. And would be great if it is removed from lmk.
The oom_score_adj is the way user-space tells the kernel what the user-space has as prio. And android
is using that very much. It's a core part. I have never seen it be used on other
linux system so what is the intended usage of oom_score_adj? Is this really abusing?

I think I can help out with removing  shrinker from lmk. Not using oom_score_adj is harder and
has a bigger impact on android, except the trivial solution by adding replacement
oom_user_prio and use that within android and kernel.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 13:16         ` peter enderborg
@ 2017-02-24 14:11           ` Michal Hocko
  2017-02-24 14:42             ` peter enderborg
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2017-02-24 14:11 UTC (permalink / raw)
  To: peter enderborg
  Cc: Martijn Coenen, John Stultz, Greg KH, Arve Hjønnevåg,
	Riley Andrews, devel, LKML, Linux-MM, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On Fri 24-02-17 14:16:34, peter enderborg wrote:
> On 02/24/2017 01:28 PM, Michal Hocko wrote:
[...]
> > Yeah, I strongly believe that the chosen approach is completely wrong.
> > Both in abusing the shrinker interface and abusing oom_score_adj as the
> > only criterion for the oom victim selection.
> 
> No one is arguing that shrinker is not problematic. And would be great
> if it is removed from lmk.  The oom_score_adj is the way user-space
> tells the kernel what the user-space has as prio. And android is using
> that very much. It's a core part.

Is there any documentation which describes how this is done?

> I have never seen it be used on
> other linux system so what is the intended usage of oom_score_adj? Is
> this really abusing?

oom_score_adj is used to _adjust_ the calculated oom score. It is not a
criterion on its own, well, except for the extreme sides of the range
which are defined to enforce resp. disallow selecting the task. The
global oom killer calculates the oom score as a function of the memory
consumption. Your patch simply ignores the memory consumption (and uses
pids to sort tasks with the same oom score which is just mind boggling)
and that is what I call the abuse. The oom score calculation might
change in future, of course, but all consumers of the oom_score_adj
really have to agree on the base which is adjusted by this tunable
otherwise you can see a lot of unexpected behavior.

I would even argue that nobody outside of mm/oom_kill.c should really
have any business with this tunable.  You can of course tweak the value
from the userspace and help to chose a better oom victim this way but
that is it.

Anyway, I guess we are getting quite off-topic here.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 14:11           ` Michal Hocko
@ 2017-02-24 14:42             ` peter enderborg
  2017-02-24 15:03               ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: peter enderborg @ 2017-02-24 14:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Martijn Coenen, John Stultz, Greg KH, Arve Hjønnevåg,
	Riley Andrews, devel, LKML, Linux-MM, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On 02/24/2017 03:11 PM, Michal Hocko wrote:
> On Fri 24-02-17 14:16:34, peter enderborg wrote:
>> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> [...]
>>> Yeah, I strongly believe that the chosen approach is completely wrong.
>>> Both in abusing the shrinker interface and abusing oom_score_adj as the
>>> only criterion for the oom victim selection.
>> No one is arguing that shrinker is not problematic. And would be great
>> if it is removed from lmk.  The oom_score_adj is the way user-space
>> tells the kernel what the user-space has as prio. And android is using
>> that very much. It's a core part.
> Is there any documentation which describes how this is done?
>
>> I have never seen it be used on
>> other linux system so what is the intended usage of oom_score_adj? Is
>> this really abusing?
> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> criterion on its own, well, except for the extreme sides of the range
> which are defined to enforce resp. disallow selecting the task. The
> global oom killer calculates the oom score as a function of the memory
> consumption. Your patch simply ignores the memory consumption (and uses
> pids to sort tasks with the same oom score which is just mind boggling)
How much it uses is of very little importance for android. The score
used are only for apps and their services. System related are not touched by
android lmk. The pid is only to have a unique key to be able to have it fast within a rbtree.
One idea was to use task_pid to get a strict age of process to get a round robin
but since it does not matter i skipped that idea since it does not matter.
> and that is what I call the abuse. The oom score calculation might
> change in future, of course, but all consumers of the oom_score_adj
> really have to agree on the base which is adjusted by this tunable
> otherwise you can see a lot of unexpected behavior.
Then can we just define a range that is strictly for user-space?
> I would even argue that nobody outside of mm/oom_kill.c should really
> have any business with this tunable.  You can of course tweak the value
> from the userspace and help to chose a better oom victim this way but
> that is it.
Why only help? If userspace can give an exact order to kernel that
must be a good thing; other wise kernel have to guess and when
can that be better? 
> Anyway, I guess we are getting quite off-topic here.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 14:42             ` peter enderborg
@ 2017-02-24 15:03               ` Michal Hocko
  2017-02-24 15:40                 ` peter enderborg
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2017-02-24 15:03 UTC (permalink / raw)
  To: peter enderborg
  Cc: Martijn Coenen, John Stultz, Greg KH, Arve Hjønnevåg,
	Riley Andrews, devel, LKML, Linux-MM, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On Fri 24-02-17 15:42:49, peter enderborg wrote:
> On 02/24/2017 03:11 PM, Michal Hocko wrote:
> > On Fri 24-02-17 14:16:34, peter enderborg wrote:
> >> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> > [...]
> >>> Yeah, I strongly believe that the chosen approach is completely wrong.
> >>> Both in abusing the shrinker interface and abusing oom_score_adj as the
> >>> only criterion for the oom victim selection.
> >> No one is arguing that shrinker is not problematic. And would be great
> >> if it is removed from lmk.  The oom_score_adj is the way user-space
> >> tells the kernel what the user-space has as prio. And android is using
> >> that very much. It's a core part.
> > Is there any documentation which describes how this is done?
> >
> >> I have never seen it be used on
> >> other linux system so what is the intended usage of oom_score_adj? Is
> >> this really abusing?
> > oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> > criterion on its own, well, except for the extreme sides of the range
> > which are defined to enforce resp. disallow selecting the task. The
> > global oom killer calculates the oom score as a function of the memory
> > consumption. Your patch simply ignores the memory consumption (and uses
> > pids to sort tasks with the same oom score which is just mind boggling)
>
> How much it uses is of very little importance for android.

But it is relevant for the global oom killer which is the main consumer of
the oom_score_adj.

> The score
> used are only for apps and their services. System related are not
> touched by android lmk. The pid is only to have a unique key to be
> able to have it fast within a rbtree.  One idea was to use task_pid to
> get a strict age of process to get a round robin but since it does not
> matter i skipped that idea since it does not matter.

Pid will not tell you anything about the age. Pids do wrap around.

> > and that is what I call the abuse. The oom score calculation might
> > change in future, of course, but all consumers of the oom_score_adj
> > really have to agree on the base which is adjusted by this tunable
> > otherwise you can see a lot of unexpected behavior.
>
> Then can we just define a range that is strictly for user-space?

This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
is usable.

> > I would even argue that nobody outside of mm/oom_kill.c should really
> > have any business with this tunable.  You can of course tweak the value
> > from the userspace and help to chose a better oom victim this way but
> > that is it.
>
> Why only help? If userspace can give an exact order to kernel that
> must be a good thing; other wise kernel have to guess and when
> can that be better? 

Because userspace doesn't know who is the best victim in 99% cases.
Android might be different, although, I am a bit skeptical - especially
after hearing quite some complains about random application being
killed... If you do believe that you know better then, by all means,
implement your custom user space LMK and chose the oom victim on a
different basis but try to understand that the global OOM killer is the
last resort measure to make the system usable again. There is a good
reason why the kernel uses the current badness calculation. The previous
implementation which considered the process age ad other things was just
too random to have a understandable behavior.

In any case playing nasty games with the oom killer tunables might and
will lead, well, to unexpected behavior.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 15:03               ` Michal Hocko
@ 2017-02-24 15:40                 ` peter enderborg
  2017-02-24 15:52                   ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: peter enderborg @ 2017-02-24 15:40 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Martijn Coenen, John Stultz, Greg KH, Arve Hjønnevåg,
	Riley Andrews, devel, LKML, Linux-MM, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On 02/24/2017 04:03 PM, Michal Hocko wrote:
> On Fri 24-02-17 15:42:49, peter enderborg wrote:
>> On 02/24/2017 03:11 PM, Michal Hocko wrote:
>>> On Fri 24-02-17 14:16:34, peter enderborg wrote:
>>>> On 02/24/2017 01:28 PM, Michal Hocko wrote:
>>> [...]
>>>>> Yeah, I strongly believe that the chosen approach is completely wrong.
>>>>> Both in abusing the shrinker interface and abusing oom_score_adj as the
>>>>> only criterion for the oom victim selection.
>>>> No one is arguing that shrinker is not problematic. And would be great
>>>> if it is removed from lmk.  The oom_score_adj is the way user-space
>>>> tells the kernel what the user-space has as prio. And android is using
>>>> that very much. It's a core part.
>>> Is there any documentation which describes how this is done?
>>>
>>>> I have never seen it be used on
>>>> other linux system so what is the intended usage of oom_score_adj? Is
>>>> this really abusing?
>>> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
>>> criterion on its own, well, except for the extreme sides of the range
>>> which are defined to enforce resp. disallow selecting the task. The
>>> global oom killer calculates the oom score as a function of the memory
>>> consumption. Your patch simply ignores the memory consumption (and uses
>>> pids to sort tasks with the same oom score which is just mind boggling)
>> How much it uses is of very little importance for android.
> But it is relevant for the global oom killer which is the main consumer of
> the oom_score_adj.
>
>> The score
>> used are only for apps and their services. System related are not
>> touched by android lmk. The pid is only to have a unique key to be
>> able to have it fast within a rbtree.  One idea was to use task_pid to
>> get a strict age of process to get a round robin but since it does not
>> matter i skipped that idea since it does not matter.
> Pid will not tell you anything about the age. Pids do wrap around.
>
>>> and that is what I call the abuse. The oom score calculation might
>>> change in future, of course, but all consumers of the oom_score_adj
>>> really have to agree on the base which is adjusted by this tunable
>>> otherwise you can see a lot of unexpected behavior.
>> Then can we just define a range that is strictly for user-space?
> This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
> is usable.
So we use them in userspace and kernel space but where is the abuse then?
>>> I would even argue that nobody outside of mm/oom_kill.c should really
>>> have any business with this tunable.  You can of course tweak the value
>>> from the userspace and help to chose a better oom victim this way but
>>> that is it.
>> Why only help? If userspace can give an exact order to kernel that
>> must be a good thing; other wise kernel have to guess and when
>> can that be better? 
> Because userspace doesn't know who is the best victim in 99% cases.
If user-space does not tell kernel what to it have to guess, android
user-space does, and maybe other should too.
> Android might be different, although, I am a bit skeptical - especially
> after hearing quite some complains about random application being
> killed... If you do believe that you know better then, by all means,
> implement your custom user space LMK and chose the oom victim on a
> different basis but try to understand that the global OOM killer is the
> last resort measure to make the system usable again. There is a good
> reason why the kernel uses the current badness calculation. The previous
> implementation which considered the process age ad other things was just
> too random to have a understandable behavior.
I think it make sense that there is only one way to describe what is
important what is not. And oom_kill is the last resort is one problem
for android. Android lowmemorykiller balance memory usage and
tries to be more proactive and that is why shrinkers work so well.
> In any case playing nasty games with the oom killer tunables might and
> will lead, well, to unexpected behavior.

I don't follow. If we only use values  OOM_SCORE_ADJ_{MIN,MAX} can
we then be "safe"?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 15:40                 ` peter enderborg
@ 2017-02-24 15:52                   ` Michal Hocko
  0 siblings, 0 replies; 21+ messages in thread
From: Michal Hocko @ 2017-02-24 15:52 UTC (permalink / raw)
  To: peter enderborg
  Cc: Martijn Coenen, John Stultz, Greg KH, Arve Hjønnevåg,
	Riley Andrews, devel, LKML, Linux-MM, Todd Kjos,
	Android Kernel Team, Rom Lemarchand, Tim Murray

On Fri 24-02-17 16:40:13, peter enderborg wrote:
> On 02/24/2017 04:03 PM, Michal Hocko wrote:
> > On Fri 24-02-17 15:42:49, peter enderborg wrote:
> >> On 02/24/2017 03:11 PM, Michal Hocko wrote:
> >>> On Fri 24-02-17 14:16:34, peter enderborg wrote:
> >>>> On 02/24/2017 01:28 PM, Michal Hocko wrote:
> >>> [...]
> >>>>> Yeah, I strongly believe that the chosen approach is completely wrong.
> >>>>> Both in abusing the shrinker interface and abusing oom_score_adj as the
> >>>>> only criterion for the oom victim selection.
> >>>> No one is arguing that shrinker is not problematic. And would be great
> >>>> if it is removed from lmk.  The oom_score_adj is the way user-space
> >>>> tells the kernel what the user-space has as prio. And android is using
> >>>> that very much. It's a core part.
> >>> Is there any documentation which describes how this is done?
> >>>
> >>>> I have never seen it be used on
> >>>> other linux system so what is the intended usage of oom_score_adj? Is
> >>>> this really abusing?
> >>> oom_score_adj is used to _adjust_ the calculated oom score. It is not a
> >>> criterion on its own, well, except for the extreme sides of the range
> >>> which are defined to enforce resp. disallow selecting the task. The
> >>> global oom killer calculates the oom score as a function of the memory
> >>> consumption. Your patch simply ignores the memory consumption (and uses
> >>> pids to sort tasks with the same oom score which is just mind boggling)
> >> How much it uses is of very little importance for android.
> > But it is relevant for the global oom killer which is the main consumer of
> > the oom_score_adj.
> >
> >> The score
> >> used are only for apps and their services. System related are not
> >> touched by android lmk. The pid is only to have a unique key to be
> >> able to have it fast within a rbtree.  One idea was to use task_pid to
> >> get a strict age of process to get a round robin but since it does not
> >> matter i skipped that idea since it does not matter.
> > Pid will not tell you anything about the age. Pids do wrap around.
> >
> >>> and that is what I call the abuse. The oom score calculation might
> >>> change in future, of course, but all consumers of the oom_score_adj
> >>> really have to agree on the base which is adjusted by this tunable
> >>> otherwise you can see a lot of unexpected behavior.
> >> Then can we just define a range that is strictly for user-space?
> > This is already well defined. The whole range OOM_SCORE_ADJ_{MIN,MAX}
> > is usable.
>
> So we use them in userspace and kernel space but where is the abuse then?

I believe I have already answered that.

> >>> I would even argue that nobody outside of mm/oom_kill.c should really
> >>> have any business with this tunable.  You can of course tweak the value
> >>> from the userspace and help to chose a better oom victim this way but
> >>> that is it.
> >> Why only help? If userspace can give an exact order to kernel that
> >> must be a good thing; other wise kernel have to guess and when
> >> can that be better? 
> > Because userspace doesn't know who is the best victim in 99% cases.
>
> If user-space does not tell kernel what to it have to guess, android
> user-space does, and maybe other should too.

I believe they would do if this was so simple. This is not as easy to
answer as you might think. If you ask, everybody will consider their
task important enough to be killed.

[...]

> > In any case playing nasty games with the oom killer tunables might and
> > will lead, well, to unexpected behavior.
> 
> I don't follow. If we only use values  OOM_SCORE_ADJ_{MIN,MAX} can
> we then be "safe"?

I didn't say that. I was just trying to say that you cannot give a
single knob two different semantics depending on who is using them. That
simply won't work. So if you believe that your LMK knows better then try
to use something else for your heuristics.

Anyway, this is still offtopic I believe.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24  9:34     ` Michal Hocko
@ 2017-02-24 18:38       ` Tim Murray
  2017-02-24 18:42         ` Rom Lemarchand
  0 siblings, 1 reply; 21+ messages in thread
From: Tim Murray @ 2017-02-24 18:38 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Martijn Coenen, John Stultz, Greg KH, Arve Hjønnevåg,
	Riley Andrews, devel, LKML, Linux-MM, Todd Kjos,
	Android Kernel Team, Rom Lemarchand

[-- Attachment #1: Type: text/plain, Size: 3408 bytes --]

Hi all, I've recently been looking at lowmemorykiller, userspace lmkd, and
memory cgroups on Android.

First of all, no, an Android device will probably not function without a
kernel or userspace version of lowmemorykiller. Android userspace expects
that if there are many apps running in the background on a machine and the
foreground app allocates additional memory, something on the system will
kill background apps to free up more memory. If this doesn't happen, I
expect that at the very least you'll see page cache thrashing, and you'll
probably see the OOM killer run regularly, which has a tendency to cause
Android userspace to restart. To the best of my knowledge, no device has
shipped with a userspace lmkd.

Second, yes, the current design and implementation of lowmemorykiller are
unsatisfactory. I now have some concrete evidence that the design of
lowmemorykiller is directly responsible for some very negative user-visible
behaviors (particularly the triggers for when to kill), so I'm currently
working on an overhaul to the Android memory model that would use mem
cgroups and userspace lmkd to make smarter decisions about reclaim vs
killing. Yes, this means that we would move to vmpressure (which will
require improvements to vmpressure). I can't give a firm ETA for this, as
it's still in the prototype phase, but the initial results are promising.

On Fri, Feb 24, 2017 at 1:34 AM, Michal Hocko <mhocko@kernel.org> wrote:

> On Thu 23-02-17 21:36:00, Martijn Coenen wrote:
> > On Thu, Feb 23, 2017 at 9:24 PM, John Stultz <john.stultz@linaro.org>
> wrote:
> [...]
> > > This is reportedly because while the mempressure notifiers provide a
> > > the signal to userspace, the work the deamon then has to do to look up
> > > per process memory usage, in order to figure out who is best to kill
> > > at that point was too costly and resulted in poor device performance.
> >
> > In particular, mempressure requires memory cgroups to function, and we
> > saw performance regressions due to the accounting done in mem cgroups.
> > At the time we didn't have enough time left to solve this before the
> > release, and we reverted back to kernel lmkd.
>
> I would be more than interested to hear details. We used to have some
> visible charge path performance footprint but this should be gone now.
>
> [...]
> > > It would be great however to get a discussion going here on what the
> > > ulmkd needs from the kernel in order to efficiently determine who best
> > > to kill, and how we might best implement that.
> >
> > The two main issues I think we need to address are:
> > 1) Getting the right granularity of events from the kernel; I once
> > tried to submit a patch upstream to address this:
> > https://lkml.org/lkml/2016/2/24/582
>
> Not only that, the implementation of tht vmpressure needs some serious
> rethinking as well. The current one can hit critical events
> unexpectedly. The calculation also doesn't consider slab reclaim
> sensibly.
>
> > 2) Find out where exactly the memory cgroup overhead is coming from,
> > and how to reduce it or work around it to acceptable levels for
> > Android. This was also on 3.10, and maybe this has long been fixed or
> > improved in more recent kernel versions.
>
> 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") has improved
> situation a lot as all the charging is lockless since then (3.19).
> --
> Michal Hocko
> SUSE Labs
>

[-- Attachment #2: Type: text/html, Size: 4299 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 18:38       ` Tim Murray
@ 2017-02-24 18:42         ` Rom Lemarchand
  2017-03-04  2:06           ` Tim Murray
  0 siblings, 1 reply; 21+ messages in thread
From: Rom Lemarchand @ 2017-02-24 18:42 UTC (permalink / raw)
  To: Tim Murray, Suren Baghdasaryan
  Cc: Michal Hocko, Martijn Coenen, John Stultz, Greg KH,
	Arve Hjønnevåg, Riley Andrews, devel, LKML, Linux-MM,
	Todd Kjos, Android Kernel Team

[-- Attachment #1: Type: text/plain, Size: 3586 bytes --]

+surenb

On Fri, Feb 24, 2017 at 10:38 AM, Tim Murray <timmurray@google.com> wrote:

> Hi all, I've recently been looking at lowmemorykiller, userspace lmkd, and
> memory cgroups on Android.
>
> First of all, no, an Android device will probably not function without a
> kernel or userspace version of lowmemorykiller. Android userspace expects
> that if there are many apps running in the background on a machine and the
> foreground app allocates additional memory, something on the system will
> kill background apps to free up more memory. If this doesn't happen, I
> expect that at the very least you'll see page cache thrashing, and you'll
> probably see the OOM killer run regularly, which has a tendency to cause
> Android userspace to restart. To the best of my knowledge, no device has
> shipped with a userspace lmkd.
>
> Second, yes, the current design and implementation of lowmemorykiller are
> unsatisfactory. I now have some concrete evidence that the design of
> lowmemorykiller is directly responsible for some very negative user-visible
> behaviors (particularly the triggers for when to kill), so I'm currently
> working on an overhaul to the Android memory model that would use mem
> cgroups and userspace lmkd to make smarter decisions about reclaim vs
> killing. Yes, this means that we would move to vmpressure (which will
> require improvements to vmpressure). I can't give a firm ETA for this, as
> it's still in the prototype phase, but the initial results are promising.
>
> On Fri, Feb 24, 2017 at 1:34 AM, Michal Hocko <mhocko@kernel.org> wrote:
>
>> On Thu 23-02-17 21:36:00, Martijn Coenen wrote:
>> > On Thu, Feb 23, 2017 at 9:24 PM, John Stultz <john.stultz@linaro.org>
>> wrote:
>> [...]
>> > > This is reportedly because while the mempressure notifiers provide a
>> > > the signal to userspace, the work the deamon then has to do to look up
>> > > per process memory usage, in order to figure out who is best to kill
>> > > at that point was too costly and resulted in poor device performance.
>> >
>> > In particular, mempressure requires memory cgroups to function, and we
>> > saw performance regressions due to the accounting done in mem cgroups.
>> > At the time we didn't have enough time left to solve this before the
>> > release, and we reverted back to kernel lmkd.
>>
>> I would be more than interested to hear details. We used to have some
>> visible charge path performance footprint but this should be gone now.
>>
>> [...]
>> > > It would be great however to get a discussion going here on what the
>> > > ulmkd needs from the kernel in order to efficiently determine who best
>> > > to kill, and how we might best implement that.
>> >
>> > The two main issues I think we need to address are:
>> > 1) Getting the right granularity of events from the kernel; I once
>> > tried to submit a patch upstream to address this:
>> > https://lkml.org/lkml/2016/2/24/582
>>
>> Not only that, the implementation of tht vmpressure needs some serious
>> rethinking as well. The current one can hit critical events
>> unexpectedly. The calculation also doesn't consider slab reclaim
>> sensibly.
>>
>> > 2) Find out where exactly the memory cgroup overhead is coming from,
>> > and how to reduce it or work around it to acceptable levels for
>> > Android. This was also on 3.10, and maybe this has long been fixed or
>> > improved in more recent kernel versions.
>>
>> 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") has improved
>> situation a lot as all the charging is lockless since then (3.19).
>> --
>> Michal Hocko
>> SUSE Labs
>>
>
>

[-- Attachment #2: Type: text/html, Size: 4728 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-24 18:42         ` Rom Lemarchand
@ 2017-03-04  2:06           ` Tim Murray
  0 siblings, 0 replies; 21+ messages in thread
From: Tim Murray @ 2017-03-04  2:06 UTC (permalink / raw)
  To: Rom Lemarchand
  Cc: Suren Baghdasaryan, Michal Hocko, Martijn Coenen, John Stultz,
	Greg KH, Arve Hjønnevåg, Riley Andrews, devel, LKML,
	Linux-MM, Todd Kjos, Android Kernel Team

Hi all,

I mentioned before that I had some ideas to overhaul lowmemorykiller,
which would have the side effect of getting it out of the kernel. I've
been working through some prototypes over the past few weeks (actually
started before Michal sent his patch out), and I'd appreciate some
feedback on what I'd like to do so I can start working on more
complete patches.

First, Michal has mentioned why the current lowmemorykiller
implementation is bad. However, the design and implementation of
lowmemorykiller is bad for Android users as well. Rather than fixing
lowmemorykiller in the kernel or enabling an equivalent
reimplementation of lowmemorykiller in userspace, I think we can solve
the Android problems and remove lowmemorykiller from the tree at the
same time.

What's wrong with lowmemorykiller from an Android user's POV?

1. lowmemorykiller can be way too aggressive when there are transient
spikes in memory consumption. LMK relies on hand-tuned thresholds to
determine when to kill a process, but hitting the threshold shouldn't
always imply a kill. For example, on some current high-end Android
devices, lowmemorykiller will start to kill oom_score_adj 200
processes once there is less than 112MB in the page cache and less
than 112MB of free pages. oom_score_adj 200 is used for processes that
are important and visible to the user but not the currently-used
foreground app; music playback or camera post-processing for some apps
usually runs as adj 200. This threshold means that even if the system
would quiesce at 110MB in the page cache and 110MB of free pages,
something important to the user may die. This is bad!

2. lowmemorykiller can be way too passive on lower memory devices.
Because lowmemorykiller has a shared threshold for the amount of free
pages and the size of the page cache before it will kill a process,
there is a scenario that we hit all the time that results in low
memory devices becoming unusable. Assume the current application and
supporting system software need X bytes in the page cache in order to
provide reasonable UI performance, and X is larger than the zone_high
watermark that stops kswapd. The number of free pages can drop below
zone_low and kswapd will start evicting pages from the page cache;
however, because the working set is actually of size X, those pages
will be paged back in about as quickly as they can be paged out. This
manifests as kswapd constantly evicting file pages and the foreground
UI workload constantly waiting on page faults. Meanwhile, even if
there are very unimportant processes to kill, lowmemorykiller won't do
anything to kill them.

#2 can be addressed somewhat by separating the limits for number of
free pages and the size of the page cache, but then lowmemorykiller
would have two sets of arbitrary hand-tuned values and still no
knowledge of kswapd/reclaim. It doesn't make sense to do that if we
can avoid it.

We have plenty of evidence for both of these on real Android devices.
I'm bringing up these issues to not only explain the problems that
we'd like to solve, but also to provide some evidence that we're
serious about fixing lowmemorykiller once and for all.

Here's where I'd like to go.

First of all, lowmemorykiller should not be in the kernel, and Android
should move to per-app mem cgroups and kill unnecessary background
tasks when under memory pressure from userspace, not the kernel.

Second, Android has good knowledge of what's important to the user and
what's not. I'd like the ability to use that information to drive
decisions about reclaiming memory, so kswapd can shrink the mem
cgroups associated with background tasks before moving on to
foreground tasks. As far as I can tell, what I'm suggesting isn't a
soft limit or something like that. We don't have specific limits on
memory consumption for particular processes, and there's no size we
definitely want to get background processes to via reclaim before we
start reclaiming from foreground or persistent processes. In practice,
I think this looks like a per-memory-cgroup reclaim priority. I have a
prototype of this where I've added a new knob called memory.priority
from 0 to 10 that serves two purposes:

- Skip reclaiming from higher-priority cgroups entirely until the
priority from shrink_zone is high enough.
- Reduce the number of pages scanned from higher-priority cgroups once
they are eligible for reclamation.

This would let kswapd reclaim from applications that aren't critical
to the user while still occasionally reclaiming from persistent
processes (evicting pages that are used very rarely from
always-running system processes). This would effectively reduce the
size of backgrounded applications without impacting UI performance--a
huge improvement over what Android can do today.

Third, assuming we can do this kind of prioritized reclaim, I'd like
more information available via vmpressure (or similar) about the
current state of kswapd in terms of what priority it's trying to
reclaim. If lmkd knew that kswapd had moved on to higher-priority
cgroups while there were very unimportant processes remaining, lmkd
could be much more accurate about when to kill a process. This means
lmkd would run only in response to actual kswapd/direct reclaim
behavior, not because of arbitrary thresholds. This would unify how to
tune Android devices for memory consumption; the knobs in /proc/sys/vm
(primarily min_free_kbytes and extra_free_kbytes) would control both
kswapd *and* lmkd. I think this would also solve the "too aggressive
killing during transient spike" issue.

I'm working on an RFC of prioritized reclaim to follow (I hope)
sometime next week. I don't have a vmpressure patch prototyped yet,
since it depends on what the prioritized reclaim interface looks like.
Also, to be perfectly clear, I don't think my current approach is
necessarily the right one at all. All I have right now is a minimal
patch (against 3.18, hence the delay) to support memory cgroup
priorities: the interface makes no sense if you aren't familiar with
mm internals, I haven't thought through how this interacts with soft
limits, it doesn't make sense with cgroup hierarchies, etc. At this
stage, I'm mainly wondering if the broader community thinks
prioritized reclaim is a viable direction.

Thanks for any feedback you can provide.

Tim

On Fri, Feb 24, 2017 at 10:42 AM, Rom Lemarchand <romlem@google.com> wrote:
> +surenb
>
> On Fri, Feb 24, 2017 at 10:38 AM, Tim Murray <timmurray@google.com> wrote:
>>
>> Hi all, I've recently been looking at lowmemorykiller, userspace lmkd, and
>> memory cgroups on Android.
>>
>> First of all, no, an Android device will probably not function without a
>> kernel or userspace version of lowmemorykiller. Android userspace expects
>> that if there are many apps running in the background on a machine and the
>> foreground app allocates additional memory, something on the system will
>> kill background apps to free up more memory. If this doesn't happen, I
>> expect that at the very least you'll see page cache thrashing, and you'll
>> probably see the OOM killer run regularly, which has a tendency to cause
>> Android userspace to restart. To the best of my knowledge, no device has
>> shipped with a userspace lmkd.
>>
>> Second, yes, the current design and implementation of lowmemorykiller are
>> unsatisfactory. I now have some concrete evidence that the design of
>> lowmemorykiller is directly responsible for some very negative user-visible
>> behaviors (particularly the triggers for when to kill), so I'm currently
>> working on an overhaul to the Android memory model that would use mem
>> cgroups and userspace lmkd to make smarter decisions about reclaim vs
>> killing. Yes, this means that we would move to vmpressure (which will
>> require improvements to vmpressure). I can't give a firm ETA for this, as
>> it's still in the prototype phase, but the initial results are promising.
>>
>> On Fri, Feb 24, 2017 at 1:34 AM, Michal Hocko <mhocko@kernel.org> wrote:
>>>
>>> On Thu 23-02-17 21:36:00, Martijn Coenen wrote:
>>> > On Thu, Feb 23, 2017 at 9:24 PM, John Stultz <john.stultz@linaro.org>
>>> > wrote:
>>> [...]
>>> > > This is reportedly because while the mempressure notifiers provide a
>>> > > the signal to userspace, the work the deamon then has to do to look
>>> > > up
>>> > > per process memory usage, in order to figure out who is best to kill
>>> > > at that point was too costly and resulted in poor device performance.
>>> >
>>> > In particular, mempressure requires memory cgroups to function, and we
>>> > saw performance regressions due to the accounting done in mem cgroups.
>>> > At the time we didn't have enough time left to solve this before the
>>> > release, and we reverted back to kernel lmkd.
>>>
>>> I would be more than interested to hear details. We used to have some
>>> visible charge path performance footprint but this should be gone now.
>>>
>>> [...]
>>> > > It would be great however to get a discussion going here on what the
>>> > > ulmkd needs from the kernel in order to efficiently determine who
>>> > > best
>>> > > to kill, and how we might best implement that.
>>> >
>>> > The two main issues I think we need to address are:
>>> > 1) Getting the right granularity of events from the kernel; I once
>>> > tried to submit a patch upstream to address this:
>>> > https://lkml.org/lkml/2016/2/24/582
>>>
>>> Not only that, the implementation of tht vmpressure needs some serious
>>> rethinking as well. The current one can hit critical events
>>> unexpectedly. The calculation also doesn't consider slab reclaim
>>> sensibly.
>>>
>>> > 2) Find out where exactly the memory cgroup overhead is coming from,
>>> > and how to reduce it or work around it to acceptable levels for
>>> > Android. This was also on 3.10, and maybe this has long been fixed or
>>> > improved in more recent kernel versions.
>>>
>>> 3e32cb2e0a12 ("mm: memcontrol: lockless page counters") has improved
>>> situation a lot as all the charging is lockless since then (3.19).
>>> --
>>> Michal Hocko
>>> SUSE Labs
>>
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-02-22 12:01 [PATCH] staging, android: remove lowmemory killer from the tree Michal Hocko
  2017-02-23 20:24 ` John Stultz
@ 2017-03-09  9:15 ` Michal Hocko
  2017-03-09  9:30   ` Greg KH
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2017-03-09  9:15 UTC (permalink / raw)
  To: Greg KH
  Cc: Arve Hjønnevåg, Riley Andrews, devel, LKML, linux-mm,
	John Stultz, Todd Kjos, Martijn Coenen, Tim Murray,
	peter enderborg, Rom Lemarchand

Greg, do you see any obstacle to have this merged. The discussion so far
shown that a) vendors are not using the code as is b) there seems to be
an agreement that something else than we have in the kernel is really
needed.

On Wed 22-02-17 13:01:21, Michal Hocko wrote:
> From: Michal Hocko <mhocko@suse.com>
> 
> Lowmemory killer is sitting in the staging tree since 2008 without any
> serious interest for fixing issues brought up by the MM folks. The main
> objection is that the implementation is basically broken by design:
> 	- it hooks into slab shrinker API which is not suitable for this
> 	  purpose. lowmem_count implementation just shows this nicely.
> 	  There is no scaling based on the memory pressure and no
> 	  feedback to the generic shrinker infrastructure.
> 	  Moreover lowmem_scan is called way too often for the heavy
> 	  work it performs.
> 	- it is not reclaim context aware - no NUMA and/or memcg
> 	  awareness.
> 
> As the code stands right now it just adds a maintenance overhead when
> core MM changes have to update lowmemorykiller.c as well. It also seems
> that the alternative LMK implementation will be solely in the userspace
> so this code has no perspective it seems. The staging tree is supposed
> to be for a code which needs to be put in shape before it can be merged
> which is not the case here obviously.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  drivers/staging/android/Kconfig           |  10 --
>  drivers/staging/android/Makefile          |   1 -
>  drivers/staging/android/lowmemorykiller.c | 212 ------------------------------
>  include/linux/sched.h                     |   4 -
>  4 files changed, 227 deletions(-)
>  delete mode 100644 drivers/staging/android/lowmemorykiller.c
> 
> diff --git a/drivers/staging/android/Kconfig b/drivers/staging/android/Kconfig
> index 6c00d6f765c6..71a50b99caff 100644
> --- a/drivers/staging/android/Kconfig
> +++ b/drivers/staging/android/Kconfig
> @@ -14,16 +14,6 @@ config ASHMEM
>  	  It is, in theory, a good memory allocator for low-memory devices,
>  	  because it can discard shared memory units when under memory pressure.
>  
> -config ANDROID_LOW_MEMORY_KILLER
> -	bool "Android Low Memory Killer"
> -	---help---
> -	  Registers processes to be killed when low memory conditions, this is useful
> -	  as there is no particular swap space on android.
> -
> -	  The registered process will kill according to the priorities in android init
> -	  scripts (/init.rc), and it defines priority values with minimum free memory size
> -	  for each priority.
> -
>  source "drivers/staging/android/ion/Kconfig"
>  
>  endif # if ANDROID
> diff --git a/drivers/staging/android/Makefile b/drivers/staging/android/Makefile
> index 7ed1be798909..7cf1564a49a5 100644
> --- a/drivers/staging/android/Makefile
> +++ b/drivers/staging/android/Makefile
> @@ -3,4 +3,3 @@ ccflags-y += -I$(src)			# needed for trace events
>  obj-y					+= ion/
>  
>  obj-$(CONFIG_ASHMEM)			+= ashmem.o
> -obj-$(CONFIG_ANDROID_LOW_MEMORY_KILLER)	+= lowmemorykiller.o
> diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
> deleted file mode 100644
> index ec3b66561412..000000000000
> --- a/drivers/staging/android/lowmemorykiller.c
> +++ /dev/null
> @@ -1,212 +0,0 @@
> -/* drivers/misc/lowmemorykiller.c
> - *
> - * The lowmemorykiller driver lets user-space specify a set of memory thresholds
> - * where processes with a range of oom_score_adj values will get killed. Specify
> - * the minimum oom_score_adj values in
> - * /sys/module/lowmemorykiller/parameters/adj and the number of free pages in
> - * /sys/module/lowmemorykiller/parameters/minfree. Both files take a comma
> - * separated list of numbers in ascending order.
> - *
> - * For example, write "0,8" to /sys/module/lowmemorykiller/parameters/adj and
> - * "1024,4096" to /sys/module/lowmemorykiller/parameters/minfree to kill
> - * processes with a oom_score_adj value of 8 or higher when the free memory
> - * drops below 4096 pages and kill processes with a oom_score_adj value of 0 or
> - * higher when the free memory drops below 1024 pages.
> - *
> - * The driver considers memory used for caches to be free, but if a large
> - * percentage of the cached memory is locked this can be very inaccurate
> - * and processes may not get killed until the normal oom killer is triggered.
> - *
> - * Copyright (C) 2007-2008 Google, Inc.
> - *
> - * This software is licensed under the terms of the GNU General Public
> - * License version 2, as published by the Free Software Foundation, and
> - * may be copied, distributed, and modified under those terms.
> - *
> - * This program is distributed in the hope that it will be useful,
> - * but WITHOUT ANY WARRANTY; without even the implied warranty of
> - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> - * GNU General Public License for more details.
> - *
> - */
> -
> -#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> -
> -#include <linux/init.h>
> -#include <linux/moduleparam.h>
> -#include <linux/kernel.h>
> -#include <linux/mm.h>
> -#include <linux/oom.h>
> -#include <linux/sched.h>
> -#include <linux/swap.h>
> -#include <linux/rcupdate.h>
> -#include <linux/profile.h>
> -#include <linux/notifier.h>
> -
> -static u32 lowmem_debug_level = 1;
> -static short lowmem_adj[6] = {
> -	0,
> -	1,
> -	6,
> -	12,
> -};
> -
> -static int lowmem_adj_size = 4;
> -static int lowmem_minfree[6] = {
> -	3 * 512,	/* 6MB */
> -	2 * 1024,	/* 8MB */
> -	4 * 1024,	/* 16MB */
> -	16 * 1024,	/* 64MB */
> -};
> -
> -static int lowmem_minfree_size = 4;
> -
> -static unsigned long lowmem_deathpending_timeout;
> -
> -#define lowmem_print(level, x...)			\
> -	do {						\
> -		if (lowmem_debug_level >= (level))	\
> -			pr_info(x);			\
> -	} while (0)
> -
> -static unsigned long lowmem_count(struct shrinker *s,
> -				  struct shrink_control *sc)
> -{
> -	return global_node_page_state(NR_ACTIVE_ANON) +
> -		global_node_page_state(NR_ACTIVE_FILE) +
> -		global_node_page_state(NR_INACTIVE_ANON) +
> -		global_node_page_state(NR_INACTIVE_FILE);
> -}
> -
> -static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
> -{
> -	struct task_struct *tsk;
> -	struct task_struct *selected = NULL;
> -	unsigned long rem = 0;
> -	int tasksize;
> -	int i;
> -	short min_score_adj = OOM_SCORE_ADJ_MAX + 1;
> -	int minfree = 0;
> -	int selected_tasksize = 0;
> -	short selected_oom_score_adj;
> -	int array_size = ARRAY_SIZE(lowmem_adj);
> -	int other_free = global_page_state(NR_FREE_PAGES) - totalreserve_pages;
> -	int other_file = global_node_page_state(NR_FILE_PAGES) -
> -				global_node_page_state(NR_SHMEM) -
> -				total_swapcache_pages();
> -
> -	if (lowmem_adj_size < array_size)
> -		array_size = lowmem_adj_size;
> -	if (lowmem_minfree_size < array_size)
> -		array_size = lowmem_minfree_size;
> -	for (i = 0; i < array_size; i++) {
> -		minfree = lowmem_minfree[i];
> -		if (other_free < minfree && other_file < minfree) {
> -			min_score_adj = lowmem_adj[i];
> -			break;
> -		}
> -	}
> -
> -	lowmem_print(3, "lowmem_scan %lu, %x, ofree %d %d, ma %hd\n",
> -		     sc->nr_to_scan, sc->gfp_mask, other_free,
> -		     other_file, min_score_adj);
> -
> -	if (min_score_adj == OOM_SCORE_ADJ_MAX + 1) {
> -		lowmem_print(5, "lowmem_scan %lu, %x, return 0\n",
> -			     sc->nr_to_scan, sc->gfp_mask);
> -		return 0;
> -	}
> -
> -	selected_oom_score_adj = min_score_adj;
> -
> -	rcu_read_lock();
> -	for_each_process(tsk) {
> -		struct task_struct *p;
> -		short oom_score_adj;
> -
> -		if (tsk->flags & PF_KTHREAD)
> -			continue;
> -
> -		p = find_lock_task_mm(tsk);
> -		if (!p)
> -			continue;
> -
> -		if (task_lmk_waiting(p) &&
> -		    time_before_eq(jiffies, lowmem_deathpending_timeout)) {
> -			task_unlock(p);
> -			rcu_read_unlock();
> -			return 0;
> -		}
> -		oom_score_adj = p->signal->oom_score_adj;
> -		if (oom_score_adj < min_score_adj) {
> -			task_unlock(p);
> -			continue;
> -		}
> -		tasksize = get_mm_rss(p->mm);
> -		task_unlock(p);
> -		if (tasksize <= 0)
> -			continue;
> -		if (selected) {
> -			if (oom_score_adj < selected_oom_score_adj)
> -				continue;
> -			if (oom_score_adj == selected_oom_score_adj &&
> -			    tasksize <= selected_tasksize)
> -				continue;
> -		}
> -		selected = p;
> -		selected_tasksize = tasksize;
> -		selected_oom_score_adj = oom_score_adj;
> -		lowmem_print(2, "select '%s' (%d), adj %hd, size %d, to kill\n",
> -			     p->comm, p->pid, oom_score_adj, tasksize);
> -	}
> -	if (selected) {
> -		task_lock(selected);
> -		send_sig(SIGKILL, selected, 0);
> -		if (selected->mm)
> -			task_set_lmk_waiting(selected);
> -		task_unlock(selected);
> -		lowmem_print(1, "Killing '%s' (%d), adj %hd,\n"
> -				 "   to free %ldkB on behalf of '%s' (%d) because\n"
> -				 "   cache %ldkB is below limit %ldkB for oom_score_adj %hd\n"
> -				 "   Free memory is %ldkB above reserved\n",
> -			     selected->comm, selected->pid,
> -			     selected_oom_score_adj,
> -			     selected_tasksize * (long)(PAGE_SIZE / 1024),
> -			     current->comm, current->pid,
> -			     other_file * (long)(PAGE_SIZE / 1024),
> -			     minfree * (long)(PAGE_SIZE / 1024),
> -			     min_score_adj,
> -			     other_free * (long)(PAGE_SIZE / 1024));
> -		lowmem_deathpending_timeout = jiffies + HZ;
> -		rem += selected_tasksize;
> -	}
> -
> -	lowmem_print(4, "lowmem_scan %lu, %x, return %lu\n",
> -		     sc->nr_to_scan, sc->gfp_mask, rem);
> -	rcu_read_unlock();
> -	return rem;
> -}
> -
> -static struct shrinker lowmem_shrinker = {
> -	.scan_objects = lowmem_scan,
> -	.count_objects = lowmem_count,
> -	.seeks = DEFAULT_SEEKS * 16
> -};
> -
> -static int __init lowmem_init(void)
> -{
> -	register_shrinker(&lowmem_shrinker);
> -	return 0;
> -}
> -device_initcall(lowmem_init);
> -
> -/*
> - * not really modular, but the easiest way to keep compat with existing
> - * bootargs behaviour is to continue using module_param here.
> - */
> -module_param_named(cost, lowmem_shrinker.seeks, int, 0644);
> -module_param_array_named(adj, lowmem_adj, short, &lowmem_adj_size, 0644);
> -module_param_array_named(minfree, lowmem_minfree, uint, &lowmem_minfree_size,
> -			 0644);
> -module_param_named(debug_level, lowmem_debug_level, uint, 0644);
> -
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index e93594b88130..3cc6c650fa6a 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -2347,7 +2347,6 @@ static inline void memalloc_noio_restore(unsigned int flags)
>  #define PFA_NO_NEW_PRIVS 0	/* May not gain new privileges. */
>  #define PFA_SPREAD_PAGE  1      /* Spread page cache over cpuset */
>  #define PFA_SPREAD_SLAB  2      /* Spread some slab caches over cpuset */
> -#define PFA_LMK_WAITING  3      /* Lowmemorykiller is waiting */
>  
>  
>  #define TASK_PFA_TEST(name, func)					\
> @@ -2371,9 +2370,6 @@ TASK_PFA_TEST(SPREAD_SLAB, spread_slab)
>  TASK_PFA_SET(SPREAD_SLAB, spread_slab)
>  TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab)
>  
> -TASK_PFA_TEST(LMK_WAITING, lmk_waiting)
> -TASK_PFA_SET(LMK_WAITING, lmk_waiting)
> -
>  /*
>   * task->jobctl flags
>   */
> -- 
> 2.11.0
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-03-09  9:15 ` Michal Hocko
@ 2017-03-09  9:30   ` Greg KH
  2017-03-09 10:00     ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: Greg KH @ 2017-03-09  9:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Arve Hjønnevåg, Riley Andrews, devel, LKML, linux-mm,
	John Stultz, Todd Kjos, Martijn Coenen, Tim Murray,
	peter enderborg, Rom Lemarchand

On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> Greg, do you see any obstacle to have this merged. The discussion so far
> shown that a) vendors are not using the code as is b) there seems to be
> an agreement that something else than we have in the kernel is really
> needed.

Well, some vendors are using the code as-is, just not Sony...

I think the ideas that Tim wrote about is the best way forward for this.
I'd prefer to leave the code in the kernel until that solution is
integrated, as dropping support entirely isn't very nice.

But, given that almost no Android system is running mainline at the
moment, I will queue this patch up for 4.12-rc1, which will give the
Google people a bit more of an incentive to get their solution
implemented and working and merged :)

Sound reasonable?  I haven't started to go through my patch queue for
4.12-rc1 stuff just yet, still digging through it for 4.11-final things
at the moment.  Give me a week or so to catch up.

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-03-09  9:30   ` Greg KH
@ 2017-03-09 10:00     ` Michal Hocko
  2017-03-09 12:48       ` Greg KH
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2017-03-09 10:00 UTC (permalink / raw)
  To: Greg KH
  Cc: Arve Hjønnevåg, Riley Andrews, devel, LKML, linux-mm,
	John Stultz, Todd Kjos, Martijn Coenen, Tim Murray,
	peter enderborg, Rom Lemarchand

On Thu 09-03-17 10:30:28, Greg KH wrote:
> On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> > Greg, do you see any obstacle to have this merged. The discussion so far
> > shown that a) vendors are not using the code as is b) there seems to be
> > an agreement that something else than we have in the kernel is really
> > needed.
> 
> Well, some vendors are using the code as-is, just not Sony...
> 
> I think the ideas that Tim wrote about is the best way forward for this.
> I'd prefer to leave the code in the kernel until that solution is
> integrated, as dropping support entirely isn't very nice.
> 
> But, given that almost no Android system is running mainline at the
> moment, I will queue this patch up for 4.12-rc1, which will give the
> Google people a bit more of an incentive to get their solution
> implemented and working and merged :)
> 
> Sound reasonable?

sounds good to me.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] staging, android: remove lowmemory killer from the tree
  2017-03-09 10:00     ` Michal Hocko
@ 2017-03-09 12:48       ` Greg KH
  0 siblings, 0 replies; 21+ messages in thread
From: Greg KH @ 2017-03-09 12:48 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Arve Hjønnevåg, Riley Andrews, devel, LKML, linux-mm,
	John Stultz, Todd Kjos, Martijn Coenen, Tim Murray,
	peter enderborg, Rom Lemarchand

On Thu, Mar 09, 2017 at 11:00:07AM +0100, Michal Hocko wrote:
> On Thu 09-03-17 10:30:28, Greg KH wrote:
> > On Thu, Mar 09, 2017 at 10:15:13AM +0100, Michal Hocko wrote:
> > > Greg, do you see any obstacle to have this merged. The discussion so far
> > > shown that a) vendors are not using the code as is b) there seems to be
> > > an agreement that something else than we have in the kernel is really
> > > needed.
> > 
> > Well, some vendors are using the code as-is, just not Sony...
> > 
> > I think the ideas that Tim wrote about is the best way forward for this.
> > I'd prefer to leave the code in the kernel until that solution is
> > integrated, as dropping support entirely isn't very nice.
> > 
> > But, given that almost no Android system is running mainline at the
> > moment, I will queue this patch up for 4.12-rc1, which will give the
> > Google people a bit more of an incentive to get their solution
> > implemented and working and merged :)
> > 
> > Sound reasonable?
> 
> sounds good to me.

Oh nevermind, might as well commit it now to my -next branch.  All now
deleted :)

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2017-03-09 12:48 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-22 12:01 [PATCH] staging, android: remove lowmemory killer from the tree Michal Hocko
2017-02-23 20:24 ` John Stultz
2017-02-23 20:28   ` Todd Kjos
2017-02-23 20:36   ` Martijn Coenen
2017-02-24  9:34     ` Michal Hocko
2017-02-24 18:38       ` Tim Murray
2017-02-24 18:42         ` Rom Lemarchand
2017-03-04  2:06           ` Tim Murray
2017-02-24 12:19     ` peter enderborg
2017-02-24 12:28       ` Michal Hocko
2017-02-24 13:16         ` peter enderborg
2017-02-24 14:11           ` Michal Hocko
2017-02-24 14:42             ` peter enderborg
2017-02-24 15:03               ` Michal Hocko
2017-02-24 15:40                 ` peter enderborg
2017-02-24 15:52                   ` Michal Hocko
2017-02-24  9:38   ` Michal Hocko
2017-03-09  9:15 ` Michal Hocko
2017-03-09  9:30   ` Greg KH
2017-03-09 10:00     ` Michal Hocko
2017-03-09 12:48       ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).