* [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
@ 2016-02-16 9:37 Xishi Qiu
2016-02-16 17:38 ` Greg Kroah-Hartman
0 siblings, 1 reply; 10+ messages in thread
From: Xishi Qiu @ 2016-02-16 9:37 UTC (permalink / raw)
To: Greg Kroah-Hartman, arve, riandrews, devel, zhong jiang; +Cc: LKML, Linux MM
Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
But usually smart phones enable zram, so swap space actually use ram.
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
---
drivers/staging/android/lowmemorykiller.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index 8b5a4a8..718ab8e 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -139,7 +139,10 @@ static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
task_unlock(p);
continue;
}
- tasksize = get_mm_rss(p->mm);
+ tasksize = get_mm_rss(p->mm) +
+ get_mm_counter(p->mm, MM_SWAPENTS) +
+ atomic_long_read(&p->mm->nr_ptes) +
+ mm_nr_pmds(p->mm);
task_unlock(p);
if (tasksize <= 0)
continue;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
2016-02-16 9:37 [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan() Xishi Qiu
@ 2016-02-16 17:38 ` Greg Kroah-Hartman
2016-02-17 0:35 ` David Rientjes
0 siblings, 1 reply; 10+ messages in thread
From: Greg Kroah-Hartman @ 2016-02-16 17:38 UTC (permalink / raw)
To: Xishi Qiu; +Cc: arve, riandrews, devel, zhong jiang, LKML, Linux MM
On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
> Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
> But usually smart phones enable zram, so swap space actually use ram.
Yes, but does that matter for this type of calculation? I need an ack
from the android team before I could ever take such a core change to
this code...
thanks,
greg k-h
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
2016-02-16 17:38 ` Greg Kroah-Hartman
@ 2016-02-17 0:35 ` David Rientjes
2016-02-17 8:30 ` Xishi Qiu
` (3 more replies)
0 siblings, 4 replies; 10+ messages in thread
From: David Rientjes @ 2016-02-17 0:35 UTC (permalink / raw)
To: Greg Kroah-Hartman
Cc: Xishi Qiu, arve, riandrews, devel, zhong jiang, LKML, Linux MM
On Tue, 16 Feb 2016, Greg Kroah-Hartman wrote:
> On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
> > Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
> > But usually smart phones enable zram, so swap space actually use ram.
>
> Yes, but does that matter for this type of calculation? I need an ack
> from the android team before I could ever take such a core change to
> this code...
>
The calculation proposed in this patch is the same as the generic oom
killer, it's an estimate of the amount of memory that will be freed if it
is killed and can exit. This is better than simply get_mm_rss().
However, I think we seriously need to re-consider the implementation of
the lowmem killer entirely. It currently abuses the use of TIF_MEMDIE,
which should ideally only be set for one thread on the system since it
allows unbounded access to global memory reserves.
It also abuses the user-visible /proc/self/oom_score_adj tunable: this
tunable is used by the generic oom killer to bias or discount a proportion
of memory from a process's usage. This is the only supported semantic of
the tunable. The lowmem killer uses it as a strict prioritization, so any
process with oom_score_adj higher than another process is preferred for
kill, REGARDLESS of memory usage. This leads to priority inversion, the
user is unable to always define the same process to be killed by the
generic oom killer and the lowmem killer. This is what happens when a
tunable with a very clear and defined purpose is used for other reasons.
I'd seriously consider not accepting any additional hacks on top of this
code until the implementation is rewritten.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
2016-02-17 0:35 ` David Rientjes
@ 2016-02-17 8:30 ` Xishi Qiu
2016-02-17 22:42 ` David Rientjes
2016-02-17 18:10 ` Michal Hocko
` (2 subsequent siblings)
3 siblings, 1 reply; 10+ messages in thread
From: Xishi Qiu @ 2016-02-17 8:30 UTC (permalink / raw)
To: David Rientjes
Cc: Greg Kroah-Hartman, arve, riandrews, devel, zhong jiang, LKML, Linux MM
On 2016/2/17 8:35, David Rientjes wrote:
> On Tue, 16 Feb 2016, Greg Kroah-Hartman wrote:
>
>> On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
>>> Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
>>> But usually smart phones enable zram, so swap space actually use ram.
>>
>> Yes, but does that matter for this type of calculation? I need an ack
>> from the android team before I could ever take such a core change to
>> this code...
>>
>
> The calculation proposed in this patch is the same as the generic oom
> killer, it's an estimate of the amount of memory that will be freed if it
> is killed and can exit. This is better than simply get_mm_rss().
>
> However, I think we seriously need to re-consider the implementation of
> the lowmem killer entirely. It currently abuses the use of TIF_MEMDIE,
> which should ideally only be set for one thread on the system since it
> allows unbounded access to global memory reserves.
>
> It also abuses the user-visible /proc/self/oom_score_adj tunable: this
> tunable is used by the generic oom killer to bias or discount a proportion
> of memory from a process's usage. This is the only supported semantic of
> the tunable. The lowmem killer uses it as a strict prioritization, so any
> process with oom_score_adj higher than another process is preferred for
> kill, REGARDLESS of memory usage. This leads to priority inversion, the
> user is unable to always define the same process to be killed by the
> generic oom killer and the lowmem killer. This is what happens when a
> tunable with a very clear and defined purpose is used for other reasons.
>
> I'd seriously consider not accepting any additional hacks on top of this
> code until the implementation is rewritten.
>
Hi David,
Thanks for your advice.
I have a stupid question, what's the main difference between lmk and oom?
1) lmk is called when reclaim memory, and oom is called when alloc failed in slow path.
2) lmk has several lowmem thresholds and oom is not.
3) others?
Thanks,
Xishi Qiu
> .
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
2016-02-17 0:35 ` David Rientjes
2016-02-17 8:30 ` Xishi Qiu
@ 2016-02-17 18:10 ` Michal Hocko
2016-02-18 6:51 ` Xishi Qiu
[not found] ` <CAF7GXvqr2dmc7CUcs_OmfYnEA9jE_Db4kGGG1HJyYYLhC6Bgew@mail.gmail.com>
3 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2016-02-17 18:10 UTC (permalink / raw)
To: David Rientjes
Cc: Greg Kroah-Hartman, Xishi Qiu, arve, riandrews, devel,
zhong jiang, LKML, Linux MM
On Tue 16-02-16 16:35:39, David Rientjes wrote:
> On Tue, 16 Feb 2016, Greg Kroah-Hartman wrote:
>
> > On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
> > > Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
> > > But usually smart phones enable zram, so swap space actually use ram.
> >
> > Yes, but does that matter for this type of calculation? I need an ack
> > from the android team before I could ever take such a core change to
> > this code...
> >
>
> The calculation proposed in this patch is the same as the generic oom
> killer, it's an estimate of the amount of memory that will be freed if it
> is killed and can exit. This is better than simply get_mm_rss().
>
> However, I think we seriously need to re-consider the implementation of
> the lowmem killer entirely. It currently abuses the use of TIF_MEMDIE,
> which should ideally only be set for one thread on the system since it
> allows unbounded access to global memory reserves.
>
> It also abuses the user-visible /proc/self/oom_score_adj tunable: this
> tunable is used by the generic oom killer to bias or discount a proportion
> of memory from a process's usage. This is the only supported semantic of
> the tunable. The lowmem killer uses it as a strict prioritization, so any
> process with oom_score_adj higher than another process is preferred for
> kill, REGARDLESS of memory usage. This leads to priority inversion, the
> user is unable to always define the same process to be killed by the
> generic oom killer and the lowmem killer. This is what happens when a
> tunable with a very clear and defined purpose is used for other reasons.
>
> I'd seriously consider not accepting any additional hacks on top of this
> code until the implementation is rewritten.
Fully agreed!
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
2016-02-17 8:30 ` Xishi Qiu
@ 2016-02-17 22:42 ` David Rientjes
0 siblings, 0 replies; 10+ messages in thread
From: David Rientjes @ 2016-02-17 22:42 UTC (permalink / raw)
To: Xishi Qiu
Cc: Greg Kroah-Hartman, arve, riandrews, devel, zhong jiang, LKML, Linux MM
On Wed, 17 Feb 2016, Xishi Qiu wrote:
> Hi David,
>
> Thanks for your advice.
>
> I have a stupid question, what's the main difference between lmk and oom?
Hi Xishi, it's not a stupid question at all!
Low memory killer appears to be implemented as a generic shrinker that
iterates through the tasklist and tries to free memory before the generic
oom killer. It has two tunables, "adj" and "minfree": "minfree" describes
what class of processes are eligible based on how many free pages are left
on the system and "adj" defines that class by using oom_score_adj values.
So LMK is trying to free memory before all memory is depleted based on
heuristics for systems that load the driver whereas the generic oom killer
is called to kill a process when reclaim has failed to free any memory and
there's no forward progress.
> 1) lmk is called when reclaim memory, and oom is called when alloc failed in slow path.
Yeah, and I don't think LMK provides any sort of guarantee against all
memory being fully depleted before it can run, so it would probably be
best effort.
> 2) lmk has several lowmem thresholds and oom is not.
Right, and it abuses oom_score_adj, which is a generic oom killer tunable
to define priorities to kill at different levels of memory availability.
> 3) others?
>
LMK also abuses TIF_MEMDIE which is used by the generic oom killer to
allow a process to free memory. Since the system is out of memory when it
is called, a process often needs additional memory to even exit, so we set
TIF_MEMDIE to ignore zone watermarks in the page allocator. LMK should
not be using this, there should already be memory available for it to
allocate from.
To fix these issues with LMK, I think it should:
- send SIGKILL to terminate a process in lowmem situations, but not
set TIF_MEMDIE and implement its own way of determining when to kill
additional processes, and
- introduce its own tunable to define the priority of kill when it runs
rather than oom_score_adj, which is a proportion of memory to bias
against, not a priority at all.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
2016-02-17 0:35 ` David Rientjes
2016-02-17 8:30 ` Xishi Qiu
2016-02-17 18:10 ` Michal Hocko
@ 2016-02-18 6:51 ` Xishi Qiu
2016-02-23 0:54 ` David Rientjes
[not found] ` <CAF7GXvqr2dmc7CUcs_OmfYnEA9jE_Db4kGGG1HJyYYLhC6Bgew@mail.gmail.com>
3 siblings, 1 reply; 10+ messages in thread
From: Xishi Qiu @ 2016-02-18 6:51 UTC (permalink / raw)
To: David Rientjes
Cc: Greg Kroah-Hartman, arve, riandrews, devel, zhong jiang, LKML, Linux MM
On 2016/2/17 8:35, David Rientjes wrote:
> On Tue, 16 Feb 2016, Greg Kroah-Hartman wrote:
>
>> On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
>>> Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
>>> But usually smart phones enable zram, so swap space actually use ram.
>>
>> Yes, but does that matter for this type of calculation? I need an ack
>> from the android team before I could ever take such a core change to
>> this code...
>>
>
> The calculation proposed in this patch is the same as the generic oom
> killer, it's an estimate of the amount of memory that will be freed if it
> is killed and can exit. This is better than simply get_mm_rss().
>
> However, I think we seriously need to re-consider the implementation of
> the lowmem killer entirely. It currently abuses the use of TIF_MEMDIE,
> which should ideally only be set for one thread on the system since it
> allows unbounded access to global memory reserves.
>
Hi David,
Does somebody do the work of re-implementation of the lowmem killer entirely
now? Could you give me some details? e.g. when and how?
Here are another two questions.
1) lmk has several lowmem thresholds, it's "lowmem_minfree[]", and the value is
static definition, so is it reasonable for different memory size(e.g. 2G/3G/4G...)
of smart phones?
2) There are many adjustable arguments in /proc/sys/vm/, and the default value
maybe not benefit for smart phones, so any suggestions?
Thanks,
Xishi Qiu
> It also abuses the user-visible /proc/self/oom_score_adj tunable: this
> tunable is used by the generic oom killer to bias or discount a proportion
> of memory from a process's usage. This is the only supported semantic of
> the tunable. The lowmem killer uses it as a strict prioritization, so any
> process with oom_score_adj higher than another process is preferred for
> kill, REGARDLESS of memory usage. This leads to priority inversion, the
> user is unable to always define the same process to be killed by the
> generic oom killer and the lowmem killer. This is what happens when a
> tunable with a very clear and defined purpose is used for other reasons.
>
> I'd seriously consider not accepting any additional hacks on top of this
> code until the implementation is rewritten.
>
> .
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
[not found] ` <CAF7GXvqr2dmc7CUcs_OmfYnEA9jE_Db4kGGG1HJyYYLhC6Bgew@mail.gmail.com>
@ 2016-02-18 10:21 ` Xishi Qiu
2016-02-23 0:50 ` David Rientjes
0 siblings, 1 reply; 10+ messages in thread
From: Xishi Qiu @ 2016-02-18 10:21 UTC (permalink / raw)
To: Figo.zhang, David Rientjes
Cc: Greg Kroah-Hartman, arve, riandrews, devel, zhong jiang, LKML, Linux MM
On 2016/2/18 15:55, Figo.zhang wrote:
>
>
> 2016-02-17 8:35 GMT+08:00 David Rientjes <rientjes@google.com <mailto:rientjes@google.com>>:
>
> On Tue, 16 Feb 2016, Greg Kroah-Hartman wrote:
>
> > On Tue, Feb 16, 2016 at 05:37:05PM +0800, Xishi Qiu wrote:
> > > Currently tasksize in lowmem_scan() only calculate rss, and not include swap.
> > > But usually smart phones enable zram, so swap space actually use ram.
> >
> > Yes, but does that matter for this type of calculation? I need an ack
> > from the android team before I could ever take such a core change to
> > this code...
> >
>
> The calculation proposed in this patch is the same as the generic oom
> killer, it's an estimate of the amount of memory that will be freed if it
> is killed and can exit. This is better than simply get_mm_rss().
>
> However, I think we seriously need to re-consider the implementation of
> the lowmem killer entirely. It currently abuses the use of TIF_MEMDIE,
> which should ideally only be set for one thread on the system since it
> allows unbounded access to global memory reserves.
>
>
>
> i don't understand why it need wait 1 second:
>
Hi David,
How about kill more processes at one time?
Usually loading camera will alloc 300-500M memory immediately, so call lmk
repeatedly is a waste of time.
And can we reclaim memory at one time instead of reclaim-alloc-reclaim-alloc...
in this situation? e.g. use try_to_free_pages(), set nr_to_reclaim=300M
Thanks,
Xishi Qiu
> if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> time_before_eq(jiffies, lowmem_deathpending_timeout)) {
> task_unlock(p);
> rcu_read_unlock();
> return 0; <= why return rather than continue?
> }
>
> and it will retry and wait many CPU times if one task holding the TIF_MEMDI.
> shrink_slab_node()
> while()
> shrinker->scan_objects();
> lowmem_scan()
> if (test_tsk_thread_flag(p, TIF_MEMDIE) &&
> time_before_eq(jiffies, lowmem_deathpending_timeout))
>
>
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
2016-02-18 10:21 ` Xishi Qiu
@ 2016-02-23 0:50 ` David Rientjes
0 siblings, 0 replies; 10+ messages in thread
From: David Rientjes @ 2016-02-23 0:50 UTC (permalink / raw)
To: Xishi Qiu
Cc: Figo.zhang, Greg Kroah-Hartman, arve, riandrews, devel,
zhong jiang, LKML, Linux MM
On Thu, 18 Feb 2016, Xishi Qiu wrote:
> How about kill more processes at one time?
>
> Usually loading camera will alloc 300-500M memory immediately, so call lmk
> repeatedly is a waste of time.
>
> And can we reclaim memory at one time instead of reclaim-alloc-reclaim-alloc...
> in this situation? e.g. use try_to_free_pages(), set nr_to_reclaim=300M
>
I don't use the lmk myself and it's never been used on my phone, so I
can't speak for the usecase. However, killing more than one process at a
time is generally a bad idea because it can allow processes to deplete
memory reserves which may be small on these systems. The lmk cannot be
considered a hotpath, so waiting for a process to exit and free its memory
for a small amount of time is generally not a bad trade-off when the
alternative is to kill many processes (perhaps unnecessarily), open
yourself up to livelock, and free memory that is potentially unneeded.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan()
2016-02-18 6:51 ` Xishi Qiu
@ 2016-02-23 0:54 ` David Rientjes
0 siblings, 0 replies; 10+ messages in thread
From: David Rientjes @ 2016-02-23 0:54 UTC (permalink / raw)
To: Xishi Qiu
Cc: Greg Kroah-Hartman, arve, riandrews, devel, zhong jiang, LKML, Linux MM
On Thu, 18 Feb 2016, Xishi Qiu wrote:
> Does somebody do the work of re-implementation of the lowmem killer entirely
> now? Could you give me some details? e.g. when and how?
>
I don't know of any plans or anybody working on reimplementing it to be
correct, I simply don't think we should pile more patches on top of an
already broken design.
> Here are another two questions.
> 1) lmk has several lowmem thresholds, it's "lowmem_minfree[]", and the value is
> static definition, so is it reasonable for different memory size(e.g. 2G/3G/4G...)
> of smart phones?
It looks like it is configurable from userspace and actually defaults to
6MB, 8MB, 16MB, and 64MB in the kernel. Reimplementing the lmk would have
to take this design decision into consideration and replace it with
something that would not cause existing userspace to break.
A sane implementation would be to do what vmpressure does in the kernel,
and that is to signal userspace when certain thresholds are met.
Userspace could then issue a SIGKILL to processes based on priority
(/proc/pid/oom_score_adj is world-readable by default) and the oom killer
will grant these processes access to memory reserves immediately if they
cannot allocate the memory needed to exit.
> 2) There are many adjustable arguments in /proc/sys/vm/, and the default value
> maybe not benefit for smart phones, so any suggestions?
>
I would assume that vm sysctls, like any other sysctls, would be tuned
with initscripts depending on the configuration, if necessary.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2016-02-23 0:54 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-16 9:37 [PATCH] mm: add MM_SWAPENTS and page table when calculate tasksize in lowmem_scan() Xishi Qiu
2016-02-16 17:38 ` Greg Kroah-Hartman
2016-02-17 0:35 ` David Rientjes
2016-02-17 8:30 ` Xishi Qiu
2016-02-17 22:42 ` David Rientjes
2016-02-17 18:10 ` Michal Hocko
2016-02-18 6:51 ` Xishi Qiu
2016-02-23 0:54 ` David Rientjes
[not found] ` <CAF7GXvqr2dmc7CUcs_OmfYnEA9jE_Db4kGGG1HJyYYLhC6Bgew@mail.gmail.com>
2016-02-18 10:21 ` Xishi Qiu
2016-02-23 0:50 ` David Rientjes
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).