From: <Peter.Enderborg@sony.com>
To: <shakeelb@google.com>
Cc: <hannes@cmpxchg.org>, <guro@fb.com>, <mhocko@kernel.org>,
<linux-mm@kvack.org>, <akpm@linux-foundation.org>,
<cgroups@vger.kernel.org>, <rientjes@google.com>,
<linux-kernel@vger.kernel.org>, <surenb@google.com>,
<gthelen@google.com>, <dragoss@google.com>,
<padmapriyad@google.com>
Subject: Re: [RFC] memory reserve for userspace oom-killer
Date: Thu, 22 Apr 2021 05:38:13 +0000 [thread overview]
Message-ID: <6eaa4c24-c565-bc5d-dbca-b73c72569a16@sony.com> (raw)
In-Reply-To: <CALvZod5+5ycobmSt=NC3VJF4FRMFmBQEN7SQgipyTDbzHEbPUQ@mail.gmail.com>
On 4/21/21 9:18 PM, Shakeel Butt wrote:
> On Wed, Apr 21, 2021 at 11:46 AM <Peter.Enderborg@sony.com> wrote:
>> On 4/21/21 8:28 PM, Shakeel Butt wrote:
>>> On Wed, Apr 21, 2021 at 10:06 AM peter enderborg
>>> <peter.enderborg@sony.com> wrote:
>>>> On 4/20/21 3:44 AM, Shakeel Butt wrote:
>>> [...]
>>>> I think this is the wrong way to go.
>>> Which one? Are you talking about the kernel one? We already talked out
>>> of that. To decide to OOM, we need to look at a very diverse set of
>>> metrics and it seems like that would be very hard to do flexibly
>>> inside the kernel.
>> You dont need to decide to oom, but when oom occurs you
>> can take a proper action.
> No, we want the flexibility to decide when to oom-kill. Kernel is very
> conservative in triggering the oom-kill.
It wont do it for you. We use this code to solve that:
/*
* lowmemorykiller_oom
*
* Author: Peter Enderborg <peter.enderborg@sonymobile.com>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*/
/* add fake print format with original module name */
#define pr_fmt(fmt) "lowmemorykiller: " fmt
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/oom.h>
#include <trace/events/lmk.h>
#include "lowmemorykiller.h"
#include "lowmemorykiller_tng.h"
#include "lowmemorykiller_stats.h"
#include "lowmemorykiller_tasks.h"
/**
* lowmemorykiller_oom_notify - OOM notifier
* @self: notifier block struct
* @notused: not used
* @parm: returned - number of pages freed
*
* Return value:
* NOTIFY_OK
**/
static int lowmemorykiller_oom_notify(struct notifier_block *self,
unsigned long notused, void *param)
{
struct lmk_rb_watch *lrw;
unsigned long *nfreed = param;
lowmem_print(2, "oom notify event\n");
*nfreed = 0;
lmk_inc_stats(LMK_OOM_COUNT);
spin_lock_bh(&lmk_task_lock);
lrw = __lmk_task_first();
if (lrw) {
struct task_struct *selected = lrw->tsk;
struct lmk_death_pending_entry *ldpt;
if (!task_trylock_lmk(selected)) {
lmk_inc_stats(LMK_ERROR);
lowmem_print(1, "Failed to lock task.\n");
lmk_inc_stats(LMK_BUSY);
goto unlock_out;
}
get_task_struct(selected);
/* move to kill pending set */
ldpt = kmem_cache_alloc(lmk_dp_cache, GFP_ATOMIC);
/* if we fail to alloc we ignore the death pending list */
if (ldpt) {
ldpt->tsk = selected;
__lmk_death_pending_add(ldpt);
} else {
WARN_ON(1);
lmk_inc_stats(LMK_MEM_ERROR);
trace_lmk_sigkill(selected->pid, selected->comm,
LMK_TRACE_MEMERROR, *nfreed, 0);
}
if (!__lmk_task_remove(selected, lrw->key))
WARN_ON(1);
spin_unlock_bh(&lmk_task_lock);
*nfreed = get_task_rss(selected);
send_sig(SIGKILL, selected, 0);
LMK_TAG_TASK_DIE(selected);
trace_lmk_sigkill(selected->pid, selected->comm,
LMK_TRACE_OOMKILL, *nfreed,
0);
task_unlock(selected);
put_task_struct(selected);
lmk_inc_stats(LMK_OOM_KILL_COUNT);
goto out;
}
unlock_out:
spin_unlock_bh(&lmk_task_lock);
out:
return NOTIFY_OK;
}
static struct notifier_block lowmemorykiller_oom_nb = {
.notifier_call = lowmemorykiller_oom_notify
};
int __init lowmemorykiller_register_oom_notifier(void)
{
register_oom_notifier(&lowmemorykiller_oom_nb);
return 0;
}
So what needed is a function that knows the
priority. Here __lmk_task_first() that is from a rb-tree.
You can pick what ever priority you like. In our case it is a
android so it is a strictly oom_adj order in the tree.
I think you can do the same with a old lowmemmorykiller style with
a full task scan too.
> [...]
>>> Actually no. It is missing the flexibility to monitor metrics which a
>>> user care and based on which they decide to trigger oom-kill. Not sure
>>> how will watchdog replace psi/vmpressure? Userspace keeps petting the
>>> watchdog does not mean that system is not suffering.
>> The userspace should very much do what it do. But when it
>> does not do what it should do, including kick the WD. Then
>> the kernel kicks in and kill a pre defined process or as many
>> as needed until the monitoring can start to kick and have the
>> control.
>>
> Roman already suggested something similar (i.e. oom-killer core and
> extended and core watching extended) but completely in userspace. I
> don't see why we would want to do that in the kernel instead.
A watchdog in kernel will work if userspace is completely broken
or staved with low on memory.
>
>>> In addition oom priorities change dynamically and changing it in your
>>> system seems very hard. Cgroup awareness is missing too.
>> Why is that hard? Moving a object in a rb-tree is as good it get.
>>
> It is a group of objects. Anyways that is implementation detail.
>
> The message I got from this exchange is that we can have a watchdog
> (userspace or kernel) to further improve the reliability of userspace
> oom-killers.
next prev parent reply other threads:[~2021-04-22 5:39 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-20 1:44 [RFC] memory reserve for userspace oom-killer Shakeel Butt
2021-04-20 6:45 ` Michal Hocko
2021-04-20 16:04 ` Shakeel Butt
2021-04-21 7:16 ` Michal Hocko
2021-04-21 13:57 ` Shakeel Butt
2021-04-21 14:29 ` Michal Hocko
2021-04-22 12:33 ` [RFC PATCH] Android OOM helper proof of concept peter enderborg
2021-04-22 13:03 ` Michal Hocko
2021-05-05 0:37 ` [RFC] memory reserve for userspace oom-killer Shakeel Butt
2021-05-05 1:26 ` Suren Baghdasaryan
2021-05-05 2:45 ` Shakeel Butt
2021-05-05 2:59 ` Suren Baghdasaryan
2021-05-05 2:43 ` Hillf Danton
2021-04-20 19:17 ` Roman Gushchin
2021-04-20 19:36 ` Suren Baghdasaryan
2021-04-21 1:18 ` Shakeel Butt
2021-04-21 2:58 ` Roman Gushchin
2021-04-21 13:26 ` Shakeel Butt
2021-04-21 19:04 ` Roman Gushchin
2021-04-21 7:23 ` Michal Hocko
2021-04-21 14:13 ` Shakeel Butt
2021-04-21 17:05 ` peter enderborg
2021-04-21 18:28 ` Shakeel Butt
2021-04-21 18:46 ` Peter.Enderborg
2021-04-21 19:18 ` Shakeel Butt
2021-04-22 5:38 ` Peter.Enderborg [this message]
2021-04-22 14:27 ` Shakeel Butt
2021-04-22 15:41 ` Peter.Enderborg
2021-04-22 13:08 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6eaa4c24-c565-bc5d-dbca-b73c72569a16@sony.com \
--to=peter.enderborg@sony.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=dragoss@google.com \
--cc=gthelen@google.com \
--cc=guro@fb.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=padmapriyad@google.com \
--cc=rientjes@google.com \
--cc=shakeelb@google.com \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).