linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: <Peter.Enderborg@sony.com>
To: <shakeelb@google.com>
Cc: <hannes@cmpxchg.org>, <guro@fb.com>, <mhocko@kernel.org>,
	<linux-mm@kvack.org>, <akpm@linux-foundation.org>,
	<cgroups@vger.kernel.org>, <rientjes@google.com>,
	<linux-kernel@vger.kernel.org>, <surenb@google.com>,
	<gthelen@google.com>, <dragoss@google.com>,
	<padmapriyad@google.com>
Subject: Re: [RFC] memory reserve for userspace oom-killer
Date: Thu, 22 Apr 2021 05:38:13 +0000	[thread overview]
Message-ID: <6eaa4c24-c565-bc5d-dbca-b73c72569a16@sony.com> (raw)
In-Reply-To: <CALvZod5+5ycobmSt=NC3VJF4FRMFmBQEN7SQgipyTDbzHEbPUQ@mail.gmail.com>

On 4/21/21 9:18 PM, Shakeel Butt wrote:
> On Wed, Apr 21, 2021 at 11:46 AM <Peter.Enderborg@sony.com> wrote:
>> On 4/21/21 8:28 PM, Shakeel Butt wrote:
>>> On Wed, Apr 21, 2021 at 10:06 AM peter enderborg
>>> <peter.enderborg@sony.com> wrote:
>>>> On 4/20/21 3:44 AM, Shakeel Butt wrote:
>>> [...]
>>>> I think this is the wrong way to go.
>>> Which one? Are you talking about the kernel one? We already talked out
>>> of that. To decide to OOM, we need to look at a very diverse set of
>>> metrics and it seems like that would be very hard to do flexibly
>>> inside the kernel.
>> You dont need to decide to oom, but when oom occurs you
>> can take a proper action.
> No, we want the flexibility to decide when to oom-kill. Kernel is very
> conservative in triggering the oom-kill.

It wont do it for you. We use this code to solve that:

/*
 *  lowmemorykiller_oom
 *
 *  Author: Peter Enderborg <peter.enderborg@sonymobile.com>
 *
 *  This program is free software; you can redistribute it and/or modify
 *  it under the terms of the GNU General Public License version 2 as
 *  published by the Free Software Foundation.
 */

/* add fake print format with original module name */
#define pr_fmt(fmt) "lowmemorykiller: " fmt

#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/oom.h>

#include <trace/events/lmk.h>

#include "lowmemorykiller.h"
#include "lowmemorykiller_tng.h"
#include "lowmemorykiller_stats.h"
#include "lowmemorykiller_tasks.h"

/**
 * lowmemorykiller_oom_notify - OOM notifier
 * @self:    notifier block struct
 * @notused:    not used
 * @parm:    returned - number of pages freed
 *
 * Return value:
 *    NOTIFY_OK
 **/

static int lowmemorykiller_oom_notify(struct notifier_block *self,
                      unsigned long notused, void *param)
{
    struct lmk_rb_watch *lrw;
    unsigned long *nfreed = param;

    lowmem_print(2, "oom notify event\n");
    *nfreed = 0;
    lmk_inc_stats(LMK_OOM_COUNT);
    spin_lock_bh(&lmk_task_lock);
    lrw = __lmk_task_first();
    if (lrw) {
        struct task_struct *selected = lrw->tsk;
        struct lmk_death_pending_entry *ldpt;

        if (!task_trylock_lmk(selected)) {
            lmk_inc_stats(LMK_ERROR);
            lowmem_print(1, "Failed to lock task.\n");
            lmk_inc_stats(LMK_BUSY);
            goto unlock_out;
        }

        get_task_struct(selected);
        /* move to kill pending set */
        ldpt = kmem_cache_alloc(lmk_dp_cache, GFP_ATOMIC);
        /* if we fail to alloc we ignore the death pending list */
        if (ldpt) {
            ldpt->tsk = selected;
            __lmk_death_pending_add(ldpt);
        } else {
            WARN_ON(1);
            lmk_inc_stats(LMK_MEM_ERROR);
            trace_lmk_sigkill(selected->pid, selected->comm,
                      LMK_TRACE_MEMERROR, *nfreed, 0);
        }
        if (!__lmk_task_remove(selected, lrw->key))
            WARN_ON(1);

        spin_unlock_bh(&lmk_task_lock);
        *nfreed = get_task_rss(selected);
        send_sig(SIGKILL, selected, 0);

        LMK_TAG_TASK_DIE(selected);
        trace_lmk_sigkill(selected->pid, selected->comm,
                  LMK_TRACE_OOMKILL, *nfreed,
                  0);

        task_unlock(selected);
        put_task_struct(selected);
        lmk_inc_stats(LMK_OOM_KILL_COUNT);
        goto out;
    }
unlock_out:
    spin_unlock_bh(&lmk_task_lock);
out:
    return NOTIFY_OK;
}

static struct notifier_block lowmemorykiller_oom_nb = {
    .notifier_call = lowmemorykiller_oom_notify
};

int __init lowmemorykiller_register_oom_notifier(void)
{
    register_oom_notifier(&lowmemorykiller_oom_nb);
    return 0;
}


So what needed is a function that knows the
priority. Here __lmk_task_first() that is from a rb-tree.

You can pick what ever priority you like. In our case it is a
android so it is a strictly oom_adj order in the tree.

I think you can do the same with a old lowmemmorykiller style  with
a full task scan too.

> [...]
>>> Actually no. It is missing the flexibility to monitor metrics which a
>>> user care and based on which they decide to trigger oom-kill. Not sure
>>> how will watchdog replace psi/vmpressure? Userspace keeps petting the
>>> watchdog does not mean that system is not suffering.
>> The userspace should very much do what it do. But when it
>> does not do what it should do, including kick the WD. Then
>> the kernel kicks in and kill a pre defined process or as many
>> as needed until the monitoring can start to kick and have the
>> control.
>>
> Roman already suggested something similar (i.e. oom-killer core and
> extended and core watching extended) but completely in userspace. I
> don't see why we would want to do that in the kernel instead.

A watchdog in kernel will work if userspace is completely broken
or staved with low on memory.


>
>>> In addition oom priorities change dynamically and changing it in your
>>> system seems very hard. Cgroup awareness is missing too.
>> Why is that hard? Moving a object in a rb-tree is as good it get.
>>
> It is a group of objects. Anyways that is implementation detail.
>
> The message I got from this exchange is that we can have a watchdog
> (userspace or kernel) to further improve the reliability of userspace
> oom-killers.


  reply	other threads:[~2021-04-22  5:39 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-20  1:44 [RFC] memory reserve for userspace oom-killer Shakeel Butt
2021-04-20  6:45 ` Michal Hocko
2021-04-20 16:04   ` Shakeel Butt
2021-04-21  7:16     ` Michal Hocko
2021-04-21 13:57       ` Shakeel Butt
2021-04-21 14:29         ` Michal Hocko
2021-04-22 12:33           ` [RFC PATCH] Android OOM helper proof of concept peter enderborg
2021-04-22 13:03             ` Michal Hocko
2021-05-05  0:37           ` [RFC] memory reserve for userspace oom-killer Shakeel Butt
2021-05-05  1:26             ` Suren Baghdasaryan
2021-05-05  2:45               ` Shakeel Butt
2021-05-05  2:59                 ` Suren Baghdasaryan
2021-05-05  2:43             ` Hillf Danton
2021-04-20 19:17 ` Roman Gushchin
2021-04-20 19:36   ` Suren Baghdasaryan
2021-04-21  1:18   ` Shakeel Butt
2021-04-21  2:58     ` Roman Gushchin
2021-04-21 13:26       ` Shakeel Butt
2021-04-21 19:04         ` Roman Gushchin
2021-04-21  7:23     ` Michal Hocko
2021-04-21 14:13       ` Shakeel Butt
2021-04-21 17:05 ` peter enderborg
2021-04-21 18:28   ` Shakeel Butt
2021-04-21 18:46     ` Peter.Enderborg
2021-04-21 19:18       ` Shakeel Butt
2021-04-22  5:38         ` Peter.Enderborg [this message]
2021-04-22 14:27           ` Shakeel Butt
2021-04-22 15:41             ` Peter.Enderborg
2021-04-22 13:08   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6eaa4c24-c565-bc5d-dbca-b73c72569a16@sony.com \
    --to=peter.enderborg@sony.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dragoss@google.com \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=padmapriyad@google.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).