linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Edward Chron <echron@arista.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	David Rientjes <rientjes@google.com>,
	Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>,
	Shakeel Butt <shakeelb@google.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Ivan Delalande <colona@arista.com>
Subject: Re: [PATCH] mm/oom: Add killed process selection information
Date: Fri, 9 Aug 2019 08:40:32 +0200	[thread overview]
Message-ID: <20190809064032.GJ18351@dhcp22.suse.cz> (raw)
In-Reply-To: <CAM3twVS7tqcHmHqjzJqO5DEsxzLfBaYF0FjVP+Jjb1ZS4rA9qA@mail.gmail.com>

[Again, please do not top post - it makes a mess of any longer
discussion]

On Thu 08-08-19 15:15:12, Edward Chron wrote:
> In our experience far more (99.9%+) OOM events are not kernel issues,
> they're user task memory issues.
> Properly maintained Linux kernel only rarely have issues.
> So useful information about the killed task, displayed in a manner
> that can be quickly digested, is very helpful.
> But it turns out the totalpages parameter is also critical to make
> sense of what is shown.

We already do print that information (see mem_cgroup_print_oom_meminfo
resp. show_mem).

> So if we report the fooWidget task was using ~15% of memory (I know
> this is just an approximation but it is often an adequate metric) we
> often can tell just from that the number is larger than expected so we
> can start there.
> Even though the % is a ballpark number, if you are familiar with the
> tasks on your system and approximately how much memory you expect them
> to use you can often tell if memory usage is excessive.
> This is not always the case but it is a fair amount of the time.
> So the % of memory field is helpful. But we've found we need totalpages as well.
> The totalpages effects the % of memory the task uses.

Is it too difficult to calculate that % from the data available in the
existing report? I would expect this would be a quite simple script
which I would consider a better than changing the kernel code.

[...]
> The oom_score tells us how Linux calculated the score for the task,
> the oom_score_adj effects this so it is helpful to have that in
> conjunction with the oom_score.
> If the adjust is high it can tell us that the task was acting as a
> canary and so it's oom_score is high even though it's memory
> utilization can be modest or low.

I am sorry but I still do not get it. How are you going to use that
information without seeing other eligible tasks. oom_score is just a
normalized memory usage + some heuristics potentially (we have given a
discount to root processes until just recently). So this value only
makes sense to the kernel oom killer implementation. Note that the
equation might change in the future (that has happen in the past several
times) so looking at the value in isolation might be quite misleading.

I can see some point in printing oom_score_adj, though. Seeing biased -
one way or the other - tasks being selected might confirm the setting is
reasonable or otherwise (e.g. seeing tasks with negative scores will
give an indication that they might be not biased enough). Then you can
go and check the eligible tasks dump and see what happened. So this part
makes some sense to me.
-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-08-09  6:40 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-08 18:32 [PATCH] mm/oom: Add killed process selection information Edward Chron
2019-08-08 18:51 ` Michal Hocko
     [not found]   ` <CAM3twVT0_f++p1jkvGuyMYtaYtzgEiaUtb8aYNCmNScirE4=og@mail.gmail.com>
2019-08-08 20:07     ` Michal Hocko
2019-08-08 22:15       ` Edward Chron
2019-08-09  6:40         ` Michal Hocko [this message]
2019-08-09 22:15           ` Edward Chron
2019-08-12 11:42             ` Michal Hocko
2019-08-15  6:24               ` Edward Chron
2019-08-15  8:18                 ` Michal Hocko
2019-08-15  6:06 Edward Chron
2019-08-15  8:24 ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190809064032.GJ18351@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=colona@arista.com \
    --cc=echron@arista.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=penguin-kernel@i-love.sakura.ne.jp \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).