All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Pavel Machek <pavel@ucw.cz>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@osdl.org>,
	linux-mm@kvack.org, akpm@linux-foundation.org
Subject: Re: OOM killer not nearly agressive enough?
Date: Fri, 10 Jan 2020 07:31:47 +0100	[thread overview]
Message-ID: <20200110063147.GB29802@dhcp22.suse.cz> (raw)
In-Reply-To: <20200109224845.GA1220@amd>

On Thu 09-01-20 23:48:45, Pavel Machek wrote:
> Hi!
> 
> > > > > Do we agree that OOM killer should have reacted way sooner?
> > > > 
> > > > This is impossible to answer without knowing what was going on at the
> > > > time. Was the system threshing over page cache/swap? In other words, is
> > > > the system completely out of memory or refaulting the working set all
> > > > the time because it doesn't fit into memory?
> > > 
> > > Swap was full, so "completely out of memory", I guess. Chromium does
> > > that fairly often :-(.
> > 
> > The oom heuristic is based on the reclaim failure. If the reclaim makes
> > some progress then the oom killer is not hit. Have a look at
> > should_reclaim_retry for more details.
> 
> Thanks for pointer.
> 
> I guess setting MAX_RECLAIM_RETRIES to 1 is not something you'd
> recommend? :-).

You can certainly play with that. I am not overly optimistic that would
help though because symptoms of a threshing system is that we actually
do not even reach this point. Pages are simply recycled but they evict
other part of the hot working set. But I am only guessing what is the
problem in your case. Anyway MAX_RECLAIM_RETRIES would tend to be more
timing sensitive in general. If the reclaim progress cannot be made
because of IO latencies or other resource depletion then the OOM be
declared too early. The current MAX_RECLAIM_RETRIES is not something we
have tuned for in any sense. I remember it didn't make much difference
to change it unless the number would be really high which would be
signal that the reclaim is not throttled very well.

> > > PSI is completely different system, but I guess
> > > I should attempt to tweak the existing one first...
> > 
> > PSI is measuring the cost of the allocation (among other things) and
> > that can give you some idea on how much time is spent to get memory.
> > Userspace can implement a policy based on that and act. The kernel oom
> > killer is the last resort when there is really no memory to
> > allocate.
> 
> So what I'm seeing is system that is unresponsive, easily for an hour.
> 
> Sometimes, I'm able to log in. When I could do that, system was
> absurdly slow, like ps printing at more than 10 seconds per line.
> ps on my system takes 300msec, estimate in the slow case would be 2000
> seconds, that is slowdown by factor of 6000x. That would be X terminal
> opening in like two hours... that's not really usable.

It would be great to find out what is the bottle neck. Is the allocator
stuck in the memory reclaim? Waiting on some lock? Reclaiming pages
which are stolen by other contending processes?

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2020-01-10  6:31 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-07 20:44 OOM killer not nearly agressive enough? Pavel Machek
2020-01-09 11:56 ` Michal Hocko
2020-01-09 21:03   ` Pavel Machek
2020-01-09 21:25     ` Michal Hocko
2020-01-09 22:48       ` Pavel Machek
2020-01-10  1:24         ` Shakeel Butt
2020-01-10  1:24           ` Shakeel Butt
2020-01-10  6:31         ` Michal Hocko [this message]
2020-01-09 21:46     ` Vito Caputo
2020-01-09 21:58       ` Michal Hocko
2020-01-09 21:05   ` Pavel Machek
2020-01-09 21:28     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200110063147.GB29802@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=akpm@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.