linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Artem S. Tashkinov" <aros@gmx.com>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
Date: Wed, 7 Aug 2019 17:34:43 -0400	[thread overview]
Message-ID: <20190807213443.GA11227@cmpxchg.org> (raw)
In-Reply-To: <20190807140130.7418e783654a9c53e6b6cd1b@linux-foundation.org>

On Wed, Aug 07, 2019 at 02:01:30PM -0700, Andrew Morton wrote:
> On Wed, 7 Aug 2019 16:51:38 -0400 Johannes Weiner <hannes@cmpxchg.org> wrote:
> 
> > However, eb414681d5a0 ("psi: pressure stall information for CPU,
> > memory, and IO") introduced a memory pressure metric that quantifies
> > the share of wallclock time in which userspace waits on reclaim,
> > refaults, swapins. By using absolute time, it encodes all the above
> > mentioned variables of hardware capacity and workload behavior. When
> > memory pressure is 40%, it means that 40% of the time the workload is
> > stalled on memory, period. This is the actual measure for the lack of
> > forward progress that users can experience. It's also something they
> > expect the kernel to manage and remedy if it becomes non-existent.
> > 
> > To accomplish this, this patch implements a thrashing cutoff for the
> > OOM killer. If the kernel determines a sustained high level of memory
> > pressure, and thus a lack of forward progress in userspace, it will
> > trigger the OOM killer to reduce memory contention.
> > 
> > Per default, the OOM killer will engage after 15 seconds of at least
> > 80% memory pressure. These values are tunable via sysctls
> > vm.thrashing_oom_period and vm.thrashing_oom_level.
> 
> Could be implemented in userspace?
> </troll>

We do in fact do this with oomd.

But it requires a comprehensive cgroup setup, with complete memory and
IO isolation, to protect that daemon from the memory pressure and
excessive paging of the rest of the system (mlock doesn't really cut
it because you need to potentially allocate quite a few proc dentries
and inodes just to walk the process tree and determine a kill target).

In a fleet that works fine, since we need to maintain that cgroup
infra anyway. But for other users, that's a lot of stack for basic
"don't hang forever if I allocate too much memory" functionality.


  reply	other threads:[~2019-08-07 21:34 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <d9802b6a-949b-b327-c4a6-3dbca485ec20@gmx.com>
2019-08-05 12:13 ` Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure Vlastimil Babka
2019-08-05 13:31   ` Michal Hocko
2019-08-05 16:47     ` Suren Baghdasaryan
2019-08-05 18:55     ` Johannes Weiner
2019-08-06  9:29       ` Michal Hocko
2019-08-05 19:31   ` Johannes Weiner
2019-08-06  1:08     ` Suren Baghdasaryan
2019-08-06  9:36       ` Vlastimil Babka
2019-08-06 14:27         ` Johannes Weiner
2019-08-06 14:36           ` Michal Hocko
2019-08-06 16:27             ` Suren Baghdasaryan
2019-08-06 22:01               ` Johannes Weiner
2019-08-07  7:59                 ` Michal Hocko
2019-08-07 20:51                   ` Johannes Weiner
2019-08-07 21:01                     ` Andrew Morton
2019-08-07 21:34                       ` Johannes Weiner [this message]
2019-08-07 21:12                     ` Johannes Weiner
2019-08-08 11:48                     ` Michal Hocko
2019-08-08 15:10                       ` ndrw.xf
2019-08-08 16:32                         ` Michal Hocko
2019-08-08 17:57                           ` ndrw.xf
2019-08-08 18:59                             ` Michal Hocko
2019-08-08 21:59                               ` ndrw
2019-08-09  8:57                                 ` Michal Hocko
2019-08-09 10:09                                   ` ndrw
2019-08-09 10:50                                     ` Michal Hocko
2019-08-09 14:18                                       ` Pintu Agarwal
2019-08-10 12:34                                       ` ndrw
2019-08-12  8:24                                         ` Michal Hocko
2019-08-10 21:07                                   ` ndrw
2021-07-24 17:32                         ` Alexey Avramov
2021-07-25  2:11                           ` Hillf Danton
2019-08-08 14:47                     ` Vlastimil Babka
2019-08-08 17:27                       ` Johannes Weiner
2019-08-09 14:56                         ` Vlastimil Babka
2019-08-09 17:31                           ` Johannes Weiner
2019-08-13 13:47                             ` Vlastimil Babka
2019-08-06 21:43       ` James Courtier-Dutton
2019-08-06 19:00 ` Florian Weimer
2019-08-05  9:05 Hillf Danton
2019-08-05 12:01 ` Artem S. Tashkinov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190807213443.GA11227@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=aros@gmx.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).