linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: ndrw <ndrw.xf@redhazel.co.uk>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	"Artem S. Tashkinov" <aros@gmx.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
Date: Fri, 9 Aug 2019 10:57:48 +0200	[thread overview]
Message-ID: <20190809085748.GN18351@dhcp22.suse.cz> (raw)
In-Reply-To: <08e5d007-a41a-e322-5631-b89978b9cc20@redhazel.co.uk>

On Thu 08-08-19 22:59:32, ndrw wrote:
> On 08/08/2019 19:59, Michal Hocko wrote:
> > Well, I am afraid that implementing anything like that in the kernel
> > will lead to many regressions and bug reports. People tend to have very
> > different opinions on when it is suitable to kill a potentially
> > important part of a workload just because memory gets low.
> 
> Are you proposing having a zero memory reserve or not having such option at
> all? I'm fine with the current default (zero reserve/margin).

We already do have a reserve (min_free_kbytes). That gives kswapd some
room to perform reclaim in the background without obvious latencies to
allocating tasks (well CPU still be used so there is still some effect).

Kswapd tries to keep a balance and free memory low but still with some
room to satisfy an immediate memory demand. Once kswapd doesn't catch up
with the memory demand we dive into the direct reclaim and that is where
people usually see latencies coming from.

The main problem here is that it is hard to tell from a single
allocation latency that we have a bigger problem. As already said, the
usual trashing scenario doesn't show problem during the reclaim because
pages can be freed up very efficiently. The problem is that they are
refaulted very quickly so we are effectively rotating working set like
crazy. Compare that to a normal used-once streaming IO workload which is
generating a lot of page cache that can be recycled in a similar pace
but a working set doesn't get freed. Free memory figures will look very
similar in both cases.

> I strongly prefer forcing OOM killer when the system is still running
> normally. Not just for preventing stalls: in my limited testing I found the
> OOM killer on a stalled system rather inaccurate, occasionally killing
> system services etc. I had much better experience with earlyoom.

Good that earlyoom works for you. All I am saying is that this is not
generally applicable heuristic because we do care about a larger variety
of workloads. I should probably emphasise that the OOM killer is there
as a _last resort_ hand break when something goes terribly wrong. It
operates at times when any user intervention would be really hard
because there is a lack of resources to be actionable.

[...]
> > > > PSI is giving you a matric that tells you how much time you
> > > > spend on the memory reclaim. So you can start watching the system from
> > > > lower utilization already.
> 
> I've tested it on a system with 45GB of RAM, SSD, swap disabled (my
> intention was to approximate a worst-case scenario) and it didn't really
> detect stall before it happened. I can see some activity after reaching
> ~42GB, the system remains fully responsive until it suddenly freezes and
> requires sysrq-f.

This is a useful feedback! What was your workload? Which kernel version?

-- 
Michal Hocko
SUSE Labs

  reply	other threads:[~2019-08-09  8:57 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-04  9:23 Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure Artem S. Tashkinov
2019-08-05 12:13 ` Vlastimil Babka
2019-08-05 13:31   ` Michal Hocko
2019-08-05 16:47     ` Suren Baghdasaryan
2019-08-05 18:55     ` Johannes Weiner
2019-08-06  9:29       ` Michal Hocko
2019-08-05 19:31   ` Johannes Weiner
2019-08-06  1:08     ` Suren Baghdasaryan
2019-08-06  9:36       ` Vlastimil Babka
2019-08-06 14:27         ` Johannes Weiner
2019-08-06 14:36           ` Michal Hocko
2019-08-06 16:27             ` Suren Baghdasaryan
2019-08-06 22:01               ` Johannes Weiner
2019-08-07  7:59                 ` Michal Hocko
2019-08-07 20:51                   ` Johannes Weiner
2019-08-07 21:01                     ` Andrew Morton
2019-08-07 21:34                       ` Johannes Weiner
2019-08-07 21:12                     ` Johannes Weiner
2019-08-08 11:48                     ` Michal Hocko
2019-08-08 15:10                       ` ndrw.xf
2019-08-08 16:32                         ` Michal Hocko
2019-08-08 17:57                           ` ndrw.xf
2019-08-08 18:59                             ` Michal Hocko
2019-08-08 21:59                               ` ndrw
2019-08-09  8:57                                 ` Michal Hocko [this message]
2019-08-09 10:09                                   ` ndrw
2019-08-09 10:50                                     ` Michal Hocko
2019-08-09 14:18                                       ` Pintu Agarwal
2019-08-10 12:34                                       ` ndrw
2019-08-12  8:24                                         ` Michal Hocko
2019-08-10 21:07                                   ` ndrw
2021-07-24 17:32                         ` Alexey Avramov
2019-08-08 14:47                     ` Vlastimil Babka
2019-08-08 17:27                       ` Johannes Weiner
2019-08-09 14:56                         ` Vlastimil Babka
2019-08-09 17:31                           ` Johannes Weiner
2019-08-13 13:47                             ` Vlastimil Babka
2019-08-06 21:43       ` James Courtier-Dutton
2019-08-06 19:00 ` Florian Weimer
2019-08-20  6:46 ` Daniel Drake
2019-08-21 21:42   ` James Courtier-Dutton
2019-08-29 12:29     ` Michal Hocko
2019-09-02 20:15     ` Pavel Machek
2019-08-23  1:54   ` ndrw
2019-08-23  2:14     ` Daniel Drake
     [not found] <20190805090514.5992-1-hdanton@sina.com>
2019-08-05 12:01 ` Artem S. Tashkinov
2019-08-06  8:57 Johannes Buchner
2019-08-06 19:43 Remi Gauvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190809085748.GN18351@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=aros@gmx.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ndrw.xf@redhazel.co.uk \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).