linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@suse.com>
To: Alexey Avramov <hakavlad@inbox.lv>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	ValdikSS <iam@valdikss.org.ru>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	corbet@lwn.net, mcgrof@kernel.org, keescook@chromium.org,
	yzaikin@google.com, oleksandr@natalenko.name, kernel@xanmod.org,
	aros@gmx.com, hakavlad@gmail.com, Yu Zhao <yuzhao@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	hdanton@sina.com, riel@surriel.com,
	Shakeel Butt <shakeelb@google.com>
Subject: Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the working set
Date: Mon, 13 Dec 2021 10:07:28 +0100	[thread overview]
Message-ID: <YbcNUEZ08lmbv0RM@dhcp22.suse.cz> (raw)
In-Reply-To: <20211213051521.21f02dd2@mail.inbox.lv>

On Mon 13-12-21 05:15:21, Alexey Avramov wrote:
> So, the problem described by Artem S. Tashkinov in 2019 is still easily 
> reproduced in 2021. The assurances of the maintainers that they consider 
> the thrashing and near-OOM stalls to be a serious problems are difficult to 
> take seriously while they ignore the obvious solution: if reclaiming file 
> caches leads to thrashing, then you just need to prohibit deleting the file 
> cache. And allow the user to control its minimum amount.

These are rather strong claims. While this might sound like a very easy
solution/workaround I have already tried to express my concerns [1].

Really, you should realize that such a knob would become carved
into stone as soon as wee merge this and we will need to support it
for ever! It is really painful (if possible at all) to deprecate any
tunable knobs that cannot be supported anymore because the underlying
implementation doesn't allow for that.  So we would absolutely need to
be sure this is the right approach to the problem.  I am not convinced
about that though.

How does the admin know the limit should be set to a certain
workload? What if the workload characteristics change and the existing
setting is just to restrictive? What if the workload istrashing over
something different than anon/file memory (e.g. any other cache that we
have or might have in the future)?

As you have pointed out there were general recommendations to use user
space based oom killer solutions which can be tuned for the specific
workload or used in an environment where the disruptive OOM killer
action is less of a problem because workload can be restarted easily
without too much harm caused by the oom killer.
Please keep in mind that there are many more different workloads that
have different requirements and an oom killer invocation can be really
much worse than a slow progress due to ephemeral, peak or even longer
term trashing or heavy refaults.

The kernel OOM killer acts as the last resort solution and therefore
stays really conservative. I do believe that integrating PSI metrics
into that decision is the right direction. It is not a trivial one
though.

Why is this better approach than a simple limit? Well, for one, it is a
feedback based solution. System knows it is trashing and can estimate
how hard. It is not about a specific type of memory because we can
detect refaults on both file and anonymous memory (it can be extended
should there be a need for future types of caches or reclaimable
memory). Memory reclaim can work with that information and balance
differen resources dynamically based on the available feedback. MM code
will not need to expose implementation details about how the reclaim
works and so we do not bind ourselves into longterm specifics.

See the difference?

If you can live with pre-mature and over-eager OOM killer policy then
all fine. Use existing userspace solutions. If you want to work on an in
kernel solution please try to understand complexity and historical
experience with similar solution first. It also helps to understand that
there are no simple solutions on the table. MM reclaim code has evolved
over many years. I am strongly suspecting we ran out of simple solutions
already. We also got burnt many times. Let's not repeat some errors
again.

[1] http://lkml.kernel.org/r/Ya3fG2rp+860Yb+t@dhcp22.suse.cz

-- 
Michal Hocko
SUSE Labs

  parent reply	other threads:[~2021-12-13  9:07 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-30 11:16 [PATCH] mm/vmscan: add sysctl knobs for protecting the working set Alexey Avramov
2021-11-30 15:28 ` Luis Chamberlain
2021-11-30 18:56 ` Oleksandr Natalenko
2021-12-01 15:51   ` Alexey Avramov
2021-12-02 18:05 ` ValdikSS
2021-12-02 21:58   ` Andrew Morton
2021-12-03 11:59     ` Vlastimil Babka
2021-12-03 13:27       ` Alexey Avramov
2021-12-06  9:59         ` Michal Hocko
2022-01-09 22:59           ` Barry Song
2021-12-03 14:01     ` Oleksandr Natalenko
2021-12-12 20:15     ` Alexey Avramov
2021-12-13  9:06       ` Barry Song
2021-12-13  9:07       ` Michal Hocko [this message]
2021-12-13  8:38   ` Barry Song
2022-01-25  8:19     ` ValdikSS
2022-02-12  0:01       ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YbcNUEZ08lmbv0RM@dhcp22.suse.cz \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=aros@gmx.com \
    --cc=corbet@lwn.net \
    --cc=hakavlad@gmail.com \
    --cc=hakavlad@inbox.lv \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=iam@valdikss.org.ru \
    --cc=keescook@chromium.org \
    --cc=kernel@xanmod.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=oleksandr@natalenko.name \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=yuzhao@google.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).