All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexey Avramov <hakavlad@inbox.lv>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: ValdikSS <iam@valdikss.org.ru>,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	corbet@lwn.net, mcgrof@kernel.org, keescook@chromium.org,
	yzaikin@google.com, oleksandr@natalenko.name, kernel@xanmod.org,
	aros@gmx.com, hakavlad@gmail.com, Yu Zhao <yuzhao@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Michal Hocko <mhocko@suse.com>,
	hannes@cmpxchg.org, hdanton@sina.com, riel@surriel.com,
	Shakeel Butt <shakeelb@google.com>
Subject: Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the working set
Date: Mon, 13 Dec 2021 05:15:21 +0900	[thread overview]
Message-ID: <20211213051521.21f02dd2@mail.inbox.lv> (raw)
In-Reply-To: <20211202135824.33d2421bf5116801cfa2040d@linux-foundation.org>

> I don't think that the limits should be "N bytes on the current node". 

It's not a problem to add a _ratio knobs. How the tunables should look and 
what their default values should be can still be discussed. Now my task is 
to prove that the problem exists and the solution I have proposed is 
effective and correct.

> the various zones have different size as well.

I'll just point out the precedent: sc->file_is_tiny works the same way 
(per node) as suggested sc->clean_below_min etc.

> We do already have a lot of sysctls for controlling these sort of
> things.  

There are many of them, but there are no most important ones for solving 
the problem - those that are proposed in the patch. 

> Was much work put into attempting to utilize the existing
> sysctls to overcome these issues?

Oh yes! This is all I have been doing for the last 4 years. At the end of 
2017, I was forced to write my own userspace OOM killer [1] to resist 
freezes (I didn't know then that earlyoom already existed).

In 2018, Facebook came on the scene with its oomd [2]:

> The traditional Linux OOM killer works fine in some cases, but in others 
> it kicks in too late, resulting in the system entering a livelock for an 
> indeterminate period.

Here we can assume that Facebook's engineers haven't found the kernel 
sysctl tunables that would satisfy them.

In 2019 LKML people could not offer Artem S. Tashkinov a simple solution to 
the problem he described [3]. In addition to discussing user-space 
solutions, 2 kernel-side solutions are proposed:

- PSI-based solution was proposed by Johannes Weiner [4].
- Reserve a fixed (configurable) amount of RAM for caches, and trigger OOM 
  killer earlier, before most UI code is evicted from memory was suggested 
  by ndrw [5]. This is what I propose to accept in the mainline. It is the 
  right way to go.

None of the suggestions posted in that thread were accepted in the 
mainline.

In 2019, at the same time, Fedora Workstation group discussed [6]
Issue #98 Better interactivity in low-memory situations.
As a result, it was decided to enable earlyoom by default for Fedora 
Workstation 32. No existing sysctl was found to be of much help.
It was also suggested to use a swap on zram and to enable the cgroup-based 
uresourced daemon to protect the user session.

So, the problem described by Artem S. Tashkinov in 2019 is still easily 
reproduced in 2021. The assurances of the maintainers that they consider 
the thrashing and near-OOM stalls to be a serious problems are difficult to 
take seriously while they ignore the obvious solution: if reclaiming file 
caches leads to thrashing, then you just need to prohibit deleting the file 
cache. And allow the user to control its minimum amount.
By the way, the implementation of such an idea has been known [7] since 
2010 and was even used in Chrome OS.

Bonus: demo: https://youtu.be/ZrLqUWRodh4
Debian 11 on VM, Linux 5.14 with the patch, no swap space, 
playing SuperTux while 1000 `tail /dev/zero` started simultaneously:
1. No freezes with vm.clean_min_kbytes=300000, I/O pressure was closed to 
   zero, memory pressure was moderate (70-80 some, 12-17 full), all tail 
   processes has been killed in 2 minutes (0:06 - 2:14), it's about 
   8 processes reaped by oom_reaper per second;
2. Complete UI freeze without the working set protection (since 3:40).

[1] https://github.com/hakavlad/nohang
[2] https://engineering.fb.com/2018/07/19/production-engineering/oomd/
[3] https://lore.kernel.org/lkml/d9802b6a-949b-b327-c4a6-3dbca485ec20@gmx.com/
[4] https://lore.kernel.org/lkml/20190807205138.GA24222@cmpxchg.org/
[5] https://lore.kernel.org/lkml/806F5696-A8D6-481D-A82F-49DEC1F2B035@redhazel.co.uk/
[6] https://pagure.io/fedora-workstation/issue/98
[7] https://lore.kernel.org/lkml/20101028191523.GA14972@google.com/


  parent reply	other threads:[~2021-12-12 20:15 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-30 11:16 [PATCH] mm/vmscan: add sysctl knobs for protecting the working set Alexey Avramov
2021-11-30 15:28 ` Luis Chamberlain
2021-11-30 16:15 ` kernel test robot
2021-11-30 16:15   ` kernel test robot
2021-11-30 17:37 ` kernel test robot
2021-11-30 18:56 ` Oleksandr Natalenko
2021-12-01 15:51   ` Alexey Avramov
2021-12-02 18:05 ` ValdikSS
2021-12-02 21:58   ` Andrew Morton
2021-12-03 11:59     ` Vlastimil Babka
2021-12-03 13:27       ` Alexey Avramov
2021-12-06  9:59         ` Michal Hocko
2022-01-09 22:59           ` Barry Song
2021-12-03 14:01     ` Oleksandr Natalenko
2021-12-12 20:15     ` Alexey Avramov [this message]
2021-12-13  9:06       ` Barry Song
2021-12-13  9:07       ` Michal Hocko
2021-12-13  8:38   ` Barry Song
2022-01-25  8:19     ` ValdikSS
2022-02-12  0:01       ` Barry Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211213051521.21f02dd2@mail.inbox.lv \
    --to=hakavlad@inbox.lv \
    --cc=akpm@linux-foundation.org \
    --cc=aros@gmx.com \
    --cc=corbet@lwn.net \
    --cc=hakavlad@gmail.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=iam@valdikss.org.ru \
    --cc=keescook@chromium.org \
    --cc=kernel@xanmod.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=oleksandr@natalenko.name \
    --cc=riel@surriel.com \
    --cc=shakeelb@google.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=yuzhao@google.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.