linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
@ 2019-08-04  9:23 Artem S. Tashkinov
  2019-08-05 12:13 ` Vlastimil Babka
                   ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Artem S. Tashkinov @ 2019-08-04  9:23 UTC (permalink / raw)
  To: linux-kernel

Hello,

There's this bug which has been bugging many people for many years
already and which is reproducible in less than a few minutes under the
latest and greatest kernel, 5.2.6. All the kernel parameters are set to
defaults.

Steps to reproduce:

1) Boot with mem=4G
2) Disable swap to make everything faster (sudo swapoff -a)
3) Launch a web browser, e.g. Chrome/Chromium or/and Firefox
4) Start opening tabs in either of them and watch your free RAM decrease

Once you hit a situation when opening a new tab requires more RAM than
is currently available, the system will stall hard. You will barely  be
able to move the mouse pointer. Your disk LED will be flashing
incessantly (I'm not entirely sure why). You will not be able to run new
applications or close currently running ones.

This little crisis may continue for minutes or even longer. I think
that's not how the system should behave in this situation. I believe
something must be done about that to avoid this stall.

I'm almost sure some sysctl parameters could be changed to avoid this
situation but something tells me this could be done for everyone and
made default because some non tech-savvy users will just give up on
Linux if they ever get in a situation like this and they won't be keen
or even be able to Google for solutions.


Best regards,
Artem

^ permalink raw reply	[flat|nested] 48+ messages in thread
[parent not found: <20190805090514.5992-1-hdanton@sina.com>]
* Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
@ 2019-08-06  8:57 Johannes Buchner
  0 siblings, 0 replies; 48+ messages in thread
From: Johannes Buchner @ 2019-08-06  8:57 UTC (permalink / raw)
  To: linux-kernel


[-- Attachment #1.1: Type: text/plain, Size: 4266 bytes --]

> On Mon, Aug 5, 2019 at 12:31 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>>
>> On Mon, Aug 05, 2019 at 02:13:16PM +0200, Vlastimil Babka wrote:
>> > On 8/4/19 11:23 AM, Artem S. Tashkinov wrote:
>> > > Hello,
>> > >
>> > > There's this bug which has been bugging many people for many years
>> > > already and which is reproducible in less than a few minutes under the
>> > > latest and greatest kernel, 5.2.6. All the kernel parameters are set to
>> > > defaults.
>> > >
>> > > Steps to reproduce:
>> > >
>> > > 1) Boot with mem=4G
>> > > 2) Disable swap to make everything faster (sudo swapoff -a)
>> > > 3) Launch a web browser, e.g. Chrome/Chromium or/and Firefox
>> > > 4) Start opening tabs in either of them and watch your free RAM decrease
>> > >
>> > > Once you hit a situation when opening a new tab requires more RAM than
>> > > is currently available, the system will stall hard. You will barely  be
>> > > able to move the mouse pointer. Your disk LED will be flashing
>> > > incessantly (I'm not entirely sure why). You will not be able to run new
>> > > applications or close currently running ones.
>> >
>> > > This little crisis may continue for minutes or even longer. I think
>> > > that's not how the system should behave in this situation. I believe
>> > > something must be done about that to avoid this stall.
>> >
>> > Yeah that's a known problem, made worse SSD's in fact, as they are able
>> > to keep refaulting the last remaining file pages fast enough, so there
>> > is still apparent progress in reclaim and OOM doesn't kick in.
>> >
>> > At this point, the likely solution will be probably based on pressure
>> > stall monitoring (PSI). I don't know how far we are from a built-in
>> > monitor with reasonable defaults for a desktop workload, so CCing
>> > relevant folks.
>>
>> Yes, psi was specifically developed to address this problem. Before
>> it, the kernel had to make all decisions based on relative event rates
>> but had no notion of time. Whereas to the user, time is clearly an
>> issue, and in fact makes all the difference. So psi quantifies the
>> time the workload spends executing vs. spinning its wheels.
>>
>> But choosing a universal cutoff for killing is not possible, since it
>> depends on the workload and the user's expectation: GUI and other
>> latency-sensitive applications care way before a compile job or video
>> encoding would care.
>>
>> Because of that, there are things like oomd and lmkd as mentioned, to
>> leave the exact policy decision to userspace.
>>
>> That being said, I think we should be able to provide a bare minimum
>> inside the kernel to avoid complete livelocks where the user does not
>> believe the machine would be able to recover without a reboot.
>>
>> The goal wouldn't be a glitch-free user experience - the kernel does
>> not know enough about the applications to even attempt that. It should
>> just not hang indefinitely. Maybe similar to the hung task detector.
>>
>> How about something like the below patch? With that, the kernel
>> catches excessive thrashing that happens before reclaim fails:
>>
>> [snip]
>>
>> +
>> +#define OOM_PRESSURE_LEVEL     80
>> +#define OOM_PRESSURE_PERIOD    (10 * NSEC_PER_SEC)
> 
> 80% of the last 10 seconds spent in full stall would definitely be a
> problem. If the system was already low on memory (which it probably
> is, or we would not be reclaiming so hard and registering such a big
> stall) then oom-killer would probably kill something before 8 seconds
> are passed. If my line of thinking is correct, then do we really
> benefit from such additional protection mechanism? I might be wrong
> here because my experience is limited to embedded systems with
> relatively small amounts of memory.

When one or more processes fight for memory, much of the time spent
stalling. Would an acceptable alternative strategy be, instead of
killing a process, to hold processes proportional to their stall time
and memory usage? By stop I mean delay their scheduling (akin to kill
-STOP/sleep/kill -CONT), or interleave the scheduling of
large-memory-using processes so they do not have to fight against each
other.

Cheers,
       Johannes




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread
* Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
@ 2019-08-06 19:43 Remi Gauvin
  0 siblings, 0 replies; 48+ messages in thread
From: Remi Gauvin @ 2019-08-06 19:43 UTC (permalink / raw)
  To: Linux Kernel Mailing List

Sorry, I don't have the original message to reply to.. But to those
interested, I have found a solution to the kernel's complete inability
to allocate more memory when it needs to swap out.

Increase the /proc/sys/vm/watermark_scale_factor from the default 10 to 500

It will make a huge difference, especially with swap on SSD, the kernel
will swap out gracefully to allocate more memory, and you can get a few
GB more memory in use before really noticing performance problems.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2021-07-24 17:38 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-04  9:23 Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure Artem S. Tashkinov
2019-08-05 12:13 ` Vlastimil Babka
2019-08-05 13:31   ` Michal Hocko
2019-08-05 16:47     ` Suren Baghdasaryan
2019-08-05 18:55     ` Johannes Weiner
2019-08-06  9:29       ` Michal Hocko
2019-08-05 19:31   ` Johannes Weiner
2019-08-06  1:08     ` Suren Baghdasaryan
2019-08-06  9:36       ` Vlastimil Babka
2019-08-06 14:27         ` Johannes Weiner
2019-08-06 14:36           ` Michal Hocko
2019-08-06 16:27             ` Suren Baghdasaryan
2019-08-06 22:01               ` Johannes Weiner
2019-08-07  7:59                 ` Michal Hocko
2019-08-07 20:51                   ` Johannes Weiner
2019-08-07 21:01                     ` Andrew Morton
2019-08-07 21:34                       ` Johannes Weiner
2019-08-07 21:12                     ` Johannes Weiner
2019-08-08 11:48                     ` Michal Hocko
2019-08-08 15:10                       ` ndrw.xf
2019-08-08 16:32                         ` Michal Hocko
2019-08-08 17:57                           ` ndrw.xf
2019-08-08 18:59                             ` Michal Hocko
2019-08-08 21:59                               ` ndrw
2019-08-09  8:57                                 ` Michal Hocko
2019-08-09 10:09                                   ` ndrw
2019-08-09 10:50                                     ` Michal Hocko
2019-08-09 14:18                                       ` Pintu Agarwal
2019-08-10 12:34                                       ` ndrw
2019-08-12  8:24                                         ` Michal Hocko
2019-08-10 21:07                                   ` ndrw
2021-07-24 17:32                         ` Alexey Avramov
2019-08-08 14:47                     ` Vlastimil Babka
2019-08-08 17:27                       ` Johannes Weiner
2019-08-09 14:56                         ` Vlastimil Babka
2019-08-09 17:31                           ` Johannes Weiner
2019-08-13 13:47                             ` Vlastimil Babka
2019-08-06 21:43       ` James Courtier-Dutton
2019-08-06 19:00 ` Florian Weimer
2019-08-20  6:46 ` Daniel Drake
2019-08-21 21:42   ` James Courtier-Dutton
2019-08-29 12:29     ` Michal Hocko
2019-09-02 20:15     ` Pavel Machek
2019-08-23  1:54   ` ndrw
2019-08-23  2:14     ` Daniel Drake
     [not found] <20190805090514.5992-1-hdanton@sina.com>
2019-08-05 12:01 ` Artem S. Tashkinov
2019-08-06  8:57 Johannes Buchner
2019-08-06 19:43 Remi Gauvin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).