archive mirror
 help / color / mirror / Atom feed
From: Pintu Agarwal <>
To: Michal Hocko <>
Cc: ndrw <>,
	Johannes Weiner <>,
	Suren Baghdasaryan <>,
	Vlastimil Babka <>,
	"Artem S. Tashkinov" <>,
	Andrew Morton <>,
	LKML <>,
	linux-mm <>, Pintu Kumar <>
Subject: Re: Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure
Date: Fri, 9 Aug 2019 19:48:45 +0530	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>



This is an interesting topic for me so I would like to join the conversation.
I will be glad if I can be of any help here either in testing PSI, or
verifying some scenarios and observation.

I have some experience working with low memory embedded devices, like
RAM as low as 128MB, 256MB, less than 1GB mostly, with/without
Display, DRM/Graphics support.
Along with ZRAM as swap space configured as 25% of RAM size.
The eMMC storage space is also as low as 4GB or 8GB max.

So, I have experienced this sluggishness, hang, OOM kill issues quite
a number of times.
So, I would like to share my experience and observation here.

Recently, I have been exploring the PSI feature on my ARM
Qemu/Beagle-Bone environment, so I can share some feedback for this as

The system sluggish behavior can result from 4 types (specially on
smart phone devices):
* memory allocation pressure
* I/O pressure
* Scheduling pressure
* Network pressure

I think the topic of concern here is: memory pressure.
So, I would like to share some thoughts about this.

* In my opinion, memory pressure should be internal to the system and
not visible to the end users.
* The pressure metrics can very from system to system, so its
difficult to apply single policy.
* I guess this is the time to apply "Machine Learning" and "Artificial
Intelligence" into the system :)

* The memory pressure starts with how many times and how quickly
system is entering the slow-path.
  Thus slow-path monitoring may give some clue about pressure building
in the system.
  Thus I use to use slow-path-counter.
  Too much of slow-path in the beginning itself indicates that this
system needs to be re-designed.

* The system should be avoided to entering slow-path again and again
thus avoiding pressure.
  If this happens then its time to reclaim memory in large chunk,
rather than in smaller chunk.
  May be its time to think about shrink_all_memory() knob in kernel.
  It can be run as bottom-half processing, may be from cgroups.

* Some experiment were done in the past. Interested people can check this paper:

* The system is already behaving sluggish even before it enters oom-kill stage.
  So, most of the time oom stage is skipped, not occurred, or its just
looping around.
  Thus, some kind of oom-monitoring may help to gather some suspect.
  Thats the reason I proposed to use something called
oom-stall-counter. That means system entering oom, but not possibly
  If this counter is updated means we assume that system started
behaving sluggish.

* A oom-kill-counter can also help in determining how much of killing
happening in kernel space.
  Example: If PSI pressure is building up and this counter is not updating...
  But in any case system-daemon should be avoided from killing.

* Some killing policy should be left to user space. So a standard
system-daemon (or kthread) should be designed along the line.
  It should be configured dynamically based on the system and oom-score.
  As my previous experience, in Tizen, we have used something called:
resourced daemon.

* Instead of static policy there should be something called "Dynamic
Low Memory Manager" (DLLM) policy.
  That is at every stage (slow-path, swapping, compaction-fail,
reclaim-fail, oom) some action can be taken.
  Earlier this event was triggered using vmpressure, but now it can
replace with PSI.

* Another major culprit with sluggish in the long run is, the
system-daemon occupying all of swap space and never releasing it.
  So, even if the kill applications due to oom, it may not help much.
Since daemons will never be killed.
  So, I proposed something called "Dynamic Swappiness", where
swappiness of daemons came be lowered dynamically, while normal
application have higher values.
  In the past I have done several experiments on this, soon I will be
publishing a paper on it.

* May be it is helpful to understand better, if we start from a very
minimal scale (just 64MB to 512MB RAM) with busy-box.
  If we can tune this perfectly, than large scale will automatically
have no issues.

With respect to PSI, here are my observations:
* PSI memory threshold (10, 60, 300) are too high for an embedded system.
  I think these settings should be dynamic, or user configurable, or
there should be on more entry for 1s or lesser.
* PSI memory values are updated after the oom-kill in kernel had
already happened, that means sluggish already occurred.
  So, I have to utilize the "total" field and monitor the difference manually.
  Like the difference between previous-total and next-total is more
than 100ms and rising, then we suspect OOM.
* Currently, PSI values are system-wide. That is, after sluggish
occurred, it is difficult to predict, which task causes sluggish.
  So, I was thinking to add new entry to capture task details as well.

These are some of my opinion. It may or may not be applicable directly.
Further brain-storming or discussions might be required.


  reply	other threads:[~2019-08-09 14:19 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-04  9:23 Let's talk about the elephant in the room - the Linux kernel's inability to gracefully handle low memory pressure Artem S. Tashkinov
2019-08-05 12:13 ` Vlastimil Babka
2019-08-05 13:31   ` Michal Hocko
2019-08-05 16:47     ` Suren Baghdasaryan
2019-08-05 18:55     ` Johannes Weiner
2019-08-06  9:29       ` Michal Hocko
2019-08-05 19:31   ` Johannes Weiner
2019-08-06  1:08     ` Suren Baghdasaryan
2019-08-06  9:36       ` Vlastimil Babka
2019-08-06 14:27         ` Johannes Weiner
2019-08-06 14:36           ` Michal Hocko
2019-08-06 16:27             ` Suren Baghdasaryan
2019-08-06 22:01               ` Johannes Weiner
2019-08-07  7:59                 ` Michal Hocko
2019-08-07 20:51                   ` Johannes Weiner
2019-08-07 21:01                     ` Andrew Morton
2019-08-07 21:34                       ` Johannes Weiner
2019-08-07 21:12                     ` Johannes Weiner
2019-08-08 11:48                     ` Michal Hocko
2019-08-08 15:10                       ` ndrw.xf
2019-08-08 16:32                         ` Michal Hocko
2019-08-08 17:57                           ` ndrw.xf
2019-08-08 18:59                             ` Michal Hocko
2019-08-08 21:59                               ` ndrw
2019-08-09  8:57                                 ` Michal Hocko
2019-08-09 10:09                                   ` ndrw
2019-08-09 10:50                                     ` Michal Hocko
2019-08-09 14:18                                       ` Pintu Agarwal [this message]
2019-08-10 12:34                                       ` ndrw
2019-08-12  8:24                                         ` Michal Hocko
2019-08-10 21:07                                   ` ndrw
2021-07-24 17:32                         ` Alexey Avramov
2019-08-08 14:47                     ` Vlastimil Babka
2019-08-08 17:27                       ` Johannes Weiner
2019-08-09 14:56                         ` Vlastimil Babka
2019-08-09 17:31                           ` Johannes Weiner
2019-08-13 13:47                             ` Vlastimil Babka
2019-08-06 21:43       ` James Courtier-Dutton
2019-08-06 19:00 ` Florian Weimer
2019-08-20  6:46 ` Daniel Drake
2019-08-21 21:42   ` James Courtier-Dutton
2019-08-29 12:29     ` Michal Hocko
2019-09-02 20:15     ` Pavel Machek
2019-08-23  1:54   ` ndrw
2019-08-23  2:14     ` Daniel Drake
     [not found] <>
2019-08-05 12:01 ` Artem S. Tashkinov
2019-08-06  8:57 Johannes Buchner
2019-08-06 19:43 Remi Gauvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).