archive mirror
 help / color / mirror / Atom feed
From: "Enrico Weigelt, metux IT consult" <>
To: Shakeel Butt <>,
	"Eric W. Biederman" <>
Cc: Alexey Gladkov <>,
	Christian Brauner <>,
	LKML <>,
	Linux Containers <>,
	Linux Containers <>,
	Linux FS Devel <>,
	Linux MM <>,
	Andrew Morton <>,
	Johannes Weiner <>,
	Michal Hocko <>,
	Chris Down <>,
	Cgroups <>
Subject: Re: [PATCH v1] proc: Implement /proc/self/meminfo
Date: Mon, 21 Jun 2021 20:20:28 +0200	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On 19.06.21 01:38, Shakeel Butt wrote:

> Nowadays, I don't think MemAvailable giving "amount of memory that can
> be allocated without triggering swapping" is even roughly accurate.
> Actually IMO "without triggering swap" is not something an application
> should concern itself with where refaults from some swap types
> (zswap/swap-on-zram) are much faster than refaults from disk.

If we're talking about things like database workloads, there IMHO isn't
anything really better than doing measurements with the actual loads
and tuning incrementally.

But: what is the actual optimization goal, why an application might
want to know where swapping begins ? Computing performance ? Caching +
IO Latency or throughput ? Network traffic (e.g. w/ iscsi) ? Power
consumption ?

>> I do know that hiding the implementation details and providing userspace
>> with information it can directly use seems like the programming model
>> that needs to be explored.  Most programs should not care if they are in
>> a memory cgroup, etc.  Programs, load management systems, and even
>> balloon drivers have a legitimately interest in how much additional load
>> can be placed on a systems memory.

What kind of load exactly ? CPU ? disk IO ? network ?

> How much additional load can be placed on a system *until what*. I
> think we should focus more on the "until" part to make the problem
> more tractable.

ACK. The interesting question is what to do in that case.

An obvious move by an database system could be eg. filling only so much
caches as there's spare physical RAM, in order to avoid useless swapping
(since we'd potentiall produce more IO load when a cache is written
out to swap, instead of just discarding it)

But, this also depends ...

#1: the application doesn't know the actual performance of the swap
device, eg. the already mentioned zswap+friends, or some fast nvmem
for swap vs disk for storage.

#2: caches might also be implemented indirectly by mmap()ing the storage
file/device and so using the kernel's cache here. in that case, the
kernel would automatically discard the pages w/o going to swap. of
course that only works if the cache is nothing but copying pages from
storage into ram.

A completely different scenario would be load management on a cluster
like k8s. Here we usually care of cluster performance (dont care about
individual nodes so muck), but wanna prevent individual nodes from being
overloaded. Since we usually don't know much about the indivdual
workload, we probably don't have much other chance than contigous
monitoring and acting when a node is getting too busy - or trying to
balance when new workloads are started, on current system load (and
other metrics). In that case, I don't see where this new proc file
should be of much help.

> Second, is the reactive approach acceptable? Instead of an upfront
> number representing the room for growth, how about just grow and
> backoff when some event (oom or stall) which we want to avoid is about
> to happen? This is achievable today for oom and stall with PSI and
> memory.high and it avoids the hard problem of reliably estimating the
> reclaimable memory.

I tend to believe that for certain use cases it would be helpful if an
application gets notified if some of its pages are soon getting swapped
out due memory pressure. Then it could decide on its own which whether
it should drop certain caches in order to prevent swapping.


Hinweis: unverschlüsselte E-Mails können leicht abgehört und manipuliert
werden ! Für eine vertrauliche Kommunikation senden Sie bitte ihren
GPG/PGP-Schlüssel zu.
Enrico Weigelt, metux IT consult
Free software and Linux embedded engineering -- +49-151-27565287

      reply	other threads:[~2021-06-21 18:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-03 10:43 [PATCH v1] proc: Implement /proc/self/meminfo legion
2021-06-03 11:33 ` Michal Hocko
2021-06-03 11:33 ` Chris Down
2021-06-09  8:16   ` Enrico Weigelt, metux IT consult
2021-06-09 19:14     ` Eric W. Biederman
2021-06-09 20:31       ` Johannes Weiner
2021-06-09 20:56         ` Eric W. Biederman
2021-06-10  0:36           ` Daniel Walsh
2021-06-11 10:37         ` Enrico Weigelt, metux IT consult
2021-06-15 11:32 ` Christian Brauner
2021-06-15 12:47   ` Alexey Gladkov
2021-06-16  1:09     ` Shakeel Butt
2021-06-16 16:17       ` Eric W. Biederman
2021-06-18 17:03         ` Michal Hocko
2021-06-18 23:38         ` Shakeel Butt
2021-06-21 18:20           ` Enrico Weigelt, metux IT consult [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).