archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <>
To: Johannes Weiner <>,
	Michal Hocko <>,
	 Vladimir Davydov <>
Cc: Andrew Morton <>,
	 Shakeel Butt <>,,
Subject: Memcg stat for available memory
Date: Sun, 28 Jun 2020 15:15:25 -0700 (PDT)	[thread overview]
Message-ID: <> (raw)

Hi everybody,

I'd like to discuss the feasibility of a stat similar to 
si_mem_available() but at memcg scope which would specify how much memory 
can be charged without I/O.

The si_mem_available() stat is based on heuristics so this does not 
provide an exact quantity that is actually available at any given time, 
but can otherwise provide userspace with some guidance on the amount of 
reclaimable memory.  See the description in 
Documentation/filesystems/proc.rst and its implementation.

 [ Naturally, userspace would need to understand both the amount of memory 
   that is available for allocation and for charging, separately, on an 
   overcommitted system.  I assume this is trivial.  (Why don't we provide 
   MemAvailable in per-node meminfo?) ]

For such a stat at memcg scope, we can ignore totalreserves and 
watermarks.  We already have ~precise (modulo MEMCG_CHARGE_BATCH) data for 
both file pages and slab_reclaimable.

We can infer lazily free memory by doing

	file - (active_file + inactive_file)

(This is necessary because lazy free memory is anon but on the inactive 
 file lru and we can't infer lazy freeable memory through pglazyfree -
 pglazyfreed, they are event counters.)

We can also infer the number of underlying compound pages that are on 
deferred split queues but have yet to be split with active_anon - anon (or
is this a bug? :)

So it *seems* like userspace can make a si_mem_available()-like 
calculation ("avail") by doing

	free = memory.high - memory.current
	lazyfree = file - (active_file + inactive_file)
	deferred = active_anon - anon

	avail = free + lazyfree + deferred +
		(active_file + inactive_file + slab_reclaimable) / 2

For userspace interested in knowing how much memory it can charge without 
incurring I/O (and assuming it has knowledge of available memory on an 
overcommitted system), it seems like:

 (a) it can derive the above avail amount that is at least similar to

 (b) it can assume that all reclaim is considered equal so anything more
     than memory.high - memory.current is disruptive enough that it's a
     better heuristic than the above, or

 (c) the kernel provide an "avail" stat in memory.stat based on the above 
     and can evolve as the kernel implementation changes (how lazy free 
     memory impacts anon vs file lru stats, how deferred split memory is 
     handled, any future extensions for "easily reclaimable memory") that 
     userspace can count on to the same degree it can count on 

Any thoughts?

             reply	other threads:[~2020-06-28 22:15 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-28 22:15 David Rientjes [this message]
2020-07-02 15:22 ` Memcg stat for available memory Shakeel Butt
2020-07-03  8:15   ` Michal Hocko
2020-07-07 19:58     ` David Rientjes
2020-07-10 19:47       ` David Rientjes
2020-07-10 21:04         ` Yang Shi
2020-07-12 22:02           ` David Rientjes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).