Memcg stat for available memory

* Memcg stat for available memory
@ 2020-06-28 22:15 David Rientjes
  2020-07-02 15:22 ` Shakeel Butt
  0 siblings, 1 reply; 7+ messages in thread
From: David Rientjes @ 2020-06-28 22:15 UTC (permalink / raw)
  To: Johannes Weiner, Michal Hocko, Vladimir Davydov
  Cc: Andrew Morton, Shakeel Butt, cgroups, linux-mm

Hi everybody,

I'd like to discuss the feasibility of a stat similar to 
si_mem_available() but at memcg scope which would specify how much memory 
can be charged without I/O.

The si_mem_available() stat is based on heuristics so this does not 
provide an exact quantity that is actually available at any given time, 
but can otherwise provide userspace with some guidance on the amount of 
reclaimable memory.  See the description in 
Documentation/filesystems/proc.rst and its implementation.

 [ Naturally, userspace would need to understand both the amount of memory 
   that is available for allocation and for charging, separately, on an 
   overcommitted system.  I assume this is trivial.  (Why don't we provide 
   MemAvailable in per-node meminfo?) ]

For such a stat at memcg scope, we can ignore totalreserves and 
watermarks.  We already have ~precise (modulo MEMCG_CHARGE_BATCH) data for 
both file pages and slab_reclaimable.

We can infer lazily free memory by doing

	file - (active_file + inactive_file)

(This is necessary because lazy free memory is anon but on the inactive 
 file lru and we can't infer lazy freeable memory through pglazyfree -
 pglazyfreed, they are event counters.)

We can also infer the number of underlying compound pages that are on 
deferred split queues but have yet to be split with active_anon - anon (or
is this a bug? :)

So it *seems* like userspace can make a si_mem_available()-like 
calculation ("avail") by doing

	free = memory.high - memory.current
	lazyfree = file - (active_file + inactive_file)
	deferred = active_anon - anon

	avail = free + lazyfree + deferred +
		(active_file + inactive_file + slab_reclaimable) / 2

For userspace interested in knowing how much memory it can charge without 
incurring I/O (and assuming it has knowledge of available memory on an 
overcommitted system), it seems like:

 (a) it can derive the above avail amount that is at least similar to
     MemAvailable,

 (b) it can assume that all reclaim is considered equal so anything more
     than memory.high - memory.current is disruptive enough that it's a
     better heuristic than the above, or

 (c) the kernel provide an "avail" stat in memory.stat based on the above 
     and can evolve as the kernel implementation changes (how lazy free 
     memory impacts anon vs file lru stats, how deferred split memory is 
     handled, any future extensions for "easily reclaimable memory") that 
     userspace can count on to the same degree it can count on 
     MemAvailable.

Any thoughts?

^ permalink raw reply	[flat|nested] 7+ messages in thread