All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Chris Down <chris@chrisdown.name>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Yang Shi <shy828301@gmail.com>,  Michal Hocko <mhocko@kernel.org>,
	Shakeel Butt <shakeelb@google.com>,
	 Yang Shi <yang.shi@linux.alibaba.com>,
	Roman Gushchin <guro@fb.com>,  Greg Thelen <gthelen@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	 Vladimir Davydov <vdavydov.dev@gmail.com>,
	cgroups@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [patch] mm, memcg: provide a stat to describe reclaimable memory
Date: Fri, 17 Jul 2020 12:37:57 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.23.453.2007171226310.3398972@chino.kir.corp.google.com> (raw)
In-Reply-To: <20200717121750.GA367633@chrisdown.name>

On Fri, 17 Jul 2020, Chris Down wrote:

> > With the proposed anon_reclaimable, do you have any reliability concerns?
> > This would be the amount of lazy freeable memory and memory that can be
> > uncharged if compound pages from the deferred split queue are split under
> > memory pressure.  It seems to be a very precise value (as slab_reclaimable
> > already in memory.stat is), so I'm not sure why there is a reliability
> > concern.  Maybe you can elaborate?
> 
> Ability to reclaim a page is largely about context at the time of reclaim. For
> example, if you are running at the edge of swap, at a metric that truly
> describes "reclaimable memory" will contain vastly different numbers from one
> second to the next as cluster and page availability increases and decreases.
> We may also have to do things like look for youngness at reclaim time, so I'm
> not convinced metrics like this makes sense in the general case.
...
> Again, I'm curious why this can't be solved by artificial workingset
> pressurisation and monitoring. Generally, the most reliable reclaim metrics
> come from operating reclaim itself.
> 

Perhaps this is best discussed in the context I gave in the earlier 
thread: imagine a thp-backed heap of 64MB and then a malloc implementation 
doing MADV_DONTNEED over all but one page in every one of these 
pageblocks.

On a 4.3 kernel, for example, memory.current for the heap segment is now 
(64MB / 2MB) * 4KB = 128KB because we have synchronous splitting and 
uncharging of the underlying hugepage.  On a 4.15 kernel, for example, 
memory.current is still 64MB because the underlying hugepages are still 
charged to the memcg due to deferred split queues.

For any application that monitors this, pressurization is not going to 
help: the memory will be reclaimed under memcg pressure but we aren't 
facing that pressure yet.  Userspace could identify this as a memory leak 
unless we describe what anon memory is actually reclaimable in this 
context (including on systems without swap).  For any entity that uses 
this information to infer if new work can be scheduled in this memcg (the 
reason MemAvailable exists in /proc/meminfo at the system level), this is 
now dramatically skewed.

At worse, on a swapless system, this memory is seen from userspace as 
unreclaimable because it's charged anon.

Do you have other suggestions for how userspace can understand what anon 
is reclaimable in this context before encountering memory pressure?  If 
so, it may be a great alternative to this: I haven't been able to think of 
such a way other than an anon_reclaimable stat.


WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Chris Down <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>
Cc: Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Yang Shi <shy828301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Yang Shi
	<yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>,
	Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Subject: Re: [patch] mm, memcg: provide a stat to describe reclaimable memory
Date: Fri, 17 Jul 2020 12:37:57 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.23.453.2007171226310.3398972@chino.kir.corp.google.com> (raw)
In-Reply-To: <20200717121750.GA367633-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>

On Fri, 17 Jul 2020, Chris Down wrote:

> > With the proposed anon_reclaimable, do you have any reliability concerns?
> > This would be the amount of lazy freeable memory and memory that can be
> > uncharged if compound pages from the deferred split queue are split under
> > memory pressure.  It seems to be a very precise value (as slab_reclaimable
> > already in memory.stat is), so I'm not sure why there is a reliability
> > concern.  Maybe you can elaborate?
> 
> Ability to reclaim a page is largely about context at the time of reclaim. For
> example, if you are running at the edge of swap, at a metric that truly
> describes "reclaimable memory" will contain vastly different numbers from one
> second to the next as cluster and page availability increases and decreases.
> We may also have to do things like look for youngness at reclaim time, so I'm
> not convinced metrics like this makes sense in the general case.
...
> Again, I'm curious why this can't be solved by artificial workingset
> pressurisation and monitoring. Generally, the most reliable reclaim metrics
> come from operating reclaim itself.
> 

Perhaps this is best discussed in the context I gave in the earlier 
thread: imagine a thp-backed heap of 64MB and then a malloc implementation 
doing MADV_DONTNEED over all but one page in every one of these 
pageblocks.

On a 4.3 kernel, for example, memory.current for the heap segment is now 
(64MB / 2MB) * 4KB = 128KB because we have synchronous splitting and 
uncharging of the underlying hugepage.  On a 4.15 kernel, for example, 
memory.current is still 64MB because the underlying hugepages are still 
charged to the memcg due to deferred split queues.

For any application that monitors this, pressurization is not going to 
help: the memory will be reclaimed under memcg pressure but we aren't 
facing that pressure yet.  Userspace could identify this as a memory leak 
unless we describe what anon memory is actually reclaimable in this 
context (including on systems without swap).  For any entity that uses 
this information to infer if new work can be scheduled in this memcg (the 
reason MemAvailable exists in /proc/meminfo at the system level), this is 
now dramatically skewed.

At worse, on a swapless system, this memory is seen from userspace as 
unreclaimable because it's charged anon.

Do you have other suggestions for how userspace can understand what anon 
is reclaimable in this context before encountering memory pressure?  If 
so, it may be a great alternative to this: I haven't been able to think of 
such a way other than an anon_reclaimable stat.

  reply	other threads:[~2020-07-17 19:38 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15  3:18 [patch] mm, memcg: provide a stat to describe reclaimable memory David Rientjes
2020-07-15  3:18 ` David Rientjes
2020-07-15  7:00 ` David Rientjes
2020-07-15  7:00   ` David Rientjes
2020-07-15  7:15   ` SeongJae Park
2020-07-15  7:15     ` SeongJae Park
2020-07-15 17:33     ` David Rientjes
2020-07-15 17:33       ` David Rientjes
2020-07-16 20:58       ` [patch] mm, memcg: provide an anon_reclaimable stat David Rientjes
2020-07-16 20:58         ` David Rientjes
2020-07-16 21:07         ` Shakeel Butt
2020-07-16 21:07           ` Shakeel Butt
2020-07-16 21:28           ` David Rientjes
2020-07-16 21:28             ` David Rientjes
2020-07-17  1:37             ` Shakeel Butt
2020-07-17  1:37               ` Shakeel Butt
2020-07-17  8:34         ` Michal Hocko
2020-07-17  8:34           ` Michal Hocko
2020-07-17 14:39         ` Johannes Weiner
2020-07-17 14:39           ` Johannes Weiner
2020-07-15 13:10 ` [patch] mm, memcg: provide a stat to describe reclaimable memory Chris Down
2020-07-15 13:10   ` Chris Down
     [not found]   ` <20200715131048.GA176092-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>
2020-07-15 18:02     ` David Rientjes
2020-07-17 12:17       ` Chris Down
2020-07-17 12:17         ` Chris Down
2020-07-17 19:37         ` David Rientjes [this message]
2020-07-17 19:37           ` David Rientjes
2020-07-20  7:37           ` Michal Hocko
2020-07-20  7:37             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.23.453.2007171226310.3398972@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=chris@chrisdown.name \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.