From: David Rientjes <rientjes@google.com> To: Chris Down <chris@chrisdown.name> Cc: Andrew Morton <akpm@linux-foundation.org>, Yang Shi <shy828301@gmail.com>, Michal Hocko <mhocko@kernel.org>, Shakeel Butt <shakeelb@google.com>, Yang Shi <yang.shi@linux.alibaba.com>, Roman Gushchin <guro@fb.com>, Greg Thelen <gthelen@google.com>, Johannes Weiner <hannes@cmpxchg.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [patch] mm, memcg: provide a stat to describe reclaimable memory Date: Fri, 17 Jul 2020 12:37:57 -0700 (PDT) [thread overview] Message-ID: <alpine.DEB.2.23.453.2007171226310.3398972@chino.kir.corp.google.com> (raw) In-Reply-To: <20200717121750.GA367633@chrisdown.name> On Fri, 17 Jul 2020, Chris Down wrote: > > With the proposed anon_reclaimable, do you have any reliability concerns? > > This would be the amount of lazy freeable memory and memory that can be > > uncharged if compound pages from the deferred split queue are split under > > memory pressure. It seems to be a very precise value (as slab_reclaimable > > already in memory.stat is), so I'm not sure why there is a reliability > > concern. Maybe you can elaborate? > > Ability to reclaim a page is largely about context at the time of reclaim. For > example, if you are running at the edge of swap, at a metric that truly > describes "reclaimable memory" will contain vastly different numbers from one > second to the next as cluster and page availability increases and decreases. > We may also have to do things like look for youngness at reclaim time, so I'm > not convinced metrics like this makes sense in the general case. ... > Again, I'm curious why this can't be solved by artificial workingset > pressurisation and monitoring. Generally, the most reliable reclaim metrics > come from operating reclaim itself. > Perhaps this is best discussed in the context I gave in the earlier thread: imagine a thp-backed heap of 64MB and then a malloc implementation doing MADV_DONTNEED over all but one page in every one of these pageblocks. On a 4.3 kernel, for example, memory.current for the heap segment is now (64MB / 2MB) * 4KB = 128KB because we have synchronous splitting and uncharging of the underlying hugepage. On a 4.15 kernel, for example, memory.current is still 64MB because the underlying hugepages are still charged to the memcg due to deferred split queues. For any application that monitors this, pressurization is not going to help: the memory will be reclaimed under memcg pressure but we aren't facing that pressure yet. Userspace could identify this as a memory leak unless we describe what anon memory is actually reclaimable in this context (including on systems without swap). For any entity that uses this information to infer if new work can be scheduled in this memcg (the reason MemAvailable exists in /proc/meminfo at the system level), this is now dramatically skewed. At worse, on a swapless system, this memory is seen from userspace as unreclaimable because it's charged anon. Do you have other suggestions for how userspace can understand what anon is reclaimable in this context before encountering memory pressure? If so, it may be a great alternative to this: I haven't been able to think of such a way other than an anon_reclaimable stat.
WARNING: multiple messages have this Message-ID (diff)
From: David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> To: Chris Down <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Yang Shi <shy828301-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Yang Shi <yang.shi-KPsoFbNs7GizrGE5bRqYAgC/G2K4zDHf@public.gmane.org>, Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>, Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>, Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>, Vladimir Davydov <vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org Subject: Re: [patch] mm, memcg: provide a stat to describe reclaimable memory Date: Fri, 17 Jul 2020 12:37:57 -0700 (PDT) [thread overview] Message-ID: <alpine.DEB.2.23.453.2007171226310.3398972@chino.kir.corp.google.com> (raw) In-Reply-To: <20200717121750.GA367633-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org> On Fri, 17 Jul 2020, Chris Down wrote: > > With the proposed anon_reclaimable, do you have any reliability concerns? > > This would be the amount of lazy freeable memory and memory that can be > > uncharged if compound pages from the deferred split queue are split under > > memory pressure. It seems to be a very precise value (as slab_reclaimable > > already in memory.stat is), so I'm not sure why there is a reliability > > concern. Maybe you can elaborate? > > Ability to reclaim a page is largely about context at the time of reclaim. For > example, if you are running at the edge of swap, at a metric that truly > describes "reclaimable memory" will contain vastly different numbers from one > second to the next as cluster and page availability increases and decreases. > We may also have to do things like look for youngness at reclaim time, so I'm > not convinced metrics like this makes sense in the general case. ... > Again, I'm curious why this can't be solved by artificial workingset > pressurisation and monitoring. Generally, the most reliable reclaim metrics > come from operating reclaim itself. > Perhaps this is best discussed in the context I gave in the earlier thread: imagine a thp-backed heap of 64MB and then a malloc implementation doing MADV_DONTNEED over all but one page in every one of these pageblocks. On a 4.3 kernel, for example, memory.current for the heap segment is now (64MB / 2MB) * 4KB = 128KB because we have synchronous splitting and uncharging of the underlying hugepage. On a 4.15 kernel, for example, memory.current is still 64MB because the underlying hugepages are still charged to the memcg due to deferred split queues. For any application that monitors this, pressurization is not going to help: the memory will be reclaimed under memcg pressure but we aren't facing that pressure yet. Userspace could identify this as a memory leak unless we describe what anon memory is actually reclaimable in this context (including on systems without swap). For any entity that uses this information to infer if new work can be scheduled in this memcg (the reason MemAvailable exists in /proc/meminfo at the system level), this is now dramatically skewed. At worse, on a swapless system, this memory is seen from userspace as unreclaimable because it's charged anon. Do you have other suggestions for how userspace can understand what anon is reclaimable in this context before encountering memory pressure? If so, it may be a great alternative to this: I haven't been able to think of such a way other than an anon_reclaimable stat.
next prev parent reply other threads:[~2020-07-17 19:38 UTC|newest] Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-15 3:18 [patch] mm, memcg: provide a stat to describe reclaimable memory David Rientjes 2020-07-15 3:18 ` David Rientjes 2020-07-15 7:00 ` David Rientjes 2020-07-15 7:00 ` David Rientjes 2020-07-15 7:15 ` SeongJae Park 2020-07-15 7:15 ` SeongJae Park 2020-07-15 17:33 ` David Rientjes 2020-07-15 17:33 ` David Rientjes 2020-07-16 20:58 ` [patch] mm, memcg: provide an anon_reclaimable stat David Rientjes 2020-07-16 20:58 ` David Rientjes 2020-07-16 21:07 ` Shakeel Butt 2020-07-16 21:07 ` Shakeel Butt 2020-07-16 21:28 ` David Rientjes 2020-07-16 21:28 ` David Rientjes 2020-07-17 1:37 ` Shakeel Butt 2020-07-17 1:37 ` Shakeel Butt 2020-07-17 8:34 ` Michal Hocko 2020-07-17 8:34 ` Michal Hocko 2020-07-17 14:39 ` Johannes Weiner 2020-07-17 14:39 ` Johannes Weiner 2020-07-15 13:10 ` [patch] mm, memcg: provide a stat to describe reclaimable memory Chris Down 2020-07-15 13:10 ` Chris Down [not found] ` <20200715131048.GA176092-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org> 2020-07-15 18:02 ` David Rientjes 2020-07-17 12:17 ` Chris Down 2020-07-17 12:17 ` Chris Down 2020-07-17 19:37 ` David Rientjes [this message] 2020-07-17 19:37 ` David Rientjes 2020-07-20 7:37 ` Michal Hocko 2020-07-20 7:37 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=alpine.DEB.2.23.453.2007171226310.3398972@chino.kir.corp.google.com \ --to=rientjes@google.com \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=chris@chrisdown.name \ --cc=gthelen@google.com \ --cc=guro@fb.com \ --cc=hannes@cmpxchg.org \ --cc=linux-mm@kvack.org \ --cc=mhocko@kernel.org \ --cc=shakeelb@google.com \ --cc=shy828301@gmail.com \ --cc=vdavydov.dev@gmail.com \ --cc=yang.shi@linux.alibaba.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.