linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Rientjes <rientjes@google.com>
To: Yang Shi <yang.shi@linux.alibaba.com>
Cc: ktkhai@virtuozzo.com, hannes@cmpxchg.org, mhocko@suse.com,
	kirill.shutemov@linux.intel.com, hughd@google.com,
	shakeelb@google.com, Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 0/3] Make deferred split shrinker memcg aware
Date: Wed, 29 May 2019 14:07:58 -0700 (PDT)	[thread overview]
Message-ID: <alpine.DEB.2.21.1905291402360.242480@chino.kir.corp.google.com> (raw)
In-Reply-To: <2e23bd8c-6120-5a86-9e9e-ab43b02ce150@linux.alibaba.com>

On Wed, 29 May 2019, Yang Shi wrote:

> > Right, we've also encountered this.  I talked to Kirill about it a week or
> > so ago where the suggestion was to split all compound pages on the
> > deferred split queues under the presence of even memory pressure.
> > 
> > That breaks cgroup isolation and perhaps unfairly penalizes workloads that
> > are running attached to other memcg hierarchies that are not under
> > pressure because their compound pages are now split as a side effect.
> > There is a benefit to keeping these compound pages around while not under
> > memory pressure if all pages are subsequently mapped again.
> 
> Yes, I do agree. I tried other approaches too, it sounds making deferred split
> queue per memcg is the optimal one.
> 

The approach we went with were to track the actual counts of compound 
pages on the deferred split queue for each pgdat for each memcg and then 
invoke the shrinker for memcg reclaim and iterate those not charged to the 
hierarchy under reclaim.  That's suboptimal and was a stop gap measure 
under time pressure: it's refreshing to see the optimal method being 
pursued, thanks!

> > I'm curious if your internal applications team is also asking for
> > statistics on how much memory can be freed if the deferred split queues
> > can be shrunk?  We have applications that monitor their own memory usage
> 
> No, but this reminds me. The THPs on deferred split queue should be accounted
> into available memory too.
> 

Right, and we have also seen this for users of MADV_FREE that have both an 
increased rss and memcg usage that don't realize that the memory is freed 
under pressure.  I'm thinking that we need some kind of MemAvailable for 
memcg hierarchies to be the authoritative source of what can be reclaimed 
under pressure.

> > through memcg stats or usage and proactively try to reduce that usage when
> > it is growing too large.  The deferred split queues have significantly
> > increased both memcg usage and rss when they've upgraded kernels.
> > 
> > How are your applications monitoring how much memory from deferred split
> > queues can be freed on memory pressure?  Any thoughts on providing it as a
> > memcg stat?
> 
> I don't think they have such monitor. I saw rss_huge is abormal in memcg stat
> even after the application is killed by oom, so I realized the deferred split
> queue may play a role here.
> 

Exactly the same in my case :)  We were likely looking at the exact same 
issue at the same time.

> The memcg stat doesn't have counters for available memory as global vmstat. It
> may be better to have such statistics, or extending reclaimable "slab" to
> shrinkable/reclaimable "memory".
> 

Have you considered following how NR_ANON_MAPPED is tracked for each pgdat 
and using that as an indicator of when the modify a memcg stat to track 
the amount of memory on a compound page?  I think this would be necessary 
for userspace to know what their true memory usage is.

  reply	other threads:[~2019-05-29 21:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-28 12:44 [RFC PATCH 0/3] Make deferred split shrinker memcg aware Yang Shi
2019-05-28 12:44 ` [PATCH 1/3] mm: thp: make " Yang Shi
2019-05-28 14:42   ` Kirill Tkhai
2019-05-29  2:43     ` Yang Shi
2019-05-29  8:14       ` Kirill Tkhai
2019-05-29 11:25         ` Yang Shi
2019-06-10  8:23           ` Kirill Tkhai
2019-06-10 17:25             ` Yang Shi
2019-06-13  8:19               ` Kirill Tkhai
2019-06-13 17:53                 ` Yang Shi
2019-05-30 12:07   ` Kirill A. Shutemov
2019-05-30 13:29     ` Yang Shi
2019-05-28 12:44 ` [PATCH 2/3] mm: thp: remove THP destructor Yang Shi
2019-05-28 12:44 ` [PATCH 3/3] mm: shrinker: make shrinker not depend on memcg kmem Yang Shi
2019-05-30 12:08   ` Kirill A. Shutemov
2019-05-30 13:20     ` Yang Shi
2019-05-29  1:22 ` [RFC PATCH 0/3] Make deferred split shrinker memcg aware David Rientjes
2019-05-29  2:34   ` Yang Shi
2019-05-29 21:07     ` David Rientjes [this message]
2019-05-30  3:22       ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.DEB.2.21.1905291402360.242480@chino.kir.corp.google.com \
    --to=rientjes@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=ktkhai@virtuozzo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shakeelb@google.com \
    --cc=yang.shi@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).