linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>, Cgroups <cgroups@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>
Subject: Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty
Date: Fri, 12 Apr 2019 13:50:02 -0700	[thread overview]
Message-ID: <CALvZod6jPxRb=ZEtErvZ4nJnObhQN05ECO9d_=HF0UWsCZExNw@mail.gmail.com> (raw)
In-Reply-To: <20190412201534.GB24377@tower.DHCP.thefacebook.com>

On Fri, Apr 12, 2019 at 1:16 PM Roman Gushchin <guro@fb.com> wrote:
>
> On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote:
> > On Fri, Apr 12, 2019 at 8:15 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
> > >
> > > Right now, when somebody needs to know the recursive memory statistics
> > > and events of a cgroup subtree, they need to walk the entire subtree
> > > and sum up the counters manually.
> > >
> > > There are two issues with this:
> > >
> > > 1. When a cgroup gets deleted, its stats are lost. The state counters
> > > should all be 0 at that point, of course, but the events are not. When
> > > this happens, the event counters, which are supposed to be monotonic,
> > > can go backwards in the parent cgroups.
> > >
> >
> > We also faced this exact same issue as well and had the similar solution.
> >
> > > 2. During regular operation, we always have a certain number of lazily
> > > freed cgroups sitting around that have been deleted, have no tasks,
> > > but have a few cache pages remaining. These groups' statistics do not
> > > change until we eventually hit memory pressure, but somebody watching,
> > > say, memory.stat on an ancestor has to iterate those every time.
> > >
> > > This patch addresses both issues by introducing recursive counters at
> > > each level that are propagated from the write side when stats change.
> > >
> > > Upward propagation happens when the per-cpu caches spill over into the
> > > local atomic counter. This is the same thing we do during charge and
> > > uncharge, except that the latter uses atomic RMWs, which are more
> > > expensive; stat changes happen at around the same rate. In a sparse
> > > file test (page faults and reclaim at maximum CPU speed) with 5 cgroup
> > > nesting levels, perf shows __mod_memcg_page state at ~1%.
> > >
> >
> > (Unrelated to this patchset) I think there should also a way to get
> > the exact memcg stats. As the machines are getting bigger (more cpus
> > and larger basic page size) the accuracy of stats are getting worse.
> > Internally we have an additional interface memory.stat_exact for that.
> > However I am not sure in the upstream kernel will an additional
> > interface is better or something like /proc/sys/vm/stat_refresh which
> > sync all per-cpu stats.
>
> I was thinking about eventually consistent counters: sync them periodically
> from a worker thread. It should keep the cost of reading small, but
> should increase the accuracy. Will it work for you?

Worker thread based solution seems fine to me but Johannes said it
would be best to not traverse the whole tree every few seconds.

  reply	other threads:[~2019-04-12 20:50 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 15:15 [PATCH 0/4] mm: memcontrol: memory.stat cost & correctness Johannes Weiner
2019-04-12 15:15 ` [PATCH 1/4] mm: memcontrol: make cgroup stats and events query API explicitly local Johannes Weiner
2019-04-12 15:15 ` [PATCH 2/4] mm: memcontrol: move stat/event counting functions out-of-line Johannes Weiner
2019-04-12 15:15 ` [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty Johannes Weiner
2019-04-12 19:55   ` Shakeel Butt
2019-04-12 20:10     ` Johannes Weiner
2019-04-12 20:38       ` Shakeel Butt
2019-04-12 20:15     ` Roman Gushchin
2019-04-12 20:50       ` Shakeel Butt [this message]
2019-04-12 15:15 ` [PATCH 4/4] mm: memcontrol: fix NUMA round-robin reclaim at intermediate level Johannes Weiner
2019-04-12 20:07 ` [PATCH 0/4] mm: memcontrol: memory.stat cost & correctness Shakeel Butt
2019-04-12 22:04 ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALvZod6jPxRb=ZEtErvZ4nJnObhQN05ECO9d_=HF0UWsCZExNw@mail.gmail.com' \
    --to=shakeelb@google.com \
    --cc=Kernel-team@fb.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).