archive mirror
 help / color / mirror / Atom feed
From: Shakeel Butt <>
To: Johannes Weiner <>
Cc: Andrew Morton <>,
	Linux MM <>, Cgroups <>,
	LKML <>,, Roman Gushchin <>
Subject: Re: [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty
Date: Fri, 12 Apr 2019 13:38:07 -0700	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>

On Fri, Apr 12, 2019 at 1:10 PM Johannes Weiner <> wrote:
> On Fri, Apr 12, 2019 at 12:55:10PM -0700, Shakeel Butt wrote:
> > We also faced this exact same issue as well and had the similar solution.
> >
> > > Signed-off-by: Johannes Weiner <>
> >
> > Reviewed-by: Shakeel Butt <>
> Thanks for the review!
> > (Unrelated to this patchset) I think there should also a way to get
> > the exact memcg stats. As the machines are getting bigger (more cpus
> > and larger basic page size) the accuracy of stats are getting worse.
> > Internally we have an additional interface memory.stat_exact for that.
> > However I am not sure in the upstream kernel will an additional
> > interface is better or something like /proc/sys/vm/stat_refresh which
> > sync all per-cpu stats.
> I was talking to Roman about this earlier as well and he mentioned it
> would be nice to have periodic flushing of the per-cpu caches. The
> global vmstat has something similar. We might be able to hook into
> those workers, but it would likely require some smarts so we don't
> walk the entire cgroup tree every couple of seconds.
> We haven't had any actual problems with the per-cpu fuzziness, mainly
> because the cgroups of interest also grow in size as the machines get
> bigger, and so the relative error doesn't increase.

Yes, this is very machine size dependent. We see this issue more often
on larger machines.

> Are your requirements that the error dissipates over time (waiting for
> a threshold convergence somewhere?) or do you have automation that
> gets decisions wrong due to the error at any given point in time?

Not sure about the first one but we do have the second case. The node
controller does make decisions in an online way based on the stats.
Also we do periodically collect and store stats for all jobs across
the fleet. This data is processed (offline) and is used in a lot of
ways. The inaccuracy in the stats do affect all that analysis
particularly for small jobs.

  reply	other threads:[~2019-04-12 20:38 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-12 15:15 [PATCH 0/4] mm: memcontrol: memory.stat cost & correctness Johannes Weiner
2019-04-12 15:15 ` [PATCH 1/4] mm: memcontrol: make cgroup stats and events query API explicitly local Johannes Weiner
2019-04-12 15:15 ` [PATCH 2/4] mm: memcontrol: move stat/event counting functions out-of-line Johannes Weiner
2019-04-12 15:15 ` [PATCH 3/4] mm: memcontrol: fix recursive statistics correctness & scalabilty Johannes Weiner
2019-04-12 19:55   ` Shakeel Butt
2019-04-12 20:10     ` Johannes Weiner
2019-04-12 20:38       ` Shakeel Butt [this message]
2019-04-12 20:15     ` Roman Gushchin
2019-04-12 20:50       ` Shakeel Butt
2019-04-12 15:15 ` [PATCH 4/4] mm: memcontrol: fix NUMA round-robin reclaim at intermediate level Johannes Weiner
2019-04-12 20:07 ` [PATCH 0/4] mm: memcontrol: memory.stat cost & correctness Shakeel Butt
2019-04-12 22:04 ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='' \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).