linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Ying Han <yinghan@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Balbir Singh <bsingharora@gmail.com>,
	cgroups@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [patch 1/2] mm: memcg: per-memcg reclaim statistics
Date: Wed, 11 Jan 2012 01:30:20 +0100	[thread overview]
Message-ID: <20120111003020.GD24386@cmpxchg.org> (raw)
In-Reply-To: <CALWz4izbTw4+7zbfiED9Lx=6RwiqxE11g5-fNRHTh=mcP=vQ2Q@mail.gmail.com>

On Tue, Jan 10, 2012 at 03:54:05PM -0800, Ying Han wrote:
> Thank you for the patch and the stats looks reasonable to me, few
> questions as below:
> 
> On Tue, Jan 10, 2012 at 7:02 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > With the single per-zone LRU gone and global reclaim scanning
> > individual memcgs, it's straight-forward to collect meaningful and
> > accurate per-memcg reclaim statistics.
> >
> > This adds the following items to memory.stat:
> 
> Some of the previous discussions including patches have similar stats
> in memory.vmscan_stat API, which collects all the per-memcg vmscan
> stats. I would like to understand more why we add into memory.stat
> instead, and do we have plan to keep extending memory.stat for those
> vmstat like stats?

I think they were put into an extra file in particular to be able to
write to this file to reset the statistics.  But in my opinion, it's
trivial to calculate a delta from before and after running a workload,
so I didn't really like adding kernel code for that.

Did you have another reason for a separate file in mind?

> > pgreclaim
> 
> Not sure if we want to keep this more consistent to /proc/vmstat, then
> it will be "pgsteal"?

The problem with that was that we didn't like to call pages stolen
when they were reclaimed from within the cgroup, so we had pgfree for
inner reclaim and pgsteal for outer reclaim, respectively.

I found it cleaner to just go with pgreclaim, it's unambiguous and
straight-forward.  Outer reclaim is designated by the hierarchy_
prefix.

> > pgscan
> >
> >  Number of pages reclaimed/scanned from that memcg due to its own
> >  hard limit (or physical limit in case of the root memcg) by the
> >  allocating task.
> >
> > kswapd_pgreclaim
> > kswapd_pgscan
> 
> we have "pgscan_kswapd_*" in vmstat, so maybe ?
> "pgsteal_kswapd"
> "pgscan_kswapd"
> 
> >  Reclaim activity from kswapd due to the memcg's own limit.  Only
> >  applicable to the root memcg for now since kswapd is only triggered
> >  by physical limits, but kswapd-style reclaim based on memcg hard
> >  limits is being developped.
> >
> > hierarchy_pgreclaim
> > hierarchy_pgscan
> > hierarchy_kswapd_pgreclaim
> > hierarchy_kswapd_pgscan
> 
> "pgsteal_hierarchy"
> "pgsteal_kswapd_hierarchy"
> ..
> 
> No strong option on the naming, but try to make it more consistent to
> existing API.

I swear I tried, but the existing naming is pretty screwed up :(

For example, pgscan_direct_* and pgscan_kswapd_* allow you to compare
scan rates of direct reclaim vs. kswapd reclaim.  To get the total
number of pages reclaimed, you sum them up.

On the other hand, pgsteal_* does not differentiate between direct
reclaim and kswapd, so to get direct reclaim numbers, you add up the
pgsteal_* counters and subtract kswapd_steal (notice the lack of pg?),
which is in turn not available at zone granularity.

> > +#define MEM_CGROUP_EVENTS_KSWAPD 2
> > +#define MEM_CGROUP_EVENTS_HIERARCHY 4

These two function as namespaces, that's why I put hierarchy_ and
kswapd_ at the beginning of the names.

Given that we have kswapd_steal, would you be okay with doing it like
this?  I mean, at least my naming conforms to ONE of the standards in
/proc/vmstat, right? ;-)

> > @@ -91,12 +91,23 @@ enum mem_cgroup_stat_index {
> >        MEM_CGROUP_STAT_NSTATS,
> >  };
> >
> > +#define MEM_CGROUP_EVENTS_KSWAPD 2
> > +#define MEM_CGROUP_EVENTS_HIERARCHY 4
> > +
> >  enum mem_cgroup_events_index {
> >        MEM_CGROUP_EVENTS_PGPGIN,       /* # of pages paged in */
> >        MEM_CGROUP_EVENTS_PGPGOUT,      /* # of pages paged out */
> >        MEM_CGROUP_EVENTS_COUNT,        /* # of pages paged in/out */
> >        MEM_CGROUP_EVENTS_PGFAULT,      /* # of page-faults */
> >        MEM_CGROUP_EVENTS_PGMAJFAULT,   /* # of major page-faults */
> > +       MEM_CGROUP_EVENTS_PGRECLAIM,
> > +       MEM_CGROUP_EVENTS_PGSCAN,
> > +       MEM_CGROUP_EVENTS_KSWAPD_PGRECLAIM,
> > +       MEM_CGROUP_EVENTS_KSWAPD_PGSCAN,
> > +       MEM_CGROUP_EVENTS_HIERARCHY_PGRECLAIM,
> > +       MEM_CGROUP_EVENTS_HIERARCHY_PGSCAN,
> > +       MEM_CGROUP_EVENTS_HIERARCHY_KSWAPD_PGRECLAIM,
> > +       MEM_CGROUP_EVENTS_HIERARCHY_KSWAPD_PGSCAN,
> 
> missing comment here?

As if the lines weren't long enough already ;-) I'll add some.

> >        MEM_CGROUP_EVENTS_NSTATS,
> >  };
> >  /*
> > @@ -889,6 +900,38 @@ static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg)
> >        return (memcg == root_mem_cgroup);
> >  }
> >
> > +/**
> > + * mem_cgroup_account_reclaim - update per-memcg reclaim statistics
> > + * @root: memcg that triggered reclaim
> > + * @memcg: memcg that is actually being scanned
> > + * @nr_reclaimed: number of pages reclaimed from @memcg
> > + * @nr_scanned: number of pages scanned from @memcg
> > + * @kswapd: whether reclaiming task is kswapd or allocator itself
> > + */
> > +void mem_cgroup_account_reclaim(struct mem_cgroup *root,
> > +                               struct mem_cgroup *memcg,
> > +                               unsigned long nr_reclaimed,
> > +                               unsigned long nr_scanned,
> > +                               bool kswapd)
> > +{
> > +       unsigned int offset = 0;
> > +
> > +       if (!root)
> > +               root = root_mem_cgroup;
> > +
> > +       if (kswapd)
> > +               offset += MEM_CGROUP_EVENTS_KSWAPD;
> > +       if (root != memcg)
> > +               offset += MEM_CGROUP_EVENTS_HIERARCHY;
> 
> Just to be clear, here root cgroup has hierarchy_* stats always 0 ?

That's correct, there can't be any hierarchical pressure on the
topmost parent.

> Also, we might want to consider renaming the root here, something like
> target? The root is confusing with root_mem_cgroup.

It's the same naming scheme I used for the iterator functions
(mem_cgroup_iter() and friends), so if we change it, I'd like to
change it consistently.

Having target and memcg as parameters is even more confusing and
non-descriptive, IMO.

Other places use mem_over_limit, which is a bit better, but quite
long.

Any other ideas for great names for parameters that designate a
hierarchy root and a memcg in that hierarchy?

  reply	other threads:[~2012-01-11  0:30 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-10 15:02 [patch 0/2] mm: memcg reclaim integration followups Johannes Weiner
2012-01-10 15:02 ` [patch 1/2] mm: memcg: per-memcg reclaim statistics Johannes Weiner
2012-01-10 23:54   ` Ying Han
2012-01-11  0:30     ` Johannes Weiner [this message]
2012-01-11 22:33       ` Ying Han
2012-01-12  9:17         ` Johannes Weiner
2012-01-10 15:02 ` [patch 2/2] mm: memcg: hierarchical soft limit reclaim Johannes Weiner
2012-01-11 21:42   ` Ying Han
2012-01-12  8:59     ` Johannes Weiner
2012-01-13 21:31       ` Ying Han
2012-01-13 22:44         ` Johannes Weiner
2012-01-17 14:22           ` Sha
2012-01-17 14:53             ` Johannes Weiner
2012-01-17 20:25               ` Ying Han
2012-01-17 21:56                 ` Johannes Weiner
2012-01-17 23:39                   ` Ying Han
     [not found]               ` <CAFj3OHWY2Biw54gaGeH5fkxzgOhxn7NAibeYT_Jmga-_ypNSRg@mail.gmail.com>
2012-01-18  9:25                 ` Johannes Weiner
2012-01-18 11:25                   ` Sha
2012-01-18 15:27                     ` Michal Hocko
2012-01-19  6:38                       ` Sha
2012-01-12  1:54   ` KAMEZAWA Hiroyuki
2012-01-13 12:16     ` Johannes Weiner
2012-01-18  5:26       ` KAMEZAWA Hiroyuki
2012-01-13 12:04   ` Michal Hocko
2012-01-13 15:50     ` Johannes Weiner
2012-01-13 16:34       ` Michal Hocko
2012-01-13 21:45         ` Ying Han
2012-01-18  9:45           ` Johannes Weiner
2012-01-18 20:38             ` Ying Han

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120111003020.GD24386@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=yinghan@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).