All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ying Han <yinghan@google.com>
To: Johannes Weiner <jweiner@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Balbir Singh <bsingharora@gmail.com>,
	Andrew Brestic <abrestic@google.com>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] Revert "memcg: add memory.vmscan_stat"
Date: Wed, 31 Aug 2011 23:05:51 -0700	[thread overview]
Message-ID: <CALWz4iyXbrgcrZEOsgvvW9mu6fr7Qwbn2d1FR_BVw6R_pMZPsQ@mail.gmail.com> (raw)
In-Reply-To: <20110830084245.GC13061@redhat.com>

On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
>> On Tue, 30 Aug 2011 09:04:24 +0200
>> Johannes Weiner <jweiner@redhat.com> wrote:
>>
>> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
>> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
>> > >   spin_lock(&memcg->scanstat.lock);
>> > >   __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
>> > >   spin_unlock(&memcg->scanstat.lock);
>> > > -
>> > > - memcg = rec->root;
>> > > - spin_lock(&memcg->scanstat.lock);
>> > > - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
>> > > - spin_unlock(&memcg->scanstat.lock);
>> > > + cgroup = memcg->css.cgroup;
>> > > + do {
>> > > +         spin_lock(&memcg->scanstat.lock);
>> > > +         __mem_cgroup_record_scanstat(
>> > > +                 memcg->scanstat.hierarchy_stats[context], rec);
>> > > +         spin_unlock(&memcg->scanstat.lock);
>> > > +         if (!cgroup->parent)
>> > > +                 break;
>> > > +         cgroup = cgroup->parent;
>> > > +         memcg = mem_cgroup_from_cont(cgroup);
>> > > + } while (memcg->use_hierarchy && memcg != rec->root);
>> >
>> > Okay, so this looks correct, but it sums up all parents after each
>> > memcg scanned, which could have a performance impact.  Usually,
>> > hierarchy statistics are only summed up when a user reads them.
>> >
>> Hmm. But sum-at-read doesn't work.
>>
>> Assume 3 cgroups in a hierarchy.
>>
>>       A
>>        /
>>       B
>>      /
>>     C
>>
>> C's scan contains 3 causes.
>>       C's scan caused by limit of A.
>>       C's scan caused by limit of B.
>>       C's scan caused by limit of C.
>>
>> If we make hierarchy sum at read, we think
>>       B's scan_stat = B's scan_stat + C's scan_stat
>> But in precice, this is
>>
>>       B's scan_stat = B's scan_stat caused by B +
>>                       B's scan_stat caused by A +
>>                       C's scan_stat caused by C +
>>                       C's scan_stat caused by B +
>>                       C's scan_stat caused by A.
>>
>> In orignal version.
>>       B's scan_stat = B's scan_stat caused by B +
>>                       C's scan_stat caused by B +
>>
>> After this patch,
>>       B's scan_stat = B's scan_stat caused by B +
>>                       B's scan_stat caused by A +
>>                       C's scan_stat caused by C +
>>                       C's scan_stat caused by B +
>>                       C's scan_stat caused by A.
>>
>> Hmm...removing hierarchy part completely seems fine to me.
>
> I see.
>
> You want to look at A and see whether its limit was responsible for
> reclaim scans in any children.  IMO, that is asking the question
> backwards.  Instead, there is a cgroup under reclaim and one wants to
> find out the cause for that.  Not the other way round.
>
> In my original proposal I suggested differentiating reclaim caused by
> internal pressure (due to own limit) and reclaim caused by
> external/hierarchical pressure (due to limits from parents).
>
> If you want to find out why C is under reclaim, look at its reclaim
> statistics.  If the _limit numbers are high, C's limit is the problem.
> If the _hierarchical numbers are high, the problem is B, A, or
> physical memory, so you check B for _limit and _hierarchical as well,
> then move on to A.
>
> Implementing this would be as easy as passing not only the memcg to
> scan (victim) to the reclaim code, but also the memcg /causing/ the
> reclaim (root_mem):
>
>        root_mem == victim -> account to victim as _limit
>        root_mem != victim -> account to victim as _hierarchical
>
> This would make things much simpler and more natural, both the code
> and the way of tracking down a problem, IMO.

This is pretty much the stats I am currently using for debugging the
reclaim patches. For example:

scanned_pages_by_system 0
scanned_pages_by_system_under_hierarchy 50989

scanned_pages_by_limit 0
scanned_pages_by_limit_under_hierarchy 0

"_system" is count under global reclaim, and "_limit" is count under
per-memcg reclaim.
"_under_hiearchy" is set if memcg is not the one triggering pressure.

So in the previous example:

>       A (root)
>        /
>       B
>      /
>     C

For cgroup C:
scanned_pages_by_system:
scanned_pages_by_system_under_hierarchy: # of pages scanned under
global memory pressure

scanned_pages_by_limit: # of pages scanned while C hits the limit
scanned_pages_by_limit_under_hierarchy: # of pages scanned while B
hits the limit

--Ying

>
>> > I don't get why this has to be done completely different from the way
>> > we usually do things, without any justification, whatsoever.
>> >
>> > Why do you want to pass a recording structure down the reclaim stack?
>>
>> Just for reducing number of passed variables.
>
> It's still sitting on bottom of the reclaim stack the whole time.
>
> With my proposal, you would only need to pass the extra root_mem
> pointer.
>

WARNING: multiple messages have this Message-ID (diff)
From: Ying Han <yinghan@google.com>
To: Johannes Weiner <jweiner@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>,
	Balbir Singh <bsingharora@gmail.com>,
	Andrew Brestic <abrestic@google.com>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [patch] Revert "memcg: add memory.vmscan_stat"
Date: Wed, 31 Aug 2011 23:05:51 -0700	[thread overview]
Message-ID: <CALWz4iyXbrgcrZEOsgvvW9mu6fr7Qwbn2d1FR_BVw6R_pMZPsQ@mail.gmail.com> (raw)
In-Reply-To: <20110830084245.GC13061@redhat.com>

On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner <jweiner@redhat.com> wrote:
> On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote:
>> On Tue, 30 Aug 2011 09:04:24 +0200
>> Johannes Weiner <jweiner@redhat.com> wrote:
>>
>> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote:
>> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s
>> > >   spin_lock(&memcg->scanstat.lock);
>> > >   __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec);
>> > >   spin_unlock(&memcg->scanstat.lock);
>> > > -
>> > > - memcg = rec->root;
>> > > - spin_lock(&memcg->scanstat.lock);
>> > > - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec);
>> > > - spin_unlock(&memcg->scanstat.lock);
>> > > + cgroup = memcg->css.cgroup;
>> > > + do {
>> > > +         spin_lock(&memcg->scanstat.lock);
>> > > +         __mem_cgroup_record_scanstat(
>> > > +                 memcg->scanstat.hierarchy_stats[context], rec);
>> > > +         spin_unlock(&memcg->scanstat.lock);
>> > > +         if (!cgroup->parent)
>> > > +                 break;
>> > > +         cgroup = cgroup->parent;
>> > > +         memcg = mem_cgroup_from_cont(cgroup);
>> > > + } while (memcg->use_hierarchy && memcg != rec->root);
>> >
>> > Okay, so this looks correct, but it sums up all parents after each
>> > memcg scanned, which could have a performance impact.  Usually,
>> > hierarchy statistics are only summed up when a user reads them.
>> >
>> Hmm. But sum-at-read doesn't work.
>>
>> Assume 3 cgroups in a hierarchy.
>>
>>       A
>>        /
>>       B
>>      /
>>     C
>>
>> C's scan contains 3 causes.
>>       C's scan caused by limit of A.
>>       C's scan caused by limit of B.
>>       C's scan caused by limit of C.
>>
>> If we make hierarchy sum at read, we think
>>       B's scan_stat = B's scan_stat + C's scan_stat
>> But in precice, this is
>>
>>       B's scan_stat = B's scan_stat caused by B +
>>                       B's scan_stat caused by A +
>>                       C's scan_stat caused by C +
>>                       C's scan_stat caused by B +
>>                       C's scan_stat caused by A.
>>
>> In orignal version.
>>       B's scan_stat = B's scan_stat caused by B +
>>                       C's scan_stat caused by B +
>>
>> After this patch,
>>       B's scan_stat = B's scan_stat caused by B +
>>                       B's scan_stat caused by A +
>>                       C's scan_stat caused by C +
>>                       C's scan_stat caused by B +
>>                       C's scan_stat caused by A.
>>
>> Hmm...removing hierarchy part completely seems fine to me.
>
> I see.
>
> You want to look at A and see whether its limit was responsible for
> reclaim scans in any children.  IMO, that is asking the question
> backwards.  Instead, there is a cgroup under reclaim and one wants to
> find out the cause for that.  Not the other way round.
>
> In my original proposal I suggested differentiating reclaim caused by
> internal pressure (due to own limit) and reclaim caused by
> external/hierarchical pressure (due to limits from parents).
>
> If you want to find out why C is under reclaim, look at its reclaim
> statistics.  If the _limit numbers are high, C's limit is the problem.
> If the _hierarchical numbers are high, the problem is B, A, or
> physical memory, so you check B for _limit and _hierarchical as well,
> then move on to A.
>
> Implementing this would be as easy as passing not only the memcg to
> scan (victim) to the reclaim code, but also the memcg /causing/ the
> reclaim (root_mem):
>
>        root_mem == victim -> account to victim as _limit
>        root_mem != victim -> account to victim as _hierarchical
>
> This would make things much simpler and more natural, both the code
> and the way of tracking down a problem, IMO.

This is pretty much the stats I am currently using for debugging the
reclaim patches. For example:

scanned_pages_by_system 0
scanned_pages_by_system_under_hierarchy 50989

scanned_pages_by_limit 0
scanned_pages_by_limit_under_hierarchy 0

"_system" is count under global reclaim, and "_limit" is count under
per-memcg reclaim.
"_under_hiearchy" is set if memcg is not the one triggering pressure.

So in the previous example:

>       A (root)
>        /
>       B
>      /
>     C

For cgroup C:
scanned_pages_by_system:
scanned_pages_by_system_under_hierarchy: # of pages scanned under
global memory pressure

scanned_pages_by_limit: # of pages scanned while C hits the limit
scanned_pages_by_limit_under_hierarchy: # of pages scanned while B
hits the limit

--Ying

>
>> > I don't get why this has to be done completely different from the way
>> > we usually do things, without any justification, whatsoever.
>> >
>> > Why do you want to pass a recording structure down the reclaim stack?
>>
>> Just for reducing number of passed variables.
>
> It's still sitting on bottom of the reclaim stack the whole time.
>
> With my proposal, you would only need to pass the extra root_mem
> pointer.
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2011-09-01  6:05 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-07-22  8:15 [PATCH v3] memcg: add memory.vmscan_stat KAMEZAWA Hiroyuki
2011-07-22  8:15 ` KAMEZAWA Hiroyuki
2011-08-08 12:43 ` Johannes Weiner
2011-08-08 12:43   ` Johannes Weiner
2011-08-08 23:33   ` KAMEZAWA Hiroyuki
2011-08-08 23:33     ` KAMEZAWA Hiroyuki
2011-08-09  8:01     ` Johannes Weiner
2011-08-09  8:01       ` Johannes Weiner
2011-08-09  8:01       ` KAMEZAWA Hiroyuki
2011-08-09  8:01         ` KAMEZAWA Hiroyuki
2011-08-13  1:04         ` Ying Han
2011-08-13  1:04           ` Ying Han
2011-08-29 15:51     ` [patch] Revert "memcg: add memory.vmscan_stat" Johannes Weiner
2011-08-29 15:51       ` Johannes Weiner
2011-08-30  1:12       ` KAMEZAWA Hiroyuki
2011-08-30  1:12         ` KAMEZAWA Hiroyuki
2011-08-30  7:04         ` Johannes Weiner
2011-08-30  7:04           ` Johannes Weiner
2011-08-30  7:20           ` KAMEZAWA Hiroyuki
2011-08-30  7:20             ` KAMEZAWA Hiroyuki
2011-08-30  7:35             ` KAMEZAWA Hiroyuki
2011-08-30  7:35               ` KAMEZAWA Hiroyuki
2011-08-30  8:42             ` Johannes Weiner
2011-08-30  8:42               ` Johannes Weiner
2011-08-30  8:56               ` KAMEZAWA Hiroyuki
2011-08-30  8:56                 ` KAMEZAWA Hiroyuki
2011-08-30 10:17                 ` Johannes Weiner
2011-08-30 10:17                   ` Johannes Weiner
2011-08-30 10:34                   ` KAMEZAWA Hiroyuki
2011-08-30 10:34                     ` KAMEZAWA Hiroyuki
2011-08-30 11:03                     ` Johannes Weiner
2011-08-30 11:03                       ` Johannes Weiner
2011-08-30 23:38                       ` KAMEZAWA Hiroyuki
2011-08-30 23:38                         ` KAMEZAWA Hiroyuki
2011-08-30 10:38                   ` KAMEZAWA Hiroyuki
2011-08-30 10:38                     ` KAMEZAWA Hiroyuki
2011-08-30 11:32                     ` Johannes Weiner
2011-08-30 11:32                       ` Johannes Weiner
2011-08-30 23:29                       ` KAMEZAWA Hiroyuki
2011-08-30 23:29                         ` KAMEZAWA Hiroyuki
2011-08-31  6:23                         ` Johannes Weiner
2011-08-31  6:23                           ` Johannes Weiner
2011-08-31  6:30                           ` KAMEZAWA Hiroyuki
2011-08-31  6:30                             ` KAMEZAWA Hiroyuki
2011-08-31  8:33                             ` Johannes Weiner
2011-08-31  8:33                               ` Johannes Weiner
2011-09-01  6:05               ` Ying Han [this message]
2011-09-01  6:05                 ` Ying Han
2011-09-01  6:40                 ` Johannes Weiner
2011-09-01  6:40                   ` Johannes Weiner
2011-09-01  7:04                   ` Ying Han
2011-09-01  7:04                     ` Ying Han
2011-09-01  8:27                     ` Johannes Weiner
2011-09-01  8:27                       ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALWz4iyXbrgcrZEOsgvvW9mu6fr7Qwbn2d1FR_BVw6R_pMZPsQ@mail.gmail.com \
    --to=yinghan@google.com \
    --cc=abrestic@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=jweiner@redhat.com \
    --cc=kamezawa.hiroyu@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=nishimura@mxp.nes.nec.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.