From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753955Ab1IAGF6 (ORCPT ); Thu, 1 Sep 2011 02:05:58 -0400 Received: from smtp-out.google.com ([216.239.44.51]:57126 "EHLO smtp-out.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752937Ab1IAGF5 convert rfc822-to-8bit (ORCPT ); Thu, 1 Sep 2011 02:05:57 -0400 DomainKey-Signature: a=rsa-sha1; s=beta; d=google.com; c=nofws; q=dns; h=dkim-signature:mime-version:in-reply-to:references:date: message-id:subject:from:to:cc:content-type: content-transfer-encoding:x-system-of-record; b=kEGF65OozcP1i9eEhoPr+jXlu1I6kR/OLjN1nw190E6bfAOkSFzlpbL24UlVAwi7P bloIASHcheDrttzicKIWg== MIME-Version: 1.0 In-Reply-To: <20110830084245.GC13061@redhat.com> References: <20110722171540.74eb9aa7.kamezawa.hiroyu@jp.fujitsu.com> <20110808124333.GA31739@redhat.com> <20110809083345.46cbc8de.kamezawa.hiroyu@jp.fujitsu.com> <20110829155113.GA21661@redhat.com> <20110830101233.ae416284.kamezawa.hiroyu@jp.fujitsu.com> <20110830070424.GA13061@redhat.com> <20110830162050.f6c13c0c.kamezawa.hiroyu@jp.fujitsu.com> <20110830084245.GC13061@redhat.com> Date: Wed, 31 Aug 2011 23:05:51 -0700 Message-ID: Subject: Re: [patch] Revert "memcg: add memory.vmscan_stat" From: Ying Han To: Johannes Weiner Cc: KAMEZAWA Hiroyuki , Andrew Morton , Daisuke Nishimura , Balbir Singh , Andrew Brestic , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8BIT X-System-Of-Record: true Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner wrote: > On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote: >> On Tue, 30 Aug 2011 09:04:24 +0200 >> Johannes Weiner wrote: >> >> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote: >> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s >> > >   spin_lock(&memcg->scanstat.lock); >> > >   __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec); >> > >   spin_unlock(&memcg->scanstat.lock); >> > > - >> > > - memcg = rec->root; >> > > - spin_lock(&memcg->scanstat.lock); >> > > - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], rec); >> > > - spin_unlock(&memcg->scanstat.lock); >> > > + cgroup = memcg->css.cgroup; >> > > + do { >> > > +         spin_lock(&memcg->scanstat.lock); >> > > +         __mem_cgroup_record_scanstat( >> > > +                 memcg->scanstat.hierarchy_stats[context], rec); >> > > +         spin_unlock(&memcg->scanstat.lock); >> > > +         if (!cgroup->parent) >> > > +                 break; >> > > +         cgroup = cgroup->parent; >> > > +         memcg = mem_cgroup_from_cont(cgroup); >> > > + } while (memcg->use_hierarchy && memcg != rec->root); >> > >> > Okay, so this looks correct, but it sums up all parents after each >> > memcg scanned, which could have a performance impact.  Usually, >> > hierarchy statistics are only summed up when a user reads them. >> > >> Hmm. But sum-at-read doesn't work. >> >> Assume 3 cgroups in a hierarchy. >> >>       A >>        / >>       B >>      / >>     C >> >> C's scan contains 3 causes. >>       C's scan caused by limit of A. >>       C's scan caused by limit of B. >>       C's scan caused by limit of C. >> >> If we make hierarchy sum at read, we think >>       B's scan_stat = B's scan_stat + C's scan_stat >> But in precice, this is >> >>       B's scan_stat = B's scan_stat caused by B + >>                       B's scan_stat caused by A + >>                       C's scan_stat caused by C + >>                       C's scan_stat caused by B + >>                       C's scan_stat caused by A. >> >> In orignal version. >>       B's scan_stat = B's scan_stat caused by B + >>                       C's scan_stat caused by B + >> >> After this patch, >>       B's scan_stat = B's scan_stat caused by B + >>                       B's scan_stat caused by A + >>                       C's scan_stat caused by C + >>                       C's scan_stat caused by B + >>                       C's scan_stat caused by A. >> >> Hmm...removing hierarchy part completely seems fine to me. > > I see. > > You want to look at A and see whether its limit was responsible for > reclaim scans in any children.  IMO, that is asking the question > backwards.  Instead, there is a cgroup under reclaim and one wants to > find out the cause for that.  Not the other way round. > > In my original proposal I suggested differentiating reclaim caused by > internal pressure (due to own limit) and reclaim caused by > external/hierarchical pressure (due to limits from parents). > > If you want to find out why C is under reclaim, look at its reclaim > statistics.  If the _limit numbers are high, C's limit is the problem. > If the _hierarchical numbers are high, the problem is B, A, or > physical memory, so you check B for _limit and _hierarchical as well, > then move on to A. > > Implementing this would be as easy as passing not only the memcg to > scan (victim) to the reclaim code, but also the memcg /causing/ the > reclaim (root_mem): > >        root_mem == victim -> account to victim as _limit >        root_mem != victim -> account to victim as _hierarchical > > This would make things much simpler and more natural, both the code > and the way of tracking down a problem, IMO. This is pretty much the stats I am currently using for debugging the reclaim patches. For example: scanned_pages_by_system 0 scanned_pages_by_system_under_hierarchy 50989 scanned_pages_by_limit 0 scanned_pages_by_limit_under_hierarchy 0 "_system" is count under global reclaim, and "_limit" is count under per-memcg reclaim. "_under_hiearchy" is set if memcg is not the one triggering pressure. So in the previous example: > A (root) > / > B > / > C For cgroup C: scanned_pages_by_system: scanned_pages_by_system_under_hierarchy: # of pages scanned under global memory pressure scanned_pages_by_limit: # of pages scanned while C hits the limit scanned_pages_by_limit_under_hierarchy: # of pages scanned while B hits the limit --Ying > >> > I don't get why this has to be done completely different from the way >> > we usually do things, without any justification, whatsoever. >> > >> > Why do you want to pass a recording structure down the reclaim stack? >> >> Just for reducing number of passed variables. > > It's still sitting on bottom of the reclaim stack the whole time. > > With my proposal, you would only need to pass the extra root_mem > pointer. > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id E54596B016A for ; Thu, 1 Sep 2011 02:06:03 -0400 (EDT) Received: from hpaq1.eem.corp.google.com (hpaq1.eem.corp.google.com [172.25.149.1]) by smtp-out.google.com with ESMTP id p8165wIL017640 for ; Wed, 31 Aug 2011 23:05:58 -0700 Received: from qyk30 (qyk30.prod.google.com [10.241.83.158]) by hpaq1.eem.corp.google.com with ESMTP id p8165c8i006170 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Wed, 31 Aug 2011 23:05:56 -0700 Received: by qyk30 with SMTP id 30so1026553qyk.2 for ; Wed, 31 Aug 2011 23:05:52 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110830084245.GC13061@redhat.com> References: <20110722171540.74eb9aa7.kamezawa.hiroyu@jp.fujitsu.com> <20110808124333.GA31739@redhat.com> <20110809083345.46cbc8de.kamezawa.hiroyu@jp.fujitsu.com> <20110829155113.GA21661@redhat.com> <20110830101233.ae416284.kamezawa.hiroyu@jp.fujitsu.com> <20110830070424.GA13061@redhat.com> <20110830162050.f6c13c0c.kamezawa.hiroyu@jp.fujitsu.com> <20110830084245.GC13061@redhat.com> Date: Wed, 31 Aug 2011 23:05:51 -0700 Message-ID: Subject: Re: [patch] Revert "memcg: add memory.vmscan_stat" From: Ying Han Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Johannes Weiner Cc: KAMEZAWA Hiroyuki , Andrew Morton , Daisuke Nishimura , Balbir Singh , Andrew Brestic , Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org On Tue, Aug 30, 2011 at 1:42 AM, Johannes Weiner wrote= : > On Tue, Aug 30, 2011 at 04:20:50PM +0900, KAMEZAWA Hiroyuki wrote: >> On Tue, 30 Aug 2011 09:04:24 +0200 >> Johannes Weiner wrote: >> >> > On Tue, Aug 30, 2011 at 10:12:33AM +0900, KAMEZAWA Hiroyuki wrote: >> > > @@ -1710,11 +1711,18 @@ static void mem_cgroup_record_scanstat(s >> > > =A0 spin_lock(&memcg->scanstat.lock); >> > > =A0 __mem_cgroup_record_scanstat(memcg->scanstat.stats[context], rec= ); >> > > =A0 spin_unlock(&memcg->scanstat.lock); >> > > - >> > > - memcg =3D rec->root; >> > > - spin_lock(&memcg->scanstat.lock); >> > > - __mem_cgroup_record_scanstat(memcg->scanstat.rootstats[context], r= ec); >> > > - spin_unlock(&memcg->scanstat.lock); >> > > + cgroup =3D memcg->css.cgroup; >> > > + do { >> > > + =A0 =A0 =A0 =A0 spin_lock(&memcg->scanstat.lock); >> > > + =A0 =A0 =A0 =A0 __mem_cgroup_record_scanstat( >> > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 memcg->scanstat.hierarchy_stats[co= ntext], rec); >> > > + =A0 =A0 =A0 =A0 spin_unlock(&memcg->scanstat.lock); >> > > + =A0 =A0 =A0 =A0 if (!cgroup->parent) >> > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 break; >> > > + =A0 =A0 =A0 =A0 cgroup =3D cgroup->parent; >> > > + =A0 =A0 =A0 =A0 memcg =3D mem_cgroup_from_cont(cgroup); >> > > + } while (memcg->use_hierarchy && memcg !=3D rec->root); >> > >> > Okay, so this looks correct, but it sums up all parents after each >> > memcg scanned, which could have a performance impact. =A0Usually, >> > hierarchy statistics are only summed up when a user reads them. >> > >> Hmm. But sum-at-read doesn't work. >> >> Assume 3 cgroups in a hierarchy. >> >> =A0 =A0 =A0 A >> =A0 =A0 =A0 =A0/ >> =A0 =A0 =A0 B >> =A0 =A0 =A0/ >> =A0 =A0 C >> >> C's scan contains 3 causes. >> =A0 =A0 =A0 C's scan caused by limit of A. >> =A0 =A0 =A0 C's scan caused by limit of B. >> =A0 =A0 =A0 C's scan caused by limit of C. >> >> If we make hierarchy sum at read, we think >> =A0 =A0 =A0 B's scan_stat =3D B's scan_stat + C's scan_stat >> But in precice, this is >> >> =A0 =A0 =A0 B's scan_stat =3D B's scan_stat caused by B + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 B's scan_stat caused by A + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 C's scan_stat caused by C + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 C's scan_stat caused by B + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 C's scan_stat caused by A. >> >> In orignal version. >> =A0 =A0 =A0 B's scan_stat =3D B's scan_stat caused by B + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 C's scan_stat caused by B + >> >> After this patch, >> =A0 =A0 =A0 B's scan_stat =3D B's scan_stat caused by B + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 B's scan_stat caused by A + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 C's scan_stat caused by C + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 C's scan_stat caused by B + >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 C's scan_stat caused by A. >> >> Hmm...removing hierarchy part completely seems fine to me. > > I see. > > You want to look at A and see whether its limit was responsible for > reclaim scans in any children. =A0IMO, that is asking the question > backwards. =A0Instead, there is a cgroup under reclaim and one wants to > find out the cause for that. =A0Not the other way round. > > In my original proposal I suggested differentiating reclaim caused by > internal pressure (due to own limit) and reclaim caused by > external/hierarchical pressure (due to limits from parents). > > If you want to find out why C is under reclaim, look at its reclaim > statistics. =A0If the _limit numbers are high, C's limit is the problem. > If the _hierarchical numbers are high, the problem is B, A, or > physical memory, so you check B for _limit and _hierarchical as well, > then move on to A. > > Implementing this would be as easy as passing not only the memcg to > scan (victim) to the reclaim code, but also the memcg /causing/ the > reclaim (root_mem): > > =A0 =A0 =A0 =A0root_mem =3D=3D victim -> account to victim as _limit > =A0 =A0 =A0 =A0root_mem !=3D victim -> account to victim as _hierarchical > > This would make things much simpler and more natural, both the code > and the way of tracking down a problem, IMO. This is pretty much the stats I am currently using for debugging the reclaim patches. For example: scanned_pages_by_system 0 scanned_pages_by_system_under_hierarchy 50989 scanned_pages_by_limit 0 scanned_pages_by_limit_under_hierarchy 0 "_system" is count under global reclaim, and "_limit" is count under per-memcg reclaim. "_under_hiearchy" is set if memcg is not the one triggering pressure. So in the previous example: > A (root) > / > B > / > C For cgroup C: scanned_pages_by_system: scanned_pages_by_system_under_hierarchy: # of pages scanned under global memory pressure scanned_pages_by_limit: # of pages scanned while C hits the limit scanned_pages_by_limit_under_hierarchy: # of pages scanned while B hits the limit --Ying > >> > I don't get why this has to be done completely different from the way >> > we usually do things, without any justification, whatsoever. >> > >> > Why do you want to pass a recording structure down the reclaim stack? >> >> Just for reducing number of passed variables. > > It's still sitting on bottom of the reclaim stack the whole time. > > With my proposal, you would only need to pass the extra root_mem > pointer. > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org