All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-15 17:45 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-15 17:45 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, linux-kernel, Andrew Morton, KAMEZAWA Hiroyuki,
	nishimura, lizf, menage, KOSAKI Motohiro

Balbir Singh wrote:
> Feature: Remove the overhead associated with the root cgroup
>
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
>
> This patch changes the memory cgroup and removes the overhead associated
> with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> can
> no longer set a memory hard limit in the root cgroup.
>
> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well.
>
> Review comments higly appreciated
>
> Tests
>
> 1. Tested with allocate, touch and limit test case for a non-root cgroup
> 2. For the root cgroup tested performance impact with reaim
>
>
> 		+patch		mmtom-08-may-2009
> AIM9		1362.93		1338.17
> Dbase		17457.75	16021.58
> New Dbase	18070.18	16518.54
> Shared		9681.85		8882.11
> Compute		16197.79	15226.13
>
Hmm, at first impression, I can't convice the numbers...
Just avoiding list_add/del makes programs _10%_ faster ?
Could you show changes in cpu cache-miss late if you can ?
(And why Aim9 goes bad ?)
Hmm, page_cgroup_zoneinfo() is accessed anyway, then...per zone counter
is not a problem here..

Could you show your .config and environment ?
When I trunst above numbers, it seems there is more optimization/
prefetch point in usual path

BTW, how the perfomance changes in children(not default) groups ?

> 3. Tested accounting in root cgroup to make sure it looks sane and
> correct.
>
Not sure but swap and shmem case should be checked carefully..


> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>
>  include/linux/page_cgroup.h |   10 ++++++++++
>  mm/memcontrol.c             |   29 ++++++++++++++++++++++++++---
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 36 insertions(+), 4 deletions(-)
>
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..8b85752 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT, /* page has been accounted for */
Reading codes, this PCG_ACCT should be PCG_AcctLRU.

>  };
>
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE)
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(Acct, ACCT)
> +CLEARPCGFLAG(Acct, ACCT)
> +TESTPCGFLAG(Acct, ACCT)
> +
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
>  	return page_to_nid(pc->page);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9712ef7..18d2819 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account
> = 0 */
> @@ -196,6 +197,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }


> @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page,
> enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum
> lru_list lru)
>
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
I think set/clear flag here adds race condtion....because pc->flags is
modfied by
  pc->flags = pcg_dafault_flags[ctype] in commit_charge()
you have to modify above lines to be

  SetPageCgroupCache(pc) or some..
  ...
  SetPageCgroupUsed(pc)

Then, you can use set_bit() without lock_page_cgroup().
(Currently, pc->flags is modified only under lock_page_cgroup(), so,
 non atomic code is used.)

Regards,
-Kame



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-15 17:45 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-15 17:45 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, linux-kernel, Andrew Morton, KAMEZAWA Hiroyuki,
	nishimura, lizf, menage, KOSAKI Motohiro

Balbir Singh wrote:
> Feature: Remove the overhead associated with the root cgroup
>
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
>
> This patch changes the memory cgroup and removes the overhead associated
> with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> can
> no longer set a memory hard limit in the root cgroup.
>
> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well.
>
> Review comments higly appreciated
>
> Tests
>
> 1. Tested with allocate, touch and limit test case for a non-root cgroup
> 2. For the root cgroup tested performance impact with reaim
>
>
> 		+patch		mmtom-08-may-2009
> AIM9		1362.93		1338.17
> Dbase		17457.75	16021.58
> New Dbase	18070.18	16518.54
> Shared		9681.85		8882.11
> Compute		16197.79	15226.13
>
Hmm, at first impression, I can't convice the numbers...
Just avoiding list_add/del makes programs _10%_ faster ?
Could you show changes in cpu cache-miss late if you can ?
(And why Aim9 goes bad ?)
Hmm, page_cgroup_zoneinfo() is accessed anyway, then...per zone counter
is not a problem here..

Could you show your .config and environment ?
When I trunst above numbers, it seems there is more optimization/
prefetch point in usual path

BTW, how the perfomance changes in children(not default) groups ?

> 3. Tested accounting in root cgroup to make sure it looks sane and
> correct.
>
Not sure but swap and shmem case should be checked carefully..


> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
>
>  include/linux/page_cgroup.h |   10 ++++++++++
>  mm/memcontrol.c             |   29 ++++++++++++++++++++++++++---
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 36 insertions(+), 4 deletions(-)
>
>
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..8b85752 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT, /* page has been accounted for */
Reading codes, this PCG_ACCT should be PCG_AcctLRU.

>  };
>
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE)
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(Acct, ACCT)
> +CLEARPCGFLAG(Acct, ACCT)
> +TESTPCGFLAG(Acct, ACCT)
> +
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
>  	return page_to_nid(pc->page);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9712ef7..18d2819 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account
> = 0 */
> @@ -196,6 +197,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }


> @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page,
> enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum
> lru_list lru)
>
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
I think set/clear flag here adds race condtion....because pc->flags is
modfied by
  pc->flags = pcg_dafault_flags[ctype] in commit_charge()
you have to modify above lines to be

  SetPageCgroupCache(pc) or some..
  ...
  SetPageCgroupUsed(pc)

Then, you can use set_bit() without lock_page_cgroup().
(Currently, pc->flags is modified only under lock_page_cgroup(), so,
 non atomic code is used.)

Regards,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-15 17:45 ` KAMEZAWA Hiroyuki
@ 2009-05-15 18:16   ` Balbir Singh
  -1 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-15 18:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:

> Balbir Singh wrote:
> > Feature: Remove the overhead associated with the root cgroup
> >
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> >
> > This patch changes the memory cgroup and removes the overhead associated
> > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > can
> > no longer set a memory hard limit in the root cgroup.
> >
> > A new flag is used to track page_cgroup associated with the root cgroup
> > pages. A new flag to track whether the page has been accounted or not
> > has been added as well.
> >
> > Review comments higly appreciated
> >
> > Tests
> >
> > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > 2. For the root cgroup tested performance impact with reaim
> >
> >
> > 		+patch		mmtom-08-may-2009
> > AIM9		1362.93		1338.17
> > Dbase		17457.75	16021.58
> > New Dbase	18070.18	16518.54
> > Shared		9681.85		8882.11
> > Compute		16197.79	15226.13
> >
> Hmm, at first impression, I can't convice the numbers...
> Just avoiding list_add/del makes programs _10%_ faster ?
> Could you show changes in cpu cache-miss late if you can ?
> (And why Aim9 goes bad ?)

OK... I'll try but I am away on travel for 3 weeks :( you can try and run
this as well

> Hmm, page_cgroup_zoneinfo() is accessed anyway, then...per zone counter
> is not a problem here..
> 
> Could you show your .config and environment ?
> When I trunst above numbers, it seems there is more optimization/
> prefetch point in usual path
> 
> BTW, how the perfomance changes in children(not default) groups ?
> 

I've not seen the impact of that. I'll try.


> > 3. Tested accounting in root cgroup to make sure it looks sane and
> > correct.
> >
> Not sure but swap and shmem case should be checked carefully..
> 
> 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> >
> >  include/linux/page_cgroup.h |   10 ++++++++++
> >  mm/memcontrol.c             |   29 ++++++++++++++++++++++++++---
> >  mm/page_cgroup.c            |    1 -
> >  3 files changed, 36 insertions(+), 4 deletions(-)
> >
> >
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..8b85752 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,8 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ROOT, /* page belongs to root cgroup */
> > +	PCG_ACCT, /* page has been accounted for */
> Reading codes, this PCG_ACCT should be PCG_AcctLRU.

OK

> 
> >  };
> >
> >  #define TESTPCGFLAG(uname, lname)			\
> > @@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE)
> >  TESTPCGFLAG(Used, USED)
> >  CLEARPCGFLAG(Used, USED)
> >
> > +SETPCGFLAG(Root, ROOT)
> > +CLEARPCGFLAG(Root, ROOT)
> > +TESTPCGFLAG(Root, ROOT)
> > +
> > +SETPCGFLAG(Acct, ACCT)
> > +CLEARPCGFLAG(Acct, ACCT)
> > +TESTPCGFLAG(Acct, ACCT)
> > +
> >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> >  {
> >  	return page_to_nid(pc->page);
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 9712ef7..18d2819 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -43,6 +43,7 @@
> >
> >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> >
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  /* Turned on only when memory cgroup is enabled && really_do_swap_account
> > = 0 */
> > @@ -196,6 +197,10 @@ enum charge_type {
> >  #define PCGF_CACHE	(1UL << PCG_CACHE)
> >  #define PCGF_USED	(1UL << PCG_USED)
> >  #define PCGF_LOCK	(1UL << PCG_LOCK)
> > +/* Not used, but added here for completeness */
> > +#define PCGF_ROOT	(1UL << PCG_ROOT)
> > +#define PCGF_ACCT	(1UL << PCG_ACCT)
> > +
> >  static const unsigned long
> >  pcg_default_flags[NR_CHARGE_TYPE] = {
> >  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> > @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> > lru_list lru)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> >  	/* can happen while we handle swapcache. */
> > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
> >  		return;
> >  	/*
> >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> > @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> > lru_list lru)
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	mem = pc->mem_cgroup;
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > +	ClearPageCgroupAcct(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_del_init(&pc->lru);
> >  	return;
> >  }
> 
> 
> > @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page,
> > enum lru_list lru)
> >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> >  	 */
> >  	smp_rmb();
> > -	/* unused page is not rotated. */
> > -	if (!PageCgroupUsed(pc))
> > +	/* unused or root page is not rotated. */
> > +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum
> > lru_list lru)
> >
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > +	SetPageCgroupAcct(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> I think set/clear flag here adds race condtion....because pc->flags is
> modfied by
>   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> you have to modify above lines to be
> 
>   SetPageCgroupCache(pc) or some..
>   ...
>   SetPageCgroupUsed(pc)

Good Point

> 
> Then, you can use set_bit() without lock_page_cgroup().
> (Currently, pc->flags is modified only under lock_page_cgroup(), so,
>  non atomic code is used.)

OK.. I wonder if we can say, the _ACCT and _ROOT flags under
zone->lru_lock. I have not seen the locks held under commit_charge
fully, but we could potentially do that. Need some more thinking.

-- 
	Balbir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-15 18:16   ` Balbir Singh
  0 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-15 18:16 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:

> Balbir Singh wrote:
> > Feature: Remove the overhead associated with the root cgroup
> >
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> >
> > This patch changes the memory cgroup and removes the overhead associated
> > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > can
> > no longer set a memory hard limit in the root cgroup.
> >
> > A new flag is used to track page_cgroup associated with the root cgroup
> > pages. A new flag to track whether the page has been accounted or not
> > has been added as well.
> >
> > Review comments higly appreciated
> >
> > Tests
> >
> > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > 2. For the root cgroup tested performance impact with reaim
> >
> >
> > 		+patch		mmtom-08-may-2009
> > AIM9		1362.93		1338.17
> > Dbase		17457.75	16021.58
> > New Dbase	18070.18	16518.54
> > Shared		9681.85		8882.11
> > Compute		16197.79	15226.13
> >
> Hmm, at first impression, I can't convice the numbers...
> Just avoiding list_add/del makes programs _10%_ faster ?
> Could you show changes in cpu cache-miss late if you can ?
> (And why Aim9 goes bad ?)

OK... I'll try but I am away on travel for 3 weeks :( you can try and run
this as well

> Hmm, page_cgroup_zoneinfo() is accessed anyway, then...per zone counter
> is not a problem here..
> 
> Could you show your .config and environment ?
> When I trunst above numbers, it seems there is more optimization/
> prefetch point in usual path
> 
> BTW, how the perfomance changes in children(not default) groups ?
> 

I've not seen the impact of that. I'll try.


> > 3. Tested accounting in root cgroup to make sure it looks sane and
> > correct.
> >
> Not sure but swap and shmem case should be checked carefully..
> 
> 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> >
> >  include/linux/page_cgroup.h |   10 ++++++++++
> >  mm/memcontrol.c             |   29 ++++++++++++++++++++++++++---
> >  mm/page_cgroup.c            |    1 -
> >  3 files changed, 36 insertions(+), 4 deletions(-)
> >
> >
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..8b85752 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,8 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ROOT, /* page belongs to root cgroup */
> > +	PCG_ACCT, /* page has been accounted for */
> Reading codes, this PCG_ACCT should be PCG_AcctLRU.

OK

> 
> >  };
> >
> >  #define TESTPCGFLAG(uname, lname)			\
> > @@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE)
> >  TESTPCGFLAG(Used, USED)
> >  CLEARPCGFLAG(Used, USED)
> >
> > +SETPCGFLAG(Root, ROOT)
> > +CLEARPCGFLAG(Root, ROOT)
> > +TESTPCGFLAG(Root, ROOT)
> > +
> > +SETPCGFLAG(Acct, ACCT)
> > +CLEARPCGFLAG(Acct, ACCT)
> > +TESTPCGFLAG(Acct, ACCT)
> > +
> >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> >  {
> >  	return page_to_nid(pc->page);
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 9712ef7..18d2819 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -43,6 +43,7 @@
> >
> >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> >
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  /* Turned on only when memory cgroup is enabled && really_do_swap_account
> > = 0 */
> > @@ -196,6 +197,10 @@ enum charge_type {
> >  #define PCGF_CACHE	(1UL << PCG_CACHE)
> >  #define PCGF_USED	(1UL << PCG_USED)
> >  #define PCGF_LOCK	(1UL << PCG_LOCK)
> > +/* Not used, but added here for completeness */
> > +#define PCGF_ROOT	(1UL << PCG_ROOT)
> > +#define PCGF_ACCT	(1UL << PCG_ACCT)
> > +
> >  static const unsigned long
> >  pcg_default_flags[NR_CHARGE_TYPE] = {
> >  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> > @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> > lru_list lru)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> >  	/* can happen while we handle swapcache. */
> > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
> >  		return;
> >  	/*
> >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> > @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum
> > lru_list lru)
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	mem = pc->mem_cgroup;
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > +	ClearPageCgroupAcct(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_del_init(&pc->lru);
> >  	return;
> >  }
> 
> 
> > @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page,
> > enum lru_list lru)
> >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> >  	 */
> >  	smp_rmb();
> > -	/* unused page is not rotated. */
> > -	if (!PageCgroupUsed(pc))
> > +	/* unused or root page is not rotated. */
> > +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum
> > lru_list lru)
> >
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > +	SetPageCgroupAcct(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> I think set/clear flag here adds race condtion....because pc->flags is
> modfied by
>   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> you have to modify above lines to be
> 
>   SetPageCgroupCache(pc) or some..
>   ...
>   SetPageCgroupUsed(pc)

Good Point

> 
> Then, you can use set_bit() without lock_page_cgroup().
> (Currently, pc->flags is modified only under lock_page_cgroup(), so,
>  non atomic code is used.)

OK.. I wonder if we can say, the _ACCT and _ROOT flags under
zone->lru_lock. I have not seen the locks held under commit_charge
fully, but we could potentially do that. Need some more thinking.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-15 17:45 ` KAMEZAWA Hiroyuki
@ 2009-05-17  4:15   ` Balbir Singh
  -1 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-17  4:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:

> I think set/clear flag here adds race condtion....because pc->flags is
> modfied by
>   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> you have to modify above lines to be
> 
>   SetPageCgroupCache(pc) or some..
>   ...
>   SetPageCgroupUsed(pc)
> 
> Then, you can use set_bit() without lock_page_cgroup().
> (Currently, pc->flags is modified only under lock_page_cgroup(), so,
>  non atomic code is used.)
>

Here is the next version of the patch


Feature: Remove the overhead associated with the root cgroup

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch changes the memory cgroup and removes the overhead associated
with accounting all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag is used to track page_cgroup associated with the root cgroup
pages. A new flag to track whether the page has been accounted or not
has been added as well. Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete, but I've not removed it yet. It
provides some readability to help the code.

Tests:
1. Tested lightly, previous versions showed good performance improvement 10%.

NOTE:
I haven't got the time right now to run oprofile and get detailed test results,
since I am in the middle of travel.

Please review the code for functional correctness and if you can test
it even better. I would like to push this in, especially if the %
performance difference I am seeing is reproducible elsewhere as well.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |   12 ++++++++++++
 mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
 mm/page_cgroup.c            |    1 -
 3 files changed, 50 insertions(+), 5 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..ebdae9a 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,8 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ROOT, /* page belongs to root cgroup */
+	PCG_ACCT, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
+SETPCGFLAG(Cache, CACHE)
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
+SETPCGFLAG(Used, USED)
+
+SETPCGFLAG(Root, ROOT)
+CLEARPCGFLAG(Root, ROOT)
+TESTPCGFLAG(Root, ROOT)
+
+SETPCGFLAG(Acct, ACCT)
+CLEARPCGFLAG(Acct, ACCT)
+TESTPCGFLAG(Acct, ACCT)
 
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9712ef7..35415fc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */
@@ -196,6 +197,10 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
+/* Not used, but added here for completeness */
+#define PCGF_ROOT	(1UL << PCG_ROOT)
+#define PCGF_ACCT	(1UL << PCG_ACCT)
+
 static const unsigned long
 pcg_default_flags[NR_CHARGE_TYPE] = {
 	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
@@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
 		return;
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
@@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 	mz = page_cgroup_zoneinfo(pc);
 	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	ClearPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_del_init(&pc->lru);
 	return;
 }
@@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
-	pc->flags = pcg_default_flags[ctype];
+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}
+
+	if (mem == root_mem_cgroup)
+		SetPageCgroupRoot(pc);
 
 	mem_cgroup_charge_statistics(mem, pc, true);
 
@@ -1521,6 +1547,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	mem_cgroup_charge_statistics(mem, pc, false);
 
 	ClearPageCgroupUsed(pc);
+	if (mem == root_mem_cgroup)
+		ClearPageCgroupRoot(pc);
 	/*
 	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
@@ -2038,6 +2066,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2504,6 +2536,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2532,6 +2565,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 09b73c5..6145ff6 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 
 #endif
 
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
 static DEFINE_MUTEX(swap_cgroup_mutex);
 

-- 
	Balbir

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-17  4:15   ` Balbir Singh
  0 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-17  4:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:

> I think set/clear flag here adds race condtion....because pc->flags is
> modfied by
>   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> you have to modify above lines to be
> 
>   SetPageCgroupCache(pc) or some..
>   ...
>   SetPageCgroupUsed(pc)
> 
> Then, you can use set_bit() without lock_page_cgroup().
> (Currently, pc->flags is modified only under lock_page_cgroup(), so,
>  non atomic code is used.)
>

Here is the next version of the patch


Feature: Remove the overhead associated with the root cgroup

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch changes the memory cgroup and removes the overhead associated
with accounting all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag is used to track page_cgroup associated with the root cgroup
pages. A new flag to track whether the page has been accounted or not
has been added as well. Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete, but I've not removed it yet. It
provides some readability to help the code.

Tests:
1. Tested lightly, previous versions showed good performance improvement 10%.

NOTE:
I haven't got the time right now to run oprofile and get detailed test results,
since I am in the middle of travel.

Please review the code for functional correctness and if you can test
it even better. I would like to push this in, especially if the %
performance difference I am seeing is reproducible elsewhere as well.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |   12 ++++++++++++
 mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
 mm/page_cgroup.c            |    1 -
 3 files changed, 50 insertions(+), 5 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..ebdae9a 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,8 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ROOT, /* page belongs to root cgroup */
+	PCG_ACCT, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
+SETPCGFLAG(Cache, CACHE)
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
+SETPCGFLAG(Used, USED)
+
+SETPCGFLAG(Root, ROOT)
+CLEARPCGFLAG(Root, ROOT)
+TESTPCGFLAG(Root, ROOT)
+
+SETPCGFLAG(Acct, ACCT)
+CLEARPCGFLAG(Acct, ACCT)
+TESTPCGFLAG(Acct, ACCT)
 
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9712ef7..35415fc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */
@@ -196,6 +197,10 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
+/* Not used, but added here for completeness */
+#define PCGF_ROOT	(1UL << PCG_ROOT)
+#define PCGF_ACCT	(1UL << PCG_ACCT)
+
 static const unsigned long
 pcg_default_flags[NR_CHARGE_TYPE] = {
 	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
@@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
 		return;
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
@@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 	mz = page_cgroup_zoneinfo(pc);
 	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	ClearPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_del_init(&pc->lru);
 	return;
 }
@@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
-	pc->flags = pcg_default_flags[ctype];
+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}
+
+	if (mem == root_mem_cgroup)
+		SetPageCgroupRoot(pc);
 
 	mem_cgroup_charge_statistics(mem, pc, true);
 
@@ -1521,6 +1547,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	mem_cgroup_charge_statistics(mem, pc, false);
 
 	ClearPageCgroupUsed(pc);
+	if (mem == root_mem_cgroup)
+		ClearPageCgroupRoot(pc);
 	/*
 	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
@@ -2038,6 +2066,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2504,6 +2536,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2532,6 +2565,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 09b73c5..6145ff6 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 
 #endif
 
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
 static DEFINE_MUTEX(swap_cgroup_mutex);
 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-15 18:16   ` Balbir Singh
@ 2009-05-18 10:11     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-18 10:11 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

On Fri, 15 May 2009 23:46:39 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> 
> > Balbir Singh wrote:
> > > Feature: Remove the overhead associated with the root cgroup
> > >
> > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > >
> > > This patch changes the memory cgroup and removes the overhead associated
> > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > can
> > > no longer set a memory hard limit in the root cgroup.
> > >
> > > A new flag is used to track page_cgroup associated with the root cgroup
> > > pages. A new flag to track whether the page has been accounted or not
> > > has been added as well.
> > >
> > > Review comments higly appreciated
> > >
> > > Tests
> > >
> > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > 2. For the root cgroup tested performance impact with reaim
> > >
> > >
> > > 		+patch		mmtom-08-may-2009
> > > AIM9		1362.93		1338.17
> > > Dbase		17457.75	16021.58
> > > New Dbase	18070.18	16518.54
> > > Shared		9681.85		8882.11
> > > Compute		16197.79	15226.13
> > >
> > Hmm, at first impression, I can't convice the numbers...
> > Just avoiding list_add/del makes programs _10%_ faster ?
> > Could you show changes in cpu cache-miss late if you can ?
> > (And why Aim9 goes bad ?)
> 
> OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> this as well
> 
tested aim7 with some config.

CPU: Xeon 3.1GHz/4Core x2 (8cpu)
Memory: 32G
HDD: Usual? Scsi disk (just 1 disk)
(try_to_free_pages() etc...will never be called.)

Multiuser config. #of tasks 1100 (near to peak on my host)

10runs.
rc6mm1 score(Jobs/min)
44009.1 44844.5 44691.1 43981.9 44992.6
44544.9 44179.1 44283.0 44442.9 45033.8  average=44500

+patch
44656.8 44270.8 44706.7 44106.1 44467.6
44585.3 44167.0 44756.7 44853.9 44249.4  average=44482

Dbase config. #of tasks 25
rc6mm1 score (jobs/min)
11022.7 11018.9 11037.9 11003.8 11087.5 
11145.2 11133.6 11068.3 11091.3 11106.6 average=11071

+patch
10888.0 10973.7 10913.9 11000.0 10984.9
10996.2 10969.9 10921.3 10921.3 11053.1 average=10962

Hmm, 1% improvement ?
(I think this is reasonable score of the effect of this patch)

Anyway, I'm afraid of difference between mine and your kernel config.
plz enjoy your travel for now :)

Thanks,
-Kame













^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-18 10:11     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-18 10:11 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

On Fri, 15 May 2009 23:46:39 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> 
> > Balbir Singh wrote:
> > > Feature: Remove the overhead associated with the root cgroup
> > >
> > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > >
> > > This patch changes the memory cgroup and removes the overhead associated
> > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > can
> > > no longer set a memory hard limit in the root cgroup.
> > >
> > > A new flag is used to track page_cgroup associated with the root cgroup
> > > pages. A new flag to track whether the page has been accounted or not
> > > has been added as well.
> > >
> > > Review comments higly appreciated
> > >
> > > Tests
> > >
> > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > 2. For the root cgroup tested performance impact with reaim
> > >
> > >
> > > 		+patch		mmtom-08-may-2009
> > > AIM9		1362.93		1338.17
> > > Dbase		17457.75	16021.58
> > > New Dbase	18070.18	16518.54
> > > Shared		9681.85		8882.11
> > > Compute		16197.79	15226.13
> > >
> > Hmm, at first impression, I can't convice the numbers...
> > Just avoiding list_add/del makes programs _10%_ faster ?
> > Could you show changes in cpu cache-miss late if you can ?
> > (And why Aim9 goes bad ?)
> 
> OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> this as well
> 
tested aim7 with some config.

CPU: Xeon 3.1GHz/4Core x2 (8cpu)
Memory: 32G
HDD: Usual? Scsi disk (just 1 disk)
(try_to_free_pages() etc...will never be called.)

Multiuser config. #of tasks 1100 (near to peak on my host)

10runs.
rc6mm1 score(Jobs/min)
44009.1 44844.5 44691.1 43981.9 44992.6
44544.9 44179.1 44283.0 44442.9 45033.8  average=44500

+patch
44656.8 44270.8 44706.7 44106.1 44467.6
44585.3 44167.0 44756.7 44853.9 44249.4  average=44482

Dbase config. #of tasks 25
rc6mm1 score (jobs/min)
11022.7 11018.9 11037.9 11003.8 11087.5 
11145.2 11133.6 11068.3 11091.3 11106.6 average=11071

+patch
10888.0 10973.7 10913.9 11000.0 10984.9
10996.2 10969.9 10921.3 10921.3 11053.1 average=10962

Hmm, 1% improvement ?
(I think this is reasonable score of the effect of this patch)

Anyway, I'm afraid of difference between mine and your kernel config.
plz enjoy your travel for now :)

Thanks,
-Kame












--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-18 10:11     ` KAMEZAWA Hiroyuki
@ 2009-05-18 10:45       ` Balbir Singh
  -1 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-18 10:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]:

> On Fri, 15 May 2009 23:46:39 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > Balbir Singh wrote:
> > > > Feature: Remove the overhead associated with the root cgroup
> > > >
> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > >
> > > > This patch changes the memory cgroup and removes the overhead associated
> > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > > can
> > > > no longer set a memory hard limit in the root cgroup.
> > > >
> > > > A new flag is used to track page_cgroup associated with the root cgroup
> > > > pages. A new flag to track whether the page has been accounted or not
> > > > has been added as well.
> > > >
> > > > Review comments higly appreciated
> > > >
> > > > Tests
> > > >
> > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > > 2. For the root cgroup tested performance impact with reaim
> > > >
> > > >
> > > > 		+patch		mmtom-08-may-2009
> > > > AIM9		1362.93		1338.17
> > > > Dbase		17457.75	16021.58
> > > > New Dbase	18070.18	16518.54
> > > > Shared		9681.85		8882.11
> > > > Compute		16197.79	15226.13
> > > >
> > > Hmm, at first impression, I can't convice the numbers...
> > > Just avoiding list_add/del makes programs _10%_ faster ?
> > > Could you show changes in cpu cache-miss late if you can ?
> > > (And why Aim9 goes bad ?)
> > 
> > OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> > this as well
> > 
> tested aim7 with some config.
> 
> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> Memory: 32G
> HDD: Usual? Scsi disk (just 1 disk)
> (try_to_free_pages() etc...will never be called.)
> 
> Multiuser config. #of tasks 1100 (near to peak on my host)
> 
> 10runs.
> rc6mm1 score(Jobs/min)
> 44009.1 44844.5 44691.1 43981.9 44992.6
> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> 
> +patch
> 44656.8 44270.8 44706.7 44106.1 44467.6
> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> 
> Dbase config. #of tasks 25
> rc6mm1 score (jobs/min)
> 11022.7 11018.9 11037.9 11003.8 11087.5 
> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> 
> +patch
> 10888.0 10973.7 10913.9 11000.0 10984.9
> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> 
> Hmm, 1% improvement ?
> (I think this is reasonable score of the effect of this patch)
>

Thanks for the test, I have a 4 CPU system and I create 80 users,
larger config shows larger difference at my end. I think even 1% is
quite reasonable as you mentioned. If the patch looks fine, should we
ask for larger testing by Andrew?
 
> Anyway, I'm afraid of difference between mine and your kernel config.
> plz enjoy your travel for now :)

Sorry, I did not send you my .config, why do you think .config makes a
difference? I think loading AIM makes the difference and I also made
one other change to the aim tests. I run with "sync" linked to
/bin/true and use tmpfs for temporary partition and 20*numnber of cpus
for number of users.

If required, I can still send out my .config to you.

-- 
	Balbir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-18 10:45       ` Balbir Singh
  0 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-18 10:45 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]:

> On Fri, 15 May 2009 23:46:39 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > Balbir Singh wrote:
> > > > Feature: Remove the overhead associated with the root cgroup
> > > >
> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > >
> > > > This patch changes the memory cgroup and removes the overhead associated
> > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > > can
> > > > no longer set a memory hard limit in the root cgroup.
> > > >
> > > > A new flag is used to track page_cgroup associated with the root cgroup
> > > > pages. A new flag to track whether the page has been accounted or not
> > > > has been added as well.
> > > >
> > > > Review comments higly appreciated
> > > >
> > > > Tests
> > > >
> > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > > 2. For the root cgroup tested performance impact with reaim
> > > >
> > > >
> > > > 		+patch		mmtom-08-may-2009
> > > > AIM9		1362.93		1338.17
> > > > Dbase		17457.75	16021.58
> > > > New Dbase	18070.18	16518.54
> > > > Shared		9681.85		8882.11
> > > > Compute		16197.79	15226.13
> > > >
> > > Hmm, at first impression, I can't convice the numbers...
> > > Just avoiding list_add/del makes programs _10%_ faster ?
> > > Could you show changes in cpu cache-miss late if you can ?
> > > (And why Aim9 goes bad ?)
> > 
> > OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> > this as well
> > 
> tested aim7 with some config.
> 
> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> Memory: 32G
> HDD: Usual? Scsi disk (just 1 disk)
> (try_to_free_pages() etc...will never be called.)
> 
> Multiuser config. #of tasks 1100 (near to peak on my host)
> 
> 10runs.
> rc6mm1 score(Jobs/min)
> 44009.1 44844.5 44691.1 43981.9 44992.6
> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> 
> +patch
> 44656.8 44270.8 44706.7 44106.1 44467.6
> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> 
> Dbase config. #of tasks 25
> rc6mm1 score (jobs/min)
> 11022.7 11018.9 11037.9 11003.8 11087.5 
> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> 
> +patch
> 10888.0 10973.7 10913.9 11000.0 10984.9
> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> 
> Hmm, 1% improvement ?
> (I think this is reasonable score of the effect of this patch)
>

Thanks for the test, I have a 4 CPU system and I create 80 users,
larger config shows larger difference at my end. I think even 1% is
quite reasonable as you mentioned. If the patch looks fine, should we
ask for larger testing by Andrew?
 
> Anyway, I'm afraid of difference between mine and your kernel config.
> plz enjoy your travel for now :)

Sorry, I did not send you my .config, why do you think .config makes a
difference? I think loading AIM makes the difference and I also made
one other change to the aim tests. I run with "sync" linked to
/bin/true and use tmpfs for temporary partition and 20*numnber of cpus
for number of users.

If required, I can still send out my .config to you.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-18 10:45       ` Balbir Singh
@ 2009-05-18 16:01         ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-18 16:01 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton,
	nishimura, lizf, menage, KOSAKI Motohiro

Balbir Singh wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18
> 19:11:07]:
>
>> On Fri, 15 May 2009 23:46:39 +0530
>> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>
>> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16
>> 02:45:03]:
>> >
>> > > Balbir Singh wrote:
>> > > > Feature: Remove the overhead associated with the root cgroup
>> > > >
>> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
>> > > >
>> > > > This patch changes the memory cgroup and removes the overhead
>> associated
>> > > > with LRU maintenance of all pages in the root cgroup. As a
>> side-effect, we
>> > > > can
>> > > > no longer set a memory hard limit in the root cgroup.
>> > > >
>> > > > A new flag is used to track page_cgroup associated with the root
>> cgroup
>> > > > pages. A new flag to track whether the page has been accounted or
>> not
>> > > > has been added as well.
>> > > >
>> > > > Review comments higly appreciated
>> > > >
>> > > > Tests
>> > > >
>> > > > 1. Tested with allocate, touch and limit test case for a non-root
>> cgroup
>> > > > 2. For the root cgroup tested performance impact with reaim
>> > > >
>> > > >
>> > > > 		+patch		mmtom-08-may-2009
>> > > > AIM9		1362.93		1338.17
>> > > > Dbase		17457.75	16021.58
>> > > > New Dbase	18070.18	16518.54
>> > > > Shared		9681.85		8882.11
>> > > > Compute		16197.79	15226.13
>> > > >
>> > > Hmm, at first impression, I can't convice the numbers...
>> > > Just avoiding list_add/del makes programs _10%_ faster ?
>> > > Could you show changes in cpu cache-miss late if you can ?
>> > > (And why Aim9 goes bad ?)
>> >
>> > OK... I'll try but I am away on travel for 3 weeks :( you can try and
>> run
>> > this as well
>> >
>> tested aim7 with some config.
>>
>> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
>> Memory: 32G
>> HDD: Usual? Scsi disk (just 1 disk)
>> (try_to_free_pages() etc...will never be called.)
>>
>> Multiuser config. #of tasks 1100 (near to peak on my host)
>>
>> 10runs.
>> rc6mm1 score(Jobs/min)
>> 44009.1 44844.5 44691.1 43981.9 44992.6
>> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
>>
>> +patch
>> 44656.8 44270.8 44706.7 44106.1 44467.6
>> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
>>
>> Dbase config. #of tasks 25
>> rc6mm1 score (jobs/min)
>> 11022.7 11018.9 11037.9 11003.8 11087.5
>> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
>>
>> +patch
>> 10888.0 10973.7 10913.9 11000.0 10984.9
>> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
>>
>> Hmm, 1% improvement ?
>> (I think this is reasonable score of the effect of this patch)
>>
>
> Thanks for the test, I have a 4 CPU system and I create 80 users,
> larger config shows larger difference at my end.
Sorry, above Dbase test was on 54 threads. I'll try 20*8=160 threads.

> I think even 1% is
> quite reasonable as you mentioned. If the patch looks fine, should we
> ask for larger testing by Andrew?
>
Hmm, as you like. My interest is bugfix for swap leaking now.
Because this change adds big special case, we need much tests, anyway.
And please show _environment_ where benchmarks run.
BTW, I wonder whetere we can have more improvements in this special case...

>> Anyway, I'm afraid of difference between mine and your kernel config.
>> plz enjoy your travel for now :)
>
> Sorry, I did not send you my .config, why do you think .config makes a
> difference?
I wanted to know what kind of DEBUG/TRACE config is on. and some others.

> I think loading AIM makes the difference and I also made
> one other change to the aim tests. I run with "sync" linked to
> /bin/true and use tmpfs for temporary partition and 20*numnber of cpus
> for number of users.
>
Is it usual method at using AIM ? (Sorry, I'm not sure).
It seems to break AIM7's purpose of "measuring typical workload"...

> If required, I can still send out my .config to you.
>
If you can, plz. (just for my interest ;)

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-18 16:01         ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-05-18 16:01 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton,
	nishimura, lizf, menage, KOSAKI Motohiro

Balbir Singh wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18
> 19:11:07]:
>
>> On Fri, 15 May 2009 23:46:39 +0530
>> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>
>> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16
>> 02:45:03]:
>> >
>> > > Balbir Singh wrote:
>> > > > Feature: Remove the overhead associated with the root cgroup
>> > > >
>> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
>> > > >
>> > > > This patch changes the memory cgroup and removes the overhead
>> associated
>> > > > with LRU maintenance of all pages in the root cgroup. As a
>> side-effect, we
>> > > > can
>> > > > no longer set a memory hard limit in the root cgroup.
>> > > >
>> > > > A new flag is used to track page_cgroup associated with the root
>> cgroup
>> > > > pages. A new flag to track whether the page has been accounted or
>> not
>> > > > has been added as well.
>> > > >
>> > > > Review comments higly appreciated
>> > > >
>> > > > Tests
>> > > >
>> > > > 1. Tested with allocate, touch and limit test case for a non-root
>> cgroup
>> > > > 2. For the root cgroup tested performance impact with reaim
>> > > >
>> > > >
>> > > > 		+patch		mmtom-08-may-2009
>> > > > AIM9		1362.93		1338.17
>> > > > Dbase		17457.75	16021.58
>> > > > New Dbase	18070.18	16518.54
>> > > > Shared		9681.85		8882.11
>> > > > Compute		16197.79	15226.13
>> > > >
>> > > Hmm, at first impression, I can't convice the numbers...
>> > > Just avoiding list_add/del makes programs _10%_ faster ?
>> > > Could you show changes in cpu cache-miss late if you can ?
>> > > (And why Aim9 goes bad ?)
>> >
>> > OK... I'll try but I am away on travel for 3 weeks :( you can try and
>> run
>> > this as well
>> >
>> tested aim7 with some config.
>>
>> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
>> Memory: 32G
>> HDD: Usual? Scsi disk (just 1 disk)
>> (try_to_free_pages() etc...will never be called.)
>>
>> Multiuser config. #of tasks 1100 (near to peak on my host)
>>
>> 10runs.
>> rc6mm1 score(Jobs/min)
>> 44009.1 44844.5 44691.1 43981.9 44992.6
>> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
>>
>> +patch
>> 44656.8 44270.8 44706.7 44106.1 44467.6
>> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
>>
>> Dbase config. #of tasks 25
>> rc6mm1 score (jobs/min)
>> 11022.7 11018.9 11037.9 11003.8 11087.5
>> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
>>
>> +patch
>> 10888.0 10973.7 10913.9 11000.0 10984.9
>> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
>>
>> Hmm, 1% improvement ?
>> (I think this is reasonable score of the effect of this patch)
>>
>
> Thanks for the test, I have a 4 CPU system and I create 80 users,
> larger config shows larger difference at my end.
Sorry, above Dbase test was on 54 threads. I'll try 20*8=160 threads.

> I think even 1% is
> quite reasonable as you mentioned. If the patch looks fine, should we
> ask for larger testing by Andrew?
>
Hmm, as you like. My interest is bugfix for swap leaking now.
Because this change adds big special case, we need much tests, anyway.
And please show _environment_ where benchmarks run.
BTW, I wonder whetere we can have more improvements in this special case...

>> Anyway, I'm afraid of difference between mine and your kernel config.
>> plz enjoy your travel for now :)
>
> Sorry, I did not send you my .config, why do you think .config makes a
> difference?
I wanted to know what kind of DEBUG/TRACE config is on. and some others.

> I think loading AIM makes the difference and I also made
> one other change to the aim tests. I run with "sync" linked to
> /bin/true and use tmpfs for temporary partition and 20*numnber of cpus
> for number of users.
>
Is it usual method at using AIM ? (Sorry, I'm not sure).
It seems to break AIM7's purpose of "measuring typical workload"...

> If required, I can still send out my .config to you.
>
If you can, plz. (just for my interest ;)

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-18 16:01         ` KAMEZAWA Hiroyuki
  (?)
@ 2009-05-19 13:18         ` Balbir Singh
  -1 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-19 13:18 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: linux-mm, linux-kernel, Andrew Morton, nishimura, lizf, menage,
	KOSAKI Motohiro

[-- Attachment #1: Type: text/plain, Size: 4556 bytes --]

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-19 01:01:00]:

> Balbir Singh wrote:
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18
> > 19:11:07]:
> >
> >> On Fri, 15 May 2009 23:46:39 +0530
> >> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >>
> >> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16
> >> 02:45:03]:
> >> >
> >> > > Balbir Singh wrote:
> >> > > > Feature: Remove the overhead associated with the root cgroup
> >> > > >
> >> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> >> > > >
> >> > > > This patch changes the memory cgroup and removes the overhead
> >> associated
> >> > > > with LRU maintenance of all pages in the root cgroup. As a
> >> side-effect, we
> >> > > > can
> >> > > > no longer set a memory hard limit in the root cgroup.
> >> > > >
> >> > > > A new flag is used to track page_cgroup associated with the root
> >> cgroup
> >> > > > pages. A new flag to track whether the page has been accounted or
> >> not
> >> > > > has been added as well.
> >> > > >
> >> > > > Review comments higly appreciated
> >> > > >
> >> > > > Tests
> >> > > >
> >> > > > 1. Tested with allocate, touch and limit test case for a non-root
> >> cgroup
> >> > > > 2. For the root cgroup tested performance impact with reaim
> >> > > >
> >> > > >
> >> > > > 		+patch		mmtom-08-may-2009
> >> > > > AIM9		1362.93		1338.17
> >> > > > Dbase		17457.75	16021.58
> >> > > > New Dbase	18070.18	16518.54
> >> > > > Shared		9681.85		8882.11
> >> > > > Compute		16197.79	15226.13
> >> > > >
> >> > > Hmm, at first impression, I can't convice the numbers...
> >> > > Just avoiding list_add/del makes programs _10%_ faster ?
> >> > > Could you show changes in cpu cache-miss late if you can ?
> >> > > (And why Aim9 goes bad ?)
> >> >
> >> > OK... I'll try but I am away on travel for 3 weeks :( you can try and
> >> run
> >> > this as well
> >> >
> >> tested aim7 with some config.
> >>
> >> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> >> Memory: 32G
> >> HDD: Usual? Scsi disk (just 1 disk)
> >> (try_to_free_pages() etc...will never be called.)
> >>
> >> Multiuser config. #of tasks 1100 (near to peak on my host)
> >>
> >> 10runs.
> >> rc6mm1 score(Jobs/min)
> >> 44009.1 44844.5 44691.1 43981.9 44992.6
> >> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> >>
> >> +patch
> >> 44656.8 44270.8 44706.7 44106.1 44467.6
> >> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> >>
> >> Dbase config. #of tasks 25
> >> rc6mm1 score (jobs/min)
> >> 11022.7 11018.9 11037.9 11003.8 11087.5
> >> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> >>
> >> +patch
> >> 10888.0 10973.7 10913.9 11000.0 10984.9
> >> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> >>
> >> Hmm, 1% improvement ?
> >> (I think this is reasonable score of the effect of this patch)
> >>
> >
> > Thanks for the test, I have a 4 CPU system and I create 80 users,
> > larger config shows larger difference at my end.
> Sorry, above Dbase test was on 54 threads. I'll try 20*8=160 threads.
>

cool! Thanks
 
> > I think even 1% is
> > quite reasonable as you mentioned. If the patch looks fine, should we
> > ask for larger testing by Andrew?
> >
> Hmm, as you like. My interest is bugfix for swap leaking now.

I've seen that too.. I think that has been going on for long and I am
afraid it is hurting features like soft limit, but bug fixing is
important. Hopefully we'll have a good solution soon.

> Because this change adds big special case, we need much tests, anyway.
> And please show _environment_ where benchmarks run.
> BTW, I wonder whetere we can have more improvements in this special case...
> 
> >> Anyway, I'm afraid of difference between mine and your kernel config.
> >> plz enjoy your travel for now :)
> >
> > Sorry, I did not send you my .config, why do you think .config makes a
> > difference?
> I wanted to know what kind of DEBUG/TRACE config is on. and some others.
> 
> > I think loading AIM makes the difference and I also made
> > one other change to the aim tests. I run with "sync" linked to
> > /bin/true and use tmpfs for temporary partition and 20*numnber of cpus
> > for number of users.
> >
> Is it usual method at using AIM ? (Sorry, I'm not sure).
> It seems to break AIM7's purpose of "measuring typical workload"...
> 

No.. it is not.. but sync has a large overhead, so I use /bin/true. I
can try without it and report back.


> > If required, I can still send out my .config to you.
> >
> If you can, plz. (just for my interest ;)
>

Attached, please see 

-- 
	Balbir

[-- Attachment #2: config-2.6.30-rc4-mm1 --]
[-- Type: text/plain, Size: 54827 bytes --]

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.30-rc4-mm1
# Wed May 13 17:51:31 2009
#
CONFIG_64BIT=y
# CONFIG_X86_32 is not set
CONFIG_X86_64=y
CONFIG_X86=y
CONFIG_OUTPUT_FORMAT="elf64-x86-64"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/x86_64_defconfig"
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_FAST_CMPXCHG_LOCAL=y
CONFIG_MMU=y
CONFIG_ZONE_DMA=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_IOMAP=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_BUG_RELATIVE_POINTERS=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_GENERIC_SPINLOCK=y
# CONFIG_RWSEM_XCHGADD_ALGORITHM is not set
CONFIG_ARCH_HAS_CPU_IDLE_WAIT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_DEFAULT_IDLE=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_HAVE_DYNAMIC_PER_CPU_AREA=y
CONFIG_HAVE_CPUMASK_OF_CPU_MAP=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ZONE_DMA32=y
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_AUDIT_ARCH=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_GENERIC_HARDIRQS=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_USE_GENERIC_SMP_HELPERS=y
CONFIG_X86_64_SMP=y
CONFIG_X86_HT=y
CONFIG_X86_TRAMPOLINE=y
# CONFIG_KTIME_SCALAR is not set
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_LOCK_KERNEL=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_TASKSTATS=y
CONFIG_TASK_DELAY_ACCT=y
CONFIG_TASK_XACCT=y
CONFIG_TASK_IO_ACCOUNTING=y
# CONFIG_AUDIT is not set

#
# RCU Subsystem
#
# CONFIG_CLASSIC_RCU is not set
CONFIG_TREE_RCU=y
# CONFIG_PREEMPT_RCU is not set
# CONFIG_RCU_TRACE is not set
CONFIG_RCU_FANOUT=64
# CONFIG_RCU_FANOUT_EXACT is not set
# CONFIG_TREE_RCU_TRACE is not set
# CONFIG_PREEMPT_RCU_TRACE is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_GROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_USER_SCHED is not set
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUPS=y
CONFIG_CGROUP_DEBUG=y
CONFIG_CGROUP_NS=y
CONFIG_CGROUP_FREEZER=y
CONFIG_CGROUP_DEVICE=y
CONFIG_CPUSETS=y
CONFIG_PROC_PID_CPUSET=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_RESOURCE_COUNTERS=y
CONFIG_CGROUP_MEM_RES_CTLR=y
CONFIG_CGROUP_MEM_RES_CTLR_SWAP=y
CONFIG_MM_OWNER=y
CONFIG_SYSFS_DEPRECATED=y
CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_RELAY=y
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_NET_NS is not set
CONFIG_BLK_DEV_INITRD=y
CONFIG_INITRAMFS_SOURCE=""
CONFIG_RD_GZIP=y
CONFIG_RD_BZIP2=y
CONFIG_RD_LZMA=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
CONFIG_ANON_INODES=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_AIO=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_PCI_QUIRKS=y
# CONFIG_STRIP_ASM_SYMS is not set
CONFIG_COMPAT_BRK=y
# CONFIG_SLAB_ALLOCATOR is not set
# CONFIG_SLUB_ALLOCATOR is not set
CONFIG_SLQB_ALLOCATOR=y
CONFIG_SLQB=y
# CONFIG_SLOB is not set
CONFIG_PROFILING=y
CONFIG_TRACEPOINTS=y
CONFIG_MARKERS=y
CONFIG_OPROFILE=m
CONFIG_OPROFILE_IBS=y
CONFIG_HAVE_OPROFILE=y
CONFIG_KPROBES=y
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS=y
CONFIG_KRETPROBES=y
CONFIG_HAVE_IOREMAP_PROT=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
CONFIG_HAVE_DMA_API_DEBUG=y
# CONFIG_SLOW_WORK is not set
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
CONFIG_MODULE_FORCE_LOAD=y
CONFIG_MODULE_UNLOAD=y
CONFIG_MODULE_FORCE_UNLOAD=y
# CONFIG_MODVERSIONS is not set
# CONFIG_MODULE_SRCVERSION_ALL is not set
CONFIG_STOP_MACHINE=y
CONFIG_UTRACE=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
# CONFIG_IOSCHED_AS is not set
CONFIG_IOSCHED_DEADLINE=y
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_FREEZER=y

#
# Processor type and features
#
# CONFIG_NO_HZ is not set
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
CONFIG_SMP=y
CONFIG_X86_X2APIC=y
# CONFIG_SPARSE_IRQ is not set
CONFIG_X86_MPPARSE=y
CONFIG_X86_EXTENDED_PLATFORM=y
# CONFIG_X86_VSMP is not set
# CONFIG_X86_UV is not set
CONFIG_SCHED_OMIT_FRAME_POINTER=y
# CONFIG_PARAVIRT_GUEST is not set
CONFIG_MEMTEST=y
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMII is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUMM is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MK8 is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MEFFICEON is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MGEODEGX1 is not set
# CONFIG_MGEODE_LX is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
# CONFIG_MVIAC7 is not set
# CONFIG_MPSC is not set
# CONFIG_MCORE2 is not set
CONFIG_GENERIC_CPU=y
CONFIG_X86_CPU=y
CONFIG_X86_L1_CACHE_BYTES=64
CONFIG_X86_INTERNODE_CACHE_BYTES=64
CONFIG_X86_CMPXCHG=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_TSC=y
CONFIG_X86_CMPXCHG64=y
CONFIG_X86_CMOV=y
CONFIG_X86_MINIMUM_CPU_FAMILY=64
CONFIG_X86_DEBUGCTLMSR=y
CONFIG_CPU_SUP_INTEL=y
CONFIG_CPU_SUP_AMD=y
CONFIG_CPU_SUP_CENTAUR=y
CONFIG_X86_DS=y
CONFIG_X86_PTRACE_BTS=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_DMI=y
CONFIG_GART_IOMMU=y
# CONFIG_CALGARY_IOMMU is not set
CONFIG_AMD_IOMMU=y
# CONFIG_AMD_IOMMU_STATS is not set
CONFIG_SWIOTLB=y
CONFIG_IOMMU_HELPER=y
CONFIG_IOMMU_API=y
# CONFIG_MAXSMP is not set
CONFIG_NR_CPUS=32
CONFIG_SCHED_SMT=y
CONFIG_SCHED_MC=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_X86_LOCAL_APIC=y
CONFIG_X86_IO_APIC=y
# CONFIG_X86_REROUTE_FOR_BROKEN_BOOT_IRQS is not set
CONFIG_X86_MCE=y
CONFIG_X86_MCE_INTEL=y
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
CONFIG_X86_MSR=y
CONFIG_X86_CPUID=y
# CONFIG_X86_CPU_DEBUG is not set
CONFIG_ARCH_PHYS_ADDR_T_64BIT=y
CONFIG_DIRECT_GBPAGES=y
CONFIG_NUMA=y
# CONFIG_K8_NUMA is not set
CONFIG_X86_64_ACPI_NUMA=y
CONFIG_NODES_SPAN_OTHER_NODES=y
CONFIG_NUMA_EMU=y
CONFIG_NODES_SHIFT=6
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_MEMORY_PROBE=y
CONFIG_ILLEGAL_POINTER_VALUE=0xdead000000000000
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_NEED_MULTIPLE_NODES=y
CONFIG_HAVE_MEMORY_PRESENT=y
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_MEMORY_HOTREMOVE=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_MIGRATION=y
CONFIG_PHYS_ADDR_T_64BIT=y
CONFIG_ZONE_DMA_FLAG=1
CONFIG_BOUNCE=y
CONFIG_VIRT_TO_BUS=y
CONFIG_UNEVICTABLE_LRU=y
CONFIG_HAVE_MLOCK=y
CONFIG_HAVE_MLOCKED_PAGE_BIT=y
CONFIG_MMU_NOTIFIER=y
CONFIG_KSM=m
# CONFIG_X86_CHECK_BIOS_CORRUPTION is not set
CONFIG_X86_RESERVE_LOW_64K=y
CONFIG_MTRR=y
CONFIG_MTRR_SANITIZER=y
CONFIG_MTRR_SANITIZER_ENABLE_DEFAULT=0
CONFIG_MTRR_SANITIZER_SPARE_REG_NR_DEFAULT=1
CONFIG_X86_PAT=y
# CONFIG_EFI is not set
CONFIG_SECCOMP=y
# CONFIG_CC_STACKPROTECTOR is not set
# CONFIG_HZ_100 is not set
CONFIG_HZ_250=y
# CONFIG_HZ_300 is not set
# CONFIG_HZ_1000 is not set
CONFIG_HZ=250
# CONFIG_SCHED_HRTICK is not set
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PHYSICAL_START=0x200000
# CONFIG_RELOCATABLE is not set
CONFIG_PHYSICAL_ALIGN=0x200000
CONFIG_HOTPLUG_CPU=y
CONFIG_COMPAT_VDSO=y
# CONFIG_CMDLINE_BOOL is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE=y
CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y

#
# Power management and ACPI options
#
CONFIG_PM=y
# CONFIG_PM_DEBUG is not set
CONFIG_PM_SLEEP_SMP=y
CONFIG_PM_SLEEP=y
CONFIG_SUSPEND=y
CONFIG_SUSPEND_FREEZER=y
# CONFIG_HIBERNATION is not set
CONFIG_ACPI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_PROCFS=y
CONFIG_ACPI_PROCFS_POWER=y
CONFIG_ACPI_SYSFS_POWER=y
CONFIG_ACPI_PROC_EVENT=y
CONFIG_ACPI_AC=y
CONFIG_ACPI_BATTERY=y
CONFIG_ACPI_BUTTON=y
CONFIG_ACPI_FAN=y
CONFIG_ACPI_DOCK=y
CONFIG_ACPI_PROCESSOR=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_THERMAL=y
CONFIG_ACPI_NUMA=y
# CONFIG_ACPI_CUSTOM_DSDT is not set
CONFIG_ACPI_BLACKLIST_YEAR=0
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_PCI_SLOT is not set
CONFIG_X86_PM_TIMER=y
CONFIG_ACPI_CONTAINER=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
# CONFIG_ACPI_SBS is not set

#
# CPU Frequency scaling
#
CONFIG_CPU_FREQ=y
CONFIG_CPU_FREQ_TABLE=y
CONFIG_CPU_FREQ_DEBUG=y
CONFIG_CPU_FREQ_STAT=y
CONFIG_CPU_FREQ_STAT_DETAILS=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y
# CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_POWERSAVE=y
CONFIG_CPU_FREQ_GOV_USERSPACE=y
CONFIG_CPU_FREQ_GOV_ONDEMAND=y
CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y

#
# CPUFreq processor drivers
#
CONFIG_X86_ACPI_CPUFREQ=y
# CONFIG_X86_POWERNOW_K8 is not set
# CONFIG_X86_SPEEDSTEP_CENTRINO is not set
# CONFIG_X86_P4_CLOCKMOD is not set

#
# shared options
#
# CONFIG_X86_SPEEDSTEP_LIB is not set
CONFIG_CPU_IDLE=y
CONFIG_CPU_IDLE_GOV_LADDER=y

#
# Memory power savings
#
# CONFIG_I7300_IDLE is not set

#
# Bus options (PCI etc.)
#
CONFIG_PCI=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_MMCONFIG=y
CONFIG_PCI_DOMAINS=y
# CONFIG_DMAR is not set
CONFIG_INTR_REMAP=y
CONFIG_PCIEPORTBUS=y
CONFIG_PCIEAER=y
CONFIG_PCIEASPM=y
# CONFIG_PCIEASPM_DEBUG is not set
CONFIG_ARCH_SUPPORTS_MSI=y
CONFIG_PCI_MSI=y
CONFIG_PCI_LEGACY=y
# CONFIG_PCI_DEBUG is not set
# CONFIG_PCI_STUB is not set
# CONFIG_HT_IRQ is not set
# CONFIG_PCI_IOV is not set
CONFIG_ISA_DMA_API=y
CONFIG_K8_NB=y
# CONFIG_PCCARD is not set
# CONFIG_HOTPLUG_PCI is not set

#
# Executable file formats / Emulations
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
# CONFIG_HAVE_AOUT is not set
# CONFIG_BINFMT_MISC is not set
CONFIG_IA32_EMULATION=y
CONFIG_IA32_AOUT=y
CONFIG_COMPAT=y
CONFIG_COMPAT_FOR_U64_ALIGNMENT=y
CONFIG_SYSVIPC_COMPAT=y
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=y
# CONFIG_PACKET_MMAP is not set
CONFIG_UNIX=y
# CONFIG_NET_KEY is not set
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
CONFIG_IP_PNP=y
CONFIG_IP_PNP_DHCP=y
# CONFIG_IP_PNP_BOOTP is not set
# CONFIG_IP_PNP_RARP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
CONFIG_INET_TUNNEL=y
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_INET_LRO is not set
CONFIG_INET_DIAG=y
CONFIG_INET_TCP_DIAG=y
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
CONFIG_IPV6=y
# CONFIG_IPV6_PRIVACY is not set
# CONFIG_IPV6_ROUTER_PREF is not set
# CONFIG_IPV6_OPTIMISTIC_DAD is not set
# CONFIG_INET6_AH is not set
# CONFIG_INET6_ESP is not set
# CONFIG_INET6_IPCOMP is not set
# CONFIG_IPV6_MIP6 is not set
# CONFIG_INET6_XFRM_TUNNEL is not set
# CONFIG_INET6_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET6_XFRM_MODE_TUNNEL is not set
# CONFIG_INET6_XFRM_MODE_BEET is not set
# CONFIG_INET6_XFRM_MODE_ROUTEOPTIMIZATION is not set
CONFIG_IPV6_SIT=y
CONFIG_IPV6_NDISC_NODETYPE=y
# CONFIG_IPV6_TUNNEL is not set
# CONFIG_IPV6_MULTIPLE_TABLES is not set
# CONFIG_IPV6_MROUTE is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
CONFIG_STP=m
CONFIG_BRIDGE=m
# CONFIG_NET_DSA is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
CONFIG_LLC=m
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_PHONET is not set
CONFIG_NET_SCHED=y

#
# Queueing/Scheduling
#
CONFIG_NET_SCH_CBQ=m
# CONFIG_NET_SCH_HTB is not set
# CONFIG_NET_SCH_HFSC is not set
# CONFIG_NET_SCH_PRIO is not set
# CONFIG_NET_SCH_MULTIQ is not set
# CONFIG_NET_SCH_RED is not set
# CONFIG_NET_SCH_SFQ is not set
# CONFIG_NET_SCH_TEQL is not set
# CONFIG_NET_SCH_TBF is not set
# CONFIG_NET_SCH_GRED is not set
# CONFIG_NET_SCH_DSMARK is not set
# CONFIG_NET_SCH_NETEM is not set
# CONFIG_NET_SCH_DRR is not set

#
# Classification
#
CONFIG_NET_CLS=y
# CONFIG_NET_CLS_BASIC is not set
# CONFIG_NET_CLS_TCINDEX is not set
# CONFIG_NET_CLS_ROUTE4 is not set
# CONFIG_NET_CLS_FW is not set
# CONFIG_NET_CLS_U32 is not set
# CONFIG_NET_CLS_RSVP is not set
# CONFIG_NET_CLS_RSVP6 is not set
# CONFIG_NET_CLS_FLOW is not set
CONFIG_NET_CLS_CGROUP=y
# CONFIG_NET_EMATCH is not set
# CONFIG_NET_CLS_ACT is not set
CONFIG_NET_SCH_FIFO=y
# CONFIG_DCB is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_NET_TCPPROBE is not set
# CONFIG_NET_DROP_MONITOR is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set
CONFIG_WIRELESS=y
# CONFIG_CFG80211 is not set
# CONFIG_WIRELESS_OLD_REGULATORY is not set
# CONFIG_WIRELESS_EXT is not set
# CONFIG_LIB80211 is not set
# CONFIG_MAC80211 is not set
CONFIG_MAC80211_DEFAULT_PS_VALUE=0
# CONFIG_WIMAX is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
CONFIG_FIRMWARE_IN_KERNEL=y
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=y
CONFIG_PROC_EVENTS=y
# CONFIG_MTD is not set
# CONFIG_PARPORT is not set
CONFIG_PNP=y
CONFIG_PNP_DEBUG_MESSAGES=y

#
# Protocols
#
CONFIG_PNPACPI=y
CONFIG_BLK_DEV=y
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_UB is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_COUNT=16
CONFIG_BLK_DEV_RAM_SIZE=4096
# CONFIG_BLK_DEV_XIP is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_VIRTIO_BLK is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_MISC_DEVICES=y
# CONFIG_IBM_ASM is not set
# CONFIG_PHANTOM is not set
# CONFIG_SGI_IOC4 is not set
# CONFIG_TIFM_CORE is not set
# CONFIG_ICS932S401 is not set
# CONFIG_ENCLOSURE_SERVICES is not set
# CONFIG_HP_ILO is not set
# CONFIG_ISL29003 is not set
# CONFIG_C2PORT is not set

#
# EEPROM support
#
# CONFIG_EEPROM_AT24 is not set
# CONFIG_EEPROM_LEGACY is not set
# CONFIG_EEPROM_MAX6875 is not set
# CONFIG_EEPROM_93CX6 is not set
CONFIG_HAVE_IDE=y
CONFIG_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_XFER_MODE=y
CONFIG_IDE_TIMINGS=y
CONFIG_IDE_ATAPI=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_IDE_GD=y
CONFIG_IDE_GD_ATA=y
# CONFIG_IDE_GD_ATAPI is not set
CONFIG_BLK_DEV_IDECD=y
CONFIG_BLK_DEV_IDECD_VERBOSE_ERRORS=y
# CONFIG_BLK_DEV_IDETAPE is not set
CONFIG_BLK_DEV_IDEACPI=y
# CONFIG_IDE_TASK_IOCTL is not set
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
CONFIG_IDE_GENERIC=y
# CONFIG_BLK_DEV_PLATFORM is not set
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_IDEPNP is not set
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_GENERIC is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_RZ1000 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
CONFIG_BLK_DEV_AMD74XX=y
CONFIG_BLK_DEV_ATIIXP=y
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_JMICRON is not set
# CONFIG_BLK_DEV_SC1200 is not set
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_IT8172 is not set
# CONFIG_BLK_DEV_IT8213 is not set
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
CONFIG_BLK_DEV_PDC202XX_NEW=y
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_BLK_DEV_TC86C001 is not set
CONFIG_BLK_DEV_IDEDMA=y

#
# SCSI device support
#
CONFIG_RAID_ATTRS=m
CONFIG_SCSI=y
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
CONFIG_SCSI_NETLINK=y
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=y
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
CONFIG_BLK_DEV_SR=y
# CONFIG_BLK_DEV_SR_VENDOR is not set
CONFIG_CHR_DEV_SG=y
# CONFIG_CHR_DEV_SCH is not set

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
CONFIG_SCSI_SPI_ATTRS=y
CONFIG_SCSI_FC_ATTRS=y
# CONFIG_SCSI_ISCSI_ATTRS is not set
CONFIG_SCSI_SAS_ATTRS=m
# CONFIG_SCSI_SAS_LIBSAS is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_SCSI_CXGB3_ISCSI is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
CONFIG_SCSI_AIC79XX=y
CONFIG_AIC79XX_CMDS_PER_DEVICE=32
CONFIG_AIC79XX_RESET_DELAY_MS=4000
# CONFIG_AIC79XX_DEBUG_ENABLE is not set
CONFIG_AIC79XX_DEBUG_MASK=0
# CONFIG_AIC79XX_REG_PRETTY_PRINT is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_DPT_I2O is not set
# CONFIG_SCSI_ADVANSYS is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_MPT2SAS is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_BUSLOGIC is not set
# CONFIG_LIBFC is not set
# CONFIG_LIBFCOE is not set
# CONFIG_FCOE is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_EATA is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_GDTH is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
# CONFIG_SCSI_INIA100 is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_IPR is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_DH is not set
# CONFIG_SCSI_OSD_INITIATOR is not set
CONFIG_ATA=m
# CONFIG_ATA_NONSTANDARD is not set
CONFIG_ATA_ACPI=y
CONFIG_SATA_PMP=y
CONFIG_SATA_AHCI=m
# CONFIG_SATA_SIL24 is not set
CONFIG_ATA_SFF=y
CONFIG_SATA_SVW=m
CONFIG_ATA_PIIX=m
# CONFIG_SATA_MV is not set
CONFIG_SATA_NV=m
# CONFIG_PDC_ADMA is not set
# CONFIG_SATA_QSTOR is not set
# CONFIG_SATA_PROMISE is not set
# CONFIG_SATA_SX4 is not set
CONFIG_SATA_SIL=m
# CONFIG_SATA_SIS is not set
# CONFIG_SATA_ULI is not set
CONFIG_SATA_VIA=m
# CONFIG_SATA_VITESSE is not set
# CONFIG_SATA_INIC162X is not set
# CONFIG_PATA_ACPI is not set
# CONFIG_PATA_ALI is not set
# CONFIG_PATA_AMD is not set
# CONFIG_PATA_ARTOP is not set
# CONFIG_PATA_ATIIXP is not set
# CONFIG_PATA_CMD640_PCI is not set
# CONFIG_PATA_CMD64X is not set
# CONFIG_PATA_CS5520 is not set
# CONFIG_PATA_CS5530 is not set
# CONFIG_PATA_CYPRESS is not set
# CONFIG_PATA_EFAR is not set
# CONFIG_ATA_GENERIC is not set
# CONFIG_PATA_HPT366 is not set
# CONFIG_PATA_HPT37X is not set
# CONFIG_PATA_HPT3X2N is not set
# CONFIG_PATA_HPT3X3 is not set
# CONFIG_PATA_IT821X is not set
# CONFIG_PATA_IT8213 is not set
# CONFIG_PATA_JMICRON is not set
# CONFIG_PATA_TRIFLEX is not set
# CONFIG_PATA_MARVELL is not set
# CONFIG_PATA_MPIIX is not set
# CONFIG_PATA_OLDPIIX is not set
# CONFIG_PATA_NETCELL is not set
# CONFIG_PATA_NINJA32 is not set
# CONFIG_PATA_NS87410 is not set
# CONFIG_PATA_NS87415 is not set
# CONFIG_PATA_OPTI is not set
# CONFIG_PATA_OPTIDMA is not set
# CONFIG_PATA_PDC_OLD is not set
# CONFIG_PATA_RADISYS is not set
# CONFIG_PATA_RZ1000 is not set
# CONFIG_PATA_SC1200 is not set
# CONFIG_PATA_SERVERWORKS is not set
# CONFIG_PATA_PDC2027X is not set
# CONFIG_PATA_SIL680 is not set
# CONFIG_PATA_SIS is not set
# CONFIG_PATA_VIA is not set
# CONFIG_PATA_WINBOND is not set
# CONFIG_PATA_SCH is not set
CONFIG_MD=y
CONFIG_BLK_DEV_MD=m
CONFIG_MD_LINEAR=m
CONFIG_MD_RAID0=m
CONFIG_MD_RAID1=m
# CONFIG_MD_RAID10 is not set
CONFIG_MD_RAID456=m
CONFIG_MD_RAID6_PQ=m
CONFIG_MD_MULTIPATH=m
# CONFIG_MD_FAULTY is not set
CONFIG_BLK_DEV_DM=y
# CONFIG_DM_DEBUG is not set
# CONFIG_DM_CRYPT is not set
# CONFIG_DM_SNAPSHOT is not set
# CONFIG_DM_MIRROR is not set
# CONFIG_DM_ZERO is not set
# CONFIG_DM_MULTIPATH is not set
# CONFIG_DM_DELAY is not set
# CONFIG_DM_UEVENT is not set
CONFIG_FUSION=y
CONFIG_FUSION_SPI=m
CONFIG_FUSION_FC=m
CONFIG_FUSION_SAS=m
CONFIG_FUSION_MAX_SGE=128
CONFIG_FUSION_CTL=m
# CONFIG_FUSION_LOGGING is not set

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
# CONFIG_FIREWIRE is not set
# CONFIG_IEEE1394 is not set
# CONFIG_I2O is not set
# CONFIG_MACINTOSH_DRIVERS is not set
CONFIG_NETDEVICES=y
CONFIG_COMPAT_NET_DEV_OPS=y
CONFIG_DUMMY=m
CONFIG_BONDING=m
# CONFIG_MACVLAN is not set
CONFIG_EQUALIZER=m
CONFIG_TUN=y
CONFIG_VETH=m
# CONFIG_NET_SB1000 is not set
# CONFIG_ARCNET is not set
CONFIG_PHYLIB=y

#
# MII PHY device drivers
#
# CONFIG_MARVELL_PHY is not set
# CONFIG_DAVICOM_PHY is not set
# CONFIG_QSEMI_PHY is not set
# CONFIG_LXT_PHY is not set
# CONFIG_CICADA_PHY is not set
# CONFIG_VITESSE_PHY is not set
# CONFIG_SMSC_PHY is not set
# CONFIG_BROADCOM_PHY is not set
# CONFIG_ICPLUS_PHY is not set
# CONFIG_REALTEK_PHY is not set
# CONFIG_NATIONAL_PHY is not set
# CONFIG_STE10XP is not set
# CONFIG_LSI_ET1011C_PHY is not set
# CONFIG_FIXED_PHY is not set
# CONFIG_MDIO_BITBANG is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=y
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
CONFIG_NET_VENDOR_3COM=y
CONFIG_VORTEX=y
# CONFIG_TYPHOON is not set
# CONFIG_ETHOC is not set
# CONFIG_DNET is not set
CONFIG_NET_TULIP=y
# CONFIG_DE2104X is not set
CONFIG_TULIP=y
# CONFIG_TULIP_MWI is not set
# CONFIG_TULIP_MMIO is not set
# CONFIG_TULIP_NAPI is not set
# CONFIG_DE4X5 is not set
# CONFIG_WINBOND_840 is not set
# CONFIG_DM9102 is not set
# CONFIG_ULI526X is not set
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_IBM_NEW_EMAC_NO_FLOW_CTRL is not set
# CONFIG_IBM_NEW_EMAC_MAL_CLR_ICINTSTAT is not set
# CONFIG_IBM_NEW_EMAC_MAL_COMMON_ERR is not set
CONFIG_NET_PCI=y
# CONFIG_PCNET32 is not set
CONFIG_AMD8111_ETH=y
# CONFIG_ADAPTEC_STARFIRE is not set
CONFIG_B44=y
CONFIG_B44_PCI_AUTOSELECT=y
CONFIG_B44_PCICORE_AUTOSELECT=y
CONFIG_B44_PCI=y
CONFIG_FORCEDETH=y
# CONFIG_FORCEDETH_NAPI is not set
CONFIG_E100=y
# CONFIG_FEALNX is not set
# CONFIG_NATSEMI is not set
# CONFIG_NE2K_PCI is not set
CONFIG_8139CP=y
CONFIG_8139TOO=y
# CONFIG_8139TOO_PIO is not set
# CONFIG_8139TOO_TUNE_TWISTER is not set
# CONFIG_8139TOO_8129 is not set
# CONFIG_8139_OLD_RX_RESET is not set
# CONFIG_R6040 is not set
# CONFIG_SIS900 is not set
# CONFIG_EPIC100 is not set
# CONFIG_SMSC9420 is not set
# CONFIG_SUNDANCE is not set
# CONFIG_TLAN is not set
# CONFIG_VIA_RHINE is not set
# CONFIG_SC92031 is not set
# CONFIG_ATL2 is not set
CONFIG_NETDEV_1000=y
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
CONFIG_E1000=y
CONFIG_E1000E=y
# CONFIG_IP1000 is not set
# CONFIG_IGB is not set
# CONFIG_IGBVF is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SIS190 is not set
# CONFIG_SKGE is not set
# CONFIG_SKY2 is not set
# CONFIG_VIA_VELOCITY is not set
CONFIG_TIGON3=y
CONFIG_BNX2=y
# CONFIG_QLA3XXX is not set
# CONFIG_ATL1 is not set
# CONFIG_ATL1E is not set
# CONFIG_ATL1C is not set
# CONFIG_JME is not set
CONFIG_NETDEV_10000=y
# CONFIG_CHELSIO_T1 is not set
CONFIG_CHELSIO_T3_DEPENDS=y
# CONFIG_CHELSIO_T3 is not set
# CONFIG_ENIC is not set
# CONFIG_IXGBE is not set
# CONFIG_IXGB is not set
CONFIG_S2IO=m
# CONFIG_VXGE is not set
# CONFIG_MYRI10GE is not set
# CONFIG_NETXEN_NIC is not set
# CONFIG_NIU is not set
# CONFIG_MLX4_EN is not set
# CONFIG_MLX4_CORE is not set
# CONFIG_TEHUTI is not set
# CONFIG_BNX2X is not set
# CONFIG_QLGE is not set
# CONFIG_SFC is not set
# CONFIG_BE2NET is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set

#
# Enable WiMAX (Networking options) to see the WiMAX drivers
#

#
# USB Network Adapters
#
# CONFIG_USB_CATC is not set
# CONFIG_USB_KAWETH is not set
# CONFIG_USB_PEGASUS is not set
# CONFIG_USB_RTL8150 is not set
# CONFIG_USB_USBNET is not set
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
CONFIG_NETCONSOLE=y
# CONFIG_NETCONSOLE_DYNAMIC is not set
CONFIG_NETPOLL=y
# CONFIG_NETPOLL_TRAP is not set
CONFIG_NET_POLL_CONTROLLER=y
CONFIG_VIRTIO_NET=m
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
CONFIG_INPUT_FF_MEMLESS=m
# CONFIG_INPUT_POLLDEV is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=y
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
CONFIG_KEYBOARD_ATKBD=y
# CONFIG_KEYBOARD_SUNKBD is not set
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
CONFIG_MOUSE_PS2=y
CONFIG_MOUSE_PS2_ALPS=y
CONFIG_MOUSE_PS2_LOGIPS2PP=y
CONFIG_MOUSE_PS2_SYNAPTICS=y
CONFIG_MOUSE_PS2_LIFEBOOK=y
CONFIG_MOUSE_PS2_TRACKPOINT=y
# CONFIG_MOUSE_PS2_ELANTECH is not set
# CONFIG_MOUSE_PS2_TOUCHKIT is not set
# CONFIG_MOUSE_SERIAL is not set
# CONFIG_MOUSE_APPLETOUCH is not set
# CONFIG_MOUSE_BCM5974 is not set
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
# CONFIG_INPUT_MISC is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
CONFIG_SERIO_I8042=y
# CONFIG_SERIO_SERPORT is not set
# CONFIG_SERIO_CT82C710 is not set
# CONFIG_SERIO_PCIPS2 is not set
CONFIG_SERIO_LIBPS2=y
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
CONFIG_DEVKMEM=y
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#
CONFIG_SERIAL_8250=y
CONFIG_SERIAL_8250_CONSOLE=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_SERIAL_8250_PCI=y
CONFIG_SERIAL_8250_PNP=y
CONFIG_SERIAL_8250_NR_UARTS=4
CONFIG_SERIAL_8250_RUNTIME_UARTS=4
# CONFIG_SERIAL_8250_EXTENDED is not set

#
# Non-8250 serial port support
#
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
# CONFIG_DEVPTS_MULTIPLE_INSTANCES is not set
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
CONFIG_HVC_DRIVER=y
CONFIG_VIRTIO_CONSOLE=m
CONFIG_IPMI_HANDLER=m
# CONFIG_IPMI_PANIC_EVENT is not set
CONFIG_IPMI_DEVICE_INTERFACE=m
CONFIG_IPMI_SI=m
CONFIG_IPMI_WATCHDOG=m
CONFIG_IPMI_POWEROFF=m
CONFIG_HW_RANDOM=y
# CONFIG_HW_RANDOM_TIMERIOMEM is not set
CONFIG_HW_RANDOM_INTEL=y
CONFIG_HW_RANDOM_AMD=y
# CONFIG_HW_RANDOM_VIRTIO is not set
# CONFIG_NVRAM is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_MWAVE is not set
# CONFIG_PC8736x_GPIO is not set
CONFIG_RAW_DRIVER=y
CONFIG_MAX_RAW_DEVS=256
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_HANGCHECK_TIMER is not set
CONFIG_TCG_TPM=y
CONFIG_TCG_TIS=y
# CONFIG_TCG_NSC is not set
# CONFIG_TCG_ATMEL is not set
# CONFIG_TCG_INFINEON is not set
# CONFIG_TELCLOCK is not set
CONFIG_DEVPORT=y
CONFIG_I2C=y
CONFIG_I2C_BOARDINFO=y
# CONFIG_I2C_CHARDEV is not set
CONFIG_I2C_HELPER_AUTO=y
CONFIG_I2C_ALGOBIT=y

#
# I2C Hardware Bus support
#

#
# PC SMBus host controller drivers
#
# CONFIG_I2C_ALI1535 is not set
# CONFIG_I2C_ALI1563 is not set
# CONFIG_I2C_ALI15X3 is not set
# CONFIG_I2C_AMD756 is not set
# CONFIG_I2C_AMD8111 is not set
# CONFIG_I2C_I801 is not set
# CONFIG_I2C_ISCH is not set
# CONFIG_I2C_PIIX4 is not set
# CONFIG_I2C_NFORCE2 is not set
# CONFIG_I2C_SIS5595 is not set
# CONFIG_I2C_SIS630 is not set
# CONFIG_I2C_SIS96X is not set
# CONFIG_I2C_VIA is not set
# CONFIG_I2C_VIAPRO is not set

#
# I2C system bus drivers (mostly embedded / system-on-chip)
#
# CONFIG_I2C_OCORES is not set
# CONFIG_I2C_SIMTEC is not set

#
# External I2C/SMBus adapter drivers
#
# CONFIG_I2C_PARPORT_LIGHT is not set
# CONFIG_I2C_TAOS_EVM is not set
# CONFIG_I2C_TINY_USB is not set

#
# Graphics adapter I2C/DDC channel drivers
#
# CONFIG_I2C_VOODOO3 is not set

#
# Other I2C/SMBus bus drivers
#
# CONFIG_I2C_PCA_PLATFORM is not set
# CONFIG_I2C_STUB is not set

#
# Miscellaneous I2C Chip support
#
# CONFIG_DS1682 is not set
# CONFIG_SENSORS_PCF8574 is not set
# CONFIG_PCF8575 is not set
# CONFIG_SENSORS_PCA9539 is not set
# CONFIG_SENSORS_TSL2550 is not set
# CONFIG_I2C_DEBUG_CORE is not set
# CONFIG_I2C_DEBUG_ALGO is not set
# CONFIG_I2C_DEBUG_BUS is not set
# CONFIG_I2C_DEBUG_CHIP is not set
# CONFIG_SPI is not set

#
# PPS support
#
# CONFIG_PPS is not set
CONFIG_ARCH_WANT_OPTIONAL_GPIOLIB=y
# CONFIG_GPIOLIB is not set
# CONFIG_W1 is not set
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
# CONFIG_PDA_POWER is not set
# CONFIG_BATTERY_DS2760 is not set
# CONFIG_BATTERY_BQ27x00 is not set
CONFIG_HWMON=m
# CONFIG_HWMON_VID is not set
# CONFIG_SENSORS_ABITUGURU is not set
# CONFIG_SENSORS_ABITUGURU3 is not set
# CONFIG_SENSORS_AD7414 is not set
# CONFIG_SENSORS_AD7418 is not set
# CONFIG_SENSORS_ADM1021 is not set
# CONFIG_SENSORS_ADM1025 is not set
# CONFIG_SENSORS_ADM1026 is not set
# CONFIG_SENSORS_ADM1029 is not set
# CONFIG_SENSORS_ADM1031 is not set
# CONFIG_SENSORS_ADM9240 is not set
# CONFIG_SENSORS_ADT7462 is not set
# CONFIG_SENSORS_ADT7470 is not set
# CONFIG_SENSORS_ADT7473 is not set
# CONFIG_SENSORS_ADT7475 is not set
# CONFIG_SENSORS_K8TEMP is not set
# CONFIG_SENSORS_ASB100 is not set
# CONFIG_SENSORS_ATK0110 is not set
# CONFIG_SENSORS_ATXP1 is not set
# CONFIG_SENSORS_DS1621 is not set
# CONFIG_SENSORS_I5K_AMB is not set
# CONFIG_SENSORS_F71805F is not set
# CONFIG_SENSORS_F71882FG is not set
# CONFIG_SENSORS_F75375S is not set
# CONFIG_SENSORS_FSCHER is not set
# CONFIG_SENSORS_FSCPOS is not set
# CONFIG_SENSORS_FSCHMD is not set
# CONFIG_SENSORS_G760A is not set
# CONFIG_SENSORS_GL518SM is not set
# CONFIG_SENSORS_GL520SM is not set
CONFIG_SENSORS_CORETEMP=m
# CONFIG_SENSORS_IBMAEM is not set
# CONFIG_SENSORS_IBMPEX is not set
# CONFIG_SENSORS_IT87 is not set
# CONFIG_SENSORS_LM63 is not set
# CONFIG_SENSORS_LM75 is not set
# CONFIG_SENSORS_LM77 is not set
# CONFIG_SENSORS_LM78 is not set
# CONFIG_SENSORS_LM80 is not set
# CONFIG_SENSORS_LM83 is not set
# CONFIG_SENSORS_LM85 is not set
# CONFIG_SENSORS_LM87 is not set
# CONFIG_SENSORS_LM90 is not set
# CONFIG_SENSORS_LM92 is not set
# CONFIG_SENSORS_LM93 is not set
# CONFIG_SENSORS_LTC4215 is not set
# CONFIG_SENSORS_LTC4245 is not set
# CONFIG_SENSORS_LM95241 is not set
# CONFIG_SENSORS_MAX1619 is not set
# CONFIG_SENSORS_MAX6650 is not set
# CONFIG_SENSORS_PC87360 is not set
# CONFIG_SENSORS_PC87427 is not set
# CONFIG_SENSORS_PCF8591 is not set
# CONFIG_SENSORS_SIS5595 is not set
# CONFIG_SENSORS_DME1737 is not set
# CONFIG_SENSORS_SMSC47M1 is not set
# CONFIG_SENSORS_SMSC47M192 is not set
# CONFIG_SENSORS_SMSC47B397 is not set
# CONFIG_SENSORS_ADS7828 is not set
# CONFIG_SENSORS_THMC50 is not set
# CONFIG_SENSORS_VIA686A is not set
# CONFIG_SENSORS_VT1211 is not set
# CONFIG_SENSORS_VT8231 is not set
# CONFIG_SENSORS_W83781D is not set
# CONFIG_SENSORS_W83791D is not set
# CONFIG_SENSORS_W83792D is not set
# CONFIG_SENSORS_W83793 is not set
# CONFIG_SENSORS_W83L785TS is not set
# CONFIG_SENSORS_W83L786NG is not set
# CONFIG_SENSORS_W83627HF is not set
# CONFIG_SENSORS_W83627EHF is not set
# CONFIG_SENSORS_HDAPS is not set
# CONFIG_SENSORS_LIS3LV02D is not set
# CONFIG_SENSORS_APPLESMC is not set
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL=y
# CONFIG_WATCHDOG is not set
CONFIG_SSB_POSSIBLE=y

#
# Sonics Silicon Backplane
#
CONFIG_SSB=y
CONFIG_SSB_SPROM=y
CONFIG_SSB_PCIHOST_POSSIBLE=y
CONFIG_SSB_PCIHOST=y
# CONFIG_SSB_B43_PCI_BRIDGE is not set
# CONFIG_SSB_DEBUG is not set
CONFIG_SSB_DRIVER_PCICORE_POSSIBLE=y
CONFIG_SSB_DRIVER_PCICORE=y

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set
# CONFIG_TWL4030_CORE is not set
# CONFIG_MFD_TMIO is not set
# CONFIG_PMIC_DA903X is not set
# CONFIG_MFD_WM8400 is not set
# CONFIG_MFD_WM8350_I2C is not set
# CONFIG_MFD_PCF50633 is not set
# CONFIG_REGULATOR is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
# CONFIG_VIDEO_DEV is not set
# CONFIG_DVB_CORE is not set
# CONFIG_VIDEO_MEDIA is not set

#
# Multimedia drivers
#
CONFIG_DAB=y
# CONFIG_USB_DABUSB is not set

#
# Graphics support
#
CONFIG_AGP=y
CONFIG_AGP_AMD64=y
CONFIG_AGP_INTEL=y
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_VIA is not set
# CONFIG_DRM is not set
# CONFIG_VGASTATE is not set
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
CONFIG_FB_DDC=y
CONFIG_FB_BOOT_VESA_SUPPORT=y
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
CONFIG_FB_MODE_HELPERS=y
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_CYBER2000 is not set
# CONFIG_FB_ARC is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_VGA16 is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_VESA is not set
# CONFIG_FB_N411 is not set
# CONFIG_FB_HGA is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_LE80578 is not set
CONFIG_FB_INTEL=y
# CONFIG_FB_INTEL_DEBUG is not set
CONFIG_FB_INTEL_I2C=y
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
# CONFIG_FB_ATY is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_VIA is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_GEODE is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_FB_METRONOME is not set
# CONFIG_FB_MB862XX is not set
# CONFIG_FB_BROADSHEET is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
CONFIG_VGA_CONSOLE=y
CONFIG_VGACON_SOFT_SCROLLBACK=y
CONFIG_VGACON_SOFT_SCROLLBACK_SIZE=256
CONFIG_DUMMY_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE is not set
CONFIG_LOGO=y
CONFIG_LOGO_LINUX_MONO=y
CONFIG_LOGO_LINUX_VGA16=y
CONFIG_LOGO_LINUX_CLUT224=y
CONFIG_SOUND=y
CONFIG_SOUND_OSS_CORE=y
# CONFIG_SND is not set
CONFIG_SOUND_PRIME=y
# CONFIG_SOUND_OSS is not set
CONFIG_HID_SUPPORT=y
CONFIG_HID=y
CONFIG_HID_DEBUG=y
# CONFIG_HIDRAW is not set

#
# USB Input Devices
#
CONFIG_USB_HID=m
# CONFIG_HID_PID is not set
# CONFIG_USB_HIDDEV is not set

#
# Special HID drivers
#
CONFIG_HID_A4TECH=m
CONFIG_HID_APPLE=m
CONFIG_HID_BELKIN=m
CONFIG_HID_CHERRY=m
CONFIG_HID_CHICONY=m
CONFIG_HID_CYPRESS=m
# CONFIG_DRAGONRISE_FF is not set
CONFIG_HID_EZKEY=m
CONFIG_HID_KYE=m
CONFIG_HID_GYRATION=m
CONFIG_HID_KENSINGTON=m
CONFIG_HID_LOGITECH=m
# CONFIG_LOGITECH_FF is not set
# CONFIG_LOGIRUMBLEPAD2_FF is not set
CONFIG_HID_MICROSOFT=m
CONFIG_HID_MONTEREY=m
CONFIG_HID_NTRIG=m
CONFIG_HID_PANTHERLORD=m
# CONFIG_PANTHERLORD_FF is not set
CONFIG_HID_PETALYNX=m
CONFIG_HID_SAMSUNG=m
CONFIG_HID_SONY=m
CONFIG_HID_SUNPLUS=m
# CONFIG_GREENASIA_FF is not set
CONFIG_HID_TOPSEED=m
# CONFIG_THRUSTMASTER_FF is not set
# CONFIG_ZEROPLUS_FF is not set
CONFIG_USB_SUPPORT=y
CONFIG_USB_ARCH_HAS_HCD=y
CONFIG_USB_ARCH_HAS_OHCI=y
CONFIG_USB_ARCH_HAS_EHCI=y
CONFIG_USB=m
# CONFIG_USB_DEBUG is not set
# CONFIG_USB_ANNOUNCE_NEW_DEVICES is not set

#
# Miscellaneous USB options
#
CONFIG_USB_DEVICE_CLASS=y
# CONFIG_USB_DYNAMIC_MINORS is not set
# CONFIG_USB_SUSPEND is not set
# CONFIG_USB_OTG is not set
CONFIG_USB_MON=m
# CONFIG_USB_WUSB is not set
# CONFIG_USB_WUSB_CBAF is not set

#
# USB Host Controller Drivers
#
# CONFIG_USB_C67X00_HCD is not set
# CONFIG_USB_XHCI_HCD is not set
CONFIG_USB_EHCI_HCD=m
# CONFIG_USB_EHCI_ROOT_HUB_TT is not set
# CONFIG_USB_EHCI_TT_NEWSCHED is not set
# CONFIG_USB_OXU210HP_HCD is not set
# CONFIG_USB_ISP116X_HCD is not set
# CONFIG_USB_ISP1760_HCD is not set
CONFIG_USB_OHCI_HCD=m
# CONFIG_USB_OHCI_HCD_SSB is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_DESC is not set
# CONFIG_USB_OHCI_BIG_ENDIAN_MMIO is not set
CONFIG_USB_OHCI_LITTLE_ENDIAN=y
CONFIG_USB_UHCI_HCD=m
# CONFIG_USB_SL811_HCD is not set
# CONFIG_USB_R8A66597_HCD is not set
# CONFIG_USB_WHCI_HCD is not set
# CONFIG_USB_HWA_HCD is not set

#
# Enable Host or Gadget support to see Inventra options
#

#
# USB Device Class drivers
#
# CONFIG_USB_ACM is not set
# CONFIG_USB_PRINTER is not set
# CONFIG_USB_WDM is not set
# CONFIG_USB_TMC is not set

#
# NOTE: USB_STORAGE depends on SCSI but BLK_DEV_SD may
#

#
# also be needed; see USB_STORAGE Help for more info
#
CONFIG_USB_STORAGE=m
# CONFIG_USB_STORAGE_DEBUG is not set
# CONFIG_USB_STORAGE_DATAFAB is not set
# CONFIG_USB_STORAGE_FREECOM is not set
# CONFIG_USB_STORAGE_ISD200 is not set
# CONFIG_USB_STORAGE_USBAT is not set
# CONFIG_USB_STORAGE_SDDR09 is not set
# CONFIG_USB_STORAGE_SDDR55 is not set
# CONFIG_USB_STORAGE_JUMPSHOT is not set
# CONFIG_USB_STORAGE_ALAUDA is not set
# CONFIG_USB_STORAGE_ONETOUCH is not set
# CONFIG_USB_STORAGE_KARMA is not set
# CONFIG_USB_STORAGE_CYPRESS_ATACB is not set
# CONFIG_USB_LIBUSUAL is not set

#
# USB Imaging devices
#
# CONFIG_USB_MDC800 is not set
# CONFIG_USB_MICROTEK is not set

#
# USB port drivers
#
# CONFIG_USB_SERIAL is not set

#
# USB Miscellaneous drivers
#
# CONFIG_USB_EMI62 is not set
# CONFIG_USB_EMI26 is not set
# CONFIG_USB_ADUTUX is not set
# CONFIG_USB_SEVSEG is not set
# CONFIG_USB_RIO500 is not set
# CONFIG_USB_LEGOTOWER is not set
# CONFIG_USB_LCD is not set
# CONFIG_USB_BERRY_CHARGE is not set
# CONFIG_USB_LED is not set
# CONFIG_USB_CYPRESS_CY7C63 is not set
# CONFIG_USB_CYTHERM is not set
# CONFIG_USB_IDMOUSE is not set
# CONFIG_USB_FTDI_ELAN is not set
# CONFIG_USB_APPLEDISPLAY is not set
# CONFIG_USB_SISUSBVGA is not set
# CONFIG_USB_LD is not set
# CONFIG_USB_TRANCEVIBRATOR is not set
# CONFIG_USB_IOWARRIOR is not set
# CONFIG_USB_ISIGHTFW is not set
# CONFIG_USB_VST is not set
# CONFIG_USB_GADGET is not set

#
# OTG and related infrastructure
#
# CONFIG_NOP_USB_XCEIV is not set
# CONFIG_UWB is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
# CONFIG_EDAC is not set
CONFIG_RTC_LIB=m
CONFIG_RTC_CLASS=m

#
# RTC interfaces
#
CONFIG_RTC_INTF_SYSFS=y
CONFIG_RTC_INTF_PROC=y
CONFIG_RTC_INTF_DEV=y
# CONFIG_RTC_INTF_DEV_UIE_EMUL is not set
# CONFIG_RTC_DRV_TEST is not set

#
# I2C RTC drivers
#
# CONFIG_RTC_DRV_DS1307 is not set
# CONFIG_RTC_DRV_DS1374 is not set
# CONFIG_RTC_DRV_DS1672 is not set
# CONFIG_RTC_DRV_DS1685 is not set
# CONFIG_RTC_DRV_MAX6900 is not set
# CONFIG_RTC_DRV_RS5C372 is not set
# CONFIG_RTC_DRV_ISL1208 is not set
# CONFIG_RTC_DRV_X1205 is not set
# CONFIG_RTC_DRV_PCF8563 is not set
# CONFIG_RTC_DRV_PCF8583 is not set
# CONFIG_RTC_DRV_M41T80 is not set
# CONFIG_RTC_DRV_S35390A is not set
# CONFIG_RTC_DRV_FM3130 is not set
# CONFIG_RTC_DRV_RX8581 is not set
# CONFIG_RTC_DRV_RX8025 is not set

#
# SPI RTC drivers
#

#
# Platform RTC drivers
#
CONFIG_RTC_DRV_CMOS=m
# CONFIG_RTC_DRV_DS1286 is not set
# CONFIG_RTC_DRV_DS1511 is not set
# CONFIG_RTC_DRV_DS1553 is not set
# CONFIG_RTC_DRV_DS1742 is not set
# CONFIG_RTC_DRV_STK17TA8 is not set
# CONFIG_RTC_DRV_M48T86 is not set
# CONFIG_RTC_DRV_M48T35 is not set
# CONFIG_RTC_DRV_M48T59 is not set
# CONFIG_RTC_DRV_MSM6242 is not set
# CONFIG_RTC_DRV_BQ4802 is not set
# CONFIG_RTC_DRV_RP5C01 is not set
# CONFIG_RTC_DRV_V3020 is not set

#
# on-CPU RTC drivers
#
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
CONFIG_UIO=m
# CONFIG_UIO_CIF is not set
# CONFIG_UIO_PDRV is not set
# CONFIG_UIO_PDRV_GENIRQ is not set
# CONFIG_UIO_SMX is not set
# CONFIG_UIO_AEC is not set
# CONFIG_UIO_SERCOS3 is not set
# CONFIG_STAGING is not set
CONFIG_X86_PLATFORM_DEVICES=y
# CONFIG_ASUS_LAPTOP is not set
# CONFIG_THINKPAD_ACPI is not set
CONFIG_INTEL_MENLOW=m
# CONFIG_EEEPC_LAPTOP is not set
# CONFIG_ACPI_WMI is not set
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set

#
# Firmware Drivers
#
# CONFIG_EDD is not set
CONFIG_FIRMWARE_MEMMAP=y
# CONFIG_DELL_RBU is not set
# CONFIG_DCDBAS is not set
CONFIG_DMIID=y
# CONFIG_ISCSI_IBFT_FIND is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
# CONFIG_EXT2_FS_SECURITY is not set
# CONFIG_EXT2_FS_XIP is not set
CONFIG_EXT3_FS=y
# CONFIG_EXT3_DEFAULTS_TO_ORDERED is not set
CONFIG_EXT3_FS_XATTR=y
CONFIG_EXT3_FS_POSIX_ACL=y
# CONFIG_EXT3_FS_SECURITY is not set
# CONFIG_EXT4_FS is not set
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FS_MBCACHE=y
CONFIG_REISERFS_FS=y
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
CONFIG_REISERFS_FS_XATTR=y
CONFIG_REISERFS_FS_POSIX_ACL=y
# CONFIG_REISERFS_FS_SECURITY is not set
# CONFIG_REISER4_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
CONFIG_FILE_LOCKING=y
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
# CONFIG_BTRFS_FS is not set
CONFIG_FSNOTIFY=y
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set
CONFIG_GENERIC_ACL=y

#
# Caches
#
# CONFIG_FSCACHE is not set

#
# CD-ROM/DVD Filesystems
#
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
CONFIG_VFAT_FS=y
CONFIG_FAT_DEFAULT_CODEPAGE=437
CONFIG_FAT_DEFAULT_IOCHARSET="iso8859-1"
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_VMCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_PROC_PAGE_MONITOR=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
# CONFIG_CONFIGFS_FS is not set
CONFIG_MISC_FILESYSTEMS=y
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_SQUASHFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_NILFS2_FS is not set
CONFIG_NETWORK_FILESYSTEMS=y
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_V3_ACL is not set
# CONFIG_NFS_V4 is not set
CONFIG_ROOT_NFS=y
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
# CONFIG_NFSD_V3_ACL is not set
# CONFIG_NFSD_V4 is not set
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
CONFIG_EXPORTFS=y
CONFIG_NFS_COMMON=y
CONFIG_SUNRPC=y
# CONFIG_RPCSEC_GSS_KRB5 is not set
# CONFIG_RPCSEC_GSS_SPKM3 is not set
# CONFIG_SMB_FS is not set
# CONFIG_CIFS is not set
# CONFIG_NCP_FS is not set
# CONFIG_CODA_FS is not set
# CONFIG_AFS_FS is not set

#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
CONFIG_NLS=y
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ASCII=y
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
CONFIG_NLS_ISO8859_15=y
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=y
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
# CONFIG_ENABLE_MUST_CHECK is not set
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
CONFIG_UNUSED_SYMBOLS=y
CONFIG_DEBUG_FS=y
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_BOOTPARAM_HUNG_TASK_PANIC is not set
CONFIG_BOOTPARAM_HUNG_TASK_PANIC_VALUE=0
CONFIG_SCHED_DEBUG=y
CONFIG_SCHEDSTATS=y
CONFIG_TIMER_STATS=y
# CONFIG_DEBUG_OBJECTS is not set
CONFIG_SLQB_DEBUG=y
# CONFIG_SLQB_DEBUG_ON is not set
# CONFIG_SLQB_SYSFS is not set
# CONFIG_DEBUG_RT_MUTEXES is not set
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
CONFIG_LOCK_STAT=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
CONFIG_DEBUG_INFO=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VIRTUAL is not set
# CONFIG_DEBUG_WRITECOUNT is not set
CONFIG_DEBUG_MEMORY_INIT=y
# CONFIG_DEBUG_LIST is not set
# CONFIG_DEBUG_SG is not set
# CONFIG_DEBUG_NOTIFIERS is not set
CONFIG_ARCH_WANT_FRAME_POINTERS=y
CONFIG_FRAME_POINTER=y
# CONFIG_DEBUG_SYNCHRO_TEST is not set
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_RCU_CPU_STALL_DETECTOR is not set
# CONFIG_KPROBES_SANITY_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_DEBUG_BLOCK_EXT_DEVT is not set
# CONFIG_LKDTM is not set
# CONFIG_FAULT_INJECTION is not set
# CONFIG_LATENCYTOP is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_HW_BRANCH_TRACER=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_RING_BUFFER=y
CONFIG_EVENT_TRACING=y
CONFIG_TRACING=y
CONFIG_TRACING_SUPPORT=y
# CONFIG_FTRACE is not set
# CONFIG_PROVIDE_OHCI1394_DMA_INIT is not set
# CONFIG_DYNAMIC_DEBUG is not set
# CONFIG_DMA_API_DEBUG is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
# CONFIG_STRICT_DEVMEM is not set
CONFIG_X86_VERBOSE_BOOTUP=y
CONFIG_EARLY_PRINTK=y
# CONFIG_EARLY_PRINTK_DBGP is not set
CONFIG_DEBUG_STACKOVERFLOW=y
# CONFIG_DEBUG_STACK_USAGE is not set
# CONFIG_DEBUG_PER_CPU_MAPS is not set
# CONFIG_X86_PTDUMP is not set
# CONFIG_DEBUG_RODATA is not set
# CONFIG_DEBUG_NX_TEST is not set
# CONFIG_IOMMU_DEBUG is not set
CONFIG_X86_DS_SELFTEST=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y
CONFIG_IO_DELAY_TYPE_0X80=0
CONFIG_IO_DELAY_TYPE_0XED=1
CONFIG_IO_DELAY_TYPE_UDELAY=2
CONFIG_IO_DELAY_TYPE_NONE=3
CONFIG_IO_DELAY_0X80=y
# CONFIG_IO_DELAY_0XED is not set
# CONFIG_IO_DELAY_UDELAY is not set
# CONFIG_IO_DELAY_NONE is not set
CONFIG_DEFAULT_IO_DELAY_TYPE=0
# CONFIG_DEBUG_BOOT_PARAMS is not set
# CONFIG_CPA_DEBUG is not set
# CONFIG_OPTIMIZE_INLINING is not set

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
CONFIG_SECURITYFS=y
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
# CONFIG_IMA is not set
CONFIG_XOR_BLOCKS=m
CONFIG_ASYNC_CORE=m
CONFIG_ASYNC_MEMCPY=m
CONFIG_ASYNC_XOR=m
CONFIG_CRYPTO=y

#
# Crypto core or helper
#
# CONFIG_CRYPTO_FIPS is not set
CONFIG_CRYPTO_ALGAPI=y
CONFIG_CRYPTO_ALGAPI2=y
CONFIG_CRYPTO_AEAD2=y
CONFIG_CRYPTO_BLKCIPHER2=y
CONFIG_CRYPTO_HASH=y
CONFIG_CRYPTO_HASH2=y
CONFIG_CRYPTO_RNG2=y
CONFIG_CRYPTO_PCOMP=y
CONFIG_CRYPTO_MANAGER=y
CONFIG_CRYPTO_MANAGER2=y
# CONFIG_CRYPTO_GF128MUL is not set
# CONFIG_CRYPTO_NULL is not set
CONFIG_CRYPTO_WORKQUEUE=y
# CONFIG_CRYPTO_CRYPTD is not set
# CONFIG_CRYPTO_AUTHENC is not set
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
# CONFIG_CRYPTO_CBC is not set
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
CONFIG_CRYPTO_HMAC=y
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
# CONFIG_CRYPTO_CRC32C is not set
# CONFIG_CRYPTO_CRC32C_INTEL is not set
# CONFIG_CRYPTO_MD4 is not set
CONFIG_CRYPTO_MD5=y
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
CONFIG_CRYPTO_SHA1=y
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
# CONFIG_CRYPTO_AES is not set
# CONFIG_CRYPTO_AES_X86_64 is not set
# CONFIG_CRYPTO_AES_NI_INTEL is not set
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SALSA20_X86_64 is not set
# CONFIG_CRYPTO_SEED is not set
# CONFIG_CRYPTO_SERPENT is not set
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set
# CONFIG_CRYPTO_TWOFISH_X86_64 is not set

#
# Compression
#
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_ZLIB is not set
# CONFIG_CRYPTO_LZO is not set

#
# Random Number Generation
#
# CONFIG_CRYPTO_ANSI_CPRNG is not set
CONFIG_CRYPTO_HW=y
# CONFIG_CRYPTO_DEV_PADLOCK is not set
# CONFIG_CRYPTO_DEV_HIFN_795X is not set
CONFIG_HAVE_KVM=y
CONFIG_HAVE_KVM_IRQCHIP=y
CONFIG_VIRTUALIZATION=y
CONFIG_KVM=m
CONFIG_KVM_INTEL=m
CONFIG_KVM_AMD=m
# CONFIG_KVM_TRACE is not set
CONFIG_VIRTIO=m
CONFIG_VIRTIO_RING=m
CONFIG_VIRTIO_PCI=m
CONFIG_VIRTIO_BALLOON=m
CONFIG_BINARY_PRINTF=y

#
# Library routines
#
CONFIG_BITREVERSE=y
CONFIG_GENERIC_FIND_FIRST_BIT=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_FIND_LAST_BIT=y
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC_T10DIF=y
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
CONFIG_ZLIB_INFLATE=y
CONFIG_DECOMPRESS_GZIP=y
CONFIG_DECOMPRESS_BZIP2=y
CONFIG_DECOMPRESS_LZMA=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_NLATTR=y

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-18 10:11     ` KAMEZAWA Hiroyuki
  (?)
  (?)
@ 2009-05-31 23:51     ` Balbir Singh
  2009-06-01 23:57       ` KAMEZAWA Hiroyuki
  -1 siblings, 1 reply; 41+ messages in thread
From: Balbir Singh @ 2009-05-31 23:51 UTC (permalink / raw)
  To: Andrew Morton, KAMEZAWA Hiroyuki
  Cc: linux-mm, nishimura, lizf, menage, KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]:

> On Fri, 15 May 2009 23:46:39 +0530
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > Balbir Singh wrote:
> > > > Feature: Remove the overhead associated with the root cgroup
> > > >
> > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > >
> > > > This patch changes the memory cgroup and removes the overhead associated
> > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > > can
> > > > no longer set a memory hard limit in the root cgroup.
> > > >
> > > > A new flag is used to track page_cgroup associated with the root cgroup
> > > > pages. A new flag to track whether the page has been accounted or not
> > > > has been added as well.
> > > >
> > > > Review comments higly appreciated
> > > >
> > > > Tests
> > > >
> > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > > 2. For the root cgroup tested performance impact with reaim
> > > >
> > > >
> > > > 		+patch		mmtom-08-may-2009
> > > > AIM9		1362.93		1338.17
> > > > Dbase		17457.75	16021.58
> > > > New Dbase	18070.18	16518.54
> > > > Shared		9681.85		8882.11
> > > > Compute		16197.79	15226.13
> > > >
> > > Hmm, at first impression, I can't convice the numbers...
> > > Just avoiding list_add/del makes programs _10%_ faster ?
> > > Could you show changes in cpu cache-miss late if you can ?
> > > (And why Aim9 goes bad ?)
> > 
> > OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> > this as well
> > 
> tested aim7 with some config.
> 
> CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> Memory: 32G
> HDD: Usual? Scsi disk (just 1 disk)
> (try_to_free_pages() etc...will never be called.)
> 
> Multiuser config. #of tasks 1100 (near to peak on my host)
> 
> 10runs.
> rc6mm1 score(Jobs/min)
> 44009.1 44844.5 44691.1 43981.9 44992.6
> 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> 
> +patch
> 44656.8 44270.8 44706.7 44106.1 44467.6
> 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> 
> Dbase config. #of tasks 25
> rc6mm1 score (jobs/min)
> 11022.7 11018.9 11037.9 11003.8 11087.5 
> 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> 
> +patch
> 10888.0 10973.7 10913.9 11000.0 10984.9
> 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> 
> Hmm, 1% improvement ?
> (I think this is reasonable score of the effect of this patch)
> 
> Anyway, I'm afraid of difference between mine and your kernel config.
> plz enjoy your travel for now :)
>


Hi, Andrew,

Could you please pick up these patches for testing. Kamezawa-San, I am
assuming that you are OK with these patches going to -mm for testing?

Would you like me to resend the patches?

Balbir 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-17  4:15   ` Balbir Singh
@ 2009-06-01  4:25     ` Daisuke Nishimura
  -1 siblings, 0 replies; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-01  4:25 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton, lizf,
	menage, KOSAKI Motohiro, Daisuke Nishimura

I'm sorry for my very late reply.

I've been working on the stale swap cache problem for a long time as you know :)

On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> 
> > I think set/clear flag here adds race condtion....because pc->flags is
> > modfied by
> >   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> > you have to modify above lines to be
> > 
> >   SetPageCgroupCache(pc) or some..
> >   ...
> >   SetPageCgroupUsed(pc)
> > 
> > Then, you can use set_bit() without lock_page_cgroup().
> > (Currently, pc->flags is modified only under lock_page_cgroup(), so,
> >  non atomic code is used.)
> >
> 
> Here is the next version of the patch
> 
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
I agree to this idea itself.

> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> pcg_default_flags is now obsolete, but I've not removed it yet. It
> provides some readability to help the code.
> 
> Tests:
> 1. Tested lightly, previous versions showed good performance improvement 10%.
> 
You should test current version :)
And I think you should test this patch under global memory pressure too
to check whether it doesn't cause bug or under/over flow of something, etc.
memcg's LRU handling about SwapCache is different from usual one.

> NOTE:
> I haven't got the time right now to run oprofile and get detailed test results,
> since I am in the middle of travel.
> 
> Please review the code for functional correctness and if you can test
> it even better. I would like to push this in, especially if the %
> performance difference I am seeing is reproducible elsewhere as well.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |   12 ++++++++++++
>  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..ebdae9a 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT, /* page has been accounted for */
>  };
>  
Those new flags are protected by zone->lru_lock, right ?
If so, please add some comments.
And I'm not sure why you need 2 flags. Isn't PCG_ROOT enough for you ?

>  #define TESTPCGFLAG(uname, lname)			\
> @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(Acct, ACCT)
> +CLEARPCGFLAG(Acct, ACCT)
> +TESTPCGFLAG(Acct, ACCT)
>  
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9712ef7..35415fc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */
> @@ -196,6 +197,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
Shouldn't we set PCG_LOCK ?
unlock_page_cgroup() will be called after this.

Moreover, IIUC, pc->flags is not cleared at page free/alloc, so if a page
is reused, pc->flags has the old value.
PCG_CACHE flag, at least, is used by the decision in mem_cgroup_charge_statistics().

> @@ -1521,6 +1547,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  	mem_cgroup_charge_statistics(mem, pc, false);
>  
>  	ClearPageCgroupUsed(pc);
> +	if (mem == root_mem_cgroup)
> +		ClearPageCgroupRoot(pc);
>  	/*
>  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
> @@ -2038,6 +2066,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
It's a nitpick, I prefer not to show *.limit_in_bytes if we cannot write to them.


Thanks,
Daisuke Nishimura.

> @@ -2504,6 +2536,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2532,6 +2565,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 09b73c5..6145ff6 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
>  
>  #endif
>  
> -
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  
>  static DEFINE_MUTEX(swap_cgroup_mutex);
>  
> 
> -- 
> 	Balbir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-06-01  4:25     ` Daisuke Nishimura
  0 siblings, 0 replies; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-01  4:25 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton, lizf,
	menage, KOSAKI Motohiro, Daisuke Nishimura

I'm sorry for my very late reply.

I've been working on the stale swap cache problem for a long time as you know :)

On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> 
> > I think set/clear flag here adds race condtion....because pc->flags is
> > modfied by
> >   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> > you have to modify above lines to be
> > 
> >   SetPageCgroupCache(pc) or some..
> >   ...
> >   SetPageCgroupUsed(pc)
> > 
> > Then, you can use set_bit() without lock_page_cgroup().
> > (Currently, pc->flags is modified only under lock_page_cgroup(), so,
> >  non atomic code is used.)
> >
> 
> Here is the next version of the patch
> 
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
I agree to this idea itself.

> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> pcg_default_flags is now obsolete, but I've not removed it yet. It
> provides some readability to help the code.
> 
> Tests:
> 1. Tested lightly, previous versions showed good performance improvement 10%.
> 
You should test current version :)
And I think you should test this patch under global memory pressure too
to check whether it doesn't cause bug or under/over flow of something, etc.
memcg's LRU handling about SwapCache is different from usual one.

> NOTE:
> I haven't got the time right now to run oprofile and get detailed test results,
> since I am in the middle of travel.
> 
> Please review the code for functional correctness and if you can test
> it even better. I would like to push this in, especially if the %
> performance difference I am seeing is reproducible elsewhere as well.
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |   12 ++++++++++++
>  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..ebdae9a 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT, /* page has been accounted for */
>  };
>  
Those new flags are protected by zone->lru_lock, right ?
If so, please add some comments.
And I'm not sure why you need 2 flags. Isn't PCG_ROOT enough for you ?

>  #define TESTPCGFLAG(uname, lname)			\
> @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(Acct, ACCT)
> +CLEARPCGFLAG(Acct, ACCT)
> +TESTPCGFLAG(Acct, ACCT)
>  
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 9712ef7..35415fc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */
> @@ -196,6 +197,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> @@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcct(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
Shouldn't we set PCG_LOCK ?
unlock_page_cgroup() will be called after this.

Moreover, IIUC, pc->flags is not cleared at page free/alloc, so if a page
is reused, pc->flags has the old value.
PCG_CACHE flag, at least, is used by the decision in mem_cgroup_charge_statistics().

> @@ -1521,6 +1547,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  	mem_cgroup_charge_statistics(mem, pc, false);
>  
>  	ClearPageCgroupUsed(pc);
> +	if (mem == root_mem_cgroup)
> +		ClearPageCgroupRoot(pc);
>  	/*
>  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
> @@ -2038,6 +2066,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
It's a nitpick, I prefer not to show *.limit_in_bytes if we cannot write to them.


Thanks,
Daisuke Nishimura.

> @@ -2504,6 +2536,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2532,6 +2565,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index 09b73c5..6145ff6 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
>  
>  #endif
>  
> -
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  
>  static DEFINE_MUTEX(swap_cgroup_mutex);
>  
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-06-01  4:25     ` Daisuke Nishimura
@ 2009-06-01  5:01       ` Daisuke Nishimura
  -1 siblings, 0 replies; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-01  5:01 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton, lizf,
	menage, KOSAKI Motohiro, Daisuke Nishimura

> > @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	if (mem == root_mem_cgroup)
> > +		SetPageCgroupRoot(pc);
> >  
> >  	mem_cgroup_charge_statistics(mem, pc, true);
> >  
> Shouldn't we set PCG_LOCK ?
> unlock_page_cgroup() will be called after this.
> 
Ah, lock_page_cgroup() has already set it.
please ignore this comment.

Sorry for noise.

Daisuke Nishimura.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-06-01  5:01       ` Daisuke Nishimura
  0 siblings, 0 replies; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-01  5:01 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton, lizf,
	menage, KOSAKI Motohiro, Daisuke Nishimura

> > @@ -1114,9 +1125,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	if (mem == root_mem_cgroup)
> > +		SetPageCgroupRoot(pc);
> >  
> >  	mem_cgroup_charge_statistics(mem, pc, true);
> >  
> Shouldn't we set PCG_LOCK ?
> unlock_page_cgroup() will be called after this.
> 
Ah, lock_page_cgroup() has already set it.
please ignore this comment.

Sorry for noise.

Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-06-01  4:25     ` Daisuke Nishimura
@ 2009-06-01  5:49       ` Balbir Singh
  -1 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-06-01  5:49 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton, lizf,
	menage, KOSAKI Motohiro

* nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-01 13:25:05]:

> I'm sorry for my very late reply.
> 
> I've been working on the stale swap cache problem for a long time as you know :)
> 
> On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > I think set/clear flag here adds race condtion....because pc->flags is
> > > modfied by
> > >   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> > > you have to modify above lines to be
> > > 
> > >   SetPageCgroupCache(pc) or some..
> > >   ...
> > >   SetPageCgroupUsed(pc)
> > > 
> > > Then, you can use set_bit() without lock_page_cgroup().
> > > (Currently, pc->flags is modified only under lock_page_cgroup(), so,
> > >  non atomic code is used.)
> > >
> > 
> > Here is the next version of the patch
> > 
> > 
> > Feature: Remove the overhead associated with the root cgroup
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > This patch changes the memory cgroup and removes the overhead associated
> > with accounting all pages in the root cgroup. As a side-effect, we can
> > no longer set a memory hard limit in the root cgroup.
> > 
> I agree to this idea itself.
>

Thanks!
 
> > A new flag is used to track page_cgroup associated with the root cgroup
> > pages. A new flag to track whether the page has been accounted or not
> > has been added as well. Flags are now set atomically for page_cgroup,
> > pcg_default_flags is now obsolete, but I've not removed it yet. It
> > provides some readability to help the code.
> > 
> > Tests:
> > 1. Tested lightly, previous versions showed good performance improvement 10%.
> > 
> You should test current version :)
> And I think you should test this patch under global memory pressure too
> to check whether it doesn't cause bug or under/over flow of something, etc.
> memcg's LRU handling about SwapCache is different from usual one.
> 

OK, I've tested it using my stress tool, but I'll modify to add some
of the things you've pointed out.

> > NOTE:
> > I haven't got the time right now to run oprofile and get detailed test results,
> > since I am in the middle of travel.
> > 
> > Please review the code for functional correctness and if you can test
> > it even better. I would like to push this in, especially if the %
> > performance difference I am seeing is reproducible elsewhere as well.
> > 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> > 
> >  include/linux/page_cgroup.h |   12 ++++++++++++
> >  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
> >  mm/page_cgroup.c            |    1 -
> >  3 files changed, 50 insertions(+), 5 deletions(-)
> > 
> > 
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..ebdae9a 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,8 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ROOT, /* page belongs to root cgroup */
> > +	PCG_ACCT, /* page has been accounted for */
> >  };
> >  
> Those new flags are protected by zone->lru_lock, right ?
> If so, please add some comments.
> And I'm not sure why you need 2 flags. Isn't PCG_ROOT enough for you ?
>

Nope.. the accounting is independent of charge/uncharge.
 
-- 
	Balbir

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-06-01  5:49       ` Balbir Singh
  0 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-06-01  5:49 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, linux-mm, linux-kernel, Andrew Morton, lizf,
	menage, KOSAKI Motohiro

* nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-01 13:25:05]:

> I'm sorry for my very late reply.
> 
> I've been working on the stale swap cache problem for a long time as you know :)
> 
> On Sun, 17 May 2009 12:15:43 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > 
> > > I think set/clear flag here adds race condtion....because pc->flags is
> > > modfied by
> > >   pc->flags = pcg_dafault_flags[ctype] in commit_charge()
> > > you have to modify above lines to be
> > > 
> > >   SetPageCgroupCache(pc) or some..
> > >   ...
> > >   SetPageCgroupUsed(pc)
> > > 
> > > Then, you can use set_bit() without lock_page_cgroup().
> > > (Currently, pc->flags is modified only under lock_page_cgroup(), so,
> > >  non atomic code is used.)
> > >
> > 
> > Here is the next version of the patch
> > 
> > 
> > Feature: Remove the overhead associated with the root cgroup
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > This patch changes the memory cgroup and removes the overhead associated
> > with accounting all pages in the root cgroup. As a side-effect, we can
> > no longer set a memory hard limit in the root cgroup.
> > 
> I agree to this idea itself.
>

Thanks!
 
> > A new flag is used to track page_cgroup associated with the root cgroup
> > pages. A new flag to track whether the page has been accounted or not
> > has been added as well. Flags are now set atomically for page_cgroup,
> > pcg_default_flags is now obsolete, but I've not removed it yet. It
> > provides some readability to help the code.
> > 
> > Tests:
> > 1. Tested lightly, previous versions showed good performance improvement 10%.
> > 
> You should test current version :)
> And I think you should test this patch under global memory pressure too
> to check whether it doesn't cause bug or under/over flow of something, etc.
> memcg's LRU handling about SwapCache is different from usual one.
> 

OK, I've tested it using my stress tool, but I'll modify to add some
of the things you've pointed out.

> > NOTE:
> > I haven't got the time right now to run oprofile and get detailed test results,
> > since I am in the middle of travel.
> > 
> > Please review the code for functional correctness and if you can test
> > it even better. I would like to push this in, especially if the %
> > performance difference I am seeing is reproducible elsewhere as well.
> > 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> > 
> >  include/linux/page_cgroup.h |   12 ++++++++++++
> >  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
> >  mm/page_cgroup.c            |    1 -
> >  3 files changed, 50 insertions(+), 5 deletions(-)
> > 
> > 
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..ebdae9a 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,8 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ROOT, /* page belongs to root cgroup */
> > +	PCG_ACCT, /* page has been accounted for */
> >  };
> >  
> Those new flags are protected by zone->lru_lock, right ?
> If so, please add some comments.
> And I'm not sure why you need 2 flags. Isn't PCG_ROOT enough for you ?
>

Nope.. the accounting is independent of charge/uncharge.
 
-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [RFC] Low overhead patches for the memory cgroup controller (v2)
  2009-05-31 23:51     ` Balbir Singh
@ 2009-06-01 23:57       ` KAMEZAWA Hiroyuki
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
  0 siblings, 1 reply; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-01 23:57 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, linux-mm, nishimura, lizf, menage, KOSAKI Motohiro

On Mon, 1 Jun 2009 07:51:21 +0800
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-18 19:11:07]:
> 
> > On Fri, 15 May 2009 23:46:39 +0530
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-05-16 02:45:03]:
> > > 
> > > > Balbir Singh wrote:
> > > > > Feature: Remove the overhead associated with the root cgroup
> > > > >
> > > > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > > >
> > > > > This patch changes the memory cgroup and removes the overhead associated
> > > > > with LRU maintenance of all pages in the root cgroup. As a side-effect, we
> > > > > can
> > > > > no longer set a memory hard limit in the root cgroup.
> > > > >
> > > > > A new flag is used to track page_cgroup associated with the root cgroup
> > > > > pages. A new flag to track whether the page has been accounted or not
> > > > > has been added as well.
> > > > >
> > > > > Review comments higly appreciated
> > > > >
> > > > > Tests
> > > > >
> > > > > 1. Tested with allocate, touch and limit test case for a non-root cgroup
> > > > > 2. For the root cgroup tested performance impact with reaim
> > > > >
> > > > >
> > > > > 		+patch		mmtom-08-may-2009
> > > > > AIM9		1362.93		1338.17
> > > > > Dbase		17457.75	16021.58
> > > > > New Dbase	18070.18	16518.54
> > > > > Shared		9681.85		8882.11
> > > > > Compute		16197.79	15226.13
> > > > >
> > > > Hmm, at first impression, I can't convice the numbers...
> > > > Just avoiding list_add/del makes programs _10%_ faster ?
> > > > Could you show changes in cpu cache-miss late if you can ?
> > > > (And why Aim9 goes bad ?)
> > > 
> > > OK... I'll try but I am away on travel for 3 weeks :( you can try and run
> > > this as well
> > > 
> > tested aim7 with some config.
> > 
> > CPU: Xeon 3.1GHz/4Core x2 (8cpu)
> > Memory: 32G
> > HDD: Usual? Scsi disk (just 1 disk)
> > (try_to_free_pages() etc...will never be called.)
> > 
> > Multiuser config. #of tasks 1100 (near to peak on my host)
> > 
> > 10runs.
> > rc6mm1 score(Jobs/min)
> > 44009.1 44844.5 44691.1 43981.9 44992.6
> > 44544.9 44179.1 44283.0 44442.9 45033.8  average=44500
> > 
> > +patch
> > 44656.8 44270.8 44706.7 44106.1 44467.6
> > 44585.3 44167.0 44756.7 44853.9 44249.4  average=44482
> > 
> > Dbase config. #of tasks 25
> > rc6mm1 score (jobs/min)
> > 11022.7 11018.9 11037.9 11003.8 11087.5 
> > 11145.2 11133.6 11068.3 11091.3 11106.6 average=11071
> > 
> > +patch
> > 10888.0 10973.7 10913.9 11000.0 10984.9
> > 10996.2 10969.9 10921.3 10921.3 11053.1 average=10962
> > 
> > Hmm, 1% improvement ?
> > (I think this is reasonable score of the effect of this patch)
> > 
> > Anyway, I'm afraid of difference between mine and your kernel config.
> > plz enjoy your travel for now :)
> >
> 
> 
> Hi, Andrew,
> 
> Could you please pick up these patches for testing. Kamezawa-San, I am
> assuming that you are OK with these patches going to -mm for testing?
> 
o.k. but..

> Would you like me to resend the patches?
> 
It's 2 weeks since original post. and several bug fixes are merged.
Could you post again ? (And it seems Nishimura-san posted some comments.)
Of course, I'll test again.

Thanks,
-Kame


> Balbir 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Low overhead patches for the memory cgroup controller (v3)
  2009-06-01 23:57       ` KAMEZAWA Hiroyuki
@ 2009-06-05  5:31         ` Balbir Singh
  2009-06-05  5:51           ` KAMEZAWA Hiroyuki
                             ` (3 more replies)
  0 siblings, 4 replies; 41+ messages in thread
From: Balbir Singh @ 2009-06-05  5:31 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm, nishimura, lizf, menage, KOSAKI Motohiro

Here is the new version of the patch with the RFC dropped. Andrew,
Kame, could you please take a look. I am just about to fly out to get
back home tomorrow, so there might be some silence, unless I get to
the next WiFi enabled airport.


From: Balbir Singh <balbir@linux.vnet.ibm.com>

Changelog v3 -> v2

1. Rebase to mmotm 2nd June 2009
2. Test with some of the test cases recommended by Daisuke-San

Changelog v2 -> v1
1. Fix and implement review comments.

Feature: Remove the overhead associated with the root cgroup

This patch changes the memory cgroup and removes the overhead associated
with accounting all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag is used to track page_cgroup associated with the root cgroup
pages. A new flag to track whether the page has been accounted or not
has been added as well. Flags are now set atomically for page_cgroup,
pcg_default_flags is now obsolete, but I've not removed it yet. It
provides some readability to help the code.

Tests Results:

Obtained by

1. Using tmpfs for mounting filesystem
2. Changing sync to be /bin/true (so that sync is not the bottleneck)
3. Used -s #cpus*40 -e #cpus*40

Reaim
		withoutpatch	patch
AIM9		9532.48		9807.59
dbase		19344.60	19285.71
new_dbase	20101.65	20163.13
shared		11827.77	11886.65
compute		17317.38	17420.05

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |   12 ++++++++++++
 mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
 mm/page_cgroup.c            |    1 -
 3 files changed, 50 insertions(+), 5 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..41cc16c 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,8 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ROOT, /* page belongs to root cgroup */
+	PCG_ACCT_LRU, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
+SETPCGFLAG(Cache, CACHE)
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
+SETPCGFLAG(Used, USED)
+
+SETPCGFLAG(Root, ROOT)
+CLEARPCGFLAG(Root, ROOT)
+TESTPCGFLAG(Root, ROOT)
+
+SETPCGFLAG(AcctLru, ACCT_LRU)
+CLEARPCGFLAG(AcctLru, ACCT_LRU)
+TESTPCGFLAG(AcctLru, ACCT_LRU)
 
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index a83e039..9561d10 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
@@ -197,6 +198,10 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
+/* Not used, but added here for completeness */
+#define PCGF_ROOT	(1UL << PCG_ROOT)
+#define PCGF_ACCT	(1UL << PCG_ACCT)
+
 static const unsigned long
 pcg_default_flags[NR_CHARGE_TYPE] = {
 	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
@@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
 		return;
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
@@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 	mz = page_cgroup_zoneinfo(pc);
 	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	ClearPageCgroupAcctLru(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_del_init(&pc->lru);
 	return;
 }
@@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcctLru(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
-	pc->flags = pcg_default_flags[ctype];
+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}
+
+	if (mem == root_mem_cgroup)
+		SetPageCgroupRoot(pc);
 
 	mem_cgroup_charge_statistics(mem, pc, true);
 
@@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	mem_cgroup_charge_statistics(mem, pc, false);
 
 	ClearPageCgroupUsed(pc);
+	if (mem == root_mem_cgroup)
+		ClearPageCgroupRoot(pc);
 	/*
 	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
@@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index ecc3918..4406a9c 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 
 #endif
 
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
 static DEFINE_MUTEX(swap_cgroup_mutex);

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
@ 2009-06-05  5:51           ` KAMEZAWA Hiroyuki
  2009-06-05  9:33             ` Balbir Singh
  2009-06-05  6:05           ` Daisuke Nishimura
                             ` (2 subsequent siblings)
  3 siblings, 1 reply; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-05  5:51 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, linux-mm, nishimura, lizf, menage, KOSAKI Motohiro

On Fri, 5 Jun 2009 13:31:07 +0800
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Here is the new version of the patch with the RFC dropped. Andrew,
> Kame, could you please take a look. I am just about to fly out to get
> back home tomorrow, so there might be some silence, unless I get to
> the next WiFi enabled airport.
> 
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> Changelog v3 -> v2
> 
> 1. Rebase to mmotm 2nd June 2009
> 2. Test with some of the test cases recommended by Daisuke-San
> 
> Changelog v2 -> v1
> 1. Fix and implement review comments.
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> pcg_default_flags is now obsolete, but I've not removed it yet. It
> provides some readability to help the code.
> 
> Tests Results:
> 
> Obtained by
> 
> 1. Using tmpfs for mounting filesystem
> 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> 3. Used -s #cpus*40 -e #cpus*40
> 
> Reaim
> 		withoutpatch	patch
> AIM9		9532.48		9807.59
> dbase		19344.60	19285.71
> new_dbase	20101.65	20163.13
> shared		11827.77	11886.65
> compute		17317.38	17420.05
> 

A few comments.


> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |   12 ++++++++++++
>  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..41cc16c 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(AcctLru, ACCT_LRU)
> +CLEARPCGFLAG(AcctLru, ACCT_LRU)
> +TESTPCGFLAG(AcctLru, ACCT_LRU)
>  
I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through
the kernel.

>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a83e039..9561d10 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -197,6 +198,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */

Could you delete this default_flags ? This is of no use after this patch.


> @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
I wonder this condition is valid one or not..

IMHO, all check here should be

==
	if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup)
		return;
	mz = page_cgroup_zoneinfo(pc);
	mem = pc->mem_cgroup;
	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
	ClearPageCgroupAcctLru(pc);
	if (PageCgroupRoot(pc))
		return;
	VM_BUGON(list_empty(&pc->lru);
	list_del_init(&pc->lru);
	return;
==

I'm sorry if there is a case
   (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru))


>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcctLru(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcctLru(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
My concern here is there will be a racy moment that pc->flag shows
  PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup.

Then, The order of code here should be
==
	if (mem == root_mem_cgroup)
		SetPageCgroupRoot(pc);
	pc->mem_cgroup == mem;;
	smp_wmb();
	switch(type) {
	case....
	}
	// Used bit is set at last.
==

But I wonder it's better to use
==
static inline int page_cgroup_is_under_root(pc)
{
	pc->mem_cgroup == root_mem_cgroup;
}
==
I'm not sure why PageCgroupRoot() "bit" is necessary.
Could you clarify the benefit of Root flag ?



> @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  	mem_cgroup_charge_statistics(mem, pc, false);
>  
>  	ClearPageCgroupUsed(pc);
> +	if (mem == root_mem_cgroup)
> +		ClearPageCgroupRoot(pc);
>  	/*
>  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
> @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index ecc3918..4406a9c 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
>  
>  #endif
>  
> -
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  
>  static DEFINE_MUTEX(swap_cgroup_mutex);
> 
Unnecessary diff here.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
  2009-06-05  5:51           ` KAMEZAWA Hiroyuki
@ 2009-06-05  6:05           ` Daisuke Nishimura
  2009-06-05  9:47             ` Balbir Singh
  2009-06-05  6:43           ` Daisuke Nishimura
  2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
  3 siblings, 1 reply; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-05  6:05 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro, Daisuke Nishimura

Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU.

> @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
I think we need ClearPageCgroupCache() here.
Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics().
A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page.

> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
I think you should set PCG_ROOT before setting PCG_USED.
IIUC, PCG_ROOT bit must be visible already when PCG_USED is set.

>  	mem_cgroup_charge_statistics(mem, pc, true);
>  


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
  2009-06-05  5:51           ` KAMEZAWA Hiroyuki
  2009-06-05  6:05           ` Daisuke Nishimura
@ 2009-06-05  6:43           ` Daisuke Nishimura
  2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
  3 siblings, 0 replies; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-05  6:43 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro, Daisuke Nishimura

On Fri, 5 Jun 2009 13:31:07 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Here is the new version of the patch with the RFC dropped. Andrew,
> Kame, could you please take a look. I am just about to fly out to get
> back home tomorrow, so there might be some silence, unless I get to
> the next WiFi enabled airport.
> 
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> Changelog v3 -> v2
> 
> 1. Rebase to mmotm 2nd June 2009
> 2. Test with some of the test cases recommended by Daisuke-San
> 
> Changelog v2 -> v1
> 1. Fix and implement review comments.
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
> A new flag is used to track page_cgroup associated with the root cgroup
> pages. A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> pcg_default_flags is now obsolete, but I've not removed it yet. It
> provides some readability to help the code.
> 
> Tests Results:
> 
> Obtained by
> 
> 1. Using tmpfs for mounting filesystem
> 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> 3. Used -s #cpus*40 -e #cpus*40
> 
> Reaim
> 		withoutpatch	patch
> AIM9		9532.48		9807.59
> dbase		19344.60	19285.71
> new_dbase	20101.65	20163.13
> shared		11827.77	11886.65
> compute		17317.38	17420.05
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |   12 ++++++++++++
>  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
>  mm/page_cgroup.c            |    1 -
>  3 files changed, 50 insertions(+), 5 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..41cc16c 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,8 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ROOT, /* page belongs to root cgroup */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(Root, ROOT)
> +CLEARPCGFLAG(Root, ROOT)
> +TESTPCGFLAG(Root, ROOT)
> +
> +SETPCGFLAG(AcctLru, ACCT_LRU)
> +CLEARPCGFLAG(AcctLru, ACCT_LRU)
> +TESTPCGFLAG(AcctLru, ACCT_LRU)
>  
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index a83e039..9561d10 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -197,6 +198,10 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> +/* Not used, but added here for completeness */
> +#define PCGF_ROOT	(1UL << PCG_ROOT)
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
> +
>  static const unsigned long
>  pcg_default_flags[NR_CHARGE_TYPE] = {
>  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
>  		return;
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  	mz = page_cgroup_zoneinfo(pc);
>  	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	ClearPageCgroupAcctLru(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcctLru(pc);
> +	if (PageCgroupRoot(pc))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	if (mem == root_mem_cgroup)
> +		SetPageCgroupRoot(pc);
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
> @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
>  	mem_cgroup_charge_statistics(mem, pc, false);
>  
>  	ClearPageCgroupUsed(pc);
> +	if (mem == root_mem_cgroup)
> +		ClearPageCgroupRoot(pc);

If we clear PCG_ROOT here, I think we cannot trust PageCgroupRoot() in mem_cgroup_del_lru_list().
And, if we never clear it on free path, we should clear it on commit_charge if mem != root_mem_cgroup.

Thanks,
Daisuke Nishimura.

>  	/*
>  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
>  	 * freed from LRU. This is safe because uncharged page is expected not
> @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> index ecc3918..4406a9c 100644
> --- a/mm/page_cgroup.c
> +++ b/mm/page_cgroup.c
> @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
>  
>  #endif
>  
> -
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  
>  static DEFINE_MUTEX(swap_cgroup_mutex);
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  5:51           ` KAMEZAWA Hiroyuki
@ 2009-06-05  9:33             ` Balbir Singh
  2009-06-08  0:20               ` Daisuke Nishimura
  0 siblings, 1 reply; 41+ messages in thread
From: Balbir Singh @ 2009-06-05  9:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Andrew Morton, linux-mm, nishimura, lizf, menage, KOSAKI Motohiro

* KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-05 14:51:41]:

> On Fri, 5 Jun 2009 13:31:07 +0800
> Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
> > Here is the new version of the patch with the RFC dropped. Andrew,
> > Kame, could you please take a look. I am just about to fly out to get
> > back home tomorrow, so there might be some silence, unless I get to
> > the next WiFi enabled airport.
> > 
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > Changelog v3 -> v2
> > 
> > 1. Rebase to mmotm 2nd June 2009
> > 2. Test with some of the test cases recommended by Daisuke-San
> > 
> > Changelog v2 -> v1
> > 1. Fix and implement review comments.
> > 
> > Feature: Remove the overhead associated with the root cgroup
> > 
> > This patch changes the memory cgroup and removes the overhead associated
> > with accounting all pages in the root cgroup. As a side-effect, we can
> > no longer set a memory hard limit in the root cgroup.
> > 
> > A new flag is used to track page_cgroup associated with the root cgroup
> > pages. A new flag to track whether the page has been accounted or not
> > has been added as well. Flags are now set atomically for page_cgroup,
> > pcg_default_flags is now obsolete, but I've not removed it yet. It
> > provides some readability to help the code.
> > 
> > Tests Results:
> > 
> > Obtained by
> > 
> > 1. Using tmpfs for mounting filesystem
> > 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> > 3. Used -s #cpus*40 -e #cpus*40
> > 
> > Reaim
> > 		withoutpatch	patch
> > AIM9		9532.48		9807.59
> > dbase		19344.60	19285.71
> > new_dbase	20101.65	20163.13
> > shared		11827.77	11886.65
> > compute		17317.38	17420.05
> > 
> 
> A few comments.
> 
> 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> > 
> >  include/linux/page_cgroup.h |   12 ++++++++++++
> >  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
> >  mm/page_cgroup.c            |    1 -
> >  3 files changed, 50 insertions(+), 5 deletions(-)
> > 
> > 
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..41cc16c 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,8 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ROOT, /* page belongs to root cgroup */
> > +	PCG_ACCT_LRU, /* page has been accounted for */
> >  };
> >  
> >  #define TESTPCGFLAG(uname, lname)			\
> > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
> >  
> >  /* Cache flag is set only once (at allocation) */
> >  TESTPCGFLAG(Cache, CACHE)
> > +SETPCGFLAG(Cache, CACHE)
> >  
> >  TESTPCGFLAG(Used, USED)
> >  CLEARPCGFLAG(Used, USED)
> > +SETPCGFLAG(Used, USED)
> > +
> > +SETPCGFLAG(Root, ROOT)
> > +CLEARPCGFLAG(Root, ROOT)
> > +TESTPCGFLAG(Root, ROOT)
> > +
> > +SETPCGFLAG(AcctLru, ACCT_LRU)
> > +CLEARPCGFLAG(AcctLru, ACCT_LRU)
> > +TESTPCGFLAG(AcctLru, ACCT_LRU)
> >  
> I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through
> the kernel.

OK, I'll make that change. I agree LRU is better.

> 
> >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> >  {
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index a83e039..9561d10 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -43,6 +43,7 @@
> >  
> >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> >  
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> > @@ -197,6 +198,10 @@ enum charge_type {
> >  #define PCGF_CACHE	(1UL << PCG_CACHE)
> >  #define PCGF_USED	(1UL << PCG_USED)
> >  #define PCGF_LOCK	(1UL << PCG_LOCK)
> > +/* Not used, but added here for completeness */
> > +#define PCGF_ROOT	(1UL << PCG_ROOT)
> > +#define PCGF_ACCT	(1UL << PCG_ACCT)
> > +
> >  static const unsigned long
> >  pcg_default_flags[NR_CHARGE_TYPE] = {
> >  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> 
> Could you delete this default_flags ? This is of no use after this patch.
>

Yes, I mentioned in the comment that they are for readability of the
code. I can remove them if required.
 
> 
> > @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> >  	/* can happen while we handle swapcache. */
> > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > +	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
> >  		return;
> I wonder this condition is valid one or not..
> 
> IMHO, all check here should be
> 
> ==
> 	if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup)
> 		return;
> 	mz = page_cgroup_zoneinfo(pc);
> 	mem = pc->mem_cgroup;
> 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> 	ClearPageCgroupAcctLru(pc);
> 	if (PageCgroupRoot(pc))
> 		return;
> 	VM_BUGON(list_empty(&pc->lru);
> 	list_del_init(&pc->lru);
> 	return;

We needed this check because

1. After PageCgroupRoot(), list_empty() will always return true for
root cgroup
2. For non root, it won't

The check is enhanced to say, don't go by list_empty(), look to see if
this is root.

I think we can change the condition and stop relying on list_empty()
for the check. I agree.


> ==
> 
> I'm sorry if there is a case
>    (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru))
>

Should not be, I think the list_empty() was used to indicated already
unaccounted, so explicit flags should work fine.
 
> 
> >  	/*
> >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> > @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	mem = pc->mem_cgroup;
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > +	ClearPageCgroupAcctLru(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_del_init(&pc->lru);
> >  	return;
> >  }
> > @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
> >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> >  	 */
> >  	smp_rmb();
> > -	/* unused page is not rotated. */
> > -	if (!PageCgroupUsed(pc))
> > +	/* unused or root page is not rotated. */
> > +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
> >  
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > +	SetPageCgroupAcctLru(pc);
> > +	if (PageCgroupRoot(pc))
> > +		return;
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> >  
> > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	if (mem == root_mem_cgroup)
> > +		SetPageCgroupRoot(pc);
> >  
> >  	mem_cgroup_charge_statistics(mem, pc, true);
> >  
> My concern here is there will be a racy moment that pc->flag shows
>   PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup.
> 
> Then, The order of code here should be
> ==
> 	if (mem == root_mem_cgroup)
> 		SetPageCgroupRoot(pc);
> 	pc->mem_cgroup == mem;;
> 	smp_wmb();
> 	switch(type) {
> 	case....
> 	}
> 	// Used bit is set at last.
> ==
> 
> But I wonder it's better to use
> ==
> static inline int page_cgroup_is_under_root(pc)
> {
> 	pc->mem_cgroup == root_mem_cgroup;
> }
> ==
> I'm not sure why PageCgroupRoot() "bit" is necessary.
> Could you clarify the benefit of Root flag ?

The Root flags was used for accounting, but I think we can start
removing it now.

> 
> 
> 
> > @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> >  	mem_cgroup_charge_statistics(mem, pc, false);
> >  
> >  	ClearPageCgroupUsed(pc);
> > +	if (mem == root_mem_cgroup)
> > +		ClearPageCgroupRoot(pc);
> >  	/*
> >  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
> >  	 * freed from LRU. This is safe because uncharged page is expected not
> > @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
> >  	name = MEMFILE_ATTR(cft->private);
> >  	switch (name) {
> >  	case RES_LIMIT:
> > +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> >  		/* This function does all necessary parse...reuse it */
> >  		ret = res_counter_memparse_write_strategy(buffer, &val);
> >  		if (ret)
> > @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >  	if (cont->parent == NULL) {
> >  		enable_swap_cgroup();
> >  		parent = NULL;
> > +		root_mem_cgroup = mem;
> >  	} else {
> >  		parent = mem_cgroup_from_cont(cont->parent);
> >  		mem->use_hierarchy = parent->use_hierarchy;
> > @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >  	return &mem->css;
> >  free_out:
> >  	__mem_cgroup_free(mem);
> > +	root_mem_cgroup = NULL;
> >  	return ERR_PTR(error);
> >  }
> >  
> > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> > index ecc3918..4406a9c 100644
> > --- a/mm/page_cgroup.c
> > +++ b/mm/page_cgroup.c
> > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
> >  
> >  #endif
> >  
> > -
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  
> >  static DEFINE_MUTEX(swap_cgroup_mutex);
> > 
> Unnecessary diff here.
>

Yes, I'll add back the space.

Thanks for the review 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  6:05           ` Daisuke Nishimura
@ 2009-06-05  9:47             ` Balbir Singh
  2009-06-08  0:03               ` Daisuke Nishimura
  0 siblings, 1 reply; 41+ messages in thread
From: Balbir Singh @ 2009-06-05  9:47 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro

* nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-05 15:05:27]:

> Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU.
> 
> > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		SetPageCgroupUsed(pc);
> I think we need ClearPageCgroupCache() here.
> Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics().
> A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page.

Yes, I know, I think it is best to set pc->flags to 0 before setting
the bits. Thanks!

> 
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	if (mem == root_mem_cgroup)
> > +		SetPageCgroupRoot(pc);
> >  
> I think you should set PCG_ROOT before setting PCG_USED.
> IIUC, PCG_ROOT bit must be visible already when PCG_USED is set.

Kame pointed to something similar, I am going to remove PCG_ROOT in
the next version.

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  9:47             ` Balbir Singh
@ 2009-06-08  0:03               ` Daisuke Nishimura
  0 siblings, 0 replies; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-08  0:03 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro, Daisuke Nishimura

On Fri, 5 Jun 2009 17:47:21 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * nishimura@mxp.nes.nec.co.jp <nishimura@mxp.nes.nec.co.jp> [2009-06-05 15:05:27]:
> 
> > Hmm.. I can't see any practical changes from v2 except for PCG_ACCT -> PCG_ACCT_LRU.
> > 
> > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> > >  		css_put(&mem->css);
> > >  		return;
> > >  	}
> > > +
> > >  	pc->mem_cgroup = mem;
> > >  	smp_wmb();
> > > -	pc->flags = pcg_default_flags[ctype];
> > > +	switch (ctype) {
> > > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > > +		SetPageCgroupCache(pc);
> > > +		SetPageCgroupUsed(pc);
> > > +		break;
> > > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > > +		SetPageCgroupUsed(pc);
> > I think we need ClearPageCgroupCache() here.
> > Otherwise, we cannot trust PageCgroupCache() in mem_cgroup_charge_statistics().
> > A page can be reused, but we don't cleare PCG_CACHE on free/alloc of page.
> 
> Yes, I know, I think it is best to set pc->flags to 0 before setting
> the bits. Thanks!
> 
I don't think clearing pc->flags is a good idea.
It can break PCG_ACCT_LRU bit.
ClearPageCgroupCache() before SetPageCgroupUsed() in case of CHARGE_TYPE_MAPPED
would be enough.

Thanks,
Daisuke Nishimura.

> > 
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > > +
> > > +	if (mem == root_mem_cgroup)
> > > +		SetPageCgroupRoot(pc);
> > >  
> > I think you should set PCG_ROOT before setting PCG_USED.
> > IIUC, PCG_ROOT bit must be visible already when PCG_USED is set.
> 
> Kame pointed to something similar, I am going to remove PCG_ROOT in
> the next version.
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v3)
  2009-06-05  9:33             ` Balbir Singh
@ 2009-06-08  0:20               ` Daisuke Nishimura
  0 siblings, 0 replies; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-08  0:20 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro, Daisuke Nishimura

On Fri, 5 Jun 2009 17:33:54 +0800, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> [2009-06-05 14:51:41]:
> 
> > On Fri, 5 Jun 2009 13:31:07 +0800
> > Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > 
> > > Here is the new version of the patch with the RFC dropped. Andrew,
> > > Kame, could you please take a look. I am just about to fly out to get
> > > back home tomorrow, so there might be some silence, unless I get to
> > > the next WiFi enabled airport.
> > > 
> > > 
> > > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > 
> > > Changelog v3 -> v2
> > > 
> > > 1. Rebase to mmotm 2nd June 2009
> > > 2. Test with some of the test cases recommended by Daisuke-San
> > > 
> > > Changelog v2 -> v1
> > > 1. Fix and implement review comments.
> > > 
> > > Feature: Remove the overhead associated with the root cgroup
> > > 
> > > This patch changes the memory cgroup and removes the overhead associated
> > > with accounting all pages in the root cgroup. As a side-effect, we can
> > > no longer set a memory hard limit in the root cgroup.
> > > 
> > > A new flag is used to track page_cgroup associated with the root cgroup
> > > pages. A new flag to track whether the page has been accounted or not
> > > has been added as well. Flags are now set atomically for page_cgroup,
> > > pcg_default_flags is now obsolete, but I've not removed it yet. It
> > > provides some readability to help the code.
> > > 
> > > Tests Results:
> > > 
> > > Obtained by
> > > 
> > > 1. Using tmpfs for mounting filesystem
> > > 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> > > 3. Used -s #cpus*40 -e #cpus*40
> > > 
> > > Reaim
> > > 		withoutpatch	patch
> > > AIM9		9532.48		9807.59
> > > dbase		19344.60	19285.71
> > > new_dbase	20101.65	20163.13
> > > shared		11827.77	11886.65
> > > compute		17317.38	17420.05
> > > 
> > 
> > A few comments.
> > 
> > 
> > > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > > ---
> > > 
> > >  include/linux/page_cgroup.h |   12 ++++++++++++
> > >  mm/memcontrol.c             |   42 ++++++++++++++++++++++++++++++++++++++----
> > >  mm/page_cgroup.c            |    1 -
> > >  3 files changed, 50 insertions(+), 5 deletions(-)
> > > 
> > > 
> > > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > > index 7339c7b..41cc16c 100644
> > > --- a/include/linux/page_cgroup.h
> > > +++ b/include/linux/page_cgroup.h
> > > @@ -26,6 +26,8 @@ enum {
> > >  	PCG_LOCK,  /* page cgroup is locked */
> > >  	PCG_CACHE, /* charged as cache */
> > >  	PCG_USED, /* this object is in use. */
> > > +	PCG_ROOT, /* page belongs to root cgroup */
> > > +	PCG_ACCT_LRU, /* page has been accounted for */
> > >  };
> > >  
> > >  #define TESTPCGFLAG(uname, lname)			\
> > > @@ -42,9 +44,19 @@ static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
> > >  
> > >  /* Cache flag is set only once (at allocation) */
> > >  TESTPCGFLAG(Cache, CACHE)
> > > +SETPCGFLAG(Cache, CACHE)
> > >  
> > >  TESTPCGFLAG(Used, USED)
> > >  CLEARPCGFLAG(Used, USED)
> > > +SETPCGFLAG(Used, USED)
> > > +
> > > +SETPCGFLAG(Root, ROOT)
> > > +CLEARPCGFLAG(Root, ROOT)
> > > +TESTPCGFLAG(Root, ROOT)
> > > +
> > > +SETPCGFLAG(AcctLru, ACCT_LRU)
> > > +CLEARPCGFLAG(AcctLru, ACCT_LRU)
> > > +TESTPCGFLAG(AcctLru, ACCT_LRU)
> > >  
> > I prefer AcctLRU rather than AcctLru. LRU is LRU or lru and not Lru through
> > the kernel.
> 
> OK, I'll make that change. I agree LRU is better.
> 
> > 
> > >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> > >  {
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index a83e039..9561d10 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -43,6 +43,7 @@
> > >  
> > >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> > >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> > >  
> > >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> > >  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> > > @@ -197,6 +198,10 @@ enum charge_type {
> > >  #define PCGF_CACHE	(1UL << PCG_CACHE)
> > >  #define PCGF_USED	(1UL << PCG_USED)
> > >  #define PCGF_LOCK	(1UL << PCG_LOCK)
> > > +/* Not used, but added here for completeness */
> > > +#define PCGF_ROOT	(1UL << PCG_ROOT)
> > > +#define PCGF_ACCT	(1UL << PCG_ACCT)
> > > +
> > >  static const unsigned long
> > >  pcg_default_flags[NR_CHARGE_TYPE] = {
> > >  	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> > 
> > Could you delete this default_flags ? This is of no use after this patch.
> >
> 
> Yes, I mentioned in the comment that they are for readability of the
> code. I can remove them if required.
>  
> > 
> > > @@ -375,7 +380,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> > >  		return;
> > >  	pc = lookup_page_cgroup(page);
> > >  	/* can happen while we handle swapcache. */
> > > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > > +	if ((!PageCgroupAcctLru(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
> > >  		return;
> > I wonder this condition is valid one or not..
> > 
> > IMHO, all check here should be
> > 
> > ==
> > 	if (!PageCgroupAcctLru(pc) || !pc->mem_cgroup)
I think checking !pc->mem_cgroup would also be verbose, it can be
changed to VM_BUG_ON().
And wouldn't "if (!TestClearPageCgroupAcctLRU(pc))" be better ? We can remove
ClearPageCgroupAcctLRU() then.

> > 		return;
> > 	mz = page_cgroup_zoneinfo(pc);
> > 	mem = pc->mem_cgroup;
> > 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > 	ClearPageCgroupAcctLru(pc);
> > 	if (PageCgroupRoot(pc))
> > 		return;
> > 	VM_BUGON(list_empty(&pc->lru);
> > 	list_del_init(&pc->lru);
> > 	return;
> 
> We needed this check because
> 
> 1. After PageCgroupRoot(), list_empty() will always return true for
> root cgroup
> 2. For non root, it won't
> 
> The check is enhanced to say, don't go by list_empty(), look to see if
> this is root.
> 
> I think we can change the condition and stop relying on list_empty()
> for the check. I agree.
> 
> 
> > ==
> > 
> > I'm sorry if there is a case
> >    (PageCgroupAcctLru(pc) && !PageCgroupRoot(pc) && list_empty(&pc->lru))
> >
> 
> Should not be, I think the list_empty() was used to indicated already
> unaccounted, so explicit flags should work fine.
>  
> > 
> > >  	/*
> > >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> > > @@ -384,6 +389,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> > >  	mz = page_cgroup_zoneinfo(pc);
> > >  	mem = pc->mem_cgroup;
Can you delte this obsolete line ?

> > >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > > +	ClearPageCgroupAcctLru(pc);
> > > +	if (PageCgroupRoot(pc))
> > > +		return;
> > >  	list_del_init(&pc->lru);
> > >  	return;
> > >  }
> > > @@ -407,8 +415,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
> > >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> > >  	 */
> > >  	smp_rmb();
> > > -	/* unused page is not rotated. */
> > > -	if (!PageCgroupUsed(pc))
> > > +	/* unused or root page is not rotated. */
> > > +	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
> > >  		return;
> > >  	mz = page_cgroup_zoneinfo(pc);
> > >  	list_move(&pc->lru, &mz->lists[lru]);
> > > @@ -432,6 +440,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
> > >  
> > >  	mz = page_cgroup_zoneinfo(pc);
> > >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > > +	SetPageCgroupAcctLru(pc);
> > > +	if (PageCgroupRoot(pc))
> > > +		return;
> > >  	list_add(&pc->lru, &mz->lists[lru]);
> > >  }
> > >  
Can you add "VM_BUG_ON(PageCgroupAcctLRU(pc))" in mem_cgroup_add_lru_list() ?
And you should change "list_empty(&pc->lru)" in mem_cgroup_lru_add_after_commit_swapcache()
to "!PageCgroupAcctLRU(pc)".


Thanks,
Daisuke Nishimura.

> > > @@ -1107,9 +1118,24 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> > >  		css_put(&mem->css);
> > >  		return;
> > >  	}
> > > +
> > >  	pc->mem_cgroup = mem;
> > >  	smp_wmb();
> > > -	pc->flags = pcg_default_flags[ctype];
> > > +	switch (ctype) {
> > > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > > +		SetPageCgroupCache(pc);
> > > +		SetPageCgroupUsed(pc);
> > > +		break;
> > > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > > +		SetPageCgroupUsed(pc);
> > > +		break;
> > > +	default:
> > > +		break;
> > > +	}
> > > +
> > > +	if (mem == root_mem_cgroup)
> > > +		SetPageCgroupRoot(pc);
> > >  
> > >  	mem_cgroup_charge_statistics(mem, pc, true);
> > >  
> > My concern here is there will be a racy moment that pc->flag shows
> >   PageCgroupUsed(pc) && !PageCgroupRoot(pc) even if pc->mem_cgroup == root_mem_cgroup.
> > 
> > Then, The order of code here should be
> > ==
> > 	if (mem == root_mem_cgroup)
> > 		SetPageCgroupRoot(pc);
> > 	pc->mem_cgroup == mem;;
> > 	smp_wmb();
> > 	switch(type) {
> > 	case....
> > 	}
> > 	// Used bit is set at last.
> > ==
> > 
> > But I wonder it's better to use
> > ==
> > static inline int page_cgroup_is_under_root(pc)
> > {
> > 	pc->mem_cgroup == root_mem_cgroup;
> > }
> > ==
> > I'm not sure why PageCgroupRoot() "bit" is necessary.
> > Could you clarify the benefit of Root flag ?
> 
> The Root flags was used for accounting, but I think we can start
> removing it now.
> 
> > 
> > 
> > 
> > > @@ -1515,6 +1541,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
> > >  	mem_cgroup_charge_statistics(mem, pc, false);
> > >  
> > >  	ClearPageCgroupUsed(pc);
> > > +	if (mem == root_mem_cgroup)
> > > +		ClearPageCgroupRoot(pc);
> > >  	/*
> > >  	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
> > >  	 * freed from LRU. This is safe because uncharged page is expected not
> > > @@ -2036,6 +2064,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
> > >  	name = MEMFILE_ATTR(cft->private);
> > >  	switch (name) {
> > >  	case RES_LIMIT:
> > > +		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
> > > +			ret = -EINVAL;
> > > +			break;
> > > +		}
> > >  		/* This function does all necessary parse...reuse it */
> > >  		ret = res_counter_memparse_write_strategy(buffer, &val);
> > >  		if (ret)
> > > @@ -2502,6 +2534,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > >  	if (cont->parent == NULL) {
> > >  		enable_swap_cgroup();
> > >  		parent = NULL;
> > > +		root_mem_cgroup = mem;
> > >  	} else {
> > >  		parent = mem_cgroup_from_cont(cont->parent);
> > >  		mem->use_hierarchy = parent->use_hierarchy;
> > > @@ -2530,6 +2563,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> > >  	return &mem->css;
> > >  free_out:
> > >  	__mem_cgroup_free(mem);
> > > +	root_mem_cgroup = NULL;
> > >  	return ERR_PTR(error);
> > >  }
> > >  
> > > diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
> > > index ecc3918..4406a9c 100644
> > > --- a/mm/page_cgroup.c
> > > +++ b/mm/page_cgroup.c
> > > @@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
> > >  
> > >  #endif
> > >  
> > > -
> > >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> > >  
> > >  static DEFINE_MUTEX(swap_cgroup_mutex);
> > > 
> > Unnecessary diff here.
> >
> 
> Yes, I'll add back the space.
> 
> Thanks for the review 
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Low overhead patches for the memory cgroup controller (v4)
  2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
                             ` (2 preceding siblings ...)
  2009-06-05  6:43           ` Daisuke Nishimura
@ 2009-06-14 18:37           ` Balbir Singh
  2009-06-15  2:04             ` KAMEZAWA Hiroyuki
  2009-06-15  2:18             ` Daisuke Nishimura
  3 siblings, 2 replies; 41+ messages in thread
From: Balbir Singh @ 2009-06-14 18:37 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki, Andrew Morton
  Cc: linux-mm, nishimura, lizf, menage, KOSAKI Motohiro

Here is v4 of the patches, please review and comment

Feature: Remove the overhead associated with the root cgroup

From: Balbir Singh <balbir@linux.vnet.ibm.com>

changelog v4 -> v3
1. Rebase to mmotm 9th june 2009
2. Remove PageCgroupRoot, we have account LRU flags to indicate that
   we do only accounting and no reclaim.
3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
   we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
4. More LRU functions are aware of PageCgroupAcctLRU

Changelog v3 -> v2

1. Rebase to mmotm 2nd June 2009
2. Test with some of the test cases recommended by Daisuke-San

Changelog v2 -> v1
1. Rebase to latest mmotm

This patch changes the memory cgroup and removes the overhead associated
with accounting all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag to track whether the page has been accounted or not
has been added as well. Flags are now set atomically for page_cgroup,

Tests:

Results (for v2)

Obtained by

1. Using tmpfs for mounting filesystem
2. Changing sync to be /bin/true (so that sync is not the bottleneck)
3. Used -s #cpus*40 -e #cpus*40

Reaim
		withoutpatch	patch
AIM9		9532.48		9807.59
dbase		19344.60	19285.71
new_dbase	20101.65	20163.13
shared		11827.77	11886.65
compute		17317.38	17420.05

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |    5 ++++
 mm/memcontrol.c             |   59 ++++++++++++++++++++++++++++++++++++-------
 2 files changed, 54 insertions(+), 10 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..57c4d50 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,7 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ACCT_LRU, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE)
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
 
+SETPCGFLAG(AcctLRU, ACCT_LRU)
+CLEARPCGFLAG(AcctLRU, ACCT_LRU)
+TESTPCGFLAG(AcctLRU, ACCT_LRU)
+
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
 	return page_to_nid(pc->page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 6ceb6f2..399d416 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
@@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
 static void mem_cgroup_put(struct mem_cgroup *mem);
 static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
 
+static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
+{
+	return (mem == root_mem_cgroup);
+}
+
 static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
 					 struct page_cgroup *pc,
 					 bool charge)
@@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	mem = pc->mem_cgroup;
+	if (!mem)
+		return;
+	if (mem_cgroup_is_root(mem)) {
+		if (!PageCgroupAcctLRU(pc))
+			return;
+	} else if (list_empty(&pc->lru))
 		return;
+
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
 	 * removed from global LRU.
 	 */
 	mz = page_cgroup_zoneinfo(pc);
-	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	if (PageCgroupAcctLRU(pc)) {
+		ClearPageCgroupAcctLRU(pc);
+		return;
+	}
 	list_del_init(&pc->lru);
 	return;
 }
@@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	if (mem_cgroup_is_root(pc->mem_cgroup)) {
+		SetPageCgroupAcctLRU(pc);
+		return;
+	}
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
  * it again. This function is only used to charge SwapCache. It's done under
  * lock_page and expected that zone->lru_lock is never held.
  */
-static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page,
+							struct page_cgroup *pc)
 {
 	unsigned long flags;
 	struct zone *zone = page_zone(page);
-	struct page_cgroup *pc = lookup_page_cgroup(page);
 
+	if (!pc->mem_cgroup ||
+		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
+		return;
 	spin_lock_irqsave(&zone->lru_lock, flags);
 	/*
 	 * Forget old LRU when this page_cgroup is *not* used. This Used bit
@@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
 	spin_unlock_irqrestore(&zone->lru_lock, flags);
 }
 
-static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
+static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page,
+							struct page_cgroup *pc)
 {
 	unsigned long flags;
 	struct zone *zone = page_zone(page);
-	struct page_cgroup *pc = lookup_page_cgroup(page);
 
+	if (!pc->mem_cgroup ||
+		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
+		return;
 	spin_lock_irqsave(&zone->lru_lock, flags);
 	/* link when the page is linked to LRU but page_cgroup isn't */
 	if (PageLRU(page) && list_empty(&pc->lru))
@@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
 void mem_cgroup_move_lists(struct page *page,
 			   enum lru_list from, enum lru_list to)
 {
+	struct page_cgroup *pc = lookup_page_cgroup(page);
 	if (mem_cgroup_disabled())
 		return;
+	smp_rmb();
+	if (!pc->mem_cgroup ||
+		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
+		return;
 	mem_cgroup_del_lru_list(page, from);
 	mem_cgroup_add_lru_list(page, to);
 }
@@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
 	pc->flags = pcg_default_flags[ctype];
@@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
 	if (!ptr)
 		return;
 	pc = lookup_page_cgroup(page);
-	mem_cgroup_lru_del_before_commit_swapcache(page);
+	smp_rmb();
+	mem_cgroup_lru_del_before_commit_swapcache(page, pc);
 	__mem_cgroup_commit_charge(ptr, pc, ctype);
-	mem_cgroup_lru_add_after_commit_swapcache(page);
+	mem_cgroup_lru_add_after_commit_swapcache(page, pc);
 	/*
 	 * Now swap is on-memory. This means this page may be
 	 * counted both as mem and swap....double count.
@@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
@ 2009-06-15  2:04             ` KAMEZAWA Hiroyuki
  2009-06-15  2:18             ` Daisuke Nishimura
  1 sibling, 0 replies; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-15  2:04 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, linux-mm, nishimura, lizf, menage, KOSAKI Motohiro

On Mon, 15 Jun 2009 00:07:40 +0530
Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> Here is v4 of the patches, please review and comment
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> changelog v4 -> v3
> 1. Rebase to mmotm 9th june 2009
> 2. Remove PageCgroupRoot, we have account LRU flags to indicate that
>    we do only accounting and no reclaim.
> 3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
>    we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
> 4. More LRU functions are aware of PageCgroupAcctLRU
> 
> Changelog v3 -> v2
> 
> 1. Rebase to mmotm 2nd June 2009
> 2. Test with some of the test cases recommended by Daisuke-San
> 
> Changelog v2 -> v1
> 1. Rebase to latest mmotm
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
> A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> 
> Tests:
> 
> Results (for v2)
> 
> Obtained by
> 
> 1. Using tmpfs for mounting filesystem
> 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> 3. Used -s #cpus*40 -e #cpus*40
> 
> Reaim
> 		withoutpatch	patch
> AIM9		9532.48		9807.59
> dbase		19344.60	19285.71
> new_dbase	20101.65	20163.13
> shared		11827.77	11886.65
> compute		17317.38	17420.05
> 

Hmm, how much overhead this patch adds for non-root cgroup ?
It seems getting better in general. But I have a few suggestions.


> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |    5 ++++
>  mm/memcontrol.c             |   59 ++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 54 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..57c4d50 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,7 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE)
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>  
> +SETPCGFLAG(AcctLRU, ACCT_LRU)
> +CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> +TESTPCGFLAG(AcctLRU, ACCT_LRU)
> +
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
>  	return page_to_nid(pc->page);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6ceb6f2..399d416 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
>  static void mem_cgroup_put(struct mem_cgroup *mem);
>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
>  
> +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
> +{
> +	return (mem == root_mem_cgroup);
> +}
> +
>  static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
>  					 struct page_cgroup *pc,
>  					 bool charge)
> @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	mem = pc->mem_cgroup;
> +	if (!mem)
> +		return;
> +	if (mem_cgroup_is_root(mem)) {
> +		if (!PageCgroupAcctLRU(pc))
> +			return;
> +	} else if (list_empty(&pc->lru))
>  		return;
> +
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
>  	 * removed from global LRU.
>  	 */
>  	mz = page_cgroup_zoneinfo(pc);
> -	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	if (PageCgroupAcctLRU(pc)) {
> +		ClearPageCgroupAcctLRU(pc);
> +		return;
> +	}
>  	list_del_init(&pc->lru);
>  	return;
>  }
Looking through the whole code, PageCgroupAcctLRU() is meaningful only when
pc->mem_cgroup == root_mem_cgroup.  Right ?

I wonder making PageCgroupAcctLRU() be always meaningful and remove all
!list_empty(&pc->lru) check is a way to go.

If do so, this function can be written as

==
	if (!PageCgroupAcctLRU(pc))
		return;
	mem = pc->mem_cgroup;
	mz = page_cgroup_zoneinfo(pc);
	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
	ClearPageCgroupAcctLRU(pc);
	/* We don't maintain LRU for root cgroup. Global LRU works for us. */
	if (!mem_cgroup_is_root(mem))
		list_del_init(&pc->lru);
==
This seems much straightforward. 

> @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	if (mem_cgroup_is_root(pc->mem_cgroup)) {
> +		SetPageCgroupAcctLRU(pc);
> +		return;
> +	}
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
With above (my) rule.   Here will be
	SetPageCgroupAcctLRU(pc);
	if (!mem_cgroup_is_root(pc->mem_cgroup))
		list_add(&pc->lru, &mz->lists[lru]);

> @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>   * it again. This function is only used to charge SwapCache. It's done under
>   * lock_page and expected that zone->lru_lock is never held.
>   */
> -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page,
> +							struct page_cgroup *pc)
>  {
>  	unsigned long flags;
>  	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
PageCgroupAcctLRU() check is done without zone->lock and this is racy if you check
flag. Considering how "pagevec" works, this race tend to be big.


>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/*
>  	 * Forget old LRU when this page_cgroup is *not* used. This Used bit
> @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
>  	spin_unlock_irqrestore(&zone->lru_lock, flags);
>  }
>  
> -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page,
> +							struct page_cgroup *pc)
>  {
>  	unsigned long flags;
>  	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;

The same comment as above.


>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/* link when the page is linked to LRU but page_cgroup isn't */
>  	if (PageLRU(page) && list_empty(&pc->lru))
> @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
>  void mem_cgroup_move_lists(struct page *page,
>  			   enum lru_list from, enum lru_list to)
>  {
> +	struct page_cgroup *pc = lookup_page_cgroup(page);
>  	if (mem_cgroup_disabled())
>  		return;
> +	smp_rmb();
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
>  	mem_cgroup_del_lru_list(page, from);
>  	mem_cgroup_add_lru_list(page, to);
>  }
Here, too.


> @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
>  	pc->flags = pcg_default_flags[ctype];
> @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
>  	if (!ptr)
>  		return;
>  	pc = lookup_page_cgroup(page);
> -	mem_cgroup_lru_del_before_commit_swapcache(page);
> +	smp_rmb();
> +	mem_cgroup_lru_del_before_commit_swapcache(page, pc);
>  	__mem_cgroup_commit_charge(ptr, pc, ctype);
> -	mem_cgroup_lru_add_after_commit_swapcache(page);
> +	mem_cgroup_lru_add_after_commit_swapcache(page, pc);

Why this change ? When you adds memory barrier, plz add comments.


>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}

Could you add modification to Documentation in the next post ?


>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  

Could you start next thread in the next post ? Once I read and make this from
unread to read, this goes far deep of old mail tree ;)


Regards,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
  2009-06-15  2:04             ` KAMEZAWA Hiroyuki
@ 2009-06-15  2:18             ` Daisuke Nishimura
  2009-06-15  2:23               ` KAMEZAWA Hiroyuki
  2009-06-15  3:00               ` Balbir Singh
  1 sibling, 2 replies; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-15  2:18 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro, Daisuke Nishimura

On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Here is v4 of the patches, please review and comment
> 
> Feature: Remove the overhead associated with the root cgroup
> 
> From: Balbir Singh <balbir@linux.vnet.ibm.com>
> 
> changelog v4 -> v3
> 1. Rebase to mmotm 9th june 2009
> 2. Remove PageCgroupRoot, we have account LRU flags to indicate that
>    we do only accounting and no reclaim.
hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be
used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks.

> 3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
>    we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU
outside of zone->lru_lock.

IMHO, the most complicated case is a SwapCache which has been read ahead by
a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be
on page_vec and be drained to LRU asymmetrically with do_swap_page().
Well, yes it would be safe just because PCGF_ACCT_LRU would not be set
if PCGF_USED has not been set, but I don't think it's a good idea to touch
PCGF_ACCT_LRU outside of zone->lru_lock anyway.


Doesn't a patch like below work for you ?
Lightly tested under global memory pressure(w/o memcg's memory pressure)
on a small machine(just a bit modified from then though).

===
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
---
 include/linux/page_cgroup.h |   13 ++++++++++
 mm/memcontrol.c             |   54 +++++++++++++++++++++++++++++++-----------
 2 files changed, 53 insertions(+), 14 deletions(-)

diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..debd8ba 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,7 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ACCT_LRU, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -40,11 +41,23 @@ static inline void SetPageCgroup##uname(struct page_cgroup *pc)\
 static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
 	{ clear_bit(PCG_##lname, &pc->flags);  }
 
+#define TESTCLEARPCGFLAG(uname, lname)			\
+static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)	\
+	{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
+
 /* Cache flag is set only once (at allocation) */
 TESTPCGFLAG(Cache, CACHE)
+CLEARPCGFLAG(Cache, CACHE)
+SETPCGFLAG(Cache, CACHE)
 
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
+SETPCGFLAG(Used, USED)
+
+SETPCGFLAG(AcctLRU, ACCT_LRU)
+CLEARPCGFLAG(AcctLRU, ACCT_LRU)
+TESTPCGFLAG(AcctLRU, ACCT_LRU)
+TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
 
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index dbece65..820f3e6 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
@@ -200,13 +201,8 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
-static const unsigned long
-pcg_default_flags[NR_CHARGE_TYPE] = {
-	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
-	PCGF_USED | PCGF_LOCK, /* Anon */
-	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* Shmem */
-	0, /* FORCE */
-};
+/* Not used, but added here for completeness */
+#define PCGF_ACCT	(1UL << PCG_ACCT)
 
 /* for encoding cft->private value on file */
 #define _MEM			(0)
@@ -354,6 +350,11 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data,
 	return ret;
 }
 
+static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
+{
+	return (mem == root_mem_cgroup);
+}
+
 /*
  * Following LRU functions are allowed to be used without PCG_LOCK.
  * Operations are called by routine of global LRU independently from memcg.
@@ -371,22 +372,24 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data,
 void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 {
 	struct page_cgroup *pc;
-	struct mem_cgroup *mem;
 	struct mem_cgroup_per_zone *mz;
 
 	if (mem_cgroup_disabled())
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if (!TestClearPageCgroupAcctLRU(pc))
 		return;
+	VM_BUG_ON(!pc->mem_cgroup);
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
 	 * removed from global LRU.
 	 */
 	mz = page_cgroup_zoneinfo(pc);
-	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	if (mem_cgroup_is_root(pc->mem_cgroup))
+		return;
+	VM_BUG_ON(list_empty(&pc->lru));
 	list_del_init(&pc->lru);
 	return;
 }
@@ -410,8 +413,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -425,6 +428,7 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 	if (mem_cgroup_disabled())
 		return;
 	pc = lookup_page_cgroup(page);
+	VM_BUG_ON(PageCgroupAcctLRU(pc));
 	/*
 	 * Used bit is set without atomic ops but after smp_wmb().
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
@@ -435,6 +439,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcctLRU(pc);
+	if (mem_cgroup_is_root(pc->mem_cgroup))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -469,7 +476,7 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
 
 	spin_lock_irqsave(&zone->lru_lock, flags);
 	/* link when the page is linked to LRU but page_cgroup isn't */
-	if (PageLRU(page) && list_empty(&pc->lru))
+	if (PageLRU(page) && !PageCgroupAcctLRU(pc))
 		mem_cgroup_add_lru_list(page, page_lru(page));
 	spin_unlock_irqrestore(&zone->lru_lock, flags);
 }
@@ -1106,9 +1113,22 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
-	pc->flags = pcg_default_flags[ctype];
+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		ClearPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}
 
 	mem_cgroup_charge_statistics(mem, pc, true);
 
@@ -2047,6 +2067,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2513,6 +2537,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2541,6 +2566,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
 
===


Thanks,
Daisuke Nishimura.

> 4. More LRU functions are aware of PageCgroupAcctLRU
> 
> Changelog v3 -> v2
> 
> 1. Rebase to mmotm 2nd June 2009
> 2. Test with some of the test cases recommended by Daisuke-San
> 
> Changelog v2 -> v1
> 1. Rebase to latest mmotm
> 
> This patch changes the memory cgroup and removes the overhead associated
> with accounting all pages in the root cgroup. As a side-effect, we can
> no longer set a memory hard limit in the root cgroup.
> 
> A new flag to track whether the page has been accounted or not
> has been added as well. Flags are now set atomically for page_cgroup,
> 
> Tests:
> 
> Results (for v2)
> 
> Obtained by
> 
> 1. Using tmpfs for mounting filesystem
> 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> 3. Used -s #cpus*40 -e #cpus*40
> 
> Reaim
> 		withoutpatch	patch
> AIM9		9532.48		9807.59
> dbase		19344.60	19285.71
> new_dbase	20101.65	20163.13
> shared		11827.77	11886.65
> compute		17317.38	17420.05
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  include/linux/page_cgroup.h |    5 ++++
>  mm/memcontrol.c             |   59 ++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 54 insertions(+), 10 deletions(-)
> 
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..57c4d50 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,7 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE)
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
>  
> +SETPCGFLAG(AcctLRU, ACCT_LRU)
> +CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> +TESTPCGFLAG(AcctLRU, ACCT_LRU)
> +
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
>  	return page_to_nid(pc->page);
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 6ceb6f2..399d416 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
>  static void mem_cgroup_put(struct mem_cgroup *mem);
>  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
>  
> +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
> +{
> +	return (mem == root_mem_cgroup);
> +}
> +
>  static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
>  					 struct page_cgroup *pc,
>  					 bool charge)
> @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	mem = pc->mem_cgroup;
> +	if (!mem)
> +		return;
> +	if (mem_cgroup_is_root(mem)) {
> +		if (!PageCgroupAcctLRU(pc))
> +			return;
> +	} else if (list_empty(&pc->lru))
>  		return;
> +
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
>  	 * removed from global LRU.
>  	 */
>  	mz = page_cgroup_zoneinfo(pc);
> -	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	if (PageCgroupAcctLRU(pc)) {
> +		ClearPageCgroupAcctLRU(pc);
> +		return;
> +	}
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	if (mem_cgroup_is_root(pc->mem_cgroup)) {
> +		SetPageCgroupAcctLRU(pc);
> +		return;
> +	}
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>   * it again. This function is only used to charge SwapCache. It's done under
>   * lock_page and expected that zone->lru_lock is never held.
>   */
> -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page,
> +							struct page_cgroup *pc)
>  {
>  	unsigned long flags;
>  	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/*
>  	 * Forget old LRU when this page_cgroup is *not* used. This Used bit
> @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
>  	spin_unlock_irqrestore(&zone->lru_lock, flags);
>  }
>  
> -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page,
> +							struct page_cgroup *pc)
>  {
>  	unsigned long flags;
>  	struct zone *zone = page_zone(page);
> -	struct page_cgroup *pc = lookup_page_cgroup(page);
>  
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/* link when the page is linked to LRU but page_cgroup isn't */
>  	if (PageLRU(page) && list_empty(&pc->lru))
> @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
>  void mem_cgroup_move_lists(struct page *page,
>  			   enum lru_list from, enum lru_list to)
>  {
> +	struct page_cgroup *pc = lookup_page_cgroup(page);
>  	if (mem_cgroup_disabled())
>  		return;
> +	smp_rmb();
> +	if (!pc->mem_cgroup ||
> +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> +		return;
>  	mem_cgroup_del_lru_list(page, from);
>  	mem_cgroup_add_lru_list(page, to);
>  }
> @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
>  	pc->flags = pcg_default_flags[ctype];
> @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
>  	if (!ptr)
>  		return;
>  	pc = lookup_page_cgroup(page);
> -	mem_cgroup_lru_del_before_commit_swapcache(page);
> +	smp_rmb();
> +	mem_cgroup_lru_del_before_commit_swapcache(page, pc);
>  	__mem_cgroup_commit_charge(ptr, pc, ctype);
> -	mem_cgroup_lru_add_after_commit_swapcache(page);
> +	mem_cgroup_lru_add_after_commit_swapcache(page, pc);
>  	/*
>  	 * Now swap is on-memory. This means this page may be
>  	 * counted both as mem and swap....double count.
> @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
> 
> -- 
> 	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  2:18             ` Daisuke Nishimura
@ 2009-06-15  2:23               ` KAMEZAWA Hiroyuki
  2009-06-15  2:44                 ` Balbir Singh
  2009-06-15  3:00               ` Balbir Singh
  1 sibling, 1 reply; 41+ messages in thread
From: KAMEZAWA Hiroyuki @ 2009-06-15  2:23 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: balbir, Andrew Morton, linux-mm, lizf, menage, KOSAKI Motohiro

On Mon, 15 Jun 2009 11:18:17 +0900
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:

> On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> > Here is v4 of the patches, please review and comment
> > 
> > Feature: Remove the overhead associated with the root cgroup
> > 
> > From: Balbir Singh <balbir@linux.vnet.ibm.com>
> > 
> > changelog v4 -> v3
> > 1. Rebase to mmotm 9th june 2009
> > 2. Remove PageCgroupRoot, we have account LRU flags to indicate that
> >    we do only accounting and no reclaim.
> hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be
> used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks.
> 
> > 3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
> >    we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
> It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU
> outside of zone->lru_lock.
> 
> IMHO, the most complicated case is a SwapCache which has been read ahead by
> a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be
> on page_vec and be drained to LRU asymmetrically with do_swap_page().
> Well, yes it would be safe just because PCGF_ACCT_LRU would not be set
> if PCGF_USED has not been set, but I don't think it's a good idea to touch
> PCGF_ACCT_LRU outside of zone->lru_lock anyway.
> 
> 
> Doesn't a patch like below work for you ?
> Lightly tested under global memory pressure(w/o memcg's memory pressure)
> on a small machine(just a bit modified from then though).
> 
This patch includes almost all what I want ;)

Thanks,
-Kame


> ===
> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
> ---
>  include/linux/page_cgroup.h |   13 ++++++++++
>  mm/memcontrol.c             |   54 +++++++++++++++++++++++++++++++-----------
>  2 files changed, 53 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> index 7339c7b..debd8ba 100644
> --- a/include/linux/page_cgroup.h
> +++ b/include/linux/page_cgroup.h
> @@ -26,6 +26,7 @@ enum {
>  	PCG_LOCK,  /* page cgroup is locked */
>  	PCG_CACHE, /* charged as cache */
>  	PCG_USED, /* this object is in use. */
> +	PCG_ACCT_LRU, /* page has been accounted for */
>  };
>  
>  #define TESTPCGFLAG(uname, lname)			\
> @@ -40,11 +41,23 @@ static inline void SetPageCgroup##uname(struct page_cgroup *pc)\
>  static inline void ClearPageCgroup##uname(struct page_cgroup *pc)	\
>  	{ clear_bit(PCG_##lname, &pc->flags);  }
>  
> +#define TESTCLEARPCGFLAG(uname, lname)			\
> +static inline int TestClearPageCgroup##uname(struct page_cgroup *pc)	\
> +	{ return test_and_clear_bit(PCG_##lname, &pc->flags);  }
> +
>  /* Cache flag is set only once (at allocation) */
>  TESTPCGFLAG(Cache, CACHE)
> +CLEARPCGFLAG(Cache, CACHE)
> +SETPCGFLAG(Cache, CACHE)
>  
>  TESTPCGFLAG(Used, USED)
>  CLEARPCGFLAG(Used, USED)
> +SETPCGFLAG(Used, USED)
> +
> +SETPCGFLAG(AcctLRU, ACCT_LRU)
> +CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> +TESTPCGFLAG(AcctLRU, ACCT_LRU)
> +TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU)
>  
>  static inline int page_cgroup_nid(struct page_cgroup *pc)
>  {
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index dbece65..820f3e6 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -43,6 +43,7 @@
>  
>  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
>  #define MEM_CGROUP_RECLAIM_RETRIES	5
> +struct mem_cgroup *root_mem_cgroup __read_mostly;
>  
>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
>  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> @@ -200,13 +201,8 @@ enum charge_type {
>  #define PCGF_CACHE	(1UL << PCG_CACHE)
>  #define PCGF_USED	(1UL << PCG_USED)
>  #define PCGF_LOCK	(1UL << PCG_LOCK)
> -static const unsigned long
> -pcg_default_flags[NR_CHARGE_TYPE] = {
> -	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
> -	PCGF_USED | PCGF_LOCK, /* Anon */
> -	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* Shmem */
> -	0, /* FORCE */
> -};
> +/* Not used, but added here for completeness */
> +#define PCGF_ACCT	(1UL << PCG_ACCT)
>  
>  /* for encoding cft->private value on file */
>  #define _MEM			(0)
> @@ -354,6 +350,11 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data,
>  	return ret;
>  }
>  
> +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
> +{
> +	return (mem == root_mem_cgroup);
> +}
> +
>  /*
>   * Following LRU functions are allowed to be used without PCG_LOCK.
>   * Operations are called by routine of global LRU independently from memcg.
> @@ -371,22 +372,24 @@ static int mem_cgroup_walk_tree(struct mem_cgroup *root, void *data,
>  void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
>  {
>  	struct page_cgroup *pc;
> -	struct mem_cgroup *mem;
>  	struct mem_cgroup_per_zone *mz;
>  
>  	if (mem_cgroup_disabled())
>  		return;
>  	pc = lookup_page_cgroup(page);
>  	/* can happen while we handle swapcache. */
> -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> +	if (!TestClearPageCgroupAcctLRU(pc))
>  		return;
> +	VM_BUG_ON(!pc->mem_cgroup);
>  	/*
>  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
>  	 * removed from global LRU.
>  	 */
>  	mz = page_cgroup_zoneinfo(pc);
> -	mem = pc->mem_cgroup;
>  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> +	if (mem_cgroup_is_root(pc->mem_cgroup))
> +		return;
> +	VM_BUG_ON(list_empty(&pc->lru));
>  	list_del_init(&pc->lru);
>  	return;
>  }
> @@ -410,8 +413,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
>  	 */
>  	smp_rmb();
> -	/* unused page is not rotated. */
> -	if (!PageCgroupUsed(pc))
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
>  		return;
>  	mz = page_cgroup_zoneinfo(pc);
>  	list_move(&pc->lru, &mz->lists[lru]);
> @@ -425,6 +428,7 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  	if (mem_cgroup_disabled())
>  		return;
>  	pc = lookup_page_cgroup(page);
> +	VM_BUG_ON(PageCgroupAcctLRU(pc));
>  	/*
>  	 * Used bit is set without atomic ops but after smp_wmb().
>  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> @@ -435,6 +439,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
>  
>  	mz = page_cgroup_zoneinfo(pc);
>  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> +	SetPageCgroupAcctLRU(pc);
> +	if (mem_cgroup_is_root(pc->mem_cgroup))
> +		return;
>  	list_add(&pc->lru, &mz->lists[lru]);
>  }
>  
> @@ -469,7 +476,7 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
>  
>  	spin_lock_irqsave(&zone->lru_lock, flags);
>  	/* link when the page is linked to LRU but page_cgroup isn't */
> -	if (PageLRU(page) && list_empty(&pc->lru))
> +	if (PageLRU(page) && !PageCgroupAcctLRU(pc))
>  		mem_cgroup_add_lru_list(page, page_lru(page));
>  	spin_unlock_irqrestore(&zone->lru_lock, flags);
>  }
> @@ -1106,9 +1113,22 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
>  		css_put(&mem->css);
>  		return;
>  	}
> +
>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		ClearPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
>  
>  	mem_cgroup_charge_statistics(mem, pc, true);
>  
> @@ -2047,6 +2067,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
>  	name = MEMFILE_ATTR(cft->private);
>  	switch (name) {
>  	case RES_LIMIT:
> +		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> +			ret = -EINVAL;
> +			break;
> +		}
>  		/* This function does all necessary parse...reuse it */
>  		ret = res_counter_memparse_write_strategy(buffer, &val);
>  		if (ret)
> @@ -2513,6 +2537,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	if (cont->parent == NULL) {
>  		enable_swap_cgroup();
>  		parent = NULL;
> +		root_mem_cgroup = mem;
>  	} else {
>  		parent = mem_cgroup_from_cont(cont->parent);
>  		mem->use_hierarchy = parent->use_hierarchy;
> @@ -2541,6 +2566,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
>  	return &mem->css;
>  free_out:
>  	__mem_cgroup_free(mem);
> +	root_mem_cgroup = NULL;
>  	return ERR_PTR(error);
>  }
>  
>  
> ===
> 
> 
> Thanks,
> Daisuke Nishimura.
> 
> > 4. More LRU functions are aware of PageCgroupAcctLRU
> > 
> > Changelog v3 -> v2
> > 
> > 1. Rebase to mmotm 2nd June 2009
> > 2. Test with some of the test cases recommended by Daisuke-San
> > 
> > Changelog v2 -> v1
> > 1. Rebase to latest mmotm
> > 
> > This patch changes the memory cgroup and removes the overhead associated
> > with accounting all pages in the root cgroup. As a side-effect, we can
> > no longer set a memory hard limit in the root cgroup.
> > 
> > A new flag to track whether the page has been accounted or not
> > has been added as well. Flags are now set atomically for page_cgroup,
> > 
> > Tests:
> > 
> > Results (for v2)
> > 
> > Obtained by
> > 
> > 1. Using tmpfs for mounting filesystem
> > 2. Changing sync to be /bin/true (so that sync is not the bottleneck)
> > 3. Used -s #cpus*40 -e #cpus*40
> > 
> > Reaim
> > 		withoutpatch	patch
> > AIM9		9532.48		9807.59
> > dbase		19344.60	19285.71
> > new_dbase	20101.65	20163.13
> > shared		11827.77	11886.65
> > compute		17317.38	17420.05
> > 
> > Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> > ---
> > 
> >  include/linux/page_cgroup.h |    5 ++++
> >  mm/memcontrol.c             |   59 ++++++++++++++++++++++++++++++++++++-------
> >  2 files changed, 54 insertions(+), 10 deletions(-)
> > 
> > 
> > diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
> > index 7339c7b..57c4d50 100644
> > --- a/include/linux/page_cgroup.h
> > +++ b/include/linux/page_cgroup.h
> > @@ -26,6 +26,7 @@ enum {
> >  	PCG_LOCK,  /* page cgroup is locked */
> >  	PCG_CACHE, /* charged as cache */
> >  	PCG_USED, /* this object is in use. */
> > +	PCG_ACCT_LRU, /* page has been accounted for */
> >  };
> >  
> >  #define TESTPCGFLAG(uname, lname)			\
> > @@ -46,6 +47,10 @@ TESTPCGFLAG(Cache, CACHE)
> >  TESTPCGFLAG(Used, USED)
> >  CLEARPCGFLAG(Used, USED)
> >  
> > +SETPCGFLAG(AcctLRU, ACCT_LRU)
> > +CLEARPCGFLAG(AcctLRU, ACCT_LRU)
> > +TESTPCGFLAG(AcctLRU, ACCT_LRU)
> > +
> >  static inline int page_cgroup_nid(struct page_cgroup *pc)
> >  {
> >  	return page_to_nid(pc->page);
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index 6ceb6f2..399d416 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -43,6 +43,7 @@
> >  
> >  struct cgroup_subsys mem_cgroup_subsys __read_mostly;
> >  #define MEM_CGROUP_RECLAIM_RETRIES	5
> > +struct mem_cgroup *root_mem_cgroup __read_mostly;
> >  
> >  #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
> >  /* Turned on only when memory cgroup is enabled && really_do_swap_account = 1 */
> > @@ -219,6 +220,11 @@ static void mem_cgroup_get(struct mem_cgroup *mem);
> >  static void mem_cgroup_put(struct mem_cgroup *mem);
> >  static struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *mem);
> >  
> > +static inline bool mem_cgroup_is_root(struct mem_cgroup *mem)
> > +{
> > +	return (mem == root_mem_cgroup);
> > +}
> > +
> >  static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
> >  					 struct page_cgroup *pc,
> >  					 bool charge)
> > @@ -378,15 +384,25 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> >  	/* can happen while we handle swapcache. */
> > -	if (list_empty(&pc->lru) || !pc->mem_cgroup)
> > +	mem = pc->mem_cgroup;
> > +	if (!mem)
> > +		return;
> > +	if (mem_cgroup_is_root(mem)) {
> > +		if (!PageCgroupAcctLRU(pc))
> > +			return;
> > +	} else if (list_empty(&pc->lru))
> >  		return;
> > +
> >  	/*
> >  	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
> >  	 * removed from global LRU.
> >  	 */
> >  	mz = page_cgroup_zoneinfo(pc);
> > -	mem = pc->mem_cgroup;
> >  	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
> > +	if (PageCgroupAcctLRU(pc)) {
> > +		ClearPageCgroupAcctLRU(pc);
> > +		return;
> > +	}
> >  	list_del_init(&pc->lru);
> >  	return;
> >  }
> > @@ -410,8 +426,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
> >  	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> >  	 */
> >  	smp_rmb();
> > -	/* unused page is not rotated. */
> > -	if (!PageCgroupUsed(pc))
> > +	/* unused or root page is not rotated. */
> > +	if (!PageCgroupUsed(pc) || PageCgroupAcctLRU(pc))
> >  		return;
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	list_move(&pc->lru, &mz->lists[lru]);
> > @@ -435,6 +451,10 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
> >  
> >  	mz = page_cgroup_zoneinfo(pc);
> >  	MEM_CGROUP_ZSTAT(mz, lru) += 1;
> > +	if (mem_cgroup_is_root(pc->mem_cgroup)) {
> > +		SetPageCgroupAcctLRU(pc);
> > +		return;
> > +	}
> >  	list_add(&pc->lru, &mz->lists[lru]);
> >  }
> >  
> > @@ -445,12 +465,15 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
> >   * it again. This function is only used to charge SwapCache. It's done under
> >   * lock_page and expected that zone->lru_lock is never held.
> >   */
> > -static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> > +static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page,
> > +							struct page_cgroup *pc)
> >  {
> >  	unsigned long flags;
> >  	struct zone *zone = page_zone(page);
> > -	struct page_cgroup *pc = lookup_page_cgroup(page);
> >  
> > +	if (!pc->mem_cgroup ||
> > +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> > +		return;
> >  	spin_lock_irqsave(&zone->lru_lock, flags);
> >  	/*
> >  	 * Forget old LRU when this page_cgroup is *not* used. This Used bit
> > @@ -461,12 +484,15 @@ static void mem_cgroup_lru_del_before_commit_swapcache(struct page *page)
> >  	spin_unlock_irqrestore(&zone->lru_lock, flags);
> >  }
> >  
> > -static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> > +static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page,
> > +							struct page_cgroup *pc)
> >  {
> >  	unsigned long flags;
> >  	struct zone *zone = page_zone(page);
> > -	struct page_cgroup *pc = lookup_page_cgroup(page);
> >  
> > +	if (!pc->mem_cgroup ||
> > +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> > +		return;
> >  	spin_lock_irqsave(&zone->lru_lock, flags);
> >  	/* link when the page is linked to LRU but page_cgroup isn't */
> >  	if (PageLRU(page) && list_empty(&pc->lru))
> > @@ -478,8 +504,13 @@ static void mem_cgroup_lru_add_after_commit_swapcache(struct page *page)
> >  void mem_cgroup_move_lists(struct page *page,
> >  			   enum lru_list from, enum lru_list to)
> >  {
> > +	struct page_cgroup *pc = lookup_page_cgroup(page);
> >  	if (mem_cgroup_disabled())
> >  		return;
> > +	smp_rmb();
> > +	if (!pc->mem_cgroup ||
> > +		(!PageCgroupAcctLRU(pc) && mem_cgroup_is_root(pc->mem_cgroup)))
> > +		return;
> >  	mem_cgroup_del_lru_list(page, from);
> >  	mem_cgroup_add_lru_list(page, to);
> >  }
> > @@ -1114,6 +1145,7 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
> >  		css_put(&mem->css);
> >  		return;
> >  	}
> > +
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> >  	pc->flags = pcg_default_flags[ctype];
> > @@ -1418,9 +1450,10 @@ __mem_cgroup_commit_charge_swapin(struct page *page, struct mem_cgroup *ptr,
> >  	if (!ptr)
> >  		return;
> >  	pc = lookup_page_cgroup(page);
> > -	mem_cgroup_lru_del_before_commit_swapcache(page);
> > +	smp_rmb();
> > +	mem_cgroup_lru_del_before_commit_swapcache(page, pc);
> >  	__mem_cgroup_commit_charge(ptr, pc, ctype);
> > -	mem_cgroup_lru_add_after_commit_swapcache(page);
> > +	mem_cgroup_lru_add_after_commit_swapcache(page, pc);
> >  	/*
> >  	 * Now swap is on-memory. This means this page may be
> >  	 * counted both as mem and swap....double count.
> > @@ -2055,6 +2088,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
> >  	name = MEMFILE_ATTR(cft->private);
> >  	switch (name) {
> >  	case RES_LIMIT:
> > +		if (mem_cgroup_is_root(memcg)) { /* Can't set limit on root */
> > +			ret = -EINVAL;
> > +			break;
> > +		}
> >  		/* This function does all necessary parse...reuse it */
> >  		ret = res_counter_memparse_write_strategy(buffer, &val);
> >  		if (ret)
> > @@ -2521,6 +2558,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >  	if (cont->parent == NULL) {
> >  		enable_swap_cgroup();
> >  		parent = NULL;
> > +		root_mem_cgroup = mem;
> >  	} else {
> >  		parent = mem_cgroup_from_cont(cont->parent);
> >  		mem->use_hierarchy = parent->use_hierarchy;
> > @@ -2549,6 +2587,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
> >  	return &mem->css;
> >  free_out:
> >  	__mem_cgroup_free(mem);
> > +	root_mem_cgroup = NULL;
> >  	return ERR_PTR(error);
> >  }
> >  
> > 
> > -- 
> > 	Balbir
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  2:23               ` KAMEZAWA Hiroyuki
@ 2009-06-15  2:44                 ` Balbir Singh
  0 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-06-15  2:44 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Daisuke Nishimura, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro

KAMEZAWA Hiroyuki wrote:
> On Mon, 15 Jun 2009 11:18:17 +0900
> Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> wrote:
> 
>> On Mon, 15 Jun 2009 00:07:40 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>> Here is v4 of the patches, please review and comment
>>>
>>> Feature: Remove the overhead associated with the root cgroup
>>>
>>> From: Balbir Singh <balbir@linux.vnet.ibm.com>
>>>
>>> changelog v4 -> v3
>>> 1. Rebase to mmotm 9th june 2009
>>> 2. Remove PageCgroupRoot, we have account LRU flags to indicate that
>>>    we do only accounting and no reclaim.
>> hmm, I prefer the previous version of PCG_ACCT_LRU meaning. It can be
>> used to remove annoying list_empty(&pc->lru) and !pc->mem_cgroup checks.
>>
>>> 3. pcg_default_flags has been used again, since PCGF_ROOT is gone,
>>>    we set PCGF_ACCT_LRU only in mem_cgroup_add_lru_list
>> It might be safe, but I don't think it's a good idea to touch PCGF_ACCT_LRU
>> outside of zone->lru_lock.
>>
>> IMHO, the most complicated case is a SwapCache which has been read ahead by
>> a *different* cpu from the cpu doing do_swap_page(). Those SwapCache can be
>> on page_vec and be drained to LRU asymmetrically with do_swap_page().
>> Well, yes it would be safe just because PCGF_ACCT_LRU would not be set
>> if PCGF_USED has not been set, but I don't think it's a good idea to touch
>> PCGF_ACCT_LRU outside of zone->lru_lock anyway.
>>
>>
>> Doesn't a patch like below work for you ?
>> Lightly tested under global memory pressure(w/o memcg's memory pressure)
>> on a small machine(just a bit modified from then though).
>>

OK, so you like the older meaning and implementation, the code seems fine to me,
I like the removal of list_empty() checks that you and Kame have proposed.


-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  2:18             ` Daisuke Nishimura
  2009-06-15  2:23               ` KAMEZAWA Hiroyuki
@ 2009-06-15  3:00               ` Balbir Singh
  2009-06-15  3:09                 ` Daisuke Nishimura
  1 sibling, 1 reply; 41+ messages in thread
From: Balbir Singh @ 2009-06-15  3:00 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro

Daisuke Nishimura wrote:

>  	pc->mem_cgroup = mem;
>  	smp_wmb();
> -	pc->flags = pcg_default_flags[ctype];

pc->flags needs to be reset here, otherwise we have the danger the carrying over
older bits. I'll merge your changes and test.


-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  3:00               ` Balbir Singh
@ 2009-06-15  3:09                 ` Daisuke Nishimura
  2009-06-15  3:22                   ` Balbir Singh
  0 siblings, 1 reply; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-15  3:09 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro, Daisuke Nishimura

On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Daisuke Nishimura wrote:
> 
> >  	pc->mem_cgroup = mem;
> >  	smp_wmb();
> > -	pc->flags = pcg_default_flags[ctype];
> 
> pc->flags needs to be reset here, otherwise we have the danger the carrying over
> older bits. I'll merge your changes and test.
> 
hmm, why ?

I do in my patch:

+	switch (ctype) {
+	case MEM_CGROUP_CHARGE_TYPE_CACHE:
+	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
+		SetPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
+		ClearPageCgroupCache(pc);
+		SetPageCgroupUsed(pc);
+		break;
+	default:
+		break;
+	}

So, all the necessary flags are set and all the unnecessary ones are cleared, right ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  3:09                 ` Daisuke Nishimura
@ 2009-06-15  3:22                   ` Balbir Singh
  2009-06-15  3:46                     ` Daisuke Nishimura
  0 siblings, 1 reply; 41+ messages in thread
From: Balbir Singh @ 2009-06-15  3:22 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro

Daisuke Nishimura wrote:
> On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> Daisuke Nishimura wrote:
>>
>>>  	pc->mem_cgroup = mem;
>>>  	smp_wmb();
>>> -	pc->flags = pcg_default_flags[ctype];
>> pc->flags needs to be reset here, otherwise we have the danger the carrying over
>> older bits. I'll merge your changes and test.
>>
> hmm, why ?
> 
> I do in my patch:
> 
> +	switch (ctype) {
> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> +		SetPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> +		ClearPageCgroupCache(pc);
> +		SetPageCgroupUsed(pc);
> +		break;
> +	default:
> +		break;
> +	}
> 

Yes, I did that in the older code, what I was suggesting was just an additional
step to ensure that in the future if we add new flags, we don't end up with a
long list of initializations and clearing or if we forget to clear pc->flags and
reuse the page_cgroup, it might be a problem. My message was confusing, it
should have been resetting the pc->flags will provide protection for any future
addition of flags.

I am testing your patch which is the modified version of v3 with your changes
and have your signed-off-by in it as well as I post v5. Is that OK?

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  3:22                   ` Balbir Singh
@ 2009-06-15  3:46                     ` Daisuke Nishimura
  2009-06-15  4:22                       ` Balbir Singh
  0 siblings, 1 reply; 41+ messages in thread
From: Daisuke Nishimura @ 2009-06-15  3:46 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro, Daisuke Nishimura

On Mon, 15 Jun 2009 08:52:56 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> Daisuke Nishimura wrote:
> > On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> >> Daisuke Nishimura wrote:
> >>
> >>>  	pc->mem_cgroup = mem;
> >>>  	smp_wmb();
> >>> -	pc->flags = pcg_default_flags[ctype];
> >> pc->flags needs to be reset here, otherwise we have the danger the carrying over
> >> older bits. I'll merge your changes and test.
> >>
> > hmm, why ?
> > 
> > I do in my patch:
> > 
> > +	switch (ctype) {
> > +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
> > +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
> > +		SetPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
> > +		ClearPageCgroupCache(pc);
> > +		SetPageCgroupUsed(pc);
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > 
> 
> Yes, I did that in the older code, what I was suggesting was just an additional
> step to ensure that in the future if we add new flags, we don't end up with a
> long list of initializations and clearing or if we forget to clear pc->flags and
> reuse the page_cgroup, it might be a problem. My message was confusing, it
> should have been resetting the pc->flags will provide protection for any future
> addition of flags.
> 
O.K. I see your point.

But we shouldn't touch PCG_ACCT_LRU flag here. IIUC, that's why we abandon
pcg_default_flags[]. Please take care of it.

> I am testing your patch which is the modified version of v3 with your changes
> and have your signed-off-by in it as well as I post v5. Is that OK?
> 
Sure :)


Thanks,
Daisuke Nishimura.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Low overhead patches for the memory cgroup controller (v4)
  2009-06-15  3:46                     ` Daisuke Nishimura
@ 2009-06-15  4:22                       ` Balbir Singh
  0 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-06-15  4:22 UTC (permalink / raw)
  To: Daisuke Nishimura
  Cc: KAMEZAWA Hiroyuki, Andrew Morton, linux-mm, lizf, menage,
	KOSAKI Motohiro

Daisuke Nishimura wrote:
> On Mon, 15 Jun 2009 08:52:56 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>> Daisuke Nishimura wrote:
>>> On Mon, 15 Jun 2009 08:30:06 +0530, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
>>>> Daisuke Nishimura wrote:
>>>>
>>>>>  	pc->mem_cgroup = mem;
>>>>>  	smp_wmb();
>>>>> -	pc->flags = pcg_default_flags[ctype];
>>>> pc->flags needs to be reset here, otherwise we have the danger the carrying over
>>>> older bits. I'll merge your changes and test.
>>>>
>>> hmm, why ?
>>>
>>> I do in my patch:
>>>
>>> +	switch (ctype) {
>>> +	case MEM_CGROUP_CHARGE_TYPE_CACHE:
>>> +	case MEM_CGROUP_CHARGE_TYPE_SHMEM:
>>> +		SetPageCgroupCache(pc);
>>> +		SetPageCgroupUsed(pc);
>>> +		break;
>>> +	case MEM_CGROUP_CHARGE_TYPE_MAPPED:
>>> +		ClearPageCgroupCache(pc);
>>> +		SetPageCgroupUsed(pc);
>>> +		break;
>>> +	default:
>>> +		break;
>>> +	}
>>>
>> Yes, I did that in the older code, what I was suggesting was just an additional
>> step to ensure that in the future if we add new flags, we don't end up with a
>> long list of initializations and clearing or if we forget to clear pc->flags and
>> reuse the page_cgroup, it might be a problem. My message was confusing, it
>> should have been resetting the pc->flags will provide protection for any future
>> addition of flags.
>>
> O.K. I see your point.
> 
> But we shouldn't touch PCG_ACCT_LRU flag here. IIUC, that's why we abandon
> pcg_default_flags[]. Please take care of it.
> 

I am keeping the pc->flags removed as in the earlier patch, but something to
keep in mind as we review further changes to the flags field.

>> I am testing your patch which is the modified version of v3 with your changes
>> and have your signed-off-by in it as well as I post v5. Is that OK?
>>
> Sure :)
> 

Just sending it out, now, Thanks!

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-15 15:18 ` Balbir Singh
  0 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-15 15:18 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, KAMEZAWA Hiroyuki, nishimura, lizf,
	menage, KOSAKI Motohiro

Feature: Remove the overhead associated with the root cgroup

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch changes the memory cgroup and removes the overhead associated
with LRU maintenance of all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag is used to track page_cgroup associated with the root cgroup
pages. A new flag to track whether the page has been accounted or not
has been added as well.

Review comments higly appreciated

Tests

1. Tested with allocate, touch and limit test case for a non-root cgroup
2. For the root cgroup tested performance impact with reaim


		+patch		mmtom-08-may-2009
AIM9		1362.93		1338.17
Dbase		17457.75	16021.58
New Dbase	18070.18	16518.54
Shared		9681.85		8882.11
Compute		16197.79	15226.13

3. Tested accounting in root cgroup to make sure it looks sane and
correct.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |   10 ++++++++++
 mm/memcontrol.c             |   29 ++++++++++++++++++++++++++---
 mm/page_cgroup.c            |    1 -
 3 files changed, 36 insertions(+), 4 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..8b85752 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,8 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ROOT, /* page belongs to root cgroup */
+	PCG_ACCT, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE)
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
 
+SETPCGFLAG(Root, ROOT)
+CLEARPCGFLAG(Root, ROOT)
+TESTPCGFLAG(Root, ROOT)
+
+SETPCGFLAG(Acct, ACCT)
+CLEARPCGFLAG(Acct, ACCT)
+TESTPCGFLAG(Acct, ACCT)
+
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
 	return page_to_nid(pc->page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9712ef7..18d2819 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */
@@ -196,6 +197,10 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
+/* Not used, but added here for completeness */
+#define PCGF_ROOT	(1UL << PCG_ROOT)
+#define PCGF_ACCT	(1UL << PCG_ACCT)
+
 static const unsigned long
 pcg_default_flags[NR_CHARGE_TYPE] = {
 	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
@@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
 		return;
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
@@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 	mz = page_cgroup_zoneinfo(pc);
 	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	ClearPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_del_init(&pc->lru);
 	return;
 }
@@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -1114,10 +1125,14 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
 	pc->flags = pcg_default_flags[ctype];
 
+	if (mem == root_mem_cgroup)
+		SetPageCgroupRoot(pc);
+
 	mem_cgroup_charge_statistics(mem, pc, true);
 
 	unlock_page_cgroup(pc);
@@ -1521,6 +1536,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	mem_cgroup_charge_statistics(mem, pc, false);
 
 	ClearPageCgroupUsed(pc);
+	if (mem == root_mem_cgroup)
+		ClearPageCgroupRoot(pc);
 	/*
 	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
@@ -2038,6 +2055,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2504,6 +2525,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2532,6 +2554,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 09b73c5..6145ff6 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 
 #endif
 
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
 static DEFINE_MUTEX(swap_cgroup_mutex);

-- 
	Balbir

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [RFC] Low overhead patches for the memory cgroup controller (v2)
@ 2009-05-15 15:18 ` Balbir Singh
  0 siblings, 0 replies; 41+ messages in thread
From: Balbir Singh @ 2009-05-15 15:18 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, Andrew Morton, KAMEZAWA Hiroyuki, nishimura, lizf,
	menage, KOSAKI Motohiro

Feature: Remove the overhead associated with the root cgroup

From: Balbir Singh <balbir@linux.vnet.ibm.com>

This patch changes the memory cgroup and removes the overhead associated
with LRU maintenance of all pages in the root cgroup. As a side-effect, we can
no longer set a memory hard limit in the root cgroup.

A new flag is used to track page_cgroup associated with the root cgroup
pages. A new flag to track whether the page has been accounted or not
has been added as well.

Review comments higly appreciated

Tests

1. Tested with allocate, touch and limit test case for a non-root cgroup
2. For the root cgroup tested performance impact with reaim


		+patch		mmtom-08-may-2009
AIM9		1362.93		1338.17
Dbase		17457.75	16021.58
New Dbase	18070.18	16518.54
Shared		9681.85		8882.11
Compute		16197.79	15226.13

3. Tested accounting in root cgroup to make sure it looks sane and
correct.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/page_cgroup.h |   10 ++++++++++
 mm/memcontrol.c             |   29 ++++++++++++++++++++++++++---
 mm/page_cgroup.c            |    1 -
 3 files changed, 36 insertions(+), 4 deletions(-)


diff --git a/include/linux/page_cgroup.h b/include/linux/page_cgroup.h
index 7339c7b..8b85752 100644
--- a/include/linux/page_cgroup.h
+++ b/include/linux/page_cgroup.h
@@ -26,6 +26,8 @@ enum {
 	PCG_LOCK,  /* page cgroup is locked */
 	PCG_CACHE, /* charged as cache */
 	PCG_USED, /* this object is in use. */
+	PCG_ROOT, /* page belongs to root cgroup */
+	PCG_ACCT, /* page has been accounted for */
 };
 
 #define TESTPCGFLAG(uname, lname)			\
@@ -46,6 +48,14 @@ TESTPCGFLAG(Cache, CACHE)
 TESTPCGFLAG(Used, USED)
 CLEARPCGFLAG(Used, USED)
 
+SETPCGFLAG(Root, ROOT)
+CLEARPCGFLAG(Root, ROOT)
+TESTPCGFLAG(Root, ROOT)
+
+SETPCGFLAG(Acct, ACCT)
+CLEARPCGFLAG(Acct, ACCT)
+TESTPCGFLAG(Acct, ACCT)
+
 static inline int page_cgroup_nid(struct page_cgroup *pc)
 {
 	return page_to_nid(pc->page);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 9712ef7..18d2819 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -43,6 +43,7 @@
 
 struct cgroup_subsys mem_cgroup_subsys __read_mostly;
 #define MEM_CGROUP_RECLAIM_RETRIES	5
+struct mem_cgroup *root_mem_cgroup __read_mostly;
 
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 /* Turned on only when memory cgroup is enabled && really_do_swap_account = 0 */
@@ -196,6 +197,10 @@ enum charge_type {
 #define PCGF_CACHE	(1UL << PCG_CACHE)
 #define PCGF_USED	(1UL << PCG_USED)
 #define PCGF_LOCK	(1UL << PCG_LOCK)
+/* Not used, but added here for completeness */
+#define PCGF_ROOT	(1UL << PCG_ROOT)
+#define PCGF_ACCT	(1UL << PCG_ACCT)
+
 static const unsigned long
 pcg_default_flags[NR_CHARGE_TYPE] = {
 	PCGF_CACHE | PCGF_USED | PCGF_LOCK, /* File Cache */
@@ -420,7 +425,7 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 		return;
 	pc = lookup_page_cgroup(page);
 	/* can happen while we handle swapcache. */
-	if (list_empty(&pc->lru) || !pc->mem_cgroup)
+	if ((!PageCgroupAcct(pc) && list_empty(&pc->lru)) || !pc->mem_cgroup)
 		return;
 	/*
 	 * We don't check PCG_USED bit. It's cleared when the "page" is finally
@@ -429,6 +434,9 @@ void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru)
 	mz = page_cgroup_zoneinfo(pc);
 	mem = pc->mem_cgroup;
 	MEM_CGROUP_ZSTAT(mz, lru) -= 1;
+	ClearPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_del_init(&pc->lru);
 	return;
 }
@@ -452,8 +460,8 @@ void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
 	 */
 	smp_rmb();
-	/* unused page is not rotated. */
-	if (!PageCgroupUsed(pc))
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || PageCgroupRoot(pc))
 		return;
 	mz = page_cgroup_zoneinfo(pc);
 	list_move(&pc->lru, &mz->lists[lru]);
@@ -477,6 +485,9 @@ void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru)
 
 	mz = page_cgroup_zoneinfo(pc);
 	MEM_CGROUP_ZSTAT(mz, lru) += 1;
+	SetPageCgroupAcct(pc);
+	if (PageCgroupRoot(pc))
+		return;
 	list_add(&pc->lru, &mz->lists[lru]);
 }
 
@@ -1114,10 +1125,14 @@ static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
 		css_put(&mem->css);
 		return;
 	}
+
 	pc->mem_cgroup = mem;
 	smp_wmb();
 	pc->flags = pcg_default_flags[ctype];
 
+	if (mem == root_mem_cgroup)
+		SetPageCgroupRoot(pc);
+
 	mem_cgroup_charge_statistics(mem, pc, true);
 
 	unlock_page_cgroup(pc);
@@ -1521,6 +1536,8 @@ __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype)
 	mem_cgroup_charge_statistics(mem, pc, false);
 
 	ClearPageCgroupUsed(pc);
+	if (mem == root_mem_cgroup)
+		ClearPageCgroupRoot(pc);
 	/*
 	 * pc->mem_cgroup is not cleared here. It will be accessed when it's
 	 * freed from LRU. This is safe because uncharged page is expected not
@@ -2038,6 +2055,10 @@ static int mem_cgroup_write(struct cgroup *cont, struct cftype *cft,
 	name = MEMFILE_ATTR(cft->private);
 	switch (name) {
 	case RES_LIMIT:
+		if (memcg == root_mem_cgroup) { /* Can't set limit on root */
+			ret = -EINVAL;
+			break;
+		}
 		/* This function does all necessary parse...reuse it */
 		ret = res_counter_memparse_write_strategy(buffer, &val);
 		if (ret)
@@ -2504,6 +2525,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	if (cont->parent == NULL) {
 		enable_swap_cgroup();
 		parent = NULL;
+		root_mem_cgroup = mem;
 	} else {
 		parent = mem_cgroup_from_cont(cont->parent);
 		mem->use_hierarchy = parent->use_hierarchy;
@@ -2532,6 +2554,7 @@ mem_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cont)
 	return &mem->css;
 free_out:
 	__mem_cgroup_free(mem);
+	root_mem_cgroup = NULL;
 	return ERR_PTR(error);
 }
 
diff --git a/mm/page_cgroup.c b/mm/page_cgroup.c
index 09b73c5..6145ff6 100644
--- a/mm/page_cgroup.c
+++ b/mm/page_cgroup.c
@@ -276,7 +276,6 @@ void __meminit pgdat_page_cgroup_init(struct pglist_data *pgdat)
 
 #endif
 
-
 #ifdef CONFIG_CGROUP_MEM_RES_CTLR_SWAP
 
 static DEFINE_MUTEX(swap_cgroup_mutex);

-- 
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2009-06-15  4:22 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-15 17:45 [RFC] Low overhead patches for the memory cgroup controller (v2) KAMEZAWA Hiroyuki
2009-05-15 17:45 ` KAMEZAWA Hiroyuki
2009-05-15 18:16 ` Balbir Singh
2009-05-15 18:16   ` Balbir Singh
2009-05-18 10:11   ` KAMEZAWA Hiroyuki
2009-05-18 10:11     ` KAMEZAWA Hiroyuki
2009-05-18 10:45     ` Balbir Singh
2009-05-18 10:45       ` Balbir Singh
2009-05-18 16:01       ` KAMEZAWA Hiroyuki
2009-05-18 16:01         ` KAMEZAWA Hiroyuki
2009-05-19 13:18         ` Balbir Singh
2009-05-31 23:51     ` Balbir Singh
2009-06-01 23:57       ` KAMEZAWA Hiroyuki
2009-06-05  5:31         ` Low overhead patches for the memory cgroup controller (v3) Balbir Singh
2009-06-05  5:51           ` KAMEZAWA Hiroyuki
2009-06-05  9:33             ` Balbir Singh
2009-06-08  0:20               ` Daisuke Nishimura
2009-06-05  6:05           ` Daisuke Nishimura
2009-06-05  9:47             ` Balbir Singh
2009-06-08  0:03               ` Daisuke Nishimura
2009-06-05  6:43           ` Daisuke Nishimura
2009-06-14 18:37           ` Low overhead patches for the memory cgroup controller (v4) Balbir Singh
2009-06-15  2:04             ` KAMEZAWA Hiroyuki
2009-06-15  2:18             ` Daisuke Nishimura
2009-06-15  2:23               ` KAMEZAWA Hiroyuki
2009-06-15  2:44                 ` Balbir Singh
2009-06-15  3:00               ` Balbir Singh
2009-06-15  3:09                 ` Daisuke Nishimura
2009-06-15  3:22                   ` Balbir Singh
2009-06-15  3:46                     ` Daisuke Nishimura
2009-06-15  4:22                       ` Balbir Singh
2009-05-17  4:15 ` [RFC] Low overhead patches for the memory cgroup controller (v2) Balbir Singh
2009-05-17  4:15   ` Balbir Singh
2009-06-01  4:25   ` Daisuke Nishimura
2009-06-01  4:25     ` Daisuke Nishimura
2009-06-01  5:01     ` Daisuke Nishimura
2009-06-01  5:01       ` Daisuke Nishimura
2009-06-01  5:49     ` Balbir Singh
2009-06-01  5:49       ` Balbir Singh
  -- strict thread matches above, loose matches on Subject: below --
2009-05-15 15:18 Balbir Singh
2009-05-15 15:18 ` Balbir Singh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.