linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
@ 2012-07-19 23:34 Tim Chen
  2012-07-20  3:19 ` Kamezawa Hiroyuki
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Tim Chen @ 2012-07-19 23:34 UTC (permalink / raw)
  To: Andrew Morton, Mel Gorman, KAMEZAWA Hiroyuki, Minchan Kim,
	Johannes Weiner
  Cc: Kirill A. Shutemov, andi.kleen, linux-mm, linux-kernel

Hi,

I noticed in a multi-process parallel files reading benchmark I ran on a
8 socket machine,  throughput slowed down by a factor of 8 when I ran
the benchmark within a cgroup container.  I traced the problem to the
following code path (see below) when we are trying to reclaim memory
from file cache.  The res_counter_uncharge function is called on every
page that's reclaimed and created heavy lock contention.  The patch
below allows the reclaimed pages to be uncharged from the resource
counter in batch and recovered the regression. 

Tim

     40.67%           usemem  [kernel.kallsyms]                   [k] _raw_spin_lock
                      |
                      --- _raw_spin_lock
                         |
                         |--92.61%-- res_counter_uncharge
                         |          |
                         |          |--100.00%-- __mem_cgroup_uncharge_common
                         |          |          |
                         |          |          |--100.00%-- mem_cgroup_uncharge_cache_page
                         |          |          |          __remove_mapping
                         |          |          |          shrink_page_list
                         |          |          |          shrink_inactive_list
                         |          |          |          shrink_mem_cgroup_zone
                         |          |          |          shrink_zone
                         |          |          |          do_try_to_free_pages
                         |          |          |          try_to_free_pages
                         |          |          |          __alloc_pages_nodemask
                         |          |          |          alloc_pages_current


---
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 33dc256..aac5672 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -779,6 +779,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 	cond_resched();
 
+	mem_cgroup_uncharge_start();
 	while (!list_empty(page_list)) {
 		enum page_references references;
 		struct address_space *mapping;
@@ -1026,6 +1027,7 @@ keep_lumpy:
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
+	mem_cgroup_uncharge_end();
 	*ret_nr_dirty += nr_dirty;
 	*ret_nr_writeback += nr_writeback;
 	return nr_reclaimed;



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-19 23:34 [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list Tim Chen
@ 2012-07-20  3:19 ` Kamezawa Hiroyuki
  2012-07-20  4:25   ` Minchan Kim
  2012-07-20 16:38   ` Tim Chen
  2012-07-20  6:27 ` Johannes Weiner
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Kamezawa Hiroyuki @ 2012-07-20  3:19 UTC (permalink / raw)
  To: Tim Chen
  Cc: Andrew Morton, Mel Gorman, Minchan Kim, Johannes Weiner,
	Kirill A. Shutemov, andi.kleen, linux-mm, linux-kernel

(2012/07/20 8:34), Tim Chen wrote:
> Hi,
>
> I noticed in a multi-process parallel files reading benchmark I ran on a
> 8 socket machine,  throughput slowed down by a factor of 8 when I ran
> the benchmark within a cgroup container.  I traced the problem to the
> following code path (see below) when we are trying to reclaim memory
> from file cache.  The res_counter_uncharge function is called on every
> page that's reclaimed and created heavy lock contention.  The patch
> below allows the reclaimed pages to be uncharged from the resource
> counter in batch and recovered the regression.
>
> Tim
>
>       40.67%           usemem  [kernel.kallsyms]                   [k] _raw_spin_lock
>                        |
>                        --- _raw_spin_lock
>                           |
>                           |--92.61%-- res_counter_uncharge
>                           |          |
>                           |          |--100.00%-- __mem_cgroup_uncharge_common
>                           |          |          |
>                           |          |          |--100.00%-- mem_cgroup_uncharge_cache_page
>                           |          |          |          __remove_mapping
>                           |          |          |          shrink_page_list
>                           |          |          |          shrink_inactive_list
>                           |          |          |          shrink_mem_cgroup_zone
>                           |          |          |          shrink_zone
>                           |          |          |          do_try_to_free_pages
>                           |          |          |          try_to_free_pages
>                           |          |          |          __alloc_pages_nodemask
>                           |          |          |          alloc_pages_current
>
>

Thank you very much !!

When I added batching, I didn't touch page-reclaim path because it delays
res_counter_uncharge() and make more threads run into page reclaim.
But, from above score, bactching seems required.

And because of current design of per-zone-per-memcg-LRU, batching
works very very well....all lru pages shrink_page_list() scans are on
the same memcg.

BTW, it's better to show 'how much improved' in patch description..


> ---
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 33dc256..aac5672 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -779,6 +779,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>
>   	cond_resched();
>
> +	mem_cgroup_uncharge_start();
>   	while (!list_empty(page_list)) {
>   		enum page_references references;
>   		struct address_space *mapping;
> @@ -1026,6 +1027,7 @@ keep_lumpy:
>
>   	list_splice(&ret_pages, page_list);
>   	count_vm_events(PGACTIVATE, pgactivate);
> +	mem_cgroup_uncharge_end();

I guess placing mem_cgroup_uncharge_end() just after the loop may be better looking.

Anyway,
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

But please show 'how much improved' in patch description.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-20  3:19 ` Kamezawa Hiroyuki
@ 2012-07-20  4:25   ` Minchan Kim
  2012-07-20 16:38   ` Tim Chen
  1 sibling, 0 replies; 11+ messages in thread
From: Minchan Kim @ 2012-07-20  4:25 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Tim Chen, Andrew Morton, Mel Gorman, Johannes Weiner,
	Kirill A. Shutemov, andi.kleen, linux-mm, linux-kernel

On Fri, Jul 20, 2012 at 12:19:20PM +0900, Kamezawa Hiroyuki wrote:
> (2012/07/20 8:34), Tim Chen wrote:
> >Hi,
> >
> >I noticed in a multi-process parallel files reading benchmark I ran on a
> >8 socket machine,  throughput slowed down by a factor of 8 when I ran
> >the benchmark within a cgroup container.  I traced the problem to the
> >following code path (see below) when we are trying to reclaim memory
> >from file cache.  The res_counter_uncharge function is called on every
> >page that's reclaimed and created heavy lock contention.  The patch
> >below allows the reclaimed pages to be uncharged from the resource
> >counter in batch and recovered the regression.
> >
> >Tim
> >
> >      40.67%           usemem  [kernel.kallsyms]                   [k] _raw_spin_lock
> >                       |
> >                       --- _raw_spin_lock
> >                          |
> >                          |--92.61%-- res_counter_uncharge
> >                          |          |
> >                          |          |--100.00%-- __mem_cgroup_uncharge_common
> >                          |          |          |
> >                          |          |          |--100.00%-- mem_cgroup_uncharge_cache_page
> >                          |          |          |          __remove_mapping
> >                          |          |          |          shrink_page_list
> >                          |          |          |          shrink_inactive_list
> >                          |          |          |          shrink_mem_cgroup_zone
> >                          |          |          |          shrink_zone
> >                          |          |          |          do_try_to_free_pages
> >                          |          |          |          try_to_free_pages
> >                          |          |          |          __alloc_pages_nodemask
> >                          |          |          |          alloc_pages_current
> >
> >
> 
> Thank you very much !!
> 
> When I added batching, I didn't touch page-reclaim path because it delays
> res_counter_uncharge() and make more threads run into page reclaim.

Isn't it really problem? It's same as global reclaim.
In the short term, you might be right but batch free might prevent
entering more into reclaim path in the long term because we get lots
of free pages than nenessary one. And we can reduce lock overhead.
If it is proved as real problem, maybe we need global reclaim, too.

> But, from above score, bactching seems required.
> 
> And because of current design of per-zone-per-memcg-LRU, batching
> works very very well....all lru pages shrink_page_list() scans are on
> the same memcg.

Yes. It's more effective point!

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-19 23:34 [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list Tim Chen
  2012-07-20  3:19 ` Kamezawa Hiroyuki
@ 2012-07-20  6:27 ` Johannes Weiner
  2012-07-20 11:19 ` Kirill A. Shutemov
  2012-07-20 13:53 ` Michal Hocko
  3 siblings, 0 replies; 11+ messages in thread
From: Johannes Weiner @ 2012-07-20  6:27 UTC (permalink / raw)
  To: Tim Chen
  Cc: Andrew Morton, Mel Gorman, KAMEZAWA Hiroyuki, Minchan Kim,
	Kirill A. Shutemov, andi.kleen, linux-mm, linux-kernel

On Thu, Jul 19, 2012 at 04:34:26PM -0700, Tim Chen wrote:
> Hi,
> 
> I noticed in a multi-process parallel files reading benchmark I ran on a
> 8 socket machine,  throughput slowed down by a factor of 8 when I ran
> the benchmark within a cgroup container.  I traced the problem to the
> following code path (see below) when we are trying to reclaim memory
> from file cache.  The res_counter_uncharge function is called on every
> page that's reclaimed and created heavy lock contention.  The patch
> below allows the reclaimed pages to be uncharged from the resource
> counter in batch and recovered the regression. 
> 
> Tim
> 
>      40.67%           usemem  [kernel.kallsyms]                   [k] _raw_spin_lock
>                       |
>                       --- _raw_spin_lock
>                          |
>                          |--92.61%-- res_counter_uncharge
>                          |          |
>                          |          |--100.00%-- __mem_cgroup_uncharge_common
>                          |          |          |
>                          |          |          |--100.00%-- mem_cgroup_uncharge_cache_page
>                          |          |          |          __remove_mapping
>                          |          |          |          shrink_page_list
>                          |          |          |          shrink_inactive_list
>                          |          |          |          shrink_mem_cgroup_zone
>                          |          |          |          shrink_zone
>                          |          |          |          do_try_to_free_pages
>                          |          |          |          try_to_free_pages
>                          |          |          |          __alloc_pages_nodemask
>                          |          |          |          alloc_pages_current
> 
> 
> ---
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>

Good one.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-19 23:34 [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list Tim Chen
  2012-07-20  3:19 ` Kamezawa Hiroyuki
  2012-07-20  6:27 ` Johannes Weiner
@ 2012-07-20 11:19 ` Kirill A. Shutemov
  2012-07-20 13:53 ` Michal Hocko
  3 siblings, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2012-07-20 11:19 UTC (permalink / raw)
  To: Tim Chen
  Cc: Andrew Morton, Mel Gorman, KAMEZAWA Hiroyuki, Minchan Kim,
	Johannes Weiner, andi.kleen, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2049 bytes --]

On Thu, Jul 19, 2012 at 04:34:26PM -0700, Tim Chen wrote:
> Hi,
> 
> I noticed in a multi-process parallel files reading benchmark I ran on a
> 8 socket machine,  throughput slowed down by a factor of 8 when I ran
> the benchmark within a cgroup container.  I traced the problem to the
> following code path (see below) when we are trying to reclaim memory
> from file cache.  The res_counter_uncharge function is called on every
> page that's reclaimed and created heavy lock contention.  The patch
> below allows the reclaimed pages to be uncharged from the resource
> counter in batch and recovered the regression. 
> 
> Tim
> 
>      40.67%           usemem  [kernel.kallsyms]                   [k] _raw_spin_lock
>                       |
>                       --- _raw_spin_lock
>                          |
>                          |--92.61%-- res_counter_uncharge
>                          |          |
>                          |          |--100.00%-- __mem_cgroup_uncharge_common
>                          |          |          |
>                          |          |          |--100.00%-- mem_cgroup_uncharge_cache_page
>                          |          |          |          __remove_mapping
>                          |          |          |          shrink_page_list
>                          |          |          |          shrink_inactive_list
>                          |          |          |          shrink_mem_cgroup_zone
>                          |          |          |          shrink_zone
>                          |          |          |          do_try_to_free_pages
>                          |          |          |          try_to_free_pages
>                          |          |          |          __alloc_pages_nodemask
>                          |          |          |          alloc_pages_current
> 
> 
> ---
> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-19 23:34 [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list Tim Chen
                   ` (2 preceding siblings ...)
  2012-07-20 11:19 ` Kirill A. Shutemov
@ 2012-07-20 13:53 ` Michal Hocko
  2012-07-20 14:16   ` Johannes Weiner
  3 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2012-07-20 13:53 UTC (permalink / raw)
  To: Tim Chen
  Cc: Andrew Morton, Mel Gorman, KAMEZAWA Hiroyuki, Minchan Kim,
	Johannes Weiner, Kirill A. Shutemov, andi.kleen, linux-mm,
	linux-kernel

On Thu 19-07-12 16:34:26, Tim Chen wrote:
[...]
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 33dc256..aac5672 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -779,6 +779,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  
>  	cond_resched();
>  
> +	mem_cgroup_uncharge_start();
>  	while (!list_empty(page_list)) {
>  		enum page_references references;
>  		struct address_space *mapping;

Is this safe? We have a scheduling point few lines below. What prevents
from task move while we are in the middle of the batch?

> @@ -1026,6 +1027,7 @@ keep_lumpy:
>  
>  	list_splice(&ret_pages, page_list);
>  	count_vm_events(PGACTIVATE, pgactivate);
> +	mem_cgroup_uncharge_end();
>  	*ret_nr_dirty += nr_dirty;
>  	*ret_nr_writeback += nr_writeback;
>  	return nr_reclaimed;
> 
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-20 13:53 ` Michal Hocko
@ 2012-07-20 14:16   ` Johannes Weiner
  2012-07-20 14:38     ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2012-07-20 14:16 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tim Chen, Andrew Morton, Mel Gorman, KAMEZAWA Hiroyuki,
	Minchan Kim, Kirill A. Shutemov, andi.kleen, linux-mm,
	linux-kernel

On Fri, Jul 20, 2012 at 03:53:29PM +0200, Michal Hocko wrote:
> On Thu 19-07-12 16:34:26, Tim Chen wrote:
> [...]
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 33dc256..aac5672 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -779,6 +779,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >  
> >  	cond_resched();
> >  
> > +	mem_cgroup_uncharge_start();
> >  	while (!list_empty(page_list)) {
> >  		enum page_references references;
> >  		struct address_space *mapping;
> 
> Is this safe? We have a scheduling point few lines below. What prevents
> from task move while we are in the middle of the batch?

The batch is accounted in task_struct, so moving a batching task to
another CPU shouldn't be a problem.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-20 14:16   ` Johannes Weiner
@ 2012-07-20 14:38     ` Michal Hocko
  2012-07-20 15:12       ` Johannes Weiner
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2012-07-20 14:38 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Tim Chen, Andrew Morton, Mel Gorman, KAMEZAWA Hiroyuki,
	Minchan Kim, Kirill A. Shutemov, andi.kleen, linux-mm,
	linux-kernel

On Fri 20-07-12 16:16:25, Johannes Weiner wrote:
> On Fri, Jul 20, 2012 at 03:53:29PM +0200, Michal Hocko wrote:
> > On Thu 19-07-12 16:34:26, Tim Chen wrote:
> > [...]
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index 33dc256..aac5672 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -779,6 +779,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > >  
> > >  	cond_resched();
> > >  
> > > +	mem_cgroup_uncharge_start();
> > >  	while (!list_empty(page_list)) {
> > >  		enum page_references references;
> > >  		struct address_space *mapping;
> > 
> > Is this safe? We have a scheduling point few lines below. What prevents
> > from task move while we are in the middle of the batch?
> 
> The batch is accounted in task_struct, so moving a batching task to
> another CPU shouldn't be a problem.

But it could also move to a different group, right?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-20 14:38     ` Michal Hocko
@ 2012-07-20 15:12       ` Johannes Weiner
  2012-07-20 16:31         ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Johannes Weiner @ 2012-07-20 15:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Tim Chen, Andrew Morton, Mel Gorman, KAMEZAWA Hiroyuki,
	Minchan Kim, Kirill A. Shutemov, andi.kleen, linux-mm,
	linux-kernel

On Fri, Jul 20, 2012 at 04:38:48PM +0200, Michal Hocko wrote:
> On Fri 20-07-12 16:16:25, Johannes Weiner wrote:
> > On Fri, Jul 20, 2012 at 03:53:29PM +0200, Michal Hocko wrote:
> > > On Thu 19-07-12 16:34:26, Tim Chen wrote:
> > > [...]
> > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > index 33dc256..aac5672 100644
> > > > --- a/mm/vmscan.c
> > > > +++ b/mm/vmscan.c
> > > > @@ -779,6 +779,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > > >  
> > > >  	cond_resched();
> > > >  
> > > > +	mem_cgroup_uncharge_start();
> > > >  	while (!list_empty(page_list)) {
> > > >  		enum page_references references;
> > > >  		struct address_space *mapping;
> > > 
> > > Is this safe? We have a scheduling point few lines below. What prevents
> > > from task move while we are in the middle of the batch?
> > 
> > The batch is accounted in task_struct, so moving a batching task to
> > another CPU shouldn't be a problem.
> 
> But it could also move to a different group, right?

The batch-uncharging task will remember the memcg of the first page it
processes, then pile every subsequent page belonging to the same memcg
on top.  It doesn't matter which group the task is in.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-20 15:12       ` Johannes Weiner
@ 2012-07-20 16:31         ` Michal Hocko
  0 siblings, 0 replies; 11+ messages in thread
From: Michal Hocko @ 2012-07-20 16:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Tim Chen, Andrew Morton, Mel Gorman, KAMEZAWA Hiroyuki,
	Minchan Kim, Kirill A. Shutemov, andi.kleen, linux-mm,
	linux-kernel

On Fri 20-07-12 17:12:16, Johannes Weiner wrote:
> On Fri, Jul 20, 2012 at 04:38:48PM +0200, Michal Hocko wrote:
> > On Fri 20-07-12 16:16:25, Johannes Weiner wrote:
> > > On Fri, Jul 20, 2012 at 03:53:29PM +0200, Michal Hocko wrote:
> > > > On Thu 19-07-12 16:34:26, Tim Chen wrote:
> > > > [...]
> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > > > index 33dc256..aac5672 100644
> > > > > --- a/mm/vmscan.c
> > > > > +++ b/mm/vmscan.c
> > > > > @@ -779,6 +779,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> > > > >  
> > > > >  	cond_resched();
> > > > >  
> > > > > +	mem_cgroup_uncharge_start();
> > > > >  	while (!list_empty(page_list)) {
> > > > >  		enum page_references references;
> > > > >  		struct address_space *mapping;
> > > > 
> > > > Is this safe? We have a scheduling point few lines below. What prevents
> > > > from task move while we are in the middle of the batch?
> > > 
> > > The batch is accounted in task_struct, so moving a batching task to
> > > another CPU shouldn't be a problem.
> > 
> > But it could also move to a different group, right?
> 
> The batch-uncharging task will remember the memcg of the first page it
> processes, then pile every subsequent page belonging to the same memcg
> on top.  It doesn't matter which group the task is in.

Ahh, you are right. I have missed if (batch->memcg != memcg) at the end
of mem_cgroup_do_uncharge.
Thanks!

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list
  2012-07-20  3:19 ` Kamezawa Hiroyuki
  2012-07-20  4:25   ` Minchan Kim
@ 2012-07-20 16:38   ` Tim Chen
  1 sibling, 0 replies; 11+ messages in thread
From: Tim Chen @ 2012-07-20 16:38 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: Andrew Morton, Mel Gorman, Minchan Kim, Johannes Weiner,
	Kirill A. Shutemov, Andi Kleen, linux-mm, linux-kernel

On Fri, 2012-07-20 at 12:19 +0900, Kamezawa Hiroyuki wrote:

> 
> When I added batching, I didn't touch page-reclaim path because it delays
> res_counter_uncharge() and make more threads run into page reclaim.
> But, from above score, bactching seems required.
> 
> And because of current design of per-zone-per-memcg-LRU, batching
> works very very well....all lru pages shrink_page_list() scans are on
> the same memcg.
> 
> BTW, it's better to show 'how much improved' in patch description..

I didn't put the specific improvement in patch description as the
performance change is specific to my machine and benchmark and
improvement could be variable for others.  However, I did include the
specific number in the body of my message.  Hope that is enough.
 

> 
> 
> > ---
> > Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 33dc256..aac5672 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -779,6 +779,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
> >
> >   	cond_resched();
> >
> > +	mem_cgroup_uncharge_start();
> >   	while (!list_empty(page_list)) {
> >   		enum page_references references;
> >   		struct address_space *mapping;
> > @@ -1026,6 +1027,7 @@ keep_lumpy:
> >
> >   	list_splice(&ret_pages, page_list);
> >   	count_vm_events(PGACTIVATE, pgactivate);
> > +	mem_cgroup_uncharge_end();
> 
> I guess placing mem_cgroup_uncharge_end() just after the loop may be better looking.

I initially though of doing that.  I later pushed the statement down to
after list_splice(&ret_pages, page_list) as that's when the page reclaim
is actually completed.  It probably doesn't matter one way or the other.
I can move it to just after the loop if people think that's better.

Thanks for reviewing the change.

Tim


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-07-20 16:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-19 23:34 [PATCH] Cgroup: Fix memory accounting scalability in shrink_page_list Tim Chen
2012-07-20  3:19 ` Kamezawa Hiroyuki
2012-07-20  4:25   ` Minchan Kim
2012-07-20 16:38   ` Tim Chen
2012-07-20  6:27 ` Johannes Weiner
2012-07-20 11:19 ` Kirill A. Shutemov
2012-07-20 13:53 ` Michal Hocko
2012-07-20 14:16   ` Johannes Weiner
2012-07-20 14:38     ` Michal Hocko
2012-07-20 15:12       ` Johannes Weiner
2012-07-20 16:31         ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).