linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: vmscan: check mem cgroup over reclaimed
@ 2012-01-23  1:55 Hillf Danton
  2012-01-23 10:47 ` Johannes Weiner
  2012-01-23 19:04 ` Ying Han
  0 siblings, 2 replies; 12+ messages in thread
From: Hillf Danton @ 2012-01-23  1:55 UTC (permalink / raw)
  To: linux-mm
  Cc: Michal Hocko, KAMEZAWA Hiroyuki, Ying Han, Hugh Dickins,
	Andrew Morton, LKML, Hillf Danton

To avoid reduction in performance of reclaimee, checking overreclaim is added
after shrinking lru list, when pages are reclaimed from mem cgroup.

If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
reclaim for reclaim/compaction.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
---

--- a/mm/vmscan.c	Mon Jan 23 00:23:10 2012
+++ b/mm/vmscan.c	Mon Jan 23 09:57:20 2012
@@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
 	unsigned long nr_reclaimed, nr_scanned;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
 	struct blk_plug plug;
+	bool memcg_over_reclaimed = false;

 restart:
 	nr_reclaimed = 0;
@@ -2103,6 +2104,11 @@ restart:

 				nr_reclaimed += shrink_list(lru, nr_to_scan,
 							    mz, sc, priority);
+
+				memcg_over_reclaimed = !scanning_global_lru(mz)
+					&& (nr_reclaimed >= nr_to_reclaim);
+				if (memcg_over_reclaimed)
+					goto out;
 			}
 		}
 		/*
@@ -2116,6 +2122,7 @@ restart:
 		if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
 			break;
 	}
+out:
 	blk_finish_plug(&plug);
 	sc->nr_reclaimed += nr_reclaimed;

@@ -2127,7 +2134,8 @@ restart:
 		shrink_active_list(SWAP_CLUSTER_MAX, mz, sc, priority, 0);

 	/* reclaim/compaction might need reclaim to continue */
-	if (should_continue_reclaim(mz, nr_reclaimed,
+	if (!memcg_over_reclaimed &&
+	    should_continue_reclaim(mz, nr_reclaimed,
 					sc->nr_scanned - nr_scanned, sc))
 		goto restart;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-23  1:55 [PATCH] mm: vmscan: check mem cgroup over reclaimed Hillf Danton
@ 2012-01-23 10:47 ` Johannes Weiner
  2012-01-23 12:30   ` Hillf Danton
  2012-01-23 19:04 ` Ying Han
  1 sibling, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2012-01-23 10:47 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, Michal Hocko, KAMEZAWA Hiroyuki, Ying Han,
	Hugh Dickins, Andrew Morton, LKML

On Mon, Jan 23, 2012 at 09:55:07AM +0800, Hillf Danton wrote:
> To avoid reduction in performance of reclaimee, checking overreclaim is added
> after shrinking lru list, when pages are reclaimed from mem cgroup.
> 
> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
> reclaim for reclaim/compaction.
> 
> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> ---
> 
> --- a/mm/vmscan.c	Mon Jan 23 00:23:10 2012
> +++ b/mm/vmscan.c	Mon Jan 23 09:57:20 2012
> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
>  	unsigned long nr_reclaimed, nr_scanned;
>  	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
>  	struct blk_plug plug;
> +	bool memcg_over_reclaimed = false;
> 
>  restart:
>  	nr_reclaimed = 0;
> @@ -2103,6 +2104,11 @@ restart:
> 
>  				nr_reclaimed += shrink_list(lru, nr_to_scan,
>  							    mz, sc, priority);
> +
> +				memcg_over_reclaimed = !scanning_global_lru(mz)
> +					&& (nr_reclaimed >= nr_to_reclaim);
> +				if (memcg_over_reclaimed)
> +					goto out;

Since this merge window, scanning_global_lru() is always false when
the memory controller is enabled, i.e. most common configurations and
distribution kernels.

This will with quite likely have bad effects on zone balancing,
pressure balancing between anon/file lru etc, while you haven't shown
that any workloads actually benefit from this.

Submitting patches like this without mentioning a problematic scenario
and numbers that demonstrate that the patch improve it is not helpful.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-23 10:47 ` Johannes Weiner
@ 2012-01-23 12:30   ` Hillf Danton
  2012-01-24  8:33     ` Johannes Weiner
  0 siblings, 1 reply; 12+ messages in thread
From: Hillf Danton @ 2012-01-23 12:30 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, Michal Hocko, KAMEZAWA Hiroyuki, Ying Han,
	Hugh Dickins, Andrew Morton, LKML

On Mon, Jan 23, 2012 at 6:47 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Mon, Jan 23, 2012 at 09:55:07AM +0800, Hillf Danton wrote:
>> To avoid reduction in performance of reclaimee, checking overreclaim is added
>> after shrinking lru list, when pages are reclaimed from mem cgroup.
>>
>> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
>> reclaim for reclaim/compaction.
>>
>> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> ---
>>
>> --- a/mm/vmscan.c     Mon Jan 23 00:23:10 2012
>> +++ b/mm/vmscan.c     Mon Jan 23 09:57:20 2012
>> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
>>       unsigned long nr_reclaimed, nr_scanned;
>>       unsigned long nr_to_reclaim = sc->nr_to_reclaim;
>>       struct blk_plug plug;
>> +     bool memcg_over_reclaimed = false;
>>
>>  restart:
>>       nr_reclaimed = 0;
>> @@ -2103,6 +2104,11 @@ restart:
>>
>>                               nr_reclaimed += shrink_list(lru, nr_to_scan,
>>                                                           mz, sc, priority);
>> +
>> +                             memcg_over_reclaimed = !scanning_global_lru(mz)
>> +                                     && (nr_reclaimed >= nr_to_reclaim);
>> +                             if (memcg_over_reclaimed)
>> +                                     goto out;
>
> Since this merge window, scanning_global_lru() is always false when
> the memory controller is enabled, i.e. most common configurations and
> distribution kernels.
>
> This will with quite likely have bad effects on zone balancing,
> pressure balancing between anon/file lru etc, while you haven't shown
> that any workloads actually benefit from this.
>
Hi Johannes

Thanks for your comment, first.

Impact on zone balance and lru-list balance is introduced actually, but I
dont think the patch is totally responsible for the balance mentioned,
because soft limit, embedded in mem cgroup, is setup by users according to
whatever tastes they have.

Though there is room for the patch to be fine tuned in this direction or that,
over reclaim should not be neglected entirely, but be avoided as much as we
could, or users are enforced to set up soft limit with much care not to mess
up zone balance.

Hillf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-23  1:55 [PATCH] mm: vmscan: check mem cgroup over reclaimed Hillf Danton
  2012-01-23 10:47 ` Johannes Weiner
@ 2012-01-23 19:04 ` Ying Han
  2012-01-24  3:45   ` Hillf Danton
  1 sibling, 1 reply; 12+ messages in thread
From: Ying Han @ 2012-01-23 19:04 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, Michal Hocko, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, LKML

On Sun, Jan 22, 2012 at 5:55 PM, Hillf Danton <dhillf@gmail.com> wrote:
> To avoid reduction in performance of reclaimee, checking overreclaim is added
> after shrinking lru list, when pages are reclaimed from mem cgroup.
>
> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
> reclaim for reclaim/compaction.
>
> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> ---
>
> --- a/mm/vmscan.c       Mon Jan 23 00:23:10 2012
> +++ b/mm/vmscan.c       Mon Jan 23 09:57:20 2012
> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
>        unsigned long nr_reclaimed, nr_scanned;
>        unsigned long nr_to_reclaim = sc->nr_to_reclaim;
>        struct blk_plug plug;
> +       bool memcg_over_reclaimed = false;
>
>  restart:
>        nr_reclaimed = 0;
> @@ -2103,6 +2104,11 @@ restart:
>
>                                nr_reclaimed += shrink_list(lru, nr_to_scan,
>                                                            mz, sc, priority);
> +
> +                               memcg_over_reclaimed = !scanning_global_lru(mz)
> +                                       && (nr_reclaimed >= nr_to_reclaim);
> +                               if (memcg_over_reclaimed)
> +                                       goto out;

Why we need the change here? Do we have number to demonstrate?


>                        }
>                }
>                /*
> @@ -2116,6 +2122,7 @@ restart:
>                if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
>                        break;
>        }
> +out:
>        blk_finish_plug(&plug);
>        sc->nr_reclaimed += nr_reclaimed;
>
> @@ -2127,7 +2134,8 @@ restart:
>                shrink_active_list(SWAP_CLUSTER_MAX, mz, sc, priority, 0);
>
>        /* reclaim/compaction might need reclaim to continue */
> -       if (should_continue_reclaim(mz, nr_reclaimed,
> +       if (!memcg_over_reclaimed &&
> +           should_continue_reclaim(mz, nr_reclaimed,
>                                        sc->nr_scanned - nr_scanned, sc))

This changes the existing logic. What if the nr_reclaimed is greater
than nr_to_reclaim, but smaller than pages_for_compaction? The
existing logic is to continue reclaiming.

--Ying

>                goto restart;

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-23 19:04 ` Ying Han
@ 2012-01-24  3:45   ` Hillf Danton
  2012-01-24 23:22     ` Ying Han
  0 siblings, 1 reply; 12+ messages in thread
From: Hillf Danton @ 2012-01-24  3:45 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, Michal Hocko, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, LKML

Hi all

On Tue, Jan 24, 2012 at 3:04 AM, Ying Han <yinghan@google.com> wrote:
> On Sun, Jan 22, 2012 at 5:55 PM, Hillf Danton <dhillf@gmail.com> wrote:
>> To avoid reduction in performance of reclaimee, checking overreclaim is added
>> after shrinking lru list, when pages are reclaimed from mem cgroup.
>>
>> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
>> reclaim for reclaim/compaction.
>>
>> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> ---
>>
>> --- a/mm/vmscan.c       Mon Jan 23 00:23:10 2012
>> +++ b/mm/vmscan.c       Mon Jan 23 09:57:20 2012
>> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
>>        unsigned long nr_reclaimed, nr_scanned;
>>        unsigned long nr_to_reclaim = sc->nr_to_reclaim;
>>        struct blk_plug plug;
>> +       bool memcg_over_reclaimed = false;
>>
>>  restart:
>>        nr_reclaimed = 0;
>> @@ -2103,6 +2104,11 @@ restart:
>>
>>                                nr_reclaimed += shrink_list(lru, nr_to_scan,
>>                                                            mz, sc, priority);
>> +
>> +                               memcg_over_reclaimed = !scanning_global_lru(mz)
>> +                                       && (nr_reclaimed >= nr_to_reclaim);
>> +                               if (memcg_over_reclaimed)
>> +                                       goto out;
>
> Why we need the change here? Do we have number to demonstrate?

See below please 8-)

>
>
>>                        }
>>                }
>>                /*
>> @@ -2116,6 +2122,7 @@ restart:
>>                if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
>>                        break;
>>        }
>> +out:
>>        blk_finish_plug(&plug);
>>        sc->nr_reclaimed += nr_reclaimed;
>>
>> @@ -2127,7 +2134,8 @@ restart:
>>                shrink_active_list(SWAP_CLUSTER_MAX, mz, sc, priority, 0);
>>
>>        /* reclaim/compaction might need reclaim to continue */
>> -       if (should_continue_reclaim(mz, nr_reclaimed,
>> +       if (!memcg_over_reclaimed &&
>> +           should_continue_reclaim(mz, nr_reclaimed,
>>                                        sc->nr_scanned - nr_scanned, sc))
>
> This changes the existing logic. What if the nr_reclaimed is greater
> than nr_to_reclaim, but smaller than pages_for_compaction? The
> existing logic is to continue reclaiming.
>
With soft limit available, what if nr_to_reclaim set to be the number of
pages exceeding soft limit? With over reclaim abused, what are the targets
of soft limit?

Thanks
Hillf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-23 12:30   ` Hillf Danton
@ 2012-01-24  8:33     ` Johannes Weiner
  2012-01-24  9:08       ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2012-01-24  8:33 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, Michal Hocko, KAMEZAWA Hiroyuki, Ying Han,
	Hugh Dickins, Andrew Morton, LKML

On Mon, Jan 23, 2012 at 08:30:42PM +0800, Hillf Danton wrote:
> On Mon, Jan 23, 2012 at 6:47 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Mon, Jan 23, 2012 at 09:55:07AM +0800, Hillf Danton wrote:
> >> To avoid reduction in performance of reclaimee, checking overreclaim is added
> >> after shrinking lru list, when pages are reclaimed from mem cgroup.
> >>
> >> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
> >> reclaim for reclaim/compaction.
> >>
> >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> >> ---
> >>
> >> --- a/mm/vmscan.c     Mon Jan 23 00:23:10 2012
> >> +++ b/mm/vmscan.c     Mon Jan 23 09:57:20 2012
> >> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
> >>       unsigned long nr_reclaimed, nr_scanned;
> >>       unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> >>       struct blk_plug plug;
> >> +     bool memcg_over_reclaimed = false;
> >>
> >>  restart:
> >>       nr_reclaimed = 0;
> >> @@ -2103,6 +2104,11 @@ restart:
> >>
> >>                               nr_reclaimed += shrink_list(lru, nr_to_scan,
> >>                                                           mz, sc, priority);
> >> +
> >> +                             memcg_over_reclaimed = !scanning_global_lru(mz)
> >> +                                     && (nr_reclaimed >= nr_to_reclaim);
> >> +                             if (memcg_over_reclaimed)
> >> +                                     goto out;
> >
> > Since this merge window, scanning_global_lru() is always false when
> > the memory controller is enabled, i.e. most common configurations and
> > distribution kernels.
> >
> > This will with quite likely have bad effects on zone balancing,
> > pressure balancing between anon/file lru etc, while you haven't shown
> > that any workloads actually benefit from this.
> >
> Hi Johannes
> 
> Thanks for your comment, first.
> 
> Impact on zone balance and lru-list balance is introduced actually, but I
> dont think the patch is totally responsible for the balance mentioned,
> because soft limit, embedded in mem cgroup, is setup by users according to
> whatever tastes they have.
> 
> Though there is room for the patch to be fine tuned in this direction or that,
> over reclaim should not be neglected entirely, but be avoided as much as we
> could, or users are enforced to set up soft limit with much care not to mess
> up zone balance.

Overreclaim is absolutely horrible with soft limits, but I think there
are more direct reasons than checking nr_to_reclaim only after a full
zone scan, for example, soft limit reclaim is invoked on zones that
are totally fine.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-24  8:33     ` Johannes Weiner
@ 2012-01-24  9:08       ` KAMEZAWA Hiroyuki
  2012-01-24 23:33         ` Ying Han
  0 siblings, 1 reply; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-24  9:08 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hillf Danton, linux-mm, Michal Hocko, Ying Han, Hugh Dickins,
	Andrew Morton, LKML

On Tue, 24 Jan 2012 09:33:47 +0100
Johannes Weiner <hannes@cmpxchg.org> wrote:

> On Mon, Jan 23, 2012 at 08:30:42PM +0800, Hillf Danton wrote:
> > On Mon, Jan 23, 2012 at 6:47 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > > On Mon, Jan 23, 2012 at 09:55:07AM +0800, Hillf Danton wrote:
> > >> To avoid reduction in performance of reclaimee, checking overreclaim is added
> > >> after shrinking lru list, when pages are reclaimed from mem cgroup.
> > >>
> > >> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
> > >> reclaim for reclaim/compaction.
> > >>
> > >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> > >> ---
> > >>
> > >> --- a/mm/vmscan.c     Mon Jan 23 00:23:10 2012
> > >> +++ b/mm/vmscan.c     Mon Jan 23 09:57:20 2012
> > >> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
> > >>       unsigned long nr_reclaimed, nr_scanned;
> > >>       unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> > >>       struct blk_plug plug;
> > >> +     bool memcg_over_reclaimed = false;
> > >>
> > >>  restart:
> > >>       nr_reclaimed = 0;
> > >> @@ -2103,6 +2104,11 @@ restart:
> > >>
> > >>                               nr_reclaimed += shrink_list(lru, nr_to_scan,
> > >>                                                           mz, sc, priority);
> > >> +
> > >> +                             memcg_over_reclaimed = !scanning_global_lru(mz)
> > >> +                                     && (nr_reclaimed >= nr_to_reclaim);
> > >> +                             if (memcg_over_reclaimed)
> > >> +                                     goto out;
> > >
> > > Since this merge window, scanning_global_lru() is always false when
> > > the memory controller is enabled, i.e. most common configurations and
> > > distribution kernels.
> > >
> > > This will with quite likely have bad effects on zone balancing,
> > > pressure balancing between anon/file lru etc, while you haven't shown
> > > that any workloads actually benefit from this.
> > >
> > Hi Johannes
> > 
> > Thanks for your comment, first.
> > 
> > Impact on zone balance and lru-list balance is introduced actually, but I
> > dont think the patch is totally responsible for the balance mentioned,
> > because soft limit, embedded in mem cgroup, is setup by users according to
> > whatever tastes they have.
> > 
> > Though there is room for the patch to be fine tuned in this direction or that,
> > over reclaim should not be neglected entirely, but be avoided as much as we
> > could, or users are enforced to set up soft limit with much care not to mess
> > up zone balance.
> 
> Overreclaim is absolutely horrible with soft limits, but I think there
> are more direct reasons than checking nr_to_reclaim only after a full
> zone scan, for example, soft limit reclaim is invoked on zones that
> are totally fine.
> 


IIUC..
 - Because zonelist is all visited by alloc_pages(), _all_ zones in zonelist
   are in memory shortage.
 - taking care of zone/node balancing. 

I know this 'full zone scan' affects latency of alloc_pages() if the number
of node is big.

IMHO, in case of direct-reclaim caused by memcg's limit, we should avoid
full zone scan because the reclaim is not caused by any memory shortage in zonelist.

In case of global memory reclaim, kswapd doesn't use zonelist.

So, only global-direct-reclaim is a problem here.
I think do-full-zone-scan will reduce the calls of try_to_free_pages() 
in future and may reduce lock contention but adds a thread too much
penalty.

In typical case, considering 4-node x86/64 NUMA, GFP_HIGHUSER_MOVABLE
allocation failure will reclaim 4*ZONE_NORMAL+ZONE_DMA32 = 160pages per scan.

If 16-node, it will be 16*ZONE_NORMAL+ZONE_DMA32 = 544? pages per scan.

32pages may be too small but don't we need to have some threshold to quit
full-zone-scan ?

Here, the topic is about softlimit reclaim. I think...

1. follow up for following comment(*) is required.
==
                        nr_soft_scanned = 0;
                        nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
                                                sc->order, sc->gfp_mask,
                                                &nr_soft_scanned);
                        sc->nr_reclaimed += nr_soft_reclaimed;
                        sc->nr_scanned += nr_soft_scanned;
                        /* need some check for avoid more shrink_zone() */ <----(*)
==

2. some threshold for avoinding full zone scan may be good.
   (But this may need deep discussion...)

3. About the patch, I think it will not break zone-balancing if (*) is
   handled in a good way.

   This check is not good.

+				memcg_over_reclaimed = !scanning_global_lru(mz)
+					&& (nr_reclaimed >= nr_to_reclaim);

   
  I like following 

  If (we-are-doing-softlimit-reclaim-for-global-direct-reclaim &&
      res_counter_soft_limit_excess(memcg->res))
       memcg_over_reclaimed = true;

Then another memcg will be picked up and soft-limit-reclaim() will continue.

Thanks,
-Kame












^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-24  3:45   ` Hillf Danton
@ 2012-01-24 23:22     ` Ying Han
  2012-01-25  1:47       ` Hillf Danton
  0 siblings, 1 reply; 12+ messages in thread
From: Ying Han @ 2012-01-24 23:22 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, Michal Hocko, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, LKML

On Mon, Jan 23, 2012 at 7:45 PM, Hillf Danton <dhillf@gmail.com> wrote:
> Hi all
>
> On Tue, Jan 24, 2012 at 3:04 AM, Ying Han <yinghan@google.com> wrote:
>> On Sun, Jan 22, 2012 at 5:55 PM, Hillf Danton <dhillf@gmail.com> wrote:
>>> To avoid reduction in performance of reclaimee, checking overreclaim is added
>>> after shrinking lru list, when pages are reclaimed from mem cgroup.
>>>
>>> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
>>> reclaim for reclaim/compaction.
>>>
>>> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>>> ---
>>>
>>> --- a/mm/vmscan.c       Mon Jan 23 00:23:10 2012
>>> +++ b/mm/vmscan.c       Mon Jan 23 09:57:20 2012
>>> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
>>>        unsigned long nr_reclaimed, nr_scanned;
>>>        unsigned long nr_to_reclaim = sc->nr_to_reclaim;
>>>        struct blk_plug plug;
>>> +       bool memcg_over_reclaimed = false;
>>>
>>>  restart:
>>>        nr_reclaimed = 0;
>>> @@ -2103,6 +2104,11 @@ restart:
>>>
>>>                                nr_reclaimed += shrink_list(lru, nr_to_scan,
>>>                                                            mz, sc, priority);
>>> +
>>> +                               memcg_over_reclaimed = !scanning_global_lru(mz)
>>> +                                       && (nr_reclaimed >= nr_to_reclaim);
>>> +                               if (memcg_over_reclaimed)
>>> +                                       goto out;
>>
>> Why we need the change here? Do we have number to demonstrate?
>
> See below please 8-)
>
>>
>>
>>>                        }
>>>                }
>>>                /*
>>> @@ -2116,6 +2122,7 @@ restart:
>>>                if (nr_reclaimed >= nr_to_reclaim && priority < DEF_PRIORITY)
>>>                        break;
>>>        }
>>> +out:
>>>        blk_finish_plug(&plug);
>>>        sc->nr_reclaimed += nr_reclaimed;
>>>
>>> @@ -2127,7 +2134,8 @@ restart:
>>>                shrink_active_list(SWAP_CLUSTER_MAX, mz, sc, priority, 0);
>>>
>>>        /* reclaim/compaction might need reclaim to continue */
>>> -       if (should_continue_reclaim(mz, nr_reclaimed,
>>> +       if (!memcg_over_reclaimed &&
>>> +           should_continue_reclaim(mz, nr_reclaimed,
>>>                                        sc->nr_scanned - nr_scanned, sc))
>>
>> This changes the existing logic. What if the nr_reclaimed is greater
>> than nr_to_reclaim, but smaller than pages_for_compaction? The
>> existing logic is to continue reclaiming.
>>
> With soft limit available, what if nr_to_reclaim set to be the number of
> pages exceeding soft limit? With over reclaim abused, what are the targets
> of soft limit?

The nr_to_reclaim is set to SWAP_CLUSTER_MAX (32) for direct reclaim
and ULONG_MAX for background reclaim. Not sure we can set it, but it
is possible the res_counter_soft_limit_excess equal to that target
value. The current soft limit mechanism provides a clue of WHERE to
reclaim pages when there is memory pressure, it doesn't change the
reclaim target as it was before.

Overreclaim a cgroup under its softlimit is bad, but we should be
careful not introducing side effect before providing the guarantee.
Here, the should_continue_reclaim() has logic of freeing a bit more
order-0 pages for compaction. The logic got changed after this.

--Ying


> Thanks
> Hillf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-24  9:08       ` KAMEZAWA Hiroyuki
@ 2012-01-24 23:33         ` Ying Han
  2012-01-26  9:16           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 12+ messages in thread
From: Ying Han @ 2012-01-24 23:33 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Hillf Danton, linux-mm, Michal Hocko,
	Hugh Dickins, Andrew Morton, LKML

On Tue, Jan 24, 2012 at 1:08 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 24 Jan 2012 09:33:47 +0100
> Johannes Weiner <hannes@cmpxchg.org> wrote:
>
>> On Mon, Jan 23, 2012 at 08:30:42PM +0800, Hillf Danton wrote:
>> > On Mon, Jan 23, 2012 at 6:47 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> > > On Mon, Jan 23, 2012 at 09:55:07AM +0800, Hillf Danton wrote:
>> > >> To avoid reduction in performance of reclaimee, checking overreclaim is added
>> > >> after shrinking lru list, when pages are reclaimed from mem cgroup.
>> > >>
>> > >> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
>> > >> reclaim for reclaim/compaction.
>> > >>
>> > >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
>> > >> ---
>> > >>
>> > >> --- a/mm/vmscan.c     Mon Jan 23 00:23:10 2012
>> > >> +++ b/mm/vmscan.c     Mon Jan 23 09:57:20 2012
>> > >> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
>> > >>       unsigned long nr_reclaimed, nr_scanned;
>> > >>       unsigned long nr_to_reclaim = sc->nr_to_reclaim;
>> > >>       struct blk_plug plug;
>> > >> +     bool memcg_over_reclaimed = false;
>> > >>
>> > >>  restart:
>> > >>       nr_reclaimed = 0;
>> > >> @@ -2103,6 +2104,11 @@ restart:
>> > >>
>> > >>                               nr_reclaimed += shrink_list(lru, nr_to_scan,
>> > >>                                                           mz, sc, priority);
>> > >> +
>> > >> +                             memcg_over_reclaimed = !scanning_global_lru(mz)
>> > >> +                                     && (nr_reclaimed >= nr_to_reclaim);
>> > >> +                             if (memcg_over_reclaimed)
>> > >> +                                     goto out;
>> > >
>> > > Since this merge window, scanning_global_lru() is always false when
>> > > the memory controller is enabled, i.e. most common configurations and
>> > > distribution kernels.
>> > >
>> > > This will with quite likely have bad effects on zone balancing,
>> > > pressure balancing between anon/file lru etc, while you haven't shown
>> > > that any workloads actually benefit from this.
>> > >
>> > Hi Johannes
>> >
>> > Thanks for your comment, first.
>> >
>> > Impact on zone balance and lru-list balance is introduced actually, but I
>> > dont think the patch is totally responsible for the balance mentioned,
>> > because soft limit, embedded in mem cgroup, is setup by users according to
>> > whatever tastes they have.
>> >
>> > Though there is room for the patch to be fine tuned in this direction or that,
>> > over reclaim should not be neglected entirely, but be avoided as much as we
>> > could, or users are enforced to set up soft limit with much care not to mess
>> > up zone balance.
>>
>> Overreclaim is absolutely horrible with soft limits, but I think there
>> are more direct reasons than checking nr_to_reclaim only after a full
>> zone scan, for example, soft limit reclaim is invoked on zones that
>> are totally fine.
>>
>
>
> IIUC..
>  - Because zonelist is all visited by alloc_pages(), _all_ zones in zonelist
>   are in memory shortage.
>  - taking care of zone/node balancing.
>
> I know this 'full zone scan' affects latency of alloc_pages() if the number
> of node is big.

>
> IMHO, in case of direct-reclaim caused by memcg's limit, we should avoid
> full zone scan because the reclaim is not caused by any memory shortage in zonelist.
>
> In case of global memory reclaim, kswapd doesn't use zonelist.
>
> So, only global-direct-reclaim is a problem here.
> I think do-full-zone-scan will reduce the calls of try_to_free_pages()
> in future and may reduce lock contention but adds a thread too much
> penalty.

> In typical case, considering 4-node x86/64 NUMA, GFP_HIGHUSER_MOVABLE
> allocation failure will reclaim 4*ZONE_NORMAL+ZONE_DMA32 = 160pages per scan.
>
> If 16-node, it will be 16*ZONE_NORMAL+ZONE_DMA32 = 544? pages per scan.
>
> 32pages may be too small but don't we need to have some threshold to quit
> full-zone-scan ?

Sorry I am confused. Are we talking about doing full zonelist scanning
within a memcg or doing anon/file lru balance within a zone? AFAIU, it
is the later one.

In this patch, we do early breakout (memcg_over_reclaimed) without
finish scanning other lrus per-memcg-per-zone. I think the concern is
what is the side effect of that ?

> Here, the topic is about softlimit reclaim. I think...
>
> 1. follow up for following comment(*) is required.
> ==
>                        nr_soft_scanned = 0;
>                        nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
>                                                sc->order, sc->gfp_mask,
>                                                &nr_soft_scanned);
>                        sc->nr_reclaimed += nr_soft_reclaimed;
>                        sc->nr_scanned += nr_soft_scanned;
>                        /* need some check for avoid more shrink_zone() */ <----(*)
> ==
>
> 2. some threshold for avoinding full zone scan may be good.
>   (But this may need deep discussion...)
>
> 3. About the patch, I think it will not break zone-balancing if (*) is
>   handled in a good way.
>
>   This check is not good.
>
> +                               memcg_over_reclaimed = !scanning_global_lru(mz)
> +                                       && (nr_reclaimed >= nr_to_reclaim);
>
>
>  I like following
>
>  If (we-are-doing-softlimit-reclaim-for-global-direct-reclaim &&
>      res_counter_soft_limit_excess(memcg->res))
>       memcg_over_reclaimed = true;

This condition looks quite similar to what we've discussed on another
thread, except that we do allow over-reclaim under softlimit after
certain priority loop. (assume we have hard-to-reclaim memory on other
cgroups above their softlimit)

There are some works needed to be done ( like reverting the rb-tree )
on current soft limit implementation before we can even further to
optimize it. It would be nice to settle the first part before
everything else.

--Ying

> Then another memcg will be picked up and soft-limit-reclaim() will continue.
>
> Thanks,
> -Kame
>
>
>
>
>
>
>
>
>
>
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-24 23:22     ` Ying Han
@ 2012-01-25  1:47       ` Hillf Danton
  2012-01-25 19:20         ` Ying Han
  0 siblings, 1 reply; 12+ messages in thread
From: Hillf Danton @ 2012-01-25  1:47 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, Michal Hocko, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, LKML

On Wed, Jan 25, 2012 at 7:22 AM, Ying Han <yinghan@google.com> wrote:
> On Mon, Jan 23, 2012 at 7:45 PM, Hillf Danton <dhillf@gmail.com> wrote:
>> With soft limit available, what if nr_to_reclaim set to be the number of
>> pages exceeding soft limit? With over reclaim abused, what are the targets
>> of soft limit?
>
> The nr_to_reclaim is set to SWAP_CLUSTER_MAX (32) for direct reclaim
> and ULONG_MAX for background reclaim. Not sure we can set it, but it
> is possible the res_counter_soft_limit_excess equal to that target
> value. The current soft limit mechanism provides a clue of WHERE to
> reclaim pages when there is memory pressure, it doesn't change the
> reclaim target as it was before.
>

Decrement in sc->nr_to_reclaim was tried in another patch, you already saw it.

> Overreclaim a cgroup under its softlimit is bad, but we should be
> careful not introducing side effect before providing the guarantee.

Yes 8-)

> Here, the should_continue_reclaim() has logic of freeing a bit more
> order-0 pages for compaction. The logic got changed after this.
>

Compaction is to increase the successful rate of THP allocation, and in turn
to back up higher performance. In soft limit, performance guarantee is not
extra request but treated with less care.

Which one you prefer, compaction or guarantee?

Thanks
Hillf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-25  1:47       ` Hillf Danton
@ 2012-01-25 19:20         ` Ying Han
  0 siblings, 0 replies; 12+ messages in thread
From: Ying Han @ 2012-01-25 19:20 UTC (permalink / raw)
  To: Hillf Danton
  Cc: linux-mm, Michal Hocko, KAMEZAWA Hiroyuki, Hugh Dickins,
	Andrew Morton, LKML

On Tue, Jan 24, 2012 at 5:47 PM, Hillf Danton <dhillf@gmail.com> wrote:
> On Wed, Jan 25, 2012 at 7:22 AM, Ying Han <yinghan@google.com> wrote:
>> On Mon, Jan 23, 2012 at 7:45 PM, Hillf Danton <dhillf@gmail.com> wrote:
>>> With soft limit available, what if nr_to_reclaim set to be the number of
>>> pages exceeding soft limit? With over reclaim abused, what are the targets
>>> of soft limit?
>>
>> The nr_to_reclaim is set to SWAP_CLUSTER_MAX (32) for direct reclaim
>> and ULONG_MAX for background reclaim. Not sure we can set it, but it
>> is possible the res_counter_soft_limit_excess equal to that target
>> value. The current soft limit mechanism provides a clue of WHERE to
>> reclaim pages when there is memory pressure, it doesn't change the
>> reclaim target as it was before.
>>
>
> Decrement in sc->nr_to_reclaim was tried in another patch, you already saw it.
>
>> Overreclaim a cgroup under its softlimit is bad, but we should be
>> careful not introducing side effect before providing the guarantee.
>
> Yes 8-)
>
>> Here, the should_continue_reclaim() has logic of freeing a bit more
>> order-0 pages for compaction. The logic got changed after this.
>>
>
> Compaction is to increase the successful rate of THP allocation, and in turn
> to back up higher performance. In soft limit, performance guarantee is not
> extra request but treated with less care.
>
> Which one you prefer, compaction or guarantee?

The compaction is something we already supporting, while the softlimit
implementation is a new design. I would say that we need to guarantee
no regression introduced by any new code.

--Ying

> Thanks
> Hillf

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: check mem cgroup over reclaimed
  2012-01-24 23:33         ` Ying Han
@ 2012-01-26  9:16           ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 12+ messages in thread
From: KAMEZAWA Hiroyuki @ 2012-01-26  9:16 UTC (permalink / raw)
  To: Ying Han
  Cc: Johannes Weiner, Hillf Danton, linux-mm, Michal Hocko,
	Hugh Dickins, Andrew Morton, LKML

On Tue, 24 Jan 2012 15:33:11 -0800
Ying Han <yinghan@google.com> wrote:

> On Tue, Jan 24, 2012 at 1:08 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Tue, 24 Jan 2012 09:33:47 +0100
> > Johannes Weiner <hannes@cmpxchg.org> wrote:
> >
> >> On Mon, Jan 23, 2012 at 08:30:42PM +0800, Hillf Danton wrote:
> >> > On Mon, Jan 23, 2012 at 6:47 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> >> > > On Mon, Jan 23, 2012 at 09:55:07AM +0800, Hillf Danton wrote:
> >> > >> To avoid reduction in performance of reclaimee, checking overreclaim is added
> >> > >> after shrinking lru list, when pages are reclaimed from mem cgroup.
> >> > >>
> >> > >> If over reclaim occurs, shrinking remaining lru lists is skipped, and no more
> >> > >> reclaim for reclaim/compaction.
> >> > >>
> >> > >> Signed-off-by: Hillf Danton <dhillf@gmail.com>
> >> > >> ---
> >> > >>
> >> > >> --- a/mm/vmscan.c     Mon Jan 23 00:23:10 2012
> >> > >> +++ b/mm/vmscan.c     Mon Jan 23 09:57:20 2012
> >> > >> @@ -2086,6 +2086,7 @@ static void shrink_mem_cgroup_zone(int p
> >> > >>       unsigned long nr_reclaimed, nr_scanned;
> >> > >>       unsigned long nr_to_reclaim = sc->nr_to_reclaim;
> >> > >>       struct blk_plug plug;
> >> > >> +     bool memcg_over_reclaimed = false;
> >> > >>
> >> > >>  restart:
> >> > >>       nr_reclaimed = 0;
> >> > >> @@ -2103,6 +2104,11 @@ restart:
> >> > >>
> >> > >>                               nr_reclaimed += shrink_list(lru, nr_to_scan,
> >> > >>                                                           mz, sc, priority);
> >> > >> +
> >> > >> +                             memcg_over_reclaimed = !scanning_global_lru(mz)
> >> > >> +                                     && (nr_reclaimed >= nr_to_reclaim);
> >> > >> +                             if (memcg_over_reclaimed)
> >> > >> +                                     goto out;
> >> > >
> >> > > Since this merge window, scanning_global_lru() is always false when
> >> > > the memory controller is enabled, i.e. most common configurations and
> >> > > distribution kernels.
> >> > >
> >> > > This will with quite likely have bad effects on zone balancing,
> >> > > pressure balancing between anon/file lru etc, while you haven't shown
> >> > > that any workloads actually benefit from this.
> >> > >
> >> > Hi Johannes
> >> >
> >> > Thanks for your comment, first.
> >> >
> >> > Impact on zone balance and lru-list balance is introduced actually, but I
> >> > dont think the patch is totally responsible for the balance mentioned,
> >> > because soft limit, embedded in mem cgroup, is setup by users according to
> >> > whatever tastes they have.
> >> >
> >> > Though there is room for the patch to be fine tuned in this direction or that,
> >> > over reclaim should not be neglected entirely, but be avoided as much as we
> >> > could, or users are enforced to set up soft limit with much care not to mess
> >> > up zone balance.
> >>
> >> Overreclaim is absolutely horrible with soft limits, but I think there
> >> are more direct reasons than checking nr_to_reclaim only after a full
> >> zone scan, for example, soft limit reclaim is invoked on zones that
> >> are totally fine.
> >>
> >
> >
> > IIUC..
> >  - Because zonelist is all visited by alloc_pages(), _all_ zones in zonelist
> >   are in memory shortage.
> >  - taking care of zone/node balancing.
> >
> > I know this 'full zone scan' affects latency of alloc_pages() if the number
> > of node is big.
> 
> >
> > IMHO, in case of direct-reclaim caused by memcg's limit, we should avoid
> > full zone scan because the reclaim is not caused by any memory shortage in zonelist.
> >

This text is talking about memcg's direct reclaim scanning caused by 'limit'.


> > In case of global memory reclaim, kswapd doesn't use zonelist.
> >
> > So, only global-direct-reclaim is a problem here.
> > I think do-full-zone-scan will reduce the calls of try_to_free_pages()
> > in future and may reduce lock contention but adds a thread too much
> > penalty.
> 
> > In typical case, considering 4-node x86/64 NUMA, GFP_HIGHUSER_MOVABLE
> > allocation failure will reclaim 4*ZONE_NORMAL+ZONE_DMA32 = 160pages per scan.
> >
> > If 16-node, it will be 16*ZONE_NORMAL+ZONE_DMA32 = 544? pages per scan.
> >
> > 32pages may be too small but don't we need to have some threshold to quit
> > full-zone-scan ?
> 
> Sorry I am confused. Are we talking about doing full zonelist scanning
> within a memcg or doing anon/file lru balance within a zone? AFAIU, it
> is the later one.
> 
I'm sorry for confusing.

Above test is talking about global lru scanning, not memcg related.



> In this patch, we do early breakout (memcg_over_reclaimed) without
> finish scanning other lrus per-memcg-per-zone. I think the concern is
> what is the side effect of that ?
> 
> > Here, the topic is about softlimit reclaim. I think...
> >
> > 1. follow up for following comment(*) is required.
> > ==
> >                        nr_soft_scanned = 0;
> >                        nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone,
> >                                                sc->order, sc->gfp_mask,
> >                                                &nr_soft_scanned);
> >                        sc->nr_reclaimed += nr_soft_reclaimed;
> >                        sc->nr_scanned += nr_soft_scanned;
> >                        /* need some check for avoid more shrink_zone() */ <----(*)
> > ==
> >
> > 2. some threshold for avoinding full zone scan may be good.
> >   (But this may need deep discussion...)
> >
> > 3. About the patch, I think it will not break zone-balancing if (*) is
> >   handled in a good way.
> >
> >   This check is not good.
> >
> > +                               memcg_over_reclaimed = !scanning_global_lru(mz)
> > +                                       && (nr_reclaimed >= nr_to_reclaim);
> >
> >
> >  I like following
> >
> >  If (we-are-doing-softlimit-reclaim-for-global-direct-reclaim &&
> >      res_counter_soft_limit_excess(memcg->res))
> >       memcg_over_reclaimed = true;
> 
> This condition looks quite similar to what we've discussed on another
> thread, except that we do allow over-reclaim under softlimit after
> certain priority loop. (assume we have hard-to-reclaim memory on other
> cgroups above their softlimit)
> 

yes. I've cut this from that thread.


> There are some works needed to be done ( like reverting the rb-tree )
> on current soft limit implementation before we can even further to
> optimize it. It would be nice to settle the first part before
> everything else.
> 
Agreed.

I personally think Johannes' clean up should go first and removing
rb-tree before optimization is better.

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2012-01-26  9:17 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-23  1:55 [PATCH] mm: vmscan: check mem cgroup over reclaimed Hillf Danton
2012-01-23 10:47 ` Johannes Weiner
2012-01-23 12:30   ` Hillf Danton
2012-01-24  8:33     ` Johannes Weiner
2012-01-24  9:08       ` KAMEZAWA Hiroyuki
2012-01-24 23:33         ` Ying Han
2012-01-26  9:16           ` KAMEZAWA Hiroyuki
2012-01-23 19:04 ` Ying Han
2012-01-24  3:45   ` Hillf Danton
2012-01-24 23:22     ` Ying Han
2012-01-25  1:47       ` Hillf Danton
2012-01-25 19:20         ` Ying Han

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).