linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -repost] memcg,vmscan: do not break out targeted reclaim without reclaimed pages
@ 2013-01-03 18:09 Michal Hocko
  2013-01-03 20:24 ` Andrew Morton
  2013-01-29  8:51 ` [PATCH] mmotm: memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch fix Michal Hocko
  0 siblings, 2 replies; 6+ messages in thread
From: Michal Hocko @ 2013-01-03 18:09 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, KAMEZAWA Hiroyuki, Johannes Weiner, Tejun Heo,
	Glauber Costa, Li Zefan, Andrew Morton

Hi,
I have posted this quite some time ago
(https://lkml.org/lkml/2012/12/14/102) but it probably slipped through
---
>From 28b4e10bc3c18b82bee695b76f4bf25c03baa5f8 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Fri, 14 Dec 2012 11:12:43 +0100
Subject: [PATCH] memcg,vmscan: do not break out targeted reclaim without
 reclaimed pages

Targeted (hard resp. soft) reclaim has traditionally tried to scan one
group with decreasing priority until nr_to_reclaim (SWAP_CLUSTER_MAX
pages) is reclaimed or all priorities are exhausted. The reclaim is
then retried until the limit is met.

This approach, however, doesn't work well with deeper hierarchies where
groups higher in the hierarchy do not have any or only very few pages
(this usually happens if those groups do not have any tasks and they
have only re-parented pages after some of their children is removed).
Those groups are reclaimed with decreasing priority pointlessly as there
is nothing to reclaim from them.

An easiest fix is to break out of the memcg iteration loop in shrink_zone
only if the whole hierarchy has been visited or sufficient pages have
been reclaimed. This is also more natural because the reclaimer expects
that the hierarchy under the given root is reclaimed. As a result we can
simplify the soft limit reclaim which does its own iteration.

Reported-by: Ying Han <yinghan@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
---
 mm/vmscan.c |   19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 916afbc..bfd41a6 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1973,18 +1973,17 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
 			shrink_lruvec(lruvec, sc);
 
 			/*
-			 * Limit reclaim has historically picked one
-			 * memcg and scanned it with decreasing
-			 * priority levels until nr_to_reclaim had
-			 * been reclaimed.  This priority cycle is
-			 * thus over after a single memcg.
-			 *
-			 * Direct reclaim and kswapd, on the other
-			 * hand, have to scan all memory cgroups to
-			 * fulfill the overall scan target for the
+			 * Direct reclaim and kswapd have to scan all memory
+			 * cgroups to fulfill the overall scan target for the
 			 * zone.
+			 *
+			 * Limit reclaim, on the other hand, only cares about
+			 * nr_to_reclaim pages to be reclaimed and it will
+			 * retry with decreasing priority if one round over the
+			 * whole hierarchy is not sufficient.
 			 */
-			if (!global_reclaim(sc)) {
+			if (!global_reclaim(sc) &&
+					sc->nr_to_reclaim >= sc->nr_reclaimed) {
 				mem_cgroup_iter_break(root, memcg);
 				break;
 			}
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH -repost] memcg,vmscan: do not break out targeted reclaim without reclaimed pages
  2013-01-03 18:09 [PATCH -repost] memcg,vmscan: do not break out targeted reclaim without reclaimed pages Michal Hocko
@ 2013-01-03 20:24 ` Andrew Morton
  2013-01-04 14:52   ` Michal Hocko
  2013-01-29  8:51 ` [PATCH] mmotm: memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch fix Michal Hocko
  1 sibling, 1 reply; 6+ messages in thread
From: Andrew Morton @ 2013-01-03 20:24 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, linux-kernel, KAMEZAWA Hiroyuki, Johannes Weiner,
	Tejun Heo, Glauber Costa, Li Zefan

On Thu, 3 Jan 2013 19:09:01 +0100
Michal Hocko <mhocko@suse.cz> wrote:

> Hi,
> I have posted this quite some time ago
> (https://lkml.org/lkml/2012/12/14/102) but it probably slipped through
> ---
> >From 28b4e10bc3c18b82bee695b76f4bf25c03baa5f8 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.cz>
> Date: Fri, 14 Dec 2012 11:12:43 +0100
> Subject: [PATCH] memcg,vmscan: do not break out targeted reclaim without
>  reclaimed pages
> 
> Targeted (hard resp. soft) reclaim has traditionally tried to scan one
> group with decreasing priority until nr_to_reclaim (SWAP_CLUSTER_MAX
> pages) is reclaimed or all priorities are exhausted. The reclaim is
> then retried until the limit is met.
> 
> This approach, however, doesn't work well with deeper hierarchies where
> groups higher in the hierarchy do not have any or only very few pages
> (this usually happens if those groups do not have any tasks and they
> have only re-parented pages after some of their children is removed).
> Those groups are reclaimed with decreasing priority pointlessly as there
> is nothing to reclaim from them.
> 
> An easiest fix is to break out of the memcg iteration loop in shrink_zone
> only if the whole hierarchy has been visited or sufficient pages have
> been reclaimed. This is also more natural because the reclaimer expects
> that the hierarchy under the given root is reclaimed. As a result we can
> simplify the soft limit reclaim which does its own iteration.
> 
> Reported-by: Ying Han <yinghan@google.com>

But what was in that report?

My guess would be "excessive CPU consumption", and perhaps "excessive
reclaim in the higher-level memcgs".

IOW, what are the user-visible effects of this change?

(And congrats - you're the first person I've sent that sentence to this
year!  But not, I fear, the last)


I don't really understand what prevents limit reclaim from stealing
lots of pages from the top-level groups.  How do we ensure
balancing/fairness in this case?


> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1973,18 +1973,17 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)

shrink_zone() might be getting a bit bloaty for CONFIG_MEMCG=n kernels.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH -repost] memcg,vmscan: do not break out targeted reclaim without reclaimed pages
  2013-01-03 20:24 ` Andrew Morton
@ 2013-01-04 14:52   ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2013-01-04 14:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, KAMEZAWA Hiroyuki, Johannes Weiner,
	Tejun Heo, Glauber Costa, Li Zefan

On Thu 03-01-13 12:24:04, Andrew Morton wrote:
> On Thu, 3 Jan 2013 19:09:01 +0100
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Hi,
> > I have posted this quite some time ago
> > (https://lkml.org/lkml/2012/12/14/102) but it probably slipped through
> > ---
> > >From 28b4e10bc3c18b82bee695b76f4bf25c03baa5f8 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.cz>
> > Date: Fri, 14 Dec 2012 11:12:43 +0100
> > Subject: [PATCH] memcg,vmscan: do not break out targeted reclaim without
> >  reclaimed pages
> > 
> > Targeted (hard resp. soft) reclaim has traditionally tried to scan one
> > group with decreasing priority until nr_to_reclaim (SWAP_CLUSTER_MAX
> > pages) is reclaimed or all priorities are exhausted. The reclaim is
> > then retried until the limit is met.
> > 
> > This approach, however, doesn't work well with deeper hierarchies where
> > groups higher in the hierarchy do not have any or only very few pages
> > (this usually happens if those groups do not have any tasks and they
> > have only re-parented pages after some of their children is removed).
> > Those groups are reclaimed with decreasing priority pointlessly as there
> > is nothing to reclaim from them.
> > 
> > An easiest fix is to break out of the memcg iteration loop in shrink_zone
> > only if the whole hierarchy has been visited or sufficient pages have
> > been reclaimed. This is also more natural because the reclaimer expects
> > that the hierarchy under the given root is reclaimed. As a result we can
> > simplify the soft limit reclaim which does its own iteration.
> > 
> > Reported-by: Ying Han <yinghan@google.com>
> 
> But what was in that report?

Well, Ying was complaining that targeted reclaim differs a lot from
the global one regarding iteration because we reclaim each group in at
the same priority because falling to lower one which targeted reclaim
hammers one group for each zone and priority first before it visits a
next group.
More on that here: https://lkml.org/lkml/2012/12/13/712

> My guess would be "excessive CPU consumption", and perhaps "excessive
> reclaim in the higher-level memcgs".
> IOW, what are the user-visible effects of this change?

I would expect a smaller CPU consumption because save a lot of
per-zone-in-zonelist * per-prio loops without any useful work to do with
deeper hierarchies. I haven't measured that though. I do not expect the
win would be huge because shrink_lruvec should be mostly noop for groups
without any pages (after Johannes recent changes in that area). The
patch is more trying to make it more consistent with the global reclaim.

> (And congrats - you're the first person I've sent that sentence to this
> year!  But not, I fear, the last)
>
> I don't really understand what prevents limit reclaim from stealing
> lots of pages from the top-level groups.  How do we ensure
> balancing/fairness in this case?

All groups in a hierarchy (with root_mem_cgroup for the global
reclaim) are visited in the round robin fashion (per-node
per-zone per-priority).  Last visited group is cached in iterator
(mem_cgroup_per_zone::mem_cgroup_reclaim_iter) and the following reclaim
will start with the next one.
Over reclaim is not an issue because of nr_to_reclaim checks both up the
way in do_try_to_free_pages and shrink_lruvec which make sure we back
off after SWAP_CLUSTER_MAX have been reclaimed.

> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1973,18 +1973,17 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
> 
> shrink_zone() might be getting a bit bloaty for CONFIG_MEMCG=n kernels.

Most of the code should be compiled out because global_reclaim is
compile time true and mem_cgroup_iter returns NULL right away. So all we
do is we waste mem_cgroup_reclaim_cookie on the stack but I guess gcc
should be able to optimize that one out as well (mine does that).

If you are talking about uncompiled code then yes it is getting messy.
Maybe we want to have something like mem_cgroup_shrink_zone and keep
only:
	do {
		mem_cgroup_shrink_zone();
	} while (should_continue_reclaim(...));

here. Dunno if that is a huge win though.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] mmotm: memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch fix
  2013-01-03 18:09 [PATCH -repost] memcg,vmscan: do not break out targeted reclaim without reclaimed pages Michal Hocko
  2013-01-03 20:24 ` Andrew Morton
@ 2013-01-29  8:51 ` Michal Hocko
  2013-01-30 16:22   ` Johannes Weiner
  1 sibling, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2013-01-29  8:51 UTC (permalink / raw)
  To: Ying Han, Andrew Morton
  Cc: linux-kernel, KAMEZAWA Hiroyuki, Johannes Weiner, Tejun Heo,
	Glauber Costa, linux-mm, Li Zefan, Andrew Morton

Ying has noticed me (via private email) that the patch is bogus because
the break out condition is incorrect. She said she would post a fix
but she's been probably too busy. If she doesn't oppose, could you add
the follow up fix, please?

I am really sorry about this mess.
---
>From 6d23b59e96b8173fae2d0d397cb5e99f16899874 Mon Sep 17 00:00:00 2001
From: Ying Han <yinghan@google.com>
Date: Tue, 29 Jan 2013 09:42:28 +0100
Subject: [PATCH] mmotm:
 memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch
 fix

We should break out of the hierarchy loop only if nr_reclaimed exceeded
nr_to_reclaim and not vice-versa. This patch fixes the condition.

Signed-off-by: Ying Han <yinghan@google.com>
---
 mm/vmscan.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d75c1ec..7528eae 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1985,7 +1985,7 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
 			 * whole hierarchy is not sufficient.
 			 */
 			if (!global_reclaim(sc) &&
-					sc->nr_to_reclaim >= sc->nr_reclaimed) {
+					sc->nr_to_reclaim <= sc->nr_reclaimed) {
 				mem_cgroup_iter_break(root, memcg);
 				break;
 			}
-- 
1.7.10.4

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] mmotm: memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch fix
  2013-01-29  8:51 ` [PATCH] mmotm: memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch fix Michal Hocko
@ 2013-01-30 16:22   ` Johannes Weiner
  2013-01-30 16:37     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Johannes Weiner @ 2013-01-30 16:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Ying Han, Andrew Morton, linux-kernel, KAMEZAWA Hiroyuki,
	Tejun Heo, Glauber Costa, linux-mm, Li Zefan

On Tue, Jan 29, 2013 at 09:51:04AM +0100, Michal Hocko wrote:
> Ying has noticed me (via private email) that the patch is bogus because
> the break out condition is incorrect. She said she would post a fix
> but she's been probably too busy. If she doesn't oppose, could you add
> the follow up fix, please?
> 
> I am really sorry about this mess.
> ---
> >From 6d23b59e96b8173fae2d0d397cb5e99f16899874 Mon Sep 17 00:00:00 2001
> From: Ying Han <yinghan@google.com>
> Date: Tue, 29 Jan 2013 09:42:28 +0100
> Subject: [PATCH] mmotm:
>  memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch
>  fix
> 
> We should break out of the hierarchy loop only if nr_reclaimed exceeded
> nr_to_reclaim and not vice-versa. This patch fixes the condition.
> 
> Signed-off-by: Ying Han <yinghan@google.com>
> ---
>  mm/vmscan.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index d75c1ec..7528eae 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1985,7 +1985,7 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
>  			 * whole hierarchy is not sufficient.
>  			 */
>  			if (!global_reclaim(sc) &&
> -					sc->nr_to_reclaim >= sc->nr_reclaimed) {
> +					sc->nr_to_reclaim <= sc->nr_reclaimed) {

This is just a really weird ordering of the operands, isn't it?  You
compare the constant to the variable, like if (42 == foo->nr_pages).

    if (sc->nr_reclaimed >= sc->nr_to_reclaim)

would be less surprising.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] mmotm: memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch fix
  2013-01-30 16:22   ` Johannes Weiner
@ 2013-01-30 16:37     ` Michal Hocko
  0 siblings, 0 replies; 6+ messages in thread
From: Michal Hocko @ 2013-01-30 16:37 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Ying Han, Andrew Morton, linux-kernel, KAMEZAWA Hiroyuki,
	Tejun Heo, Glauber Costa, linux-mm, Li Zefan

On Wed 30-01-13 11:22:57, Johannes Weiner wrote:
> On Tue, Jan 29, 2013 at 09:51:04AM +0100, Michal Hocko wrote:
> > Ying has noticed me (via private email) that the patch is bogus because
> > the break out condition is incorrect. She said she would post a fix
> > but she's been probably too busy. If she doesn't oppose, could you add
> > the follow up fix, please?
> > 
> > I am really sorry about this mess.
> > ---
> > >From 6d23b59e96b8173fae2d0d397cb5e99f16899874 Mon Sep 17 00:00:00 2001
> > From: Ying Han <yinghan@google.com>
> > Date: Tue, 29 Jan 2013 09:42:28 +0100
> > Subject: [PATCH] mmotm:
> >  memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch
> >  fix
> > 
> > We should break out of the hierarchy loop only if nr_reclaimed exceeded
> > nr_to_reclaim and not vice-versa. This patch fixes the condition.
> > 
> > Signed-off-by: Ying Han <yinghan@google.com>
> > ---
> >  mm/vmscan.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index d75c1ec..7528eae 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1985,7 +1985,7 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc)
> >  			 * whole hierarchy is not sufficient.
> >  			 */
> >  			if (!global_reclaim(sc) &&
> > -					sc->nr_to_reclaim >= sc->nr_reclaimed) {
> > +					sc->nr_to_reclaim <= sc->nr_reclaimed) {
> 
> This is just a really weird ordering of the operands, isn't it?  You
> compare the constant to the variable, like if (42 == foo->nr_pages).
> 
>     if (sc->nr_reclaimed >= sc->nr_to_reclaim)
> 
> would be less surprising.

No objections from me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-01-30 16:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-03 18:09 [PATCH -repost] memcg,vmscan: do not break out targeted reclaim without reclaimed pages Michal Hocko
2013-01-03 20:24 ` Andrew Morton
2013-01-04 14:52   ` Michal Hocko
2013-01-29  8:51 ` [PATCH] mmotm: memcgvmscan-do-not-break-out-targeted-reclaim-without-reclaimed-pages.patch fix Michal Hocko
2013-01-30 16:22   ` Johannes Weiner
2013-01-30 16:37     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).