[PATCH 0/2] Eliminate hangs when using frequent high-order allocations V4

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Eliminate hangs when using frequent high-order allocations V4
@ 2011-05-23  9:53 ` Mel Gorman
  0 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2011-05-23  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable, Mel Gorman

(Resending as the updated patch 2 appears to have gotten lost in a
"twisty maze of threads all similar" while questing towards mmotm)

Changelog since V3
  o cond_resched in shrink_slab when it does nothing rather than
    having kswapd sleep for HZ/10 when it needs to schedule

Changelog since V2
  o Drop all SLUB latency-reducing patches.

Changelog since V1
  o kswapd should sleep if need_resched
  o Remove __GFP_REPEAT from GFP flags when speculatively using high
    orders so direct/compaction exits earlier
  o Remove __GFP_NORETRY for correctness
  o Correct logic in sleeping_prematurely
  o Leave SLUB using the default slub_max_order

There are a few reports of people experiencing hangs when copying
large amounts of data with kswapd using a large amount of CPU which
appear to be due to recent reclaim changes. SLUB using high orders
is the trigger but not the root cause as SLUB has been using high
orders for a while. The root cause was bugs introduced into reclaim
which are addressed by the following two patches.

Patch 1 corrects logic introduced by commit [1741c877: mm:
	kswapd: keep kswapd awake for high-order allocations until
	a percentage of the node is balanced] to allow kswapd to
	go to sleep when balanced for high orders.

Patch 2 notes that it is possible for kswapd to miss every
	cond_resched() and updates shrink_slab() so it'll at least
	reach that scheduling point.

Chris Wood reports that these two patches in isolation are sufficient
to prevent the system hanging. AFAIK, they should also resolve similar
hangs experienced by James Bottomley.

These should be also considered for -stable for both 2.6.38 and 2.6.39.

-- 
1.7.3.4

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/2] Eliminate hangs when using frequent high-order allocations V4
@ 2011-05-23  9:53 ` Mel Gorman
  0 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2011-05-23  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable, Mel Gorman

(Resending as the updated patch 2 appears to have gotten lost in a
"twisty maze of threads all similar" while questing towards mmotm)

Changelog since V3
  o cond_resched in shrink_slab when it does nothing rather than
    having kswapd sleep for HZ/10 when it needs to schedule

Changelog since V2
  o Drop all SLUB latency-reducing patches.

Changelog since V1
  o kswapd should sleep if need_resched
  o Remove __GFP_REPEAT from GFP flags when speculatively using high
    orders so direct/compaction exits earlier
  o Remove __GFP_NORETRY for correctness
  o Correct logic in sleeping_prematurely
  o Leave SLUB using the default slub_max_order

There are a few reports of people experiencing hangs when copying
large amounts of data with kswapd using a large amount of CPU which
appear to be due to recent reclaim changes. SLUB using high orders
is the trigger but not the root cause as SLUB has been using high
orders for a while. The root cause was bugs introduced into reclaim
which are addressed by the following two patches.

Patch 1 corrects logic introduced by commit [1741c877: mm:
	kswapd: keep kswapd awake for high-order allocations until
	a percentage of the node is balanced] to allow kswapd to
	go to sleep when balanced for high orders.

Patch 2 notes that it is possible for kswapd to miss every
	cond_resched() and updates shrink_slab() so it'll at least
	reach that scheduling point.

Chris Wood reports that these two patches in isolation are sufficient
to prevent the system hanging. AFAIK, they should also resolve similar
hangs experienced by James Bottomley.

These should be also considered for -stable for both 2.6.38 and 2.6.39.

-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm: vmscan: Correct use of pgdat_balanced in sleeping_prematurely
  2011-05-23  9:53 ` Mel Gorman
@ 2011-05-23  9:53   ` Mel Gorman
  -1 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2011-05-23  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable, Mel Gorman

From: Johannes Weiner <hannes@cmpxchg.org>

Johannes Weiner poined out that the logic in commit [1741c877: mm:
kswapd: keep kswapd awake for high-order allocations until a percentage
of the node is balanced] is backwards. Instead of allowing kswapd to go
to sleep when balancing for high order allocations, it keeps it kswapd
running uselessly.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8bfd450..1aa262b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2286,7 +2286,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
 	 * must be balanced
 	 */
 	if (order)
-		return pgdat_balanced(pgdat, balanced, classzone_idx);
+		return !pgdat_balanced(pgdat, balanced, classzone_idx);
 	else
 		return !all_zones_ok;
 }
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 1/2] mm: vmscan: Correct use of pgdat_balanced in sleeping_prematurely
@ 2011-05-23  9:53   ` Mel Gorman
  0 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2011-05-23  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable, Mel Gorman

From: Johannes Weiner <hannes@cmpxchg.org>

Johannes Weiner poined out that the logic in commit [1741c877: mm:
kswapd: keep kswapd awake for high-order allocations until a percentage
of the node is balanced] is backwards. Instead of allowing kswapd to go
to sleep when balancing for high order allocations, it keeps it kswapd
running uselessly.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
---
 mm/vmscan.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8bfd450..1aa262b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2286,7 +2286,7 @@ static bool sleeping_prematurely(pg_data_t *pgdat, int order, long remaining,
 	 * must be balanced
 	 */
 	if (order)
-		return pgdat_balanced(pgdat, balanced, classzone_idx);
+		return !pgdat_balanced(pgdat, balanced, classzone_idx);
 	else
 		return !all_zones_ok;
 }
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab
  2011-05-23  9:53 ` Mel Gorman
@ 2011-05-23  9:53   ` Mel Gorman
  -1 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2011-05-23  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable, Mel Gorman

It has been reported on some laptops that kswapd is consuming large
amounts of CPU and not being scheduled when SLUB is enabled during
large amounts of file copying. It is expected that this is due to
kswapd missing every cond_resched() point because;

shrink_page_list() calls cond_resched() if inactive pages were isolated
        which in turn may not happen if all_unreclaimable is set in
        shrink_zones(). If for whatver reason, all_unreclaimable is
        set on all zones, we can miss calling cond_resched().

balance_pgdat() only calls cond_resched if the zones are not
        balanced. For a high-order allocation that is balanced, it
        checks order-0 again. During that window, order-0 might have
        become unbalanced so it loops again for order-0 and returns
        that it was reclaiming for order-0 to kswapd(). It can then
        find that a caller has rewoken kswapd for a high-order and
        re-enters balance_pgdat() without ever calling cond_resched().

shrink_slab only calls cond_resched() if we are reclaiming slab
	pages. If there are a large number of direct reclaimers, the
	shrinker_rwsem can be contended and prevent kswapd calling
	cond_resched().

This patch modifies the shrink_slab() case. If the semaphore is
contended, the caller will still check cond_resched(). After each
successful call into a shrinker, the check for cond_resched() remains
in case one shrinker is particularly slow.

This patch replaces
mm-vmscan-if-kswapd-has-been-running-too-long-allow-it-to-sleep.patch
in -mm.

[mgorman@suse.de: Preserve call to cond_resched after each call into shrinker]
From: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1aa262b..cc1470b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -230,8 +230,11 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	if (scanned == 0)
 		scanned = SWAP_CLUSTER_MAX;
 
-	if (!down_read_trylock(&shrinker_rwsem))
-		return 1;	/* Assume we'll be able to shrink next time */
+	if (!down_read_trylock(&shrinker_rwsem)) {
+		/* Assume we'll be able to shrink next time */
+		ret = 1;
+		goto out;
+	}
 
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
@@ -282,6 +285,8 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 		shrinker->nr += total_scan;
 	}
 	up_read(&shrinker_rwsem);
+out:
+	cond_resched();
 	return ret;
 }
 
-- 
1.7.3.4


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab
@ 2011-05-23  9:53   ` Mel Gorman
  0 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2011-05-23  9:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: James Bottomley, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable, Mel Gorman

It has been reported on some laptops that kswapd is consuming large
amounts of CPU and not being scheduled when SLUB is enabled during
large amounts of file copying. It is expected that this is due to
kswapd missing every cond_resched() point because;

shrink_page_list() calls cond_resched() if inactive pages were isolated
        which in turn may not happen if all_unreclaimable is set in
        shrink_zones(). If for whatver reason, all_unreclaimable is
        set on all zones, we can miss calling cond_resched().

balance_pgdat() only calls cond_resched if the zones are not
        balanced. For a high-order allocation that is balanced, it
        checks order-0 again. During that window, order-0 might have
        become unbalanced so it loops again for order-0 and returns
        that it was reclaiming for order-0 to kswapd(). It can then
        find that a caller has rewoken kswapd for a high-order and
        re-enters balance_pgdat() without ever calling cond_resched().

shrink_slab only calls cond_resched() if we are reclaiming slab
	pages. If there are a large number of direct reclaimers, the
	shrinker_rwsem can be contended and prevent kswapd calling
	cond_resched().

This patch modifies the shrink_slab() case. If the semaphore is
contended, the caller will still check cond_resched(). After each
successful call into a shrinker, the check for cond_resched() remains
in case one shrinker is particularly slow.

This patch replaces
mm-vmscan-if-kswapd-has-been-running-too-long-allow-it-to-sleep.patch
in -mm.

[mgorman@suse.de: Preserve call to cond_resched after each call into shrinker]
From: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 1aa262b..cc1470b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -230,8 +230,11 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 	if (scanned == 0)
 		scanned = SWAP_CLUSTER_MAX;
 
-	if (!down_read_trylock(&shrinker_rwsem))
-		return 1;	/* Assume we'll be able to shrink next time */
+	if (!down_read_trylock(&shrinker_rwsem)) {
+		/* Assume we'll be able to shrink next time */
+		ret = 1;
+		goto out;
+	}
 
 	list_for_each_entry(shrinker, &shrinker_list, list) {
 		unsigned long long delta;
@@ -282,6 +285,8 @@ unsigned long shrink_slab(unsigned long scanned, gfp_t gfp_mask,
 		shrinker->nr += total_scan;
 	}
 	up_read(&shrinker_rwsem);
+out:
+	cond_resched();
 	return ret;
 }
 
-- 
1.7.3.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: Correct use of pgdat_balanced in sleeping_prematurely
  2011-05-23  9:53   ` Mel Gorman
@ 2011-05-23 15:46     ` Minchan Kim
  -1 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2011-05-23 15:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, James Bottomley, Colin King, Raghavendra D Prabhu,
	Jan Kara, Chris Mason, Christoph Lameter, Pekka Enberg,
	Rik van Riel, Johannes Weiner, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable

On Mon, May 23, 2011 at 6:53 PM, Mel Gorman <mgorman@suse.de> wrote:
> From: Johannes Weiner <hannes@cmpxchg.org>
>
> Johannes Weiner poined out that the logic in commit [1741c877: mm:
> kswapd: keep kswapd awake for high-order allocations until a percentage
> of the node is balanced] is backwards. Instead of allowing kswapd to go
> to sleep when balancing for high order allocations, it keeps it kswapd
> running uselessly.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: Correct use of pgdat_balanced in sleeping_prematurely
@ 2011-05-23 15:46     ` Minchan Kim
  0 siblings, 0 replies; 14+ messages in thread
From: Minchan Kim @ 2011-05-23 15:46 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, James Bottomley, Colin King, Raghavendra D Prabhu,
	Jan Kara, Chris Mason, Christoph Lameter, Pekka Enberg,
	Rik van Riel, Johannes Weiner, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable

On Mon, May 23, 2011 at 6:53 PM, Mel Gorman <mgorman@suse.de> wrote:
> From: Johannes Weiner <hannes@cmpxchg.org>
>
> Johannes Weiner poined out that the logic in commit [1741c877: mm:
> kswapd: keep kswapd awake for high-order allocations until a percentage
> of the node is balanced] is backwards. Instead of allowing kswapd to go
> to sleep when balancing for high order allocations, it keeps it kswapd
> running uselessly.
>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>


-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab
  2011-05-23  9:53   ` Mel Gorman
@ 2011-05-23 20:03     ` Andrew Morton
  -1 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2011-05-23 20:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: James Bottomley, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable

On Mon, 23 May 2011 10:53:55 +0100
Mel Gorman <mgorman@suse.de> wrote:

> It has been reported on some laptops that kswapd is consuming large
> amounts of CPU and not being scheduled when SLUB is enabled during
> large amounts of file copying. It is expected that this is due to
> kswapd missing every cond_resched() point because;
> 
> shrink_page_list() calls cond_resched() if inactive pages were isolated
>         which in turn may not happen if all_unreclaimable is set in
>         shrink_zones(). If for whatver reason, all_unreclaimable is
>         set on all zones, we can miss calling cond_resched().
> 
> balance_pgdat() only calls cond_resched if the zones are not
>         balanced. For a high-order allocation that is balanced, it
>         checks order-0 again. During that window, order-0 might have
>         become unbalanced so it loops again for order-0 and returns
>         that it was reclaiming for order-0 to kswapd(). It can then
>         find that a caller has rewoken kswapd for a high-order and
>         re-enters balance_pgdat() without ever calling cond_resched().
> 
> shrink_slab only calls cond_resched() if we are reclaiming slab
> 	pages. If there are a large number of direct reclaimers, the
> 	shrinker_rwsem can be contended and prevent kswapd calling
> 	cond_resched().
> 
> This patch modifies the shrink_slab() case. If the semaphore is
> contended, the caller will still check cond_resched(). After each
> successful call into a shrinker, the check for cond_resched() remains
> in case one shrinker is particularly slow.

So CONFIG_PREEMPT=y kernels don't exhibit this problem?

I'm still unconvinced that we know what's going on here.  What's kswapd
*doing* with all those cycles?  And if kswapd is now scheduling away,
who is doing that work instead?  Direct reclaim?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab
@ 2011-05-23 20:03     ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2011-05-23 20:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: James Bottomley, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable

On Mon, 23 May 2011 10:53:55 +0100
Mel Gorman <mgorman@suse.de> wrote:

> It has been reported on some laptops that kswapd is consuming large
> amounts of CPU and not being scheduled when SLUB is enabled during
> large amounts of file copying. It is expected that this is due to
> kswapd missing every cond_resched() point because;
> 
> shrink_page_list() calls cond_resched() if inactive pages were isolated
>         which in turn may not happen if all_unreclaimable is set in
>         shrink_zones(). If for whatver reason, all_unreclaimable is
>         set on all zones, we can miss calling cond_resched().
> 
> balance_pgdat() only calls cond_resched if the zones are not
>         balanced. For a high-order allocation that is balanced, it
>         checks order-0 again. During that window, order-0 might have
>         become unbalanced so it loops again for order-0 and returns
>         that it was reclaiming for order-0 to kswapd(). It can then
>         find that a caller has rewoken kswapd for a high-order and
>         re-enters balance_pgdat() without ever calling cond_resched().
> 
> shrink_slab only calls cond_resched() if we are reclaiming slab
> 	pages. If there are a large number of direct reclaimers, the
> 	shrinker_rwsem can be contended and prevent kswapd calling
> 	cond_resched().
> 
> This patch modifies the shrink_slab() case. If the semaphore is
> contended, the caller will still check cond_resched(). After each
> successful call into a shrinker, the check for cond_resched() remains
> in case one shrinker is particularly slow.

So CONFIG_PREEMPT=y kernels don't exhibit this problem?

I'm still unconvinced that we know what's going on here.  What's kswapd
*doing* with all those cycles?  And if kswapd is now scheduling away,
who is doing that work instead?  Direct reclaim?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab
  2011-05-23 20:03     ` Andrew Morton
@ 2011-05-23 20:07       ` James Bottomley
  -1 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2011-05-23 20:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable

On Mon, 2011-05-23 at 13:03 -0700, Andrew Morton wrote:
> On Mon, 23 May 2011 10:53:55 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > It has been reported on some laptops that kswapd is consuming large
> > amounts of CPU and not being scheduled when SLUB is enabled during
> > large amounts of file copying. It is expected that this is due to
> > kswapd missing every cond_resched() point because;
> > 
> > shrink_page_list() calls cond_resched() if inactive pages were isolated
> >         which in turn may not happen if all_unreclaimable is set in
> >         shrink_zones(). If for whatver reason, all_unreclaimable is
> >         set on all zones, we can miss calling cond_resched().
> > 
> > balance_pgdat() only calls cond_resched if the zones are not
> >         balanced. For a high-order allocation that is balanced, it
> >         checks order-0 again. During that window, order-0 might have
> >         become unbalanced so it loops again for order-0 and returns
> >         that it was reclaiming for order-0 to kswapd(). It can then
> >         find that a caller has rewoken kswapd for a high-order and
> >         re-enters balance_pgdat() without ever calling cond_resched().
> > 
> > shrink_slab only calls cond_resched() if we are reclaiming slab
> > 	pages. If there are a large number of direct reclaimers, the
> > 	shrinker_rwsem can be contended and prevent kswapd calling
> > 	cond_resched().
> > 
> > This patch modifies the shrink_slab() case. If the semaphore is
> > contended, the caller will still check cond_resched(). After each
> > successful call into a shrinker, the check for cond_resched() remains
> > in case one shrinker is particularly slow.
> 
> So CONFIG_PREEMPT=y kernels don't exhibit this problem?

Yes, they do.  They just don't hang on my sandybridge system in the same
way than non-PREEMPT kernels do.  I'm still sure it's got something to
do with rescheduling kswapd onto a different CPU ...

> I'm still unconvinced that we know what's going on here.  What's kswapd
> *doing* with all those cycles?  And if kswapd is now scheduling away,
> who is doing that work instead?  Direct reclaim?

Still in the dark about this one, too.

James



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab
@ 2011-05-23 20:07       ` James Bottomley
  0 siblings, 0 replies; 14+ messages in thread
From: James Bottomley @ 2011-05-23 20:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable

On Mon, 2011-05-23 at 13:03 -0700, Andrew Morton wrote:
> On Mon, 23 May 2011 10:53:55 +0100
> Mel Gorman <mgorman@suse.de> wrote:
> 
> > It has been reported on some laptops that kswapd is consuming large
> > amounts of CPU and not being scheduled when SLUB is enabled during
> > large amounts of file copying. It is expected that this is due to
> > kswapd missing every cond_resched() point because;
> > 
> > shrink_page_list() calls cond_resched() if inactive pages were isolated
> >         which in turn may not happen if all_unreclaimable is set in
> >         shrink_zones(). If for whatver reason, all_unreclaimable is
> >         set on all zones, we can miss calling cond_resched().
> > 
> > balance_pgdat() only calls cond_resched if the zones are not
> >         balanced. For a high-order allocation that is balanced, it
> >         checks order-0 again. During that window, order-0 might have
> >         become unbalanced so it loops again for order-0 and returns
> >         that it was reclaiming for order-0 to kswapd(). It can then
> >         find that a caller has rewoken kswapd for a high-order and
> >         re-enters balance_pgdat() without ever calling cond_resched().
> > 
> > shrink_slab only calls cond_resched() if we are reclaiming slab
> > 	pages. If there are a large number of direct reclaimers, the
> > 	shrinker_rwsem can be contended and prevent kswapd calling
> > 	cond_resched().
> > 
> > This patch modifies the shrink_slab() case. If the semaphore is
> > contended, the caller will still check cond_resched(). After each
> > successful call into a shrinker, the check for cond_resched() remains
> > in case one shrinker is particularly slow.
> 
> So CONFIG_PREEMPT=y kernels don't exhibit this problem?

Yes, they do.  They just don't hang on my sandybridge system in the same
way than non-PREEMPT kernels do.  I'm still sure it's got something to
do with rescheduling kswapd onto a different CPU ...

> I'm still unconvinced that we know what's going on here.  What's kswapd
> *doing* with all those cycles?  And if kswapd is now scheduling away,
> who is doing that work instead?  Direct reclaim?

Still in the dark about this one, too.

James


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab
  2011-05-23 20:07       ` James Bottomley
@ 2011-05-24  9:21         ` Mel Gorman
  -1 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2011-05-24  9:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable

On Tue, May 24, 2011 at 12:07:36AM +0400, James Bottomley wrote:
> On Mon, 2011-05-23 at 13:03 -0700, Andrew Morton wrote:
> > On Mon, 23 May 2011 10:53:55 +0100
> > Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > It has been reported on some laptops that kswapd is consuming large
> > > amounts of CPU and not being scheduled when SLUB is enabled during
> > > large amounts of file copying. It is expected that this is due to
> > > kswapd missing every cond_resched() point because;
> > > 
> > > shrink_page_list() calls cond_resched() if inactive pages were isolated
> > >         which in turn may not happen if all_unreclaimable is set in
> > >         shrink_zones(). If for whatver reason, all_unreclaimable is
> > >         set on all zones, we can miss calling cond_resched().
> > > 
> > > balance_pgdat() only calls cond_resched if the zones are not
> > >         balanced. For a high-order allocation that is balanced, it
> > >         checks order-0 again. During that window, order-0 might have
> > >         become unbalanced so it loops again for order-0 and returns
> > >         that it was reclaiming for order-0 to kswapd(). It can then
> > >         find that a caller has rewoken kswapd for a high-order and
> > >         re-enters balance_pgdat() without ever calling cond_resched().
> > > 
> > > shrink_slab only calls cond_resched() if we are reclaiming slab
> > > 	pages. If there are a large number of direct reclaimers, the
> > > 	shrinker_rwsem can be contended and prevent kswapd calling
> > > 	cond_resched().
> > > 
> > > This patch modifies the shrink_slab() case. If the semaphore is
> > > contended, the caller will still check cond_resched(). After each
> > > successful call into a shrinker, the check for cond_resched() remains
> > > in case one shrinker is particularly slow.
> > 
> > So CONFIG_PREEMPT=y kernels don't exhibit this problem?
> 
> Yes, they do.  They just don't hang on my sandybridge system in the same
> way than non-PREEMPT kernels do.  I'm still sure it's got something to
> do with rescheduling kswapd onto a different CPU ...
> 
> > I'm still unconvinced that we know what's going on here.  What's kswapd
> > *doing* with all those cycles?  And if kswapd is now scheduling away,
> > who is doing that work instead?  Direct reclaim?
> 
> Still in the dark about this one, too.
> 

I still very strongly suspect that what gets us into this situation
is all_unreclaiable being set when there are a large bunch of dirty
pages together in the LRU pushing up the scanning rates high enough
after slab is shrunk as far as they can be at this time. Without
a local reproduction case, I'm undecided as to how this should be
investigated other than sticking in printks when all_unreclaimable
is set that outputs the number of LRU pages - anon, file and dirty
(even though this information in itself will be incomplete) and see
what falls out. I'm trying to borrow a similar laptop but haven't
found someone with a similar model yet in the locality.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab
@ 2011-05-24  9:21         ` Mel Gorman
  0 siblings, 0 replies; 14+ messages in thread
From: Mel Gorman @ 2011-05-24  9:21 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Colin King, Raghavendra D Prabhu, Jan Kara,
	Chris Mason, Christoph Lameter, Pekka Enberg, Rik van Riel,
	Johannes Weiner, Minchan Kim, linux-fsdevel, linux-mm,
	linux-kernel, linux-ext4, stable

On Tue, May 24, 2011 at 12:07:36AM +0400, James Bottomley wrote:
> On Mon, 2011-05-23 at 13:03 -0700, Andrew Morton wrote:
> > On Mon, 23 May 2011 10:53:55 +0100
> > Mel Gorman <mgorman@suse.de> wrote:
> > 
> > > It has been reported on some laptops that kswapd is consuming large
> > > amounts of CPU and not being scheduled when SLUB is enabled during
> > > large amounts of file copying. It is expected that this is due to
> > > kswapd missing every cond_resched() point because;
> > > 
> > > shrink_page_list() calls cond_resched() if inactive pages were isolated
> > >         which in turn may not happen if all_unreclaimable is set in
> > >         shrink_zones(). If for whatver reason, all_unreclaimable is
> > >         set on all zones, we can miss calling cond_resched().
> > > 
> > > balance_pgdat() only calls cond_resched if the zones are not
> > >         balanced. For a high-order allocation that is balanced, it
> > >         checks order-0 again. During that window, order-0 might have
> > >         become unbalanced so it loops again for order-0 and returns
> > >         that it was reclaiming for order-0 to kswapd(). It can then
> > >         find that a caller has rewoken kswapd for a high-order and
> > >         re-enters balance_pgdat() without ever calling cond_resched().
> > > 
> > > shrink_slab only calls cond_resched() if we are reclaiming slab
> > > 	pages. If there are a large number of direct reclaimers, the
> > > 	shrinker_rwsem can be contended and prevent kswapd calling
> > > 	cond_resched().
> > > 
> > > This patch modifies the shrink_slab() case. If the semaphore is
> > > contended, the caller will still check cond_resched(). After each
> > > successful call into a shrinker, the check for cond_resched() remains
> > > in case one shrinker is particularly slow.
> > 
> > So CONFIG_PREEMPT=y kernels don't exhibit this problem?
> 
> Yes, they do.  They just don't hang on my sandybridge system in the same
> way than non-PREEMPT kernels do.  I'm still sure it's got something to
> do with rescheduling kswapd onto a different CPU ...
> 
> > I'm still unconvinced that we know what's going on here.  What's kswapd
> > *doing* with all those cycles?  And if kswapd is now scheduling away,
> > who is doing that work instead?  Direct reclaim?
> 
> Still in the dark about this one, too.
> 

I still very strongly suspect that what gets us into this situation
is all_unreclaiable being set when there are a large bunch of dirty
pages together in the LRU pushing up the scanning rates high enough
after slab is shrunk as far as they can be at this time. Without
a local reproduction case, I'm undecided as to how this should be
investigated other than sticking in printks when all_unreclaimable
is set that outputs the number of LRU pages - anon, file and dirty
(even though this information in itself will be incomplete) and see
what falls out. I'm trying to borrow a similar laptop but haven't
found someone with a similar model yet in the locality.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-05-24  9:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-23  9:53 [PATCH 0/2] Eliminate hangs when using frequent high-order allocations V4 Mel Gorman
2011-05-23  9:53 ` Mel Gorman
2011-05-23  9:53 ` [PATCH 1/2] mm: vmscan: Correct use of pgdat_balanced in sleeping_prematurely Mel Gorman
2011-05-23  9:53   ` Mel Gorman
2011-05-23 15:46   ` Minchan Kim
2011-05-23 15:46     ` Minchan Kim
2011-05-23  9:53 ` [PATCH 2/2] mm: vmscan: Correctly check if reclaimer should schedule during shrink_slab Mel Gorman
2011-05-23  9:53   ` Mel Gorman
2011-05-23 20:03   ` Andrew Morton
2011-05-23 20:03     ` Andrew Morton
2011-05-23 20:07     ` James Bottomley
2011-05-23 20:07       ` James Bottomley
2011-05-24  9:21       ` Mel Gorman
2011-05-24  9:21         ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.