[PATCH 0/3] Reduce GFP_ATOMIC allocation failures, partial fix V3

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] Reduce GFP_ATOMIC allocation failures, partial fix V3
@ 2009-10-27 13:40 ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC
failures. A significant number of these have been high-order GFP_ATOMIC
failures and while they are generally brushed away, there has been a large
increase in them recently and there are a number of possible areas the
problem could be in - core vm, page writeback and a specific driver. The
bugs affected by this that I am aware of are;

[Bug #14141] order 2 page allocation failures in iwlagn
[Bug #14141] order 2 page allocation failures (generic)
[Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100
[No BZ ID]   Kernel crash on 2.6.31.x (kcryptd: page allocation failure..)
[No BZ ID]   page allocation failure message kernel 2.6.31.4 (tty-related)

The three patches in this series partially address the problem. I am
proposing these for merging to mainline and -stable now to reduce the number
of duplicate bug reports. The following bug should be fixed by these patches.

[No BZ ID] page allocation failure message kernel 2.6.31.4 (tty-related)

The following bug becomes very difficult to reproduce with these patches;

[Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100

The rest of the bugs remain open.

If these patches are agreed upon, they should be also considered -stable
candidates. Patch 1 does not apply cleanly but I can supply a version
that does.

 mm/page_alloc.c |    4 ++--
 mm/vmscan.c     |    9 +++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 0/3] Reduce GFP_ATOMIC allocation failures, partial fix V3
@ 2009-10-27 13:40 ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

Since 2.6.31-rc1, there have been an increasing number of GFP_ATOMIC
failures. A significant number of these have been high-order GFP_ATOMIC
failures and while they are generally brushed away, there has been a large
increase in them recently and there are a number of possible areas the
problem could be in - core vm, page writeback and a specific driver. The
bugs affected by this that I am aware of are;

[Bug #14141] order 2 page allocation failures in iwlagn
[Bug #14141] order 2 page allocation failures (generic)
[Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100
[No BZ ID]   Kernel crash on 2.6.31.x (kcryptd: page allocation failure..)
[No BZ ID]   page allocation failure message kernel 2.6.31.4 (tty-related)

The three patches in this series partially address the problem. I am
proposing these for merging to mainline and -stable now to reduce the number
of duplicate bug reports. The following bug should be fixed by these patches.

[No BZ ID] page allocation failure message kernel 2.6.31.4 (tty-related)

The following bug becomes very difficult to reproduce with these patches;

[Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100

The rest of the bugs remain open.

If these patches are agreed upon, they should be also considered -stable
candidates. Patch 1 does not apply cleanly but I can supply a version
that does.

 mm/page_alloc.c |    4 ++--
 mm/vmscan.c     |    9 +++++++++
 2 files changed, 11 insertions(+), 2 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/3] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
  2009-10-27 13:40 ` Mel Gorman
  (?)
@ 2009-10-27 13:40   ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

If a direct reclaim makes no forward progress, it considers whether it
should go OOM or not. Whether OOM is triggered or not, it may retry the
application afterwards. In times past, this would always wake kswapd as well
but currently, kswapd is not woken up after direct reclaim fails. For order-0
allocations, this makes little difference but if there is a heavy mix of
higher-order allocations that direct reclaim is failing for, it might mean
that kswapd is not rewoken for higher orders as much as it did previously.

This patch wakes up kswapd when an allocation is being retried after a direct
reclaim failure. It would be expected that kswapd is already awake, but
this has the effect of telling kswapd to reclaim at the higher order as well.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf72055..dfa4362 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
 		goto nopage;

+restart:
 	wake_all_kswapd(order, zonelist, high_zoneidx);

-restart:
 	/*
 	 * OK, we're below the kswapd watermark and have kicked background
 	 * reclaim. Now things get more complex, so set up alloc_flags according
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 1/3] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
@ 2009-10-27 13:40   ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

If a direct reclaim makes no forward progress, it considers whether it
should go OOM or not. Whether OOM is triggered or not, it may retry the
application afterwards. In times past, this would always wake kswapd as well
but currently, kswapd is not woken up after direct reclaim fails. For order-0
allocations, this makes little difference but if there is a heavy mix of
higher-order allocations that direct reclaim is failing for, it might mean
that kswapd is not rewoken for higher orders as much as it did previously.

This patch wakes up kswapd when an allocation is being retried after a direct
reclaim failure. It would be expected that kswapd is already awake, but
this has the effect of telling kswapd to reclaim at the higher order as well.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf72055..dfa4362 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
 		goto nopage;

+restart:
 	wake_all_kswapd(order, zonelist, high_zoneidx);

-restart:
 	/*
 	 * OK, we're below the kswapd watermark and have kicked background
 	 * reclaim. Now things get more complex, so set up alloc_flags according
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 1/3] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed
@ 2009-10-27 13:40   ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

If a direct reclaim makes no forward progress, it considers whether it
should go OOM or not. Whether OOM is triggered or not, it may retry the
application afterwards. In times past, this would always wake kswapd as well
but currently, kswapd is not woken up after direct reclaim fails. For order-0
allocations, this makes little difference but if there is a heavy mix of
higher-order allocations that direct reclaim is failing for, it might mean
that kswapd is not rewoken for higher orders as much as it did previously.

This patch wakes up kswapd when an allocation is being retried after a direct
reclaim failure. It would be expected that kswapd is already awake, but
this has the effect of telling kswapd to reclaim at the higher order as well.

Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
Reviewed-by: Christoph Lameter <cl-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Reviewed-by: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bf72055..dfa4362 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1817,9 +1817,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
 		goto nopage;

+restart:
 	wake_all_kswapd(order, zonelist, high_zoneidx);

-restart:
 	/*
 	 * OK, we're below the kswapd watermark and have kicked background
 	 * reclaim. Now things get more complex, so set up alloc_flags according
-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-27 13:40 ` Mel Gorman
@ 2009-10-27 13:40   ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
slightly by allowing rt_tasks that are handling an interrupt to set
ALLOC_HARDER. This patch brings the watermark logic more in line with
2.6.30.

[rientjes@google.com: Spotted the problem]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfa4362..7f2aa3e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
 		 */
 		alloc_flags &= ~ALLOC_CPUSET;
-	} else if (unlikely(rt_task(p)))
+	} else if (unlikely(rt_task(p)) && !in_interrupt())
 		alloc_flags |= ALLOC_HARDER;
 
 	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-27 13:40   ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
slightly by allowing rt_tasks that are handling an interrupt to set
ALLOC_HARDER. This patch brings the watermark logic more in line with
2.6.30.

[rientjes@google.com: Spotted the problem]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 mm/page_alloc.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index dfa4362..7f2aa3e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
 		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
 		 */
 		alloc_flags &= ~ALLOC_CPUSET;
-	} else if (unlikely(rt_task(p)))
+	} else if (unlikely(rt_task(p)) && !in_interrupt())
 		alloc_flags |= ALLOC_HARDER;
 
 	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-10-27 13:40 ` Mel Gorman
@ 2009-10-27 13:40   ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

When a high-order allocation fails, kswapd is kicked so that it reclaims
at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
allocations. Something has changed in recent kernels that affect the timing
where high-order GFP_ATOMIC allocations are now failing with more frequency,
particularly under pressure. This patch forces kswapd to notice sooner that
high-order allocations are occuring.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/vmscan.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 64e4388..7eceb02 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2016,6 +2016,15 @@ loop_again:
 					priority != DEF_PRIORITY)
 				continue;
 
+			/*
+			 * Exit the function now and have kswapd start over
+			 * if it is known that higher orders are required
+			 */
+			if (pgdat->kswapd_max_order > order) {
+				all_zones_ok = 1;
+				goto out;
+			}
+
 			if (!zone_watermark_ok(zone, order,
 					high_wmark_pages(zone), end_zone, 0))
 				all_zones_ok = 0;
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-10-27 13:40   ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-27 13:40 UTC (permalink / raw)
  To: Andrew Morton, stable
  Cc: linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Rafael J. Wysocki,
	Kernel Testers List, Mel Gorman

When a high-order allocation fails, kswapd is kicked so that it reclaims
at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
allocations. Something has changed in recent kernels that affect the timing
where high-order GFP_ATOMIC allocations are now failing with more frequency,
particularly under pressure. This patch forces kswapd to notice sooner that
high-order allocations are occuring.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/vmscan.c |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 64e4388..7eceb02 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2016,6 +2016,15 @@ loop_again:
 					priority != DEF_PRIORITY)
 				continue;
 
+			/*
+			 * Exit the function now and have kswapd start over
+			 * if it is known that higher orders are required
+			 */
+			if (pgdat->kswapd_max_order > order) {
+				all_zones_ok = 1;
+				goto out;
+			}
+
 			if (!zone_watermark_ok(zone, order,
 					high_wmark_pages(zone), end_zone, 0))
 				all_zones_ok = 0;
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-10-27 13:40   ` Mel Gorman
                     ` (2 preceding siblings ...)
  (?)
@ 2009-10-27 18:18   ` Rik van Riel
  -1 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-10-27 18:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel,  <rjw@sisk.pl>,
	Kernel Testers List <kernel-testers@vger.kernel.org>

On 10/27/2009 09:40 AM, Mel Gorman wrote:
> When a high-order allocation fails, kswapd is kicked so that it reclaims
> at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> allocations. Something has changed in recent kernels that affect the timing
> where high-order GFP_ATOMIC allocations are now failing with more frequency,
> particularly under pressure. This patch forces kswapd to notice sooner that
> high-order allocations are occuring.
>
> Signed-off-by: Mel Gorman<mel@csn.ul.ie>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-10-27 13:40   ` Mel Gorman
  (?)
@ 2009-10-27 18:18   ` Rik van Riel
  -1 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-10-27 18:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel,  <rjw@sisk.pl>,
	Kernel Testers List <kernel-testers@vger.kernel.org>

On 10/27/2009 09:40 AM, Mel Gorman wrote:
> When a high-order allocation fails, kswapd is kicked so that it reclaims
> at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> allocations. Something has changed in recent kernels that affect the timing
> where high-order GFP_ATOMIC allocations are now failing with more frequency,
> particularly under pressure. This patch forces kswapd to notice sooner that
> high-order allocations are occuring.
>
> Signed-off-by: Mel Gorman<mel@csn.ul.ie>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
       [not found]   ` <1256650833-15516-4-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
@ 2009-10-27 18:18     ` Rik van Riel
  0 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-10-27 18:18 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	 <rjw-KKrjLPT3xs0@public.gmane.org>,
	Kernel Testers List
	<kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>

On 10/27/2009 09:40 AM, Mel Gorman wrote:
> When a high-order allocation fails, kswapd is kicked so that it reclaims
> at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> allocations. Something has changed in recent kernels that affect the timing
> where high-order GFP_ATOMIC allocations are now failing with more frequency,
> particularly under pressure. This patch forces kswapd to notice sooner that
> high-order allocations are occuring.
>
> Signed-off-by: Mel Gorman<mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>

Reviewed-by: Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-27 20:09     ` Andrew Morton
  0 siblings, 0 replies; 115+ messages in thread
From: Andrew Morton @ 2009-10-27 20:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: stable, linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Mel Gorman <mel@csn.ul.ie>

On Tue, 27 Oct 2009 13:40:32 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
> slightly by allowing rt_tasks that are handling an interrupt to set
> ALLOC_HARDER. This patch brings the watermark logic more in line with
> 2.6.30.
> 
> [rientjes@google.com: Spotted the problem]
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
> Reviewed-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  mm/page_alloc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dfa4362..7f2aa3e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
>  		 */
>  		alloc_flags &= ~ALLOC_CPUSET;
> -	} else if (unlikely(rt_task(p)))
> +	} else if (unlikely(rt_task(p)) && !in_interrupt())
>  		alloc_flags |= ALLOC_HARDER;
>  
>  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {

What are the runtime-observeable effects of this change?

The description is a bit waffly-sounding for a -stable backportable
thing, IMO.  What reason do the -stable maintainers and users have to
believe that this patch is needed, and an improvement?


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-27 20:09     ` Andrew Morton
  0 siblings, 0 replies; 115+ messages in thread
From: Andrew Morton @ 2009-10-27 20:09 UTC (permalink / raw)
  Cc: stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List
	<kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>

On Tue, 27 Oct 2009 13:40:32 +0000
Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:

> Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
> slightly by allowing rt_tasks that are handling an interrupt to set
> ALLOC_HARDER. This patch brings the watermark logic more in line with
> 2.6.30.
> 
> [rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org: Spotted the problem]
> Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
> Reviewed-by: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> Reviewed-by: Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  mm/page_alloc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dfa4362..7f2aa3e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
>  		 */
>  		alloc_flags &= ~ALLOC_CPUSET;
> -	} else if (unlikely(rt_task(p)))
> +	} else if (unlikely(rt_task(p)) && !in_interrupt())
>  		alloc_flags |= ALLOC_HARDER;
>  
>  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {

What are the runtime-observeable effects of this change?

The description is a bit waffly-sounding for a -stable backportable
thing, IMO.  What reason do the -stable maintainers and users have to
believe that this patch is needed, and an improvement?

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-27 20:09     ` Andrew Morton
  0 siblings, 0 replies; 115+ messages in thread
From: Andrew Morton @ 2009-10-27 20:09 UTC (permalink / raw)
  To: Mel Gorman
  Cc: stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List
	<kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>

On Tue, 27 Oct 2009 13:40:32 +0000
Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:

> Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
> slightly by allowing rt_tasks that are handling an interrupt to set
> ALLOC_HARDER. This patch brings the watermark logic more in line with
> 2.6.30.
> 
> [rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org: Spotted the problem]
> Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
> Reviewed-by: Pekka Enberg <penberg-bbCR+/B0CizivPeTLB3BmA@public.gmane.org>
> Reviewed-by: Rik van Riel <riel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro-+CUm20s59erQFUHtdCDX3A@public.gmane.org>
> ---
>  mm/page_alloc.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index dfa4362..7f2aa3e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
>  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
>  		 */
>  		alloc_flags &= ~ALLOC_CPUSET;
> -	} else if (unlikely(rt_task(p)))
> +	} else if (unlikely(rt_task(p)) && !in_interrupt())
>  		alloc_flags |= ALLOC_HARDER;
>  
>  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {

What are the runtime-observeable effects of this change?

The description is a bit waffly-sounding for a -stable backportable
thing, IMO.  What reason do the -stable maintainers and users have to
believe that this patch is needed, and an improvement?

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-10-27 13:40   ` Mel Gorman
@ 2009-10-27 20:19     ` Andrew Morton
  -1 siblings, 0 replies; 115+ messages in thread
From: Andrew Morton @ 2009-10-27 20:19 UTC (permalink / raw)
  To: Mel Gorman
  Cc: stable, linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Kernel Testers List,
	Mel Gorman

On Tue, 27 Oct 2009 13:40:33 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> When a high-order allocation fails, kswapd is kicked so that it reclaims
> at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> allocations. Something has changed in recent kernels that affect the timing
> where high-order GFP_ATOMIC allocations are now failing with more frequency,
> particularly under pressure. This patch forces kswapd to notice sooner that
> high-order allocations are occuring.
> 

"something has changed"?  Shouldn't we find out what that is?

> ---
>  mm/vmscan.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 64e4388..7eceb02 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2016,6 +2016,15 @@ loop_again:
>  					priority != DEF_PRIORITY)
>  				continue;
>  
> +			/*
> +			 * Exit the function now and have kswapd start over
> +			 * if it is known that higher orders are required
> +			 */
> +			if (pgdat->kswapd_max_order > order) {
> +				all_zones_ok = 1;
> +				goto out;
> +			}
> +
>  			if (!zone_watermark_ok(zone, order,
>  					high_wmark_pages(zone), end_zone, 0))
>  				all_zones_ok = 0;

So this handles the case where some concurrent thread or interrupt
increases pgdat->kswapd_max_order while kswapd was running
balance_pgdat(), yes?

Does that actually happen much?  Enough for this patch to make any
useful difference?

If one where to whack a printk in that `if' block, how often would it
trigger, and under what circumstances?


If the -stable maintainers were to ask me "why did you send this" then
right now my answer would have to be "I have no idea".  Help.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-10-27 20:19     ` Andrew Morton
  0 siblings, 0 replies; 115+ messages in thread
From: Andrew Morton @ 2009-10-27 20:19 UTC (permalink / raw)
  Cc: stable, linux-kernel, linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, Kernel Testers List,
	Mel Gorman

On Tue, 27 Oct 2009 13:40:33 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> When a high-order allocation fails, kswapd is kicked so that it reclaims
> at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> allocations. Something has changed in recent kernels that affect the timing
> where high-order GFP_ATOMIC allocations are now failing with more frequency,
> particularly under pressure. This patch forces kswapd to notice sooner that
> high-order allocations are occuring.
> 

"something has changed"?  Shouldn't we find out what that is?

> ---
>  mm/vmscan.c |    9 +++++++++
>  1 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 64e4388..7eceb02 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2016,6 +2016,15 @@ loop_again:
>  					priority != DEF_PRIORITY)
>  				continue;
>  
> +			/*
> +			 * Exit the function now and have kswapd start over
> +			 * if it is known that higher orders are required
> +			 */
> +			if (pgdat->kswapd_max_order > order) {
> +				all_zones_ok = 1;
> +				goto out;
> +			}
> +
>  			if (!zone_watermark_ok(zone, order,
>  					high_wmark_pages(zone), end_zone, 0))
>  				all_zones_ok = 0;

So this handles the case where some concurrent thread or interrupt
increases pgdat->kswapd_max_order while kswapd was running
balance_pgdat(), yes?

Does that actually happen much?  Enough for this patch to make any
useful difference?

If one where to whack a printk in that `if' block, how often would it
trigger, and under what circumstances?


If the -stable maintainers were to ask me "why did you send this" then
right now my answer would have to be "I have no idea".  Help.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-27 20:09     ` Andrew Morton
  (?)
@ 2009-10-27 21:12       ` David Rientjes
  -1 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-10-27 21:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, stable, linux-kernel, linux-mm, Frans Pop,
	Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers, Mel Gorman

On Tue, 27 Oct 2009, Andrew Morton wrote:

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index dfa4362..7f2aa3e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> >  		 */
> >  		alloc_flags &= ~ALLOC_CPUSET;
> > -	} else if (unlikely(rt_task(p)))
> > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> >  		alloc_flags |= ALLOC_HARDER;
> >  
> >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> 
> What are the runtime-observeable effects of this change?
> 

Giving rt tasks access to memory reserves is necessary to reduce latency, 
the privilege does not apply to interrupts that subsequently get run on 
the same cpu.

> The description is a bit waffly-sounding for a -stable backportable
> thing, IMO.  What reason do the -stable maintainers and users have to
> believe that this patch is needed, and an improvement?
> 

Allowing interrupts to allocate below the low watermark when not 
GFP_ATOMIC depletes memory reserves; this fixes an inconsistency 
introduced by the page allocator refactoring patchset that went into 
2.6.31.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-27 21:12       ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-10-27 21:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, stable, linux-kernel, linux-mm, Frans Pop,
	Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On Tue, 27 Oct 2009, Andrew Morton wrote:

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index dfa4362..7f2aa3e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> >  		 */
> >  		alloc_flags &= ~ALLOC_CPUSET;
> > -	} else if (unlikely(rt_task(p)))
> > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> >  		alloc_flags |= ALLOC_HARDER;
> >  
> >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> 
> What are the runtime-observeable effects of this change?
> 

Giving rt tasks access to memory reserves is necessary to reduce latency, 
the privilege does not apply to interrupts that subsequently get run on 
the same cpu.

> The description is a bit waffly-sounding for a -stable backportable
> thing, IMO.  What reason do the -stable maintainers and users have to
> believe that this patch is needed, and an improvement?
> 

Allowing interrupts to allocate below the low watermark when not 
GFP_ATOMIC depletes memory reserves; this fixes an inconsistency 
introduced by the page allocator refactoring patchset that went into 
2.6.31.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-27 21:12       ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-10-27 21:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers-u79uwXL29TY76Z2rM5mHXA,
	Mel Gorman

On Tue, 27 Oct 2009, Andrew Morton wrote:

> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index dfa4362..7f2aa3e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> >  		 */
> >  		alloc_flags &= ~ALLOC_CPUSET;
> > -	} else if (unlikely(rt_task(p)))
> > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> >  		alloc_flags |= ALLOC_HARDER;
> >  
> >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> 
> What are the runtime-observeable effects of this change?
> 

Giving rt tasks access to memory reserves is necessary to reduce latency, 
the privilege does not apply to interrupts that subsequently get run on 
the same cpu.

> The description is a bit waffly-sounding for a -stable backportable
> thing, IMO.  What reason do the -stable maintainers and users have to
> believe that this patch is needed, and an improvement?
> 

Allowing interrupts to allocate below the low watermark when not 
GFP_ATOMIC depletes memory reserves; this fixes an inconsistency 
introduced by the page allocator refactoring patchset that went into 
2.6.31.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-10-27 20:19     ` Andrew Morton
  (?)
@ 2009-10-28  3:54       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 115+ messages in thread
From: KOSAKI Motohiro @ 2009-10-28  3:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, Mel Gorman, stable, linux-kernel,
	linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

> On Tue, 27 Oct 2009 13:40:33 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > allocations. Something has changed in recent kernels that affect the timing
> > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > particularly under pressure. This patch forces kswapd to notice sooner that
> > high-order allocations are occuring.
> 
> "something has changed"?  Shouldn't we find out what that is?

if kswapd_max_order was changed, kswapd quickly change its own reclaim
order.

old:
  1. happen order-0 allocation
  2. kick kswapd
  3. happen high-order allocation
  4. change kswapd_max_order, but kswapd continue order-0 reclaim.
  5. kswapd end order-0 reclaim and exit balance_pgdat
  6. kswapd() restart balance_pdgat() with high-order

new:
  1. happen order-0 allocation
  2. kick kswapd
  3. happen high-order allocation
  4. change kswapd_max_order
  5. kswapd notice it and quickly exit balance_pgdat()
  6. kswapd() restart balance_pdgat() with high-order

> 
> > ---
> >  mm/vmscan.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 64e4388..7eceb02 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2016,6 +2016,15 @@ loop_again:
> >  					priority != DEF_PRIORITY)
> >  				continue;
> >  
> > +			/*
> > +			 * Exit the function now and have kswapd start over
> > +			 * if it is known that higher orders are required
> > +			 */
> > +			if (pgdat->kswapd_max_order > order) {
> > +				all_zones_ok = 1;
> > +				goto out;
> > +			}
> > +
> >  			if (!zone_watermark_ok(zone, order,
> >  					high_wmark_pages(zone), end_zone, 0))
> >  				all_zones_ok = 0;
> 
> So this handles the case where some concurrent thread or interrupt
> increases pgdat->kswapd_max_order while kswapd was running
> balance_pgdat(), yes?

Yes.

> Does that actually happen much?  Enough for this patch to make any
> useful difference?

In typical use-case, it doesn't have so much improvement. However some
driver use high-order allocation on interrupt context.
It mean we need quickly reclaim before GFP_ATOMIC allocation failure.

I agree these driver is ill. but...
We can't ignore enduser bug report.


> 
> If one where to whack a printk in that `if' block, how often would it
> trigger, and under what circumstances?
> 
> 
> If the -stable maintainers were to ask me "why did you send this" then
> right now my answer would have to be "I have no idea".  Help.






^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-10-28  3:54       ` KOSAKI Motohiro
  0 siblings, 0 replies; 115+ messages in thread
From: KOSAKI Motohiro @ 2009-10-28  3:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro, Mel Gorman, stable, linux-kernel,
	linux-mm@kvack.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

> On Tue, 27 Oct 2009 13:40:33 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > allocations. Something has changed in recent kernels that affect the timing
> > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > particularly under pressure. This patch forces kswapd to notice sooner that
> > high-order allocations are occuring.
> 
> "something has changed"?  Shouldn't we find out what that is?

if kswapd_max_order was changed, kswapd quickly change its own reclaim
order.

old:
  1. happen order-0 allocation
  2. kick kswapd
  3. happen high-order allocation
  4. change kswapd_max_order, but kswapd continue order-0 reclaim.
  5. kswapd end order-0 reclaim and exit balance_pgdat
  6. kswapd() restart balance_pdgat() with high-order

new:
  1. happen order-0 allocation
  2. kick kswapd
  3. happen high-order allocation
  4. change kswapd_max_order
  5. kswapd notice it and quickly exit balance_pgdat()
  6. kswapd() restart balance_pdgat() with high-order

> 
> > ---
> >  mm/vmscan.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 64e4388..7eceb02 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2016,6 +2016,15 @@ loop_again:
> >  					priority != DEF_PRIORITY)
> >  				continue;
> >  
> > +			/*
> > +			 * Exit the function now and have kswapd start over
> > +			 * if it is known that higher orders are required
> > +			 */
> > +			if (pgdat->kswapd_max_order > order) {
> > +				all_zones_ok = 1;
> > +				goto out;
> > +			}
> > +
> >  			if (!zone_watermark_ok(zone, order,
> >  					high_wmark_pages(zone), end_zone, 0))
> >  				all_zones_ok = 0;
> 
> So this handles the case where some concurrent thread or interrupt
> increases pgdat->kswapd_max_order while kswapd was running
> balance_pgdat(), yes?

Yes.

> Does that actually happen much?  Enough for this patch to make any
> useful difference?

In typical use-case, it doesn't have so much improvement. However some
driver use high-order allocation on interrupt context.
It mean we need quickly reclaim before GFP_ATOMIC allocation failure.

I agree these driver is ill. but...
We can't ignore enduser bug report.


> 
> If one where to whack a printk in that `if' block, how often would it
> trigger, and under what circumstances?
> 
> 
> If the -stable maintainers were to ask me "why did you send this" then
> right now my answer would have to be "I have no idea".  Help.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-10-28  3:54       ` KOSAKI Motohiro
  0 siblings, 0 replies; 115+ messages in thread
From: KOSAKI Motohiro @ 2009-10-28  3:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kosaki.motohiro-+CUm20s59erQFUHtdCDX3A, Mel Gorman,
	stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org",
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

> On Tue, 27 Oct 2009 13:40:33 +0000
> Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:
> 
> > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > allocations. Something has changed in recent kernels that affect the timing
> > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > particularly under pressure. This patch forces kswapd to notice sooner that
> > high-order allocations are occuring.
> 
> "something has changed"?  Shouldn't we find out what that is?

if kswapd_max_order was changed, kswapd quickly change its own reclaim
order.

old:
  1. happen order-0 allocation
  2. kick kswapd
  3. happen high-order allocation
  4. change kswapd_max_order, but kswapd continue order-0 reclaim.
  5. kswapd end order-0 reclaim and exit balance_pgdat
  6. kswapd() restart balance_pdgat() with high-order

new:
  1. happen order-0 allocation
  2. kick kswapd
  3. happen high-order allocation
  4. change kswapd_max_order
  5. kswapd notice it and quickly exit balance_pgdat()
  6. kswapd() restart balance_pdgat() with high-order

> 
> > ---
> >  mm/vmscan.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 64e4388..7eceb02 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2016,6 +2016,15 @@ loop_again:
> >  					priority != DEF_PRIORITY)
> >  				continue;
> >  
> > +			/*
> > +			 * Exit the function now and have kswapd start over
> > +			 * if it is known that higher orders are required
> > +			 */
> > +			if (pgdat->kswapd_max_order > order) {
> > +				all_zones_ok = 1;
> > +				goto out;
> > +			}
> > +
> >  			if (!zone_watermark_ok(zone, order,
> >  					high_wmark_pages(zone), end_zone, 0))
> >  				all_zones_ok = 0;
> 
> So this handles the case where some concurrent thread or interrupt
> increases pgdat->kswapd_max_order while kswapd was running
> balance_pgdat(), yes?

Yes.

> Does that actually happen much?  Enough for this patch to make any
> useful difference?

In typical use-case, it doesn't have so much improvement. However some
driver use high-order allocation on interrupt context.
It mean we need quickly reclaim before GFP_ATOMIC allocation failure.

I agree these driver is ill. but...
We can't ignore enduser bug report.


> 
> If one where to whack a printk in that `if' block, how often would it
> trigger, and under what circumstances?
> 
> 
> If the -stable maintainers were to ask me "why did you send this" then
> right now my answer would have to be "I have no idea".  Help.





^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-27 20:09     ` Andrew Morton
@ 2009-10-28 10:24       ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-28 10:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Mel Gorman <mel@csn.ul.ie>

On Tue, Oct 27, 2009 at 01:09:24PM -0700, Andrew Morton wrote:
> On Tue, 27 Oct 2009 13:40:32 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
> > slightly by allowing rt_tasks that are handling an interrupt to set
> > ALLOC_HARDER. This patch brings the watermark logic more in line with
> > 2.6.30.
> > 
> > [rientjes@google.com: Spotted the problem]
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
> > Reviewed-by: Rik van Riel <riel@redhat.com>
> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> >  mm/page_alloc.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index dfa4362..7f2aa3e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> >  		 */
> >  		alloc_flags &= ~ALLOC_CPUSET;
> > -	} else if (unlikely(rt_task(p)))
> > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> >  		alloc_flags |= ALLOC_HARDER;
> >  
> >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> 
> What are the runtime-observeable effects of this change?
> 

A reduction of high-order GFP_ATOMIC allocation failures reported 

http://www.gossamer-threads.com/lists/linux/kernel/1144153

> The description is a bit waffly-sounding for a -stable backportable
> thing, IMO.  What reason do the -stable maintainers and users have to
> believe that this patch is needed, and an improvement?
> 

Allocation failure reports are occuring against 2.6.31.4 that did not
occur in 2.6.30. The bug reporter observes no such allocation failures
with this and the previous patch applied. The data is fuzzier than I'd
like but both patches do appear to be required.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-28 10:24       ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-28 10:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Mel Gorman <mel@csn.ul.ie>

On Tue, Oct 27, 2009 at 01:09:24PM -0700, Andrew Morton wrote:
> On Tue, 27 Oct 2009 13:40:32 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > Commit 341ce06f69abfafa31b9468410a13dbd60e2b237 altered watermark logic
> > slightly by allowing rt_tasks that are handling an interrupt to set
> > ALLOC_HARDER. This patch brings the watermark logic more in line with
> > 2.6.30.
> > 
> > [rientjes@google.com: Spotted the problem]
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
> > Reviewed-by: Rik van Riel <riel@redhat.com>
> > Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> > ---
> >  mm/page_alloc.c |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index dfa4362..7f2aa3e 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> >  		 */
> >  		alloc_flags &= ~ALLOC_CPUSET;
> > -	} else if (unlikely(rt_task(p)))
> > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> >  		alloc_flags |= ALLOC_HARDER;
> >  
> >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> 
> What are the runtime-observeable effects of this change?
> 

A reduction of high-order GFP_ATOMIC allocation failures reported 

http://www.gossamer-threads.com/lists/linux/kernel/1144153

> The description is a bit waffly-sounding for a -stable backportable
> thing, IMO.  What reason do the -stable maintainers and users have to
> believe that this patch is needed, and an improvement?
> 

Allocation failure reports are occuring against 2.6.31.4 that did not
occur in 2.6.30. The bug reporter observes no such allocation failures
with this and the previous patch applied. The data is fuzzier than I'd
like but both patches do appear to be required.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-10-27 20:19     ` Andrew Morton
  (?)
@ 2009-10-28 10:29       ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-28 10:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Tue, Oct 27, 2009 at 01:19:05PM -0700, Andrew Morton wrote:
> On Tue, 27 Oct 2009 13:40:33 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > allocations. Something has changed in recent kernels that affect the timing
> > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > particularly under pressure. This patch forces kswapd to notice sooner that
> > high-order allocations are occuring.
> > 
> 
> "something has changed"?  Shouldn't we find out what that is?
> 

We've been trying but the answer right now is "lots". There were some
changes in the allocator itself which were unintentional and fixed in
patches 1 and 2 of this series. The two other major changes are

iwlagn is now making high order GFP_ATOMIC allocations which didn't
help. This is being addressed separetly and I believe the relevant
patches are now in mainline.

The other major change appears to be in page writeback. Reverting
commits 373c0a7e + 8aa7e847 significantly helps one bug reporter but
it's still unknown as to why that is.

The latter is still being investigated but as the patches in this series
are known to help some bug reporters with their GFP_ATOMIC failures and
it is being reported against latest mainline and -stable, I felt it was
best to help some of the bug reporters now to reduce duplicate reports.

> > ---
> >  mm/vmscan.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 64e4388..7eceb02 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2016,6 +2016,15 @@ loop_again:
> >  					priority != DEF_PRIORITY)
> >  				continue;
> >  
> > +			/*
> > +			 * Exit the function now and have kswapd start over
> > +			 * if it is known that higher orders are required
> > +			 */
> > +			if (pgdat->kswapd_max_order > order) {
> > +				all_zones_ok = 1;
> > +				goto out;
> > +			}
> > +
> >  			if (!zone_watermark_ok(zone, order,
> >  					high_wmark_pages(zone), end_zone, 0))
> >  				all_zones_ok = 0;
> 
> So this handles the case where some concurrent thread or interrupt
> increases pgdat->kswapd_max_order while kswapd was running
> balance_pgdat(), yes?
> 

Right.

> Does that actually happen much?  Enough for this patch to make any
> useful difference?
> 

Apparently, yes. Wireless drivers in particularly seem to be very
high-order GFP_ATOMIC happy.

> If one where to whack a printk in that `if' block, how often would it
> trigger, and under what circumstances?

I don't know the frequency. The circumstances are "under load" when
there are drivers depending on high-order allocations but the
reproduction cases are unreliable.

Do you want me to slap together a patch that adds a vmstat counter for
this? I can then ask future bug reporters to examine that counter and see
if it really is a major factor for a lot of people or not.

> If the -stable maintainers were to ask me "why did you send this" then
> right now my answer would have to be "I have no idea".  Help.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-10-28 10:29       ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-28 10:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Tue, Oct 27, 2009 at 01:19:05PM -0700, Andrew Morton wrote:
> On Tue, 27 Oct 2009 13:40:33 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > allocations. Something has changed in recent kernels that affect the timing
> > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > particularly under pressure. This patch forces kswapd to notice sooner that
> > high-order allocations are occuring.
> > 
> 
> "something has changed"?  Shouldn't we find out what that is?
> 

We've been trying but the answer right now is "lots". There were some
changes in the allocator itself which were unintentional and fixed in
patches 1 and 2 of this series. The two other major changes are

iwlagn is now making high order GFP_ATOMIC allocations which didn't
help. This is being addressed separetly and I believe the relevant
patches are now in mainline.

The other major change appears to be in page writeback. Reverting
commits 373c0a7e + 8aa7e847 significantly helps one bug reporter but
it's still unknown as to why that is.

The latter is still being investigated but as the patches in this series
are known to help some bug reporters with their GFP_ATOMIC failures and
it is being reported against latest mainline and -stable, I felt it was
best to help some of the bug reporters now to reduce duplicate reports.

> > ---
> >  mm/vmscan.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 64e4388..7eceb02 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2016,6 +2016,15 @@ loop_again:
> >  					priority != DEF_PRIORITY)
> >  				continue;
> >  
> > +			/*
> > +			 * Exit the function now and have kswapd start over
> > +			 * if it is known that higher orders are required
> > +			 */
> > +			if (pgdat->kswapd_max_order > order) {
> > +				all_zones_ok = 1;
> > +				goto out;
> > +			}
> > +
> >  			if (!zone_watermark_ok(zone, order,
> >  					high_wmark_pages(zone), end_zone, 0))
> >  				all_zones_ok = 0;
> 
> So this handles the case where some concurrent thread or interrupt
> increases pgdat->kswapd_max_order while kswapd was running
> balance_pgdat(), yes?
> 

Right.

> Does that actually happen much?  Enough for this patch to make any
> useful difference?
> 

Apparently, yes. Wireless drivers in particularly seem to be very
high-order GFP_ATOMIC happy.

> If one where to whack a printk in that `if' block, how often would it
> trigger, and under what circumstances?

I don't know the frequency. The circumstances are "under load" when
there are drivers depending on high-order allocations but the
reproduction cases are unreliable.

Do you want me to slap together a patch that adds a vmstat counter for
this? I can then ask future bug reporters to examine that counter and see
if it really is a major factor for a lot of people or not.

> If the -stable maintainers were to ask me "why did you send this" then
> right now my answer would have to be "I have no idea".  Help.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-10-28 10:29       ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-10-28 10:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Tue, Oct 27, 2009 at 01:19:05PM -0700, Andrew Morton wrote:
> On Tue, 27 Oct 2009 13:40:33 +0000
> Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:
> 
> > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > allocations. Something has changed in recent kernels that affect the timing
> > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > particularly under pressure. This patch forces kswapd to notice sooner that
> > high-order allocations are occuring.
> > 
> 
> "something has changed"?  Shouldn't we find out what that is?
> 

We've been trying but the answer right now is "lots". There were some
changes in the allocator itself which were unintentional and fixed in
patches 1 and 2 of this series. The two other major changes are

iwlagn is now making high order GFP_ATOMIC allocations which didn't
help. This is being addressed separetly and I believe the relevant
patches are now in mainline.

The other major change appears to be in page writeback. Reverting
commits 373c0a7e + 8aa7e847 significantly helps one bug reporter but
it's still unknown as to why that is.

The latter is still being investigated but as the patches in this series
are known to help some bug reporters with their GFP_ATOMIC failures and
it is being reported against latest mainline and -stable, I felt it was
best to help some of the bug reporters now to reduce duplicate reports.

> > ---
> >  mm/vmscan.c |    9 +++++++++
> >  1 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 64e4388..7eceb02 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2016,6 +2016,15 @@ loop_again:
> >  					priority != DEF_PRIORITY)
> >  				continue;
> >  
> > +			/*
> > +			 * Exit the function now and have kswapd start over
> > +			 * if it is known that higher orders are required
> > +			 */
> > +			if (pgdat->kswapd_max_order > order) {
> > +				all_zones_ok = 1;
> > +				goto out;
> > +			}
> > +
> >  			if (!zone_watermark_ok(zone, order,
> >  					high_wmark_pages(zone), end_zone, 0))
> >  				all_zones_ok = 0;
> 
> So this handles the case where some concurrent thread or interrupt
> increases pgdat->kswapd_max_order while kswapd was running
> balance_pgdat(), yes?
> 

Right.

> Does that actually happen much?  Enough for this patch to make any
> useful difference?
> 

Apparently, yes. Wireless drivers in particularly seem to be very
high-order GFP_ATOMIC happy.

> If one where to whack a printk in that `if' block, how often would it
> trigger, and under what circumstances?

I don't know the frequency. The circumstances are "under load" when
there are drivers depending on high-order allocations but the
reproduction cases are unreliable.

Do you want me to slap together a patch that adds a vmstat counter for
this? I can then ask future bug reporters to examine that counter and see
if it really is a major factor for a lot of people or not.

> If the -stable maintainers were to ask me "why did you send this" then
> right now my answer would have to be "I have no idea".  Help.
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 0/3] Reduce GFP_ATOMIC allocation failures, partial fix V3
  2009-10-27 13:40 ` Mel Gorman
  (?)
@ 2009-10-28 13:02   ` Karol Lewandowski
  -1 siblings, 0 replies; 115+ messages in thread
From: Karol Lewandowski @ 2009-10-28 13:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Frans Pop,
	Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Tue, Oct 27, 2009 at 01:40:30PM +0000, Mel Gorman wrote:
> The following bug becomes very difficult to reproduce with these patches;
> 
> [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100

Minor clarification -- bug becomes difficult to reproduce _quickly_.

I've always saw this bug after many suspend-resume cycles (interlaved
with "real work").  Since testing one kernel in normal usage scenario
would take many days I've tried to immitate "real work" by lots of
memory intensive/fragmenting processes.

Hovewer, this bug shows itself (sooner or later) in every kernel
except 2.6.30 (or earlier).

Thanks.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 0/3] Reduce GFP_ATOMIC allocation failures, partial fix V3
@ 2009-10-28 13:02   ` Karol Lewandowski
  0 siblings, 0 replies; 115+ messages in thread
From: Karol Lewandowski @ 2009-10-28 13:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Frans Pop,
	Jiri Kosina, Sven Geggus, Karol Lewandowski, Tobias Oetiker,
	KOSAKI Motohiro, Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Tue, Oct 27, 2009 at 01:40:30PM +0000, Mel Gorman wrote:
> The following bug becomes very difficult to reproduce with these patches;
> 
> [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100

Minor clarification -- bug becomes difficult to reproduce _quickly_.

I've always saw this bug after many suspend-resume cycles (interlaved
with "real work").  Since testing one kernel in normal usage scenario
would take many days I've tried to immitate "real work" by lots of
memory intensive/fragmenting processes.

Hovewer, this bug shows itself (sooner or later) in every kernel
except 2.6.30 (or earlier).

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 0/3] Reduce GFP_ATOMIC allocation failures, partial fix V3
@ 2009-10-28 13:02   ` Karol Lewandowski
  0 siblings, 0 replies; 115+ messages in thread
From: Karol Lewandowski @ 2009-10-28 13:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Rafael J. Wysocki, Kernel Testers List

On Tue, Oct 27, 2009 at 01:40:30PM +0000, Mel Gorman wrote:
> The following bug becomes very difficult to reproduce with these patches;
> 
> [Bug #14265] ifconfig: page allocation failure. order:5, mode:0x8020 w/ e100

Minor clarification -- bug becomes difficult to reproduce _quickly_.

I've always saw this bug after many suspend-resume cycles (interlaved
with "real work").  Since testing one kernel in normal usage scenario
would take many days I've tried to immitate "real work" by lots of
memory intensive/fragmenting processes.

Hovewer, this bug shows itself (sooner or later) in every kernel
except 2.6.30 (or earlier).

Thanks.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-10-28 10:29       ` Mel Gorman
@ 2009-10-28 19:47         ` Andrew Morton
  -1 siblings, 0 replies; 115+ messages in thread
From: Andrew Morton @ 2009-10-28 19:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, 28 Oct 2009 10:29:36 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> On Tue, Oct 27, 2009 at 01:19:05PM -0700, Andrew Morton wrote:
> > On Tue, 27 Oct 2009 13:40:33 +0000
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > > allocations. Something has changed in recent kernels that affect the timing
> > > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > > particularly under pressure. This patch forces kswapd to notice sooner that
> > > high-order allocations are occuring.
> > > 
> > 
> > "something has changed"?  Shouldn't we find out what that is?
> > 
> 
> We've been trying but the answer right now is "lots". There were some
> changes in the allocator itself which were unintentional and fixed in
> patches 1 and 2 of this series. The two other major changes are
> 
> iwlagn is now making high order GFP_ATOMIC allocations which didn't
> help. This is being addressed separetly and I believe the relevant
> patches are now in mainline.
> 
> The other major change appears to be in page writeback. Reverting
> commits 373c0a7e + 8aa7e847 significantly helps one bug reporter but
> it's still unknown as to why that is.

Peculiar.  Those changes are fairly remote from large-order-GFP_ATOMIC
allocations.

> ...
>
> Wireless drivers in particularly seem to be very
> high-order GFP_ATOMIC happy.

It would be nice if we could find a way of preventing people from
attempting high-order atomic allocations in the first place - it's a bit
of a trap.

Maybe add a runtime warning which is suppressable by GFP_NOWARN (or a
new flag), then either fix existing callers or, after review, add the
flag.

Of course, this might just end up with people adding these hopeless
allocation attempts and just setting the nowarn flag :(

> > If one where to whack a printk in that `if' block, how often would it
> > trigger, and under what circumstances?
> 
> I don't know the frequency. The circumstances are "under load" when
> there are drivers depending on high-order allocations but the
> reproduction cases are unreliable.
> 
> Do you want me to slap together a patch that adds a vmstat counter for
> this? I can then ask future bug reporters to examine that counter and see
> if it really is a major factor for a lot of people or not.

Something like that, if it will help us understand what's going on.  I
don't see a permanent need for that instrumentation but while this
problem is still in the research stage, sure, lard it up with debug
stuff?



It's very important to understand _why_ the VM got worse.  And, of
course, to fix that up.  But, separately, we should find a way of
preventing developers from using these very unreliable allocations.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-10-28 19:47         ` Andrew Morton
  0 siblings, 0 replies; 115+ messages in thread
From: Andrew Morton @ 2009-10-28 19:47 UTC (permalink / raw)
  To: Mel Gorman
  Cc: stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, 28 Oct 2009 10:29:36 +0000
Mel Gorman <mel@csn.ul.ie> wrote:

> On Tue, Oct 27, 2009 at 01:19:05PM -0700, Andrew Morton wrote:
> > On Tue, 27 Oct 2009 13:40:33 +0000
> > Mel Gorman <mel@csn.ul.ie> wrote:
> > 
> > > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > > allocations. Something has changed in recent kernels that affect the timing
> > > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > > particularly under pressure. This patch forces kswapd to notice sooner that
> > > high-order allocations are occuring.
> > > 
> > 
> > "something has changed"?  Shouldn't we find out what that is?
> > 
> 
> We've been trying but the answer right now is "lots". There were some
> changes in the allocator itself which were unintentional and fixed in
> patches 1 and 2 of this series. The two other major changes are
> 
> iwlagn is now making high order GFP_ATOMIC allocations which didn't
> help. This is being addressed separetly and I believe the relevant
> patches are now in mainline.
> 
> The other major change appears to be in page writeback. Reverting
> commits 373c0a7e + 8aa7e847 significantly helps one bug reporter but
> it's still unknown as to why that is.

Peculiar.  Those changes are fairly remote from large-order-GFP_ATOMIC
allocations.

> ...
>
> Wireless drivers in particularly seem to be very
> high-order GFP_ATOMIC happy.

It would be nice if we could find a way of preventing people from
attempting high-order atomic allocations in the first place - it's a bit
of a trap.

Maybe add a runtime warning which is suppressable by GFP_NOWARN (or a
new flag), then either fix existing callers or, after review, add the
flag.

Of course, this might just end up with people adding these hopeless
allocation attempts and just setting the nowarn flag :(

> > If one where to whack a printk in that `if' block, how often would it
> > trigger, and under what circumstances?
> 
> I don't know the frequency. The circumstances are "under load" when
> there are drivers depending on high-order allocations but the
> reproduction cases are unreliable.
> 
> Do you want me to slap together a patch that adds a vmstat counter for
> this? I can then ask future bug reporters to examine that counter and see
> if it really is a major factor for a lot of people or not.

Something like that, if it will help us understand what's going on.  I
don't see a permanent need for that instrumentation but while this
problem is still in the research stage, sure, lard it up with debug
stuff?



It's very important to understand _why_ the VM got worse.  And, of
course, to fix that up.  But, separately, we should find a way of
preventing developers from using these very unreliable allocations.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-27 21:12       ` David Rientjes
  (?)
@ 2009-10-31 18:40         ` Pavel Machek
  -1 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-10-31 18:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Tue 2009-10-27 14:12:36, David Rientjes wrote:
> On Tue, 27 Oct 2009, Andrew Morton wrote:
> 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index dfa4362..7f2aa3e 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> > >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> > >  		 */
> > >  		alloc_flags &= ~ALLOC_CPUSET;
> > > -	} else if (unlikely(rt_task(p)))
> > > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> > >  		alloc_flags |= ALLOC_HARDER;
> > >  
> > >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> > 
> > What are the runtime-observeable effects of this change?
> > 
> 
> Giving rt tasks access to memory reserves is necessary to reduce latency, 
> the privilege does not apply to interrupts that subsequently get run on 
> the same cpu.

If rt task needs to allocate memory like that, then its broken,
anyway...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 18:40         ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-10-31 18:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Tue 2009-10-27 14:12:36, David Rientjes wrote:
> On Tue, 27 Oct 2009, Andrew Morton wrote:
> 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index dfa4362..7f2aa3e 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> > >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> > >  		 */
> > >  		alloc_flags &= ~ALLOC_CPUSET;
> > > -	} else if (unlikely(rt_task(p)))
> > > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> > >  		alloc_flags |= ALLOC_HARDER;
> > >  
> > >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> > 
> > What are the runtime-observeable effects of this change?
> > 
> 
> Giving rt tasks access to memory reserves is necessary to reduce latency, 
> the privilege does not apply to interrupts that subsequently get run on 
> the same cpu.

If rt task needs to allocate memory like that, then its broken,
anyway...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 18:40         ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-10-31 18:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue 2009-10-27 14:12:36, David Rientjes wrote:
> On Tue, 27 Oct 2009, Andrew Morton wrote:
> 
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index dfa4362..7f2aa3e 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1769,7 +1769,7 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> > >  		 * See also cpuset_zone_allowed() comment in kernel/cpuset.c.
> > >  		 */
> > >  		alloc_flags &= ~ALLOC_CPUSET;
> > > -	} else if (unlikely(rt_task(p)))
> > > +	} else if (unlikely(rt_task(p)) && !in_interrupt())
> > >  		alloc_flags |= ALLOC_HARDER;
> > >  
> > >  	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> > 
> > What are the runtime-observeable effects of this change?
> > 
> 
> Giving rt tasks access to memory reserves is necessary to reduce latency, 
> the privilege does not apply to interrupts that subsequently get run on 
> the same cpu.

If rt task needs to allocate memory like that, then its broken,
anyway...

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-31 18:40         ` Pavel Machek
@ 2009-10-31 19:51           ` David Rientjes
  -1 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-10-31 19:51 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Sat, 31 Oct 2009, Pavel Machek wrote:

> > Giving rt tasks access to memory reserves is necessary to reduce latency, 
> > the privilege does not apply to interrupts that subsequently get run on 
> > the same cpu.
> 
> If rt task needs to allocate memory like that, then its broken,
> anyway...
> 

Um, no, it's a matter of the kernel implementation.  We allow such tasks 
to allocate deeper into reserves to avoid the page allocator from 
incurring a significant penalty when direct reclaim is required.  
Background reclaim has already commenced at this point in the slowpath.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 19:51           ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-10-31 19:51 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Sat, 31 Oct 2009, Pavel Machek wrote:

> > Giving rt tasks access to memory reserves is necessary to reduce latency, 
> > the privilege does not apply to interrupts that subsequently get run on 
> > the same cpu.
> 
> If rt task needs to allocate memory like that, then its broken,
> anyway...
> 

Um, no, it's a matter of the kernel implementation.  We allow such tasks 
to allocate deeper into reserves to avoid the page allocator from 
incurring a significant penalty when direct reclaim is required.  
Background reclaim has already commenced at this point in the slowpath.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-31 19:51           ` David Rientjes
@ 2009-10-31 20:11             ` Pavel Machek
  -1 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-10-31 20:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Sat 2009-10-31 12:51:14, David Rientjes wrote:
> On Sat, 31 Oct 2009, Pavel Machek wrote:
> 
> > > Giving rt tasks access to memory reserves is necessary to reduce latency, 
> > > the privilege does not apply to interrupts that subsequently get run on 
> > > the same cpu.
> > 
> > If rt task needs to allocate memory like that, then its broken,
> > anyway...
> 
> Um, no, it's a matter of the kernel implementation.  We allow such tasks 
> to allocate deeper into reserves to avoid the page allocator from 
> incurring a significant penalty when direct reclaim is required.  
> Background reclaim has already commenced at this point in the
>slowpath.

But we can't guarantee that enough memory will be ready in the
reserves. So if realtime task relies on it, it is broken, and will
fail to meet its deadlines from time to time.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 20:11             ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-10-31 20:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Sat 2009-10-31 12:51:14, David Rientjes wrote:
> On Sat, 31 Oct 2009, Pavel Machek wrote:
> 
> > > Giving rt tasks access to memory reserves is necessary to reduce latency, 
> > > the privilege does not apply to interrupts that subsequently get run on 
> > > the same cpu.
> > 
> > If rt task needs to allocate memory like that, then its broken,
> > anyway...
> 
> Um, no, it's a matter of the kernel implementation.  We allow such tasks 
> to allocate deeper into reserves to avoid the page allocator from 
> incurring a significant penalty when direct reclaim is required.  
> Background reclaim has already commenced at this point in the
>slowpath.

But we can't guarantee that enough memory will be ready in the
reserves. So if realtime task relies on it, it is broken, and will
fail to meet its deadlines from time to time.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-31 20:11             ` Pavel Machek
  (?)
@ 2009-10-31 21:19               ` David Rientjes
  -1 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-10-31 21:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Sat, 31 Oct 2009, Pavel Machek wrote:

> > Um, no, it's a matter of the kernel implementation.  We allow such tasks 
> > to allocate deeper into reserves to avoid the page allocator from 
> > incurring a significant penalty when direct reclaim is required.  
> > Background reclaim has already commenced at this point in the
> > slowpath.
> 
> But we can't guarantee that enough memory will be ready in the
> reserves. So if realtime task relies on it, it is broken, and will
> fail to meet its deadlines from time to time.

This is truly a bizarre tangent to take, I don't quite understand the 
point you're trying to make.  Memory reserves exist to prevent blocking 
when we need memory the most (oom killed task or direct reclaim) and to 
allocate from when we can't (GFP_ATOMIC) or shouldn't (rt tasks) utilize 
direct reclaim.  The idea is to kick background reclaim first in the 
slowpath so we're only below the low watermark for a short period and 
allow the allocation to succeed.  If direct reclaim actually can't free 
any memory, the oom killer will free it for us.

So the realtime[*] tasks aren't relying on it at all, the ALLOC_HARDER 
exemption for them in the page allocator are a convenience to return 
memory faster than otherwise when the fastpath fails.  I don't see much 
point in arguing against that.

 [*] This is the current mainline definition of "realtime," which actually
     includes a large range of different priorities.  For strict realtime,
     you'd need to check out the -rt tree.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 21:19               ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-10-31 21:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Sat, 31 Oct 2009, Pavel Machek wrote:

> > Um, no, it's a matter of the kernel implementation.  We allow such tasks 
> > to allocate deeper into reserves to avoid the page allocator from 
> > incurring a significant penalty when direct reclaim is required.  
> > Background reclaim has already commenced at this point in the
> > slowpath.
> 
> But we can't guarantee that enough memory will be ready in the
> reserves. So if realtime task relies on it, it is broken, and will
> fail to meet its deadlines from time to time.

This is truly a bizarre tangent to take, I don't quite understand the 
point you're trying to make.  Memory reserves exist to prevent blocking 
when we need memory the most (oom killed task or direct reclaim) and to 
allocate from when we can't (GFP_ATOMIC) or shouldn't (rt tasks) utilize 
direct reclaim.  The idea is to kick background reclaim first in the 
slowpath so we're only below the low watermark for a short period and 
allow the allocation to succeed.  If direct reclaim actually can't free 
any memory, the oom killer will free it for us.

So the realtime[*] tasks aren't relying on it at all, the ALLOC_HARDER 
exemption for them in the page allocator are a convenience to return 
memory faster than otherwise when the fastpath fails.  I don't see much 
point in arguing against that.

 [*] This is the current mainline definition of "realtime," which actually
     includes a large range of different priorities.  For strict realtime,
     you'd need to check out the -rt tree.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 21:19               ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-10-31 21:19 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Andrew Morton, Mel Gorman, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Sat, 31 Oct 2009, Pavel Machek wrote:

> > Um, no, it's a matter of the kernel implementation.  We allow such tasks 
> > to allocate deeper into reserves to avoid the page allocator from 
> > incurring a significant penalty when direct reclaim is required.  
> > Background reclaim has already commenced at this point in the
> > slowpath.
> 
> But we can't guarantee that enough memory will be ready in the
> reserves. So if realtime task relies on it, it is broken, and will
> fail to meet its deadlines from time to time.

This is truly a bizarre tangent to take, I don't quite understand the 
point you're trying to make.  Memory reserves exist to prevent blocking 
when we need memory the most (oom killed task or direct reclaim) and to 
allocate from when we can't (GFP_ATOMIC) or shouldn't (rt tasks) utilize 
direct reclaim.  The idea is to kick background reclaim first in the 
slowpath so we're only below the low watermark for a short period and 
allow the allocation to succeed.  If direct reclaim actually can't free 
any memory, the oom killer will free it for us.

So the realtime[*] tasks aren't relying on it at all, the ALLOC_HARDER 
exemption for them in the page allocator are a convenience to return 
memory faster than otherwise when the fastpath fails.  I don't see much 
point in arguing against that.

 [*] This is the current mainline definition of "realtime," which actually
     includes a large range of different priorities.  For strict realtime,
     you'd need to check out the -rt tree.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-31 21:19               ` David Rientjes
@ 2009-10-31 22:29                 ` Pavel Machek
  -1 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-10-31 22:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Sat 2009-10-31 14:19:50, David Rientjes wrote:
> On Sat, 31 Oct 2009, Pavel Machek wrote:
> 
> > > Um, no, it's a matter of the kernel implementation.  We allow such tasks 
> > > to allocate deeper into reserves to avoid the page allocator from 
> > > incurring a significant penalty when direct reclaim is required.  
> > > Background reclaim has already commenced at this point in the
> > > slowpath.
> > 
> > But we can't guarantee that enough memory will be ready in the
> > reserves. So if realtime task relies on it, it is broken, and will
> > fail to meet its deadlines from time to time.
> 
> This is truly a bizarre tangent to take, I don't quite understand the 
> point you're trying to make.  Memory reserves exist to prevent blocking 
> when we need memory the most (oom killed task or direct reclaim) and to 
> allocate from when we can't (GFP_ATOMIC) or shouldn't (rt tasks) utilize 
> direct reclaim.  The idea is to kick background reclaim first in the 
> slowpath so we're only below the low watermark for a short period and 
> allow the allocation to succeed.  If direct reclaim actually can't free 
> any memory, the oom killer will free it for us.
> 
> So the realtime[*] tasks aren't relying on it at all, the ALLOC_HARDER 
> exemption for them in the page allocator are a convenience to return 
> memory faster than otherwise when the fastpath fails.  I don't see much 
> point in arguing against that.

Well, you are trying to make rt heuristic more precise. I believe it
would be better to simply remove it.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 22:29                 ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-10-31 22:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Mel Gorman, stable, linux-kernel, linux-mm,
	Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Rik van Riel,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

On Sat 2009-10-31 14:19:50, David Rientjes wrote:
> On Sat, 31 Oct 2009, Pavel Machek wrote:
> 
> > > Um, no, it's a matter of the kernel implementation.  We allow such tasks 
> > > to allocate deeper into reserves to avoid the page allocator from 
> > > incurring a significant penalty when direct reclaim is required.  
> > > Background reclaim has already commenced at this point in the
> > > slowpath.
> > 
> > But we can't guarantee that enough memory will be ready in the
> > reserves. So if realtime task relies on it, it is broken, and will
> > fail to meet its deadlines from time to time.
> 
> This is truly a bizarre tangent to take, I don't quite understand the 
> point you're trying to make.  Memory reserves exist to prevent blocking 
> when we need memory the most (oom killed task or direct reclaim) and to 
> allocate from when we can't (GFP_ATOMIC) or shouldn't (rt tasks) utilize 
> direct reclaim.  The idea is to kick background reclaim first in the 
> slowpath so we're only below the low watermark for a short period and 
> allow the allocation to succeed.  If direct reclaim actually can't free 
> any memory, the oom killer will free it for us.
> 
> So the realtime[*] tasks aren't relying on it at all, the ALLOC_HARDER 
> exemption for them in the page allocator are a convenience to return 
> memory faster than otherwise when the fastpath fails.  I don't see much 
> point in arguing against that.

Well, you are trying to make rt heuristic more precise. I believe it
would be better to simply remove it.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-31 22:29                 ` Pavel Machek
@ 2009-10-31 22:55                   ` Rik van Riel
  -1 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-10-31 22:55 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On 10/31/2009 06:29 PM, Pavel Machek wrote:

> Well, you are trying to make rt heuristic more precise.

No he's not.  He is trying to make it only apply to real time
tasks, not to interrupts that happen to interrupt the realtime
tasks.

 > I believe it would be better to simply remove it.

You are against trying to give the realtime tasks a best effort
advantage at memory allocation?

Realtime apps often *have* to allocate memory on the kernel side,
because they use network system calls, etc...

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 22:55                   ` Rik van Riel
  0 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-10-31 22:55 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On 10/31/2009 06:29 PM, Pavel Machek wrote:

> Well, you are trying to make rt heuristic more precise.

No he's not.  He is trying to make it only apply to real time
tasks, not to interrupts that happen to interrupt the realtime
tasks.

 > I believe it would be better to simply remove it.

You are against trying to give the realtime tasks a best effort
advantage at memory allocation?

Realtime apps often *have* to allocate memory on the kernel side,
because they use network system calls, etc...

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-31 20:11             ` Pavel Machek
@ 2009-10-31 23:59               ` Rik van Riel
  -1 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-10-31 23:59 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On 10/31/2009 04:11 PM, Pavel Machek wrote:

> But we can't guarantee that enough memory will be ready in the
> reserves. So if realtime task relies on it, it is broken, and will
> fail to meet its deadlines from time to time.

Any realtime task that does networking (which may be the
majority of realtime tasks) relies on the kernel memory
allocator.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-10-31 23:59               ` Rik van Riel
  0 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-10-31 23:59 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On 10/31/2009 04:11 PM, Pavel Machek wrote:

> But we can't guarantee that enough memory will be ready in the
> reserves. So if realtime task relies on it, it is broken, and will
> fail to meet its deadlines from time to time.

Any realtime task that does networking (which may be the
majority of realtime tasks) relies on the kernel memory
allocator.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-31 22:55                   ` Rik van Riel
  (?)
@ 2009-11-01  7:35                     ` Pavel Machek
  -1 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-01  7:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

> > I believe it would be better to simply remove it.
> 
> You are against trying to give the realtime tasks a best effort
> advantage at memory allocation?

Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
realtime tasks are allowed to eat into them. That feels wrong.

"realtime" tasks are not automatically "more important".

> Realtime apps often *have* to allocate memory on the kernel side,
> because they use network system calls, etc...

So what? As soon as they do that, they lose any guarantees, anyway.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-01  7:35                     ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-01  7:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

> > I believe it would be better to simply remove it.
> 
> You are against trying to give the realtime tasks a best effort
> advantage at memory allocation?

Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
realtime tasks are allowed to eat into them. That feels wrong.

"realtime" tasks are not automatically "more important".

> Realtime apps often *have* to allocate memory on the kernel side,
> because they use network system calls, etc...

So what? As soon as they do that, they lose any guarantees, anyway.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-01  7:35                     ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-01  7:35 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David Rientjes, Andrew Morton, Mel Gorman,
	stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Christoph Lameter, Stephan von Krawczynski,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

> > I believe it would be better to simply remove it.
> 
> You are against trying to give the realtime tasks a best effort
> advantage at memory allocation?

Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
realtime tasks are allowed to eat into them. That feels wrong.

"realtime" tasks are not automatically "more important".

> Realtime apps often *have* to allocate memory on the kernel side,
> because they use network system calls, etc...

So what? As soon as they do that, they lose any guarantees, anyway.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use  ALLOC_HARDER
  2009-11-01  7:35                     ` Pavel Machek
  (?)
@ 2009-11-01 12:37                       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 115+ messages in thread
From: KOSAKI Motohiro @ 2009-11-01 12:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rik van Riel, David Rientjes, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, Pekka Enberg,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

2009/11/1 Pavel Machek <pavel@ucw.cz>:
>> > I believe it would be better to simply remove it.
>>
>> You are against trying to give the realtime tasks a best effort
>> advantage at memory allocation?
>
> Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
> realtime tasks are allowed to eat into them. That feels wrong.
>
> "realtime" tasks are not automatically "more important".
>
>> Realtime apps often *have* to allocate memory on the kernel side,
>> because they use network system calls, etc...
>
> So what? As soon as they do that, they lose any guarantees, anyway.

Then, your proposal makes regression to rt workload. any improve idea
is welcome.
but we don't hope to see any regression.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-01 12:37                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 115+ messages in thread
From: KOSAKI Motohiro @ 2009-11-01 12:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rik van Riel, David Rientjes, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, Pekka Enberg,
	Christoph Lameter, Stephan von Krawczynski, kernel-testers

2009/11/1 Pavel Machek <pavel@ucw.cz>:
>> > I believe it would be better to simply remove it.
>>
>> You are against trying to give the realtime tasks a best effort
>> advantage at memory allocation?
>
> Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
> realtime tasks are allowed to eat into them. That feels wrong.
>
> "realtime" tasks are not automatically "more important".
>
>> Realtime apps often *have* to allocate memory on the kernel side,
>> because they use network system calls, etc...
>
> So what? As soon as they do that, they lose any guarantees, anyway.

Then, your proposal makes regression to rt workload. any improve idea
is welcome.
but we don't hope to see any regression.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use  ALLOC_HARDER
@ 2009-11-01 12:37                       ` KOSAKI Motohiro
  0 siblings, 0 replies; 115+ messages in thread
From: KOSAKI Motohiro @ 2009-11-01 12:37 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Rik van Riel, David Rientjes, Andrew Morton, Mel Gorman,
	stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, Pekka Enberg,
	Christoph Lameter, Stephan von Krawczynski,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

2009/11/1 Pavel Machek <pavel-+ZI9xUNit7I@public.gmane.org>:
>> > I believe it would be better to simply remove it.
>>
>> You are against trying to give the realtime tasks a best effort
>> advantage at memory allocation?
>
> Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
> realtime tasks are allowed to eat into them. That feels wrong.
>
> "realtime" tasks are not automatically "more important".
>
>> Realtime apps often *have* to allocate memory on the kernel side,
>> because they use network system calls, etc...
>
> So what? As soon as they do that, they lose any guarantees, anyway.

Then, your proposal makes regression to rt workload. any improve idea
is welcome.
but we don't hope to see any regression.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-11-01  7:35                     ` Pavel Machek
@ 2009-11-01 14:44                       ` Rik van Riel
  -1 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-11-01 14:44 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On 11/01/2009 02:35 AM, Pavel Machek wrote:
>>> I believe it would be better to simply remove it.
>>
>> You are against trying to give the realtime tasks a best effort
>> advantage at memory allocation?
>
> Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
> realtime tasks are allowed to eat into them. That feels wrong.
>
> "realtime" tasks are not automatically "more important".
>
>> Realtime apps often *have* to allocate memory on the kernel side,
>> because they use network system calls, etc...
>
> So what? As soon as they do that, they lose any guarantees, anyway.

They might lose the absolute guarantee, but that's no reason
not to give it our best effort!

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-01 14:44                       ` Rik van Riel
  0 siblings, 0 replies; 115+ messages in thread
From: Rik van Riel @ 2009-11-01 14:44 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On 11/01/2009 02:35 AM, Pavel Machek wrote:
>>> I believe it would be better to simply remove it.
>>
>> You are against trying to give the realtime tasks a best effort
>> advantage at memory allocation?
>
> Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
> realtime tasks are allowed to eat into them. That feels wrong.
>
> "realtime" tasks are not automatically "more important".
>
>> Realtime apps often *have* to allocate memory on the kernel side,
>> because they use network system calls, etc...
>
> So what? As soon as they do that, they lose any guarantees, anyway.

They might lose the absolute guarantee, but that's no reason
not to give it our best effort!

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-11-01 14:44                       ` Rik van Riel
  (?)
@ 2009-11-01 19:32                         ` Pavel Machek
  -1 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-01 19:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On Sun 2009-11-01 09:44:04, Rik van Riel wrote:
> On 11/01/2009 02:35 AM, Pavel Machek wrote:
> >>>I believe it would be better to simply remove it.
> >>
> >>You are against trying to give the realtime tasks a best effort
> >>advantage at memory allocation?
> >
> >Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
> >realtime tasks are allowed to eat into them. That feels wrong.
> >
> >"realtime" tasks are not automatically "more important".
> >
> >>Realtime apps often *have* to allocate memory on the kernel side,
> >>because they use network system calls, etc...
> >
> >So what? As soon as they do that, they lose any guarantees, anyway.
> 
> They might lose the absolute guarantee, but that's no reason
> not to give it our best effort!

You know, there's no reason not to give best effort to normal tasks,
too...

Well, OTOH that means that realtime tasks can now interfere with
interrupt memory allocations...

Anyway, I guess this is not terribly important...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-01 19:32                         ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-01 19:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David Rientjes, Andrew Morton, Mel Gorman, stable, linux-kernel,
	linux-mm, Frans Pop, Jiri Kosina, Sven Geggus, Karol Lewandowski,
	Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg, Christoph Lameter,
	Stephan von Krawczynski, kernel-testers

On Sun 2009-11-01 09:44:04, Rik van Riel wrote:
> On 11/01/2009 02:35 AM, Pavel Machek wrote:
> >>>I believe it would be better to simply remove it.
> >>
> >>You are against trying to give the realtime tasks a best effort
> >>advantage at memory allocation?
> >
> >Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
> >realtime tasks are allowed to eat into them. That feels wrong.
> >
> >"realtime" tasks are not automatically "more important".
> >
> >>Realtime apps often *have* to allocate memory on the kernel side,
> >>because they use network system calls, etc...
> >
> >So what? As soon as they do that, they lose any guarantees, anyway.
> 
> They might lose the absolute guarantee, but that's no reason
> not to give it our best effort!

You know, there's no reason not to give best effort to normal tasks,
too...

Well, OTOH that means that realtime tasks can now interfere with
interrupt memory allocations...

Anyway, I guess this is not terribly important...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-01 19:32                         ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-01 19:32 UTC (permalink / raw)
  To: Rik van Riel
  Cc: David Rientjes, Andrew Morton, Mel Gorman,
	stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Christoph Lameter, Stephan von Krawczynski,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Sun 2009-11-01 09:44:04, Rik van Riel wrote:
> On 11/01/2009 02:35 AM, Pavel Machek wrote:
> >>>I believe it would be better to simply remove it.
> >>
> >>You are against trying to give the realtime tasks a best effort
> >>advantage at memory allocation?
> >
> >Yes. Those memory reserves were for kernel, GPF_ATOMIC and stuff. Now
> >realtime tasks are allowed to eat into them. That feels wrong.
> >
> >"realtime" tasks are not automatically "more important".
> >
> >>Realtime apps often *have* to allocate memory on the kernel side,
> >>because they use network system calls, etc...
> >
> >So what? As soon as they do that, they lose any guarantees, anyway.
> 
> They might lose the absolute guarantee, but that's no reason
> not to give it our best effort!

You know, there's no reason not to give best effort to normal tasks,
too...

Well, OTOH that means that realtime tasks can now interfere with
interrupt memory allocations...

Anyway, I guess this is not terribly important...
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-10-28 19:47         ` Andrew Morton
  (?)
@ 2009-11-02 16:05           ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-02 16:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Oct 28, 2009 at 12:47:56PM -0700, Andrew Morton wrote:
> On Wed, 28 Oct 2009 10:29:36 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Tue, Oct 27, 2009 at 01:19:05PM -0700, Andrew Morton wrote:
> > > On Tue, 27 Oct 2009 13:40:33 +0000
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > 
> > > > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > > > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > > > allocations. Something has changed in recent kernels that affect the timing
> > > > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > > > particularly under pressure. This patch forces kswapd to notice sooner that
> > > > high-order allocations are occuring.
> > > > 
> > > 
> > > "something has changed"?  Shouldn't we find out what that is?
> > > 
> > 
> > We've been trying but the answer right now is "lots". There were some
> > changes in the allocator itself which were unintentional and fixed in
> > patches 1 and 2 of this series. The two other major changes are
> > 
> > iwlagn is now making high order GFP_ATOMIC allocations which didn't
> > help. This is being addressed separetly and I believe the relevant
> > patches are now in mainline.
> > 
> > The other major change appears to be in page writeback. Reverting
> > commits 373c0a7e + 8aa7e847 significantly helps one bug reporter but
> > it's still unknown as to why that is.
> 
> Peculiar.  Those changes are fairly remote from large-order-GFP_ATOMIC
> allocations.
> 

Indeed. The significance of the patch seems to be how long and how often
processes sleep in the page allocator and what kswapd is doing.

> > ...
> >
> > Wireless drivers in particularly seem to be very
> > high-order GFP_ATOMIC happy.
> 
> It would be nice if we could find a way of preventing people from
> attempting high-order atomic allocations in the first place - it's a bit
> of a trap.
> 

True.

> Maybe add a runtime warning which is suppressable by GFP_NOWARN (or a
> new flag), then either fix existing callers or, after review, add the
> flag.
> 
> Of course, this might just end up with people adding these hopeless
> allocation attempts and just setting the nowarn flag :(
> 

That's the difficulty but we should consider adding such warnings or
maintaining in-kernel the unique GFP_ATOMIC callers and their frequency.
It would require a lot of monitoring though and a fair amount of stick
beatings to get the callers corrected.

> > > If one where to whack a printk in that `if' block, how often would it
> > > trigger, and under what circumstances?
> > 
> > I don't know the frequency. The circumstances are "under load" when
> > there are drivers depending on high-order allocations but the
> > reproduction cases are unreliable.
> > 
> > Do you want me to slap together a patch that adds a vmstat counter for
> > this? I can then ask future bug reporters to examine that counter and see
> > if it really is a major factor for a lot of people or not.
> 
> Something like that, if it will help us understand what's going on.  I
> don't see a permanent need for that instrumentation but while this
> problem is still in the research stage, sure, lard it up with debug
> stuff?
> 

I have a candidate patch below. One of the reasons it took so long to
get out is what I found on the way developing the patch. I had added a
debugging patch to printk what kswapd was doing. One massive difference I
noted was that in 2.6.30 kswapd often went to sleep for 25 jiffies (HZ/10)
in balance_pgdat(). In 2.6.31 and particularly in mainline, it sleeps less
and for shorter intervals. When the sleep interval is low, kswapd notices
the watermarks are ok and goes back to sleep far quicker than 2.6.30
did.

One consequence of this is that kswapd is going back to sleep just as the
high watermark is clear but if it had slept for longer it would have found
that the zone quickly went back below the high watermark due to parallel
allocators. i.e. in 2.6.30, kswapd worked for longer than current mainline.

To see if there is any merit to this, the patch below also counts the number
of times that kswapd prematurely went to sleep. If kswapd is routinely going
to sleep with watermarks not being met, one correction might be to make
balance_pgdat() unconditionally sleep for HZ/10 instead of sleeping based on
congestion as this would bring kswapd closer in line with 2.6.30. Of course,
the pain in the neck is that the premature-sleep-check itself is happening
too quickly.

> It's very important to understand _why_ the VM got worse.  And, of
> course, to fix that up.  But, separately, we should find a way of
> preventing developers from using these very unreliable allocations.
> 

Agreed. I think the main thing that has changed is timing.  congestion_wait()
is now doing the "right" thing and sleeping until congestion is
cleared. Unfortunately, it feels like some users of congestion_wait(),
such as kswapd, really wanted to sleep for a fixed interval and not based
on congestion. The comment in balance_pgdat() appears to indicate this was
the expected behaviour.

==== CUT HERE ====
vmscan: Help debug kswapd issues by counting number of rewakeups and premature sleeps

There is a growing amount of anedotal evidence that high-order atomic
allocation failures have been increasing since 2.6.31-rc1. The two
strongest possibilities are a marked increase in the number of
GFP_ATOMIC allocations and alterations in timing. Debugging printk
patches have shown for example that kswapd is sleeping for shorter
intervals and going to sleep when watermarks are still not being met.

This patch adds two kswapd counters to help identify if timing is an
issue. The first counter kswapd_highorder_rewakeup counts the number of
times that kswapd stops reclaiming at one order and restarts at a higher
order. The second counter kswapd_slept_prematurely counts the number of
times kswapd went to sleep when the high watermark was not met.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 include/linux/vmstat.h |    1 +
 mm/vmscan.c            |   17 ++++++++++++++++-
 mm/vmstat.c            |    2 ++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 2d0f222..2e0d18d 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -40,6 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		PGSCAN_ZONE_RECLAIM_FAILED,
 #endif
 		PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
+		KSWAPD_HIGHORDER_REWAKEUP, KSWAPD_PREMATURE_SLEEP,
 		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7eceb02..cf40136 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2021,6 +2021,7 @@ loop_again:
 			 * if it is known that higher orders are required
 			 */
 			if (pgdat->kswapd_max_order > order) {
+				count_vm_event(KSWAPD_HIGHORDER_REWAKEUP);
 				all_zones_ok = 1;
 				goto out;
 			}
@@ -2124,6 +2125,17 @@ out:
 	return sc.nr_reclaimed;
 }

+static int kswapd_sleeping_prematurely(int order)
+{
+	struct zone *zone;
+	for_each_populated_zone(zone)
+		if (!zone_watermark_ok(zone, order, high_wmark_pages(zone),
+								0, 0))
+			return 1;
+
+	return 0;
+}
+
 /*
  * The background pageout daemon, started as a kernel thread
  * from the init process.
@@ -2183,8 +2195,11 @@ static int kswapd(void *p)
 			 */
 			order = new_order;
 		} else {
-			if (!freezing(current))
+			if (!freezing(current)) {
+				if (kswapd_sleeping_prematurely(order))
+					count_vm_event(KSWAPD_PREMATURE_SLEEP);
 				schedule();
+			}

 			order = pgdat->kswapd_max_order;
 		}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c81321f..fa881c5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -683,6 +683,8 @@ static const char * const vmstat_text[] = {
 	"slabs_scanned",
 	"kswapd_steal",
 	"kswapd_inodesteal",
+	"kswapd_highorder_rewakeup",
+	"kswapd_slept_prematurely",
 	"pageoutrun",
 	"allocstall",

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-02 16:05           ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-02 16:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Oct 28, 2009 at 12:47:56PM -0700, Andrew Morton wrote:
> On Wed, 28 Oct 2009 10:29:36 +0000
> Mel Gorman <mel@csn.ul.ie> wrote:
> 
> > On Tue, Oct 27, 2009 at 01:19:05PM -0700, Andrew Morton wrote:
> > > On Tue, 27 Oct 2009 13:40:33 +0000
> > > Mel Gorman <mel@csn.ul.ie> wrote:
> > > 
> > > > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > > > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > > > allocations. Something has changed in recent kernels that affect the timing
> > > > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > > > particularly under pressure. This patch forces kswapd to notice sooner that
> > > > high-order allocations are occuring.
> > > > 
> > > 
> > > "something has changed"?  Shouldn't we find out what that is?
> > > 
> > 
> > We've been trying but the answer right now is "lots". There were some
> > changes in the allocator itself which were unintentional and fixed in
> > patches 1 and 2 of this series. The two other major changes are
> > 
> > iwlagn is now making high order GFP_ATOMIC allocations which didn't
> > help. This is being addressed separetly and I believe the relevant
> > patches are now in mainline.
> > 
> > The other major change appears to be in page writeback. Reverting
> > commits 373c0a7e + 8aa7e847 significantly helps one bug reporter but
> > it's still unknown as to why that is.
> 
> Peculiar.  Those changes are fairly remote from large-order-GFP_ATOMIC
> allocations.
> 

Indeed. The significance of the patch seems to be how long and how often
processes sleep in the page allocator and what kswapd is doing.

> > ...
> >
> > Wireless drivers in particularly seem to be very
> > high-order GFP_ATOMIC happy.
> 
> It would be nice if we could find a way of preventing people from
> attempting high-order atomic allocations in the first place - it's a bit
> of a trap.
> 

True.

> Maybe add a runtime warning which is suppressable by GFP_NOWARN (or a
> new flag), then either fix existing callers or, after review, add the
> flag.
> 
> Of course, this might just end up with people adding these hopeless
> allocation attempts and just setting the nowarn flag :(
> 

That's the difficulty but we should consider adding such warnings or
maintaining in-kernel the unique GFP_ATOMIC callers and their frequency.
It would require a lot of monitoring though and a fair amount of stick
beatings to get the callers corrected.

> > > If one where to whack a printk in that `if' block, how often would it
> > > trigger, and under what circumstances?
> > 
> > I don't know the frequency. The circumstances are "under load" when
> > there are drivers depending on high-order allocations but the
> > reproduction cases are unreliable.
> > 
> > Do you want me to slap together a patch that adds a vmstat counter for
> > this? I can then ask future bug reporters to examine that counter and see
> > if it really is a major factor for a lot of people or not.
> 
> Something like that, if it will help us understand what's going on.  I
> don't see a permanent need for that instrumentation but while this
> problem is still in the research stage, sure, lard it up with debug
> stuff?
> 

I have a candidate patch below. One of the reasons it took so long to
get out is what I found on the way developing the patch. I had added a
debugging patch to printk what kswapd was doing. One massive difference I
noted was that in 2.6.30 kswapd often went to sleep for 25 jiffies (HZ/10)
in balance_pgdat(). In 2.6.31 and particularly in mainline, it sleeps less
and for shorter intervals. When the sleep interval is low, kswapd notices
the watermarks are ok and goes back to sleep far quicker than 2.6.30
did.

One consequence of this is that kswapd is going back to sleep just as the
high watermark is clear but if it had slept for longer it would have found
that the zone quickly went back below the high watermark due to parallel
allocators. i.e. in 2.6.30, kswapd worked for longer than current mainline.

To see if there is any merit to this, the patch below also counts the number
of times that kswapd prematurely went to sleep. If kswapd is routinely going
to sleep with watermarks not being met, one correction might be to make
balance_pgdat() unconditionally sleep for HZ/10 instead of sleeping based on
congestion as this would bring kswapd closer in line with 2.6.30. Of course,
the pain in the neck is that the premature-sleep-check itself is happening
too quickly.

> It's very important to understand _why_ the VM got worse.  And, of
> course, to fix that up.  But, separately, we should find a way of
> preventing developers from using these very unreliable allocations.
> 

Agreed. I think the main thing that has changed is timing.  congestion_wait()
is now doing the "right" thing and sleeping until congestion is
cleared. Unfortunately, it feels like some users of congestion_wait(),
such as kswapd, really wanted to sleep for a fixed interval and not based
on congestion. The comment in balance_pgdat() appears to indicate this was
the expected behaviour.

==== CUT HERE ====
vmscan: Help debug kswapd issues by counting number of rewakeups and premature sleeps

There is a growing amount of anedotal evidence that high-order atomic
allocation failures have been increasing since 2.6.31-rc1. The two
strongest possibilities are a marked increase in the number of
GFP_ATOMIC allocations and alterations in timing. Debugging printk
patches have shown for example that kswapd is sleeping for shorter
intervals and going to sleep when watermarks are still not being met.

This patch adds two kswapd counters to help identify if timing is an
issue. The first counter kswapd_highorder_rewakeup counts the number of
times that kswapd stops reclaiming at one order and restarts at a higher
order. The second counter kswapd_slept_prematurely counts the number of
times kswapd went to sleep when the high watermark was not met.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
--- 
 include/linux/vmstat.h |    1 +
 mm/vmscan.c            |   17 ++++++++++++++++-
 mm/vmstat.c            |    2 ++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 2d0f222..2e0d18d 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -40,6 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		PGSCAN_ZONE_RECLAIM_FAILED,
 #endif
 		PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
+		KSWAPD_HIGHORDER_REWAKEUP, KSWAPD_PREMATURE_SLEEP,
 		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7eceb02..cf40136 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2021,6 +2021,7 @@ loop_again:
 			 * if it is known that higher orders are required
 			 */
 			if (pgdat->kswapd_max_order > order) {
+				count_vm_event(KSWAPD_HIGHORDER_REWAKEUP);
 				all_zones_ok = 1;
 				goto out;
 			}
@@ -2124,6 +2125,17 @@ out:
 	return sc.nr_reclaimed;
 }

+static int kswapd_sleeping_prematurely(int order)
+{
+	struct zone *zone;
+	for_each_populated_zone(zone)
+		if (!zone_watermark_ok(zone, order, high_wmark_pages(zone),
+								0, 0))
+			return 1;
+
+	return 0;
+}
+
 /*
  * The background pageout daemon, started as a kernel thread
  * from the init process.
@@ -2183,8 +2195,11 @@ static int kswapd(void *p)
 			 */
 			order = new_order;
 		} else {
-			if (!freezing(current))
+			if (!freezing(current)) {
+				if (kswapd_sleeping_prematurely(order))
+					count_vm_event(KSWAPD_PREMATURE_SLEEP);
 				schedule();
+			}

 			order = pgdat->kswapd_max_order;
 		}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c81321f..fa881c5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -683,6 +683,8 @@ static const char * const vmstat_text[] = {
 	"slabs_scanned",
 	"kswapd_steal",
 	"kswapd_inodesteal",
+	"kswapd_highorder_rewakeup",
+	"kswapd_slept_prematurely",
 	"pageoutrun",
 	"allocstall",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-02 16:05           ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-02 16:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Oct 28, 2009 at 12:47:56PM -0700, Andrew Morton wrote:
> On Wed, 28 Oct 2009 10:29:36 +0000
> Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:
> 
> > On Tue, Oct 27, 2009 at 01:19:05PM -0700, Andrew Morton wrote:
> > > On Tue, 27 Oct 2009 13:40:33 +0000
> > > Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org> wrote:
> > > 
> > > > When a high-order allocation fails, kswapd is kicked so that it reclaims
> > > > at a higher-order to avoid direct reclaimers stall and to help GFP_ATOMIC
> > > > allocations. Something has changed in recent kernels that affect the timing
> > > > where high-order GFP_ATOMIC allocations are now failing with more frequency,
> > > > particularly under pressure. This patch forces kswapd to notice sooner that
> > > > high-order allocations are occuring.
> > > > 
> > > 
> > > "something has changed"?  Shouldn't we find out what that is?
> > > 
> > 
> > We've been trying but the answer right now is "lots". There were some
> > changes in the allocator itself which were unintentional and fixed in
> > patches 1 and 2 of this series. The two other major changes are
> > 
> > iwlagn is now making high order GFP_ATOMIC allocations which didn't
> > help. This is being addressed separetly and I believe the relevant
> > patches are now in mainline.
> > 
> > The other major change appears to be in page writeback. Reverting
> > commits 373c0a7e + 8aa7e847 significantly helps one bug reporter but
> > it's still unknown as to why that is.
> 
> Peculiar.  Those changes are fairly remote from large-order-GFP_ATOMIC
> allocations.
> 

Indeed. The significance of the patch seems to be how long and how often
processes sleep in the page allocator and what kswapd is doing.

> > ...
> >
> > Wireless drivers in particularly seem to be very
> > high-order GFP_ATOMIC happy.
> 
> It would be nice if we could find a way of preventing people from
> attempting high-order atomic allocations in the first place - it's a bit
> of a trap.
> 

True.

> Maybe add a runtime warning which is suppressable by GFP_NOWARN (or a
> new flag), then either fix existing callers or, after review, add the
> flag.
> 
> Of course, this might just end up with people adding these hopeless
> allocation attempts and just setting the nowarn flag :(
> 

That's the difficulty but we should consider adding such warnings or
maintaining in-kernel the unique GFP_ATOMIC callers and their frequency.
It would require a lot of monitoring though and a fair amount of stick
beatings to get the callers corrected.

> > > If one where to whack a printk in that `if' block, how often would it
> > > trigger, and under what circumstances?
> > 
> > I don't know the frequency. The circumstances are "under load" when
> > there are drivers depending on high-order allocations but the
> > reproduction cases are unreliable.
> > 
> > Do you want me to slap together a patch that adds a vmstat counter for
> > this? I can then ask future bug reporters to examine that counter and see
> > if it really is a major factor for a lot of people or not.
> 
> Something like that, if it will help us understand what's going on.  I
> don't see a permanent need for that instrumentation but while this
> problem is still in the research stage, sure, lard it up with debug
> stuff?
> 

I have a candidate patch below. One of the reasons it took so long to
get out is what I found on the way developing the patch. I had added a
debugging patch to printk what kswapd was doing. One massive difference I
noted was that in 2.6.30 kswapd often went to sleep for 25 jiffies (HZ/10)
in balance_pgdat(). In 2.6.31 and particularly in mainline, it sleeps less
and for shorter intervals. When the sleep interval is low, kswapd notices
the watermarks are ok and goes back to sleep far quicker than 2.6.30
did.

One consequence of this is that kswapd is going back to sleep just as the
high watermark is clear but if it had slept for longer it would have found
that the zone quickly went back below the high watermark due to parallel
allocators. i.e. in 2.6.30, kswapd worked for longer than current mainline.

To see if there is any merit to this, the patch below also counts the number
of times that kswapd prematurely went to sleep. If kswapd is routinely going
to sleep with watermarks not being met, one correction might be to make
balance_pgdat() unconditionally sleep for HZ/10 instead of sleeping based on
congestion as this would bring kswapd closer in line with 2.6.30. Of course,
the pain in the neck is that the premature-sleep-check itself is happening
too quickly.

> It's very important to understand _why_ the VM got worse.  And, of
> course, to fix that up.  But, separately, we should find a way of
> preventing developers from using these very unreliable allocations.
> 

Agreed. I think the main thing that has changed is timing.  congestion_wait()
is now doing the "right" thing and sleeping until congestion is
cleared. Unfortunately, it feels like some users of congestion_wait(),
such as kswapd, really wanted to sleep for a fixed interval and not based
on congestion. The comment in balance_pgdat() appears to indicate this was
the expected behaviour.

==== CUT HERE ====
vmscan: Help debug kswapd issues by counting number of rewakeups and premature sleeps

There is a growing amount of anedotal evidence that high-order atomic
allocation failures have been increasing since 2.6.31-rc1. The two
strongest possibilities are a marked increase in the number of
GFP_ATOMIC allocations and alterations in timing. Debugging printk
patches have shown for example that kswapd is sleeping for shorter
intervals and going to sleep when watermarks are still not being met.

This patch adds two kswapd counters to help identify if timing is an
issue. The first counter kswapd_highorder_rewakeup counts the number of
times that kswapd stops reclaiming at one order and restarts at a higher
order. The second counter kswapd_slept_prematurely counts the number of
times kswapd went to sleep when the high watermark was not met.

Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
--- 
 include/linux/vmstat.h |    1 +
 mm/vmscan.c            |   17 ++++++++++++++++-
 mm/vmstat.c            |    2 ++
 3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 2d0f222..2e0d18d 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -40,6 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		PGSCAN_ZONE_RECLAIM_FAILED,
 #endif
 		PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
+		KSWAPD_HIGHORDER_REWAKEUP, KSWAPD_PREMATURE_SLEEP,
 		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 7eceb02..cf40136 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2021,6 +2021,7 @@ loop_again:
 			 * if it is known that higher orders are required
 			 */
 			if (pgdat->kswapd_max_order > order) {
+				count_vm_event(KSWAPD_HIGHORDER_REWAKEUP);
 				all_zones_ok = 1;
 				goto out;
 			}
@@ -2124,6 +2125,17 @@ out:
 	return sc.nr_reclaimed;
 }

+static int kswapd_sleeping_prematurely(int order)
+{
+	struct zone *zone;
+	for_each_populated_zone(zone)
+		if (!zone_watermark_ok(zone, order, high_wmark_pages(zone),
+								0, 0))
+			return 1;
+
+	return 0;
+}
+
 /*
  * The background pageout daemon, started as a kernel thread
  * from the init process.
@@ -2183,8 +2195,11 @@ static int kswapd(void *p)
 			 */
 			order = new_order;
 		} else {
-			if (!freezing(current))
+			if (!freezing(current)) {
+				if (kswapd_sleeping_prematurely(order))
+					count_vm_event(KSWAPD_PREMATURE_SLEEP);
 				schedule();
+			}

 			order = pgdat->kswapd_max_order;
 		}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index c81321f..fa881c5 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -683,6 +683,8 @@ static const char * const vmstat_text[] = {
 	"slabs_scanned",
 	"kswapd_steal",
 	"kswapd_inodesteal",
+	"kswapd_highorder_rewakeup",
+	"kswapd_slept_prematurely",
 	"pageoutrun",
 	"allocstall",

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-11-01 14:44                       ` Rik van Riel
@ 2009-11-02 16:38                         ` Christoph Lameter
  -1 siblings, 0 replies; 115+ messages in thread
From: Christoph Lameter @ 2009-11-02 16:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Pavel Machek, David Rientjes, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Sun, 1 Nov 2009, Rik van Riel wrote:

> > So what? As soon as they do that, they lose any guarantees, anyway.
>
> They might lose the absolute guarantee, but that's no reason
> not to give it our best effort!

Then its not realtime anymore. "Realtime" seems to be some wishy
washy marketing term that flexes in a variety of ways.


^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-02 16:38                         ` Christoph Lameter
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Lameter @ 2009-11-02 16:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Pavel Machek, David Rientjes, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Sun, 1 Nov 2009, Rik van Riel wrote:

> > So what? As soon as they do that, they lose any guarantees, anyway.
>
> They might lose the absolute guarantee, but that's no reason
> not to give it our best effort!

Then its not realtime anymore. "Realtime" seems to be some wishy
washy marketing term that flexes in a variety of ways.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-10-31 23:59               ` Rik van Riel
@ 2009-11-02 16:42                 ` Christoph Lameter
  -1 siblings, 0 replies; 115+ messages in thread
From: Christoph Lameter @ 2009-11-02 16:42 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Pavel Machek, David Rientjes, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Sat, 31 Oct 2009, Rik van Riel wrote:

> On 10/31/2009 04:11 PM, Pavel Machek wrote:
>
> > But we can't guarantee that enough memory will be ready in the
> > reserves. So if realtime task relies on it, it is broken, and will
> > fail to meet its deadlines from time to time.
>
> Any realtime task that does networking (which may be the
> majority of realtime tasks) relies on the kernel memory
> allocator.

What is realtime in this scenario? There are no guarantees that reclaim
wont have to occur. There are no guarantees anymore and therefore you
cannot really call this realtime.

Is realtime anything more than: "I want to have my patches merged"?

Give some criteria as to what realtime is please. I am all for decreasing
kernel latencies. But some of this work is adding bloat and increasing
overhead.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-02 16:42                 ` Christoph Lameter
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Lameter @ 2009-11-02 16:42 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Pavel Machek, David Rientjes, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Sat, 31 Oct 2009, Rik van Riel wrote:

> On 10/31/2009 04:11 PM, Pavel Machek wrote:
>
> > But we can't guarantee that enough memory will be ready in the
> > reserves. So if realtime task relies on it, it is broken, and will
> > fail to meet its deadlines from time to time.
>
> Any realtime task that does networking (which may be the
> majority of realtime tasks) relies on the kernel memory
> allocator.

What is realtime in this scenario? There are no guarantees that reclaim
wont have to occur. There are no guarantees anymore and therefore you
cannot really call this realtime.

Is realtime anything more than: "I want to have my patches merged"?

Give some criteria as to what realtime is please. I am all for decreasing
kernel latencies. But some of this work is adding bloat and increasing
overhead.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-02 16:05           ` Mel Gorman
@ 2009-11-02 17:32             ` Frans Pop
  -1 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-02 17:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Monday 02 November 2009, Mel Gorman wrote:
> vmscan: Help debug kswapd issues by counting number of rewakeups and
> premature sleeps
>
> There is a growing amount of anedotal evidence that high-order atomic
> allocation failures have been increasing since 2.6.31-rc1. The two
> strongest possibilities are a marked increase in the number of
> GFP_ATOMIC allocations and alterations in timing. Debugging printk
> patches have shown for example that kswapd is sleeping for shorter
> intervals and going to sleep when watermarks are still not being met.
>
> This patch adds two kswapd counters to help identify if timing is an
> issue. The first counter kswapd_highorder_rewakeup counts the number of
> times that kswapd stops reclaiming at one order and restarts at a higher
> order. The second counter kswapd_slept_prematurely counts the number of
> times kswapd went to sleep when the high watermark was not met.

What testing would you like done with this patch?

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-02 17:32             ` Frans Pop
  0 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-02 17:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Monday 02 November 2009, Mel Gorman wrote:
> vmscan: Help debug kswapd issues by counting number of rewakeups and
> premature sleeps
>
> There is a growing amount of anedotal evidence that high-order atomic
> allocation failures have been increasing since 2.6.31-rc1. The two
> strongest possibilities are a marked increase in the number of
> GFP_ATOMIC allocations and alterations in timing. Debugging printk
> patches have shown for example that kswapd is sleeping for shorter
> intervals and going to sleep when watermarks are still not being met.
>
> This patch adds two kswapd counters to help identify if timing is an
> issue. The first counter kswapd_highorder_rewakeup counts the number of
> times that kswapd stops reclaiming at one order and restarts at a higher
> order. The second counter kswapd_slept_prematurely counts the number of
> times kswapd went to sleep when the high watermark was not met.

What testing would you like done with this patch?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-02 17:32             ` Frans Pop
  (?)
@ 2009-11-02 17:38               ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-02 17:38 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> On Monday 02 November 2009, Mel Gorman wrote:
> > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > premature sleeps
> >
> > There is a growing amount of anedotal evidence that high-order atomic
> > allocation failures have been increasing since 2.6.31-rc1. The two
> > strongest possibilities are a marked increase in the number of
> > GFP_ATOMIC allocations and alterations in timing. Debugging printk
> > patches have shown for example that kswapd is sleeping for shorter
> > intervals and going to sleep when watermarks are still not being met.
> >
> > This patch adds two kswapd counters to help identify if timing is an
> > issue. The first counter kswapd_highorder_rewakeup counts the number of
> > times that kswapd stops reclaiming at one order and restarts at a higher
> > order. The second counter kswapd_slept_prematurely counts the number of
> > times kswapd went to sleep when the high watermark was not met.
> 
> What testing would you like done with this patch?
> 

Same reproduction as before except post what the contents of
/proc/vmstat were after the problem was triggered.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-02 17:38               ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-02 17:38 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> On Monday 02 November 2009, Mel Gorman wrote:
> > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > premature sleeps
> >
> > There is a growing amount of anedotal evidence that high-order atomic
> > allocation failures have been increasing since 2.6.31-rc1. The two
> > strongest possibilities are a marked increase in the number of
> > GFP_ATOMIC allocations and alterations in timing. Debugging printk
> > patches have shown for example that kswapd is sleeping for shorter
> > intervals and going to sleep when watermarks are still not being met.
> >
> > This patch adds two kswapd counters to help identify if timing is an
> > issue. The first counter kswapd_highorder_rewakeup counts the number of
> > times that kswapd stops reclaiming at one order and restarts at a higher
> > order. The second counter kswapd_slept_prematurely counts the number of
> > times kswapd went to sleep when the high watermark was not met.
> 
> What testing would you like done with this patch?
> 

Same reproduction as before except post what the contents of
/proc/vmstat were after the problem was triggered.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-02 17:38               ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-02 17:38 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Rik van Riel, Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List

On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> On Monday 02 November 2009, Mel Gorman wrote:
> > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > premature sleeps
> >
> > There is a growing amount of anedotal evidence that high-order atomic
> > allocation failures have been increasing since 2.6.31-rc1. The two
> > strongest possibilities are a marked increase in the number of
> > GFP_ATOMIC allocations and alterations in timing. Debugging printk
> > patches have shown for example that kswapd is sleeping for shorter
> > intervals and going to sleep when watermarks are still not being met.
> >
> > This patch adds two kswapd counters to help identify if timing is an
> > issue. The first counter kswapd_highorder_rewakeup counts the number of
> > times that kswapd stops reclaiming at one order and restarts at a higher
> > order. The second counter kswapd_slept_prematurely counts the number of
> > times kswapd went to sleep when the high watermark was not met.
> 
> What testing would you like done with this patch?
> 

Same reproduction as before except post what the contents of
/proc/vmstat were after the problem was triggered.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-02 17:38               ` Mel Gorman
@ 2009-11-02 20:36                 ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-02 20:36 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Mon, Nov 02, 2009 at 05:38:38PM +0000, Mel Gorman wrote:
> On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> > On Monday 02 November 2009, Mel Gorman wrote:
> > > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > > premature sleeps
> > >
> > > There is a growing amount of anedotal evidence that high-order atomic
> > > allocation failures have been increasing since 2.6.31-rc1. The two
> > > strongest possibilities are a marked increase in the number of
> > > GFP_ATOMIC allocations and alterations in timing. Debugging printk
> > > patches have shown for example that kswapd is sleeping for shorter
> > > intervals and going to sleep when watermarks are still not being met.
> > >
> > > This patch adds two kswapd counters to help identify if timing is an
> > > issue. The first counter kswapd_highorder_rewakeup counts the number of
> > > times that kswapd stops reclaiming at one order and restarts at a higher
> > > order. The second counter kswapd_slept_prematurely counts the number of
> > > times kswapd went to sleep when the high watermark was not met.
> > 
> > What testing would you like done with this patch?
> > 
> 
> Same reproduction as before except post what the contents of
> /proc/vmstat were after the problem was triggered.
> 

In the event there is a positive count for kswapd_slept_prematurely after
the error is produced, can you also check if the following patch makes a
difference and what the contents of vmstat are please? It alters how kswapd
behaves and when it goes to sleep.

Thanks

==== CUT HERE ====
vmscan: Have kswapd sleep for a short interval and double check it should be asleep

After kswapd balances all zones in a pgdat, it goes to sleep. In the event
of no IO congestion, kswapd can go to sleep very shortly after the high
watermark was reached. If there are a constant stream of allocations from
parallel processes, it can mean that kswapd went to sleep too quickly and
the high watermark is not being maintained for sufficient length time.

This patch makes kswapd go to sleep as a two-stage process. It first
tries to sleep for HZ/10. If it is woken up by another process or the
high watermark is no longer met, it's considered a premature sleep and
kswapd continues work. Otherwise it goes fully to sleep.

This adds more counters to distinguish between fast and slow breaches of
watermarks. A "fast" premature sleep is one where the low watermark was
hit in a very short time after kswapd going to sleep. A "slow" premature
sleep indicates that the high watermark was breached after a very short
interval.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/vmstat.h |    3 ++-
 mm/vmscan.c            |   31 +++++++++++++++++++++++++++----
 mm/vmstat.c            |    3 ++-
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 2e0d18d..f344878 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -40,7 +40,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		PGSCAN_ZONE_RECLAIM_FAILED,
 #endif
 		PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
-		KSWAPD_HIGHORDER_REWAKEUP, KSWAPD_PREMATURE_SLEEP,
+		KSWAPD_HIGHORDER_REWAKEUP,
+		KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW,
 		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 11a69a8..70aeb05 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1905,10 +1905,14 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 #endif
 
 /* is kswapd sleeping prematurely? */
-static int sleeping_prematurely(int order)
+static int sleeping_prematurely(int order, long remaining)
 {
 	struct zone *zone;
 
+	/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
+	if (remaining)
+		return 1;
+
 	/* If after HZ/10, a zone is below the high mark, it's premature */
 	for_each_populated_zone(zone)
 		if (!zone_watermark_ok(zone, order, high_wmark_pages(zone),
@@ -2209,9 +2213,28 @@ static int kswapd(void *p)
 			order = new_order;
 		} else {
 			if (!freezing(current)) {
-				if (sleeping_prematurely(order))
-					count_vm_event(KSWAPD_PREMATURE_SLEEP);
-				schedule();
+				long remaining = 0;
+
+				/* Try to sleep for a short interval */
+				if (!sleeping_prematurely(order, remaining)) {
+					remaining = schedule_timeout(HZ/10);
+					finish_wait(&pgdat->kswapd_wait, &wait);
+					prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
+				}
+
+				/*
+				 * After a short sleep, check if it was a
+				 * premature sleep. If not, then go fully
+				 * to sleep until explicitly woken up
+				 */
+				if (!sleeping_prematurely(order, remaining))
+					schedule();
+				else {
+					if (remaining)
+						count_vm_event(KSWAPD_PREMATURE_FAST);
+					else
+						count_vm_event(KSWAPD_PREMATURE_SLOW);
+				}
 			}
 
 			order = pgdat->kswapd_max_order;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index fa881c5..47a6914 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -684,7 +684,8 @@ static const char * const vmstat_text[] = {
 	"kswapd_steal",
 	"kswapd_inodesteal",
 	"kswapd_highorder_rewakeup",
-	"kswapd_slept_prematurely",
+	"kswapd_slept_prematurely_fast",
+	"kswapd_slept_prematurely_slow",
 	"pageoutrun",
 	"allocstall",
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-02 20:36                 ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-02 20:36 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Mon, Nov 02, 2009 at 05:38:38PM +0000, Mel Gorman wrote:
> On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> > On Monday 02 November 2009, Mel Gorman wrote:
> > > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > > premature sleeps
> > >
> > > There is a growing amount of anedotal evidence that high-order atomic
> > > allocation failures have been increasing since 2.6.31-rc1. The two
> > > strongest possibilities are a marked increase in the number of
> > > GFP_ATOMIC allocations and alterations in timing. Debugging printk
> > > patches have shown for example that kswapd is sleeping for shorter
> > > intervals and going to sleep when watermarks are still not being met.
> > >
> > > This patch adds two kswapd counters to help identify if timing is an
> > > issue. The first counter kswapd_highorder_rewakeup counts the number of
> > > times that kswapd stops reclaiming at one order and restarts at a higher
> > > order. The second counter kswapd_slept_prematurely counts the number of
> > > times kswapd went to sleep when the high watermark was not met.
> > 
> > What testing would you like done with this patch?
> > 
> 
> Same reproduction as before except post what the contents of
> /proc/vmstat were after the problem was triggered.
> 

In the event there is a positive count for kswapd_slept_prematurely after
the error is produced, can you also check if the following patch makes a
difference and what the contents of vmstat are please? It alters how kswapd
behaves and when it goes to sleep.

Thanks

==== CUT HERE ====
vmscan: Have kswapd sleep for a short interval and double check it should be asleep

After kswapd balances all zones in a pgdat, it goes to sleep. In the event
of no IO congestion, kswapd can go to sleep very shortly after the high
watermark was reached. If there are a constant stream of allocations from
parallel processes, it can mean that kswapd went to sleep too quickly and
the high watermark is not being maintained for sufficient length time.

This patch makes kswapd go to sleep as a two-stage process. It first
tries to sleep for HZ/10. If it is woken up by another process or the
high watermark is no longer met, it's considered a premature sleep and
kswapd continues work. Otherwise it goes fully to sleep.

This adds more counters to distinguish between fast and slow breaches of
watermarks. A "fast" premature sleep is one where the low watermark was
hit in a very short time after kswapd going to sleep. A "slow" premature
sleep indicates that the high watermark was breached after a very short
interval.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/vmstat.h |    3 ++-
 mm/vmscan.c            |   31 +++++++++++++++++++++++++++----
 mm/vmstat.c            |    3 ++-
 3 files changed, 31 insertions(+), 6 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 2e0d18d..f344878 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -40,7 +40,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		PGSCAN_ZONE_RECLAIM_FAILED,
 #endif
 		PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
-		KSWAPD_HIGHORDER_REWAKEUP, KSWAPD_PREMATURE_SLEEP,
+		KSWAPD_HIGHORDER_REWAKEUP,
+		KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW,
 		PAGEOUTRUN, ALLOCSTALL, PGROTATED,
 #ifdef CONFIG_HUGETLB_PAGE
 		HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 11a69a8..70aeb05 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1905,10 +1905,14 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
 #endif
 
 /* is kswapd sleeping prematurely? */
-static int sleeping_prematurely(int order)
+static int sleeping_prematurely(int order, long remaining)
 {
 	struct zone *zone;
 
+	/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
+	if (remaining)
+		return 1;
+
 	/* If after HZ/10, a zone is below the high mark, it's premature */
 	for_each_populated_zone(zone)
 		if (!zone_watermark_ok(zone, order, high_wmark_pages(zone),
@@ -2209,9 +2213,28 @@ static int kswapd(void *p)
 			order = new_order;
 		} else {
 			if (!freezing(current)) {
-				if (sleeping_prematurely(order))
-					count_vm_event(KSWAPD_PREMATURE_SLEEP);
-				schedule();
+				long remaining = 0;
+
+				/* Try to sleep for a short interval */
+				if (!sleeping_prematurely(order, remaining)) {
+					remaining = schedule_timeout(HZ/10);
+					finish_wait(&pgdat->kswapd_wait, &wait);
+					prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
+				}
+
+				/*
+				 * After a short sleep, check if it was a
+				 * premature sleep. If not, then go fully
+				 * to sleep until explicitly woken up
+				 */
+				if (!sleeping_prematurely(order, remaining))
+					schedule();
+				else {
+					if (remaining)
+						count_vm_event(KSWAPD_PREMATURE_FAST);
+					else
+						count_vm_event(KSWAPD_PREMATURE_SLOW);
+				}
 			}
 
 			order = pgdat->kswapd_max_order;
diff --git a/mm/vmstat.c b/mm/vmstat.c
index fa881c5..47a6914 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -684,7 +684,8 @@ static const char * const vmstat_text[] = {
 	"kswapd_steal",
 	"kswapd_inodesteal",
 	"kswapd_highorder_rewakeup",
-	"kswapd_slept_prematurely",
+	"kswapd_slept_prematurely_fast",
+	"kswapd_slept_prematurely_slow",
 	"pageoutrun",
 	"allocstall",
 
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-11-02 16:42                 ` Christoph Lameter
  (?)
@ 2009-11-02 20:53                   ` David Rientjes
  -1 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-11-02 20:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Pavel Machek, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Mon, 2 Nov 2009, Christoph Lameter wrote:

> What is realtime in this scenario? There are no guarantees that reclaim
> wont have to occur. There are no guarantees anymore and therefore you
> cannot really call this realtime.
> 

Realtime in this scenario is anything with a priority of MAX_RT_PRIO or 
lower.

> Is realtime anything more than: "I want to have my patches merged"?
> 

These allocations are not using ~__GFP_WAIT for a reason, they can block 
on direct reclaim.

But we're convoluting this issue _way_ more than it needs to be.  We have 
used ALLOC_HARDER for these tasks as a convenience for over four years.  
The fix here is to address an omittion in the page allocator refactoring 
code that went into 2.6.31 that dropped the check for !in_interrupt().

If you'd like to raise the concern about the rt exemption being given 
ALLOC_HARDER, then it is seperate from this fix.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-02 20:53                   ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-11-02 20:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Pavel Machek, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Mon, 2 Nov 2009, Christoph Lameter wrote:

> What is realtime in this scenario? There are no guarantees that reclaim
> wont have to occur. There are no guarantees anymore and therefore you
> cannot really call this realtime.
> 

Realtime in this scenario is anything with a priority of MAX_RT_PRIO or 
lower.

> Is realtime anything more than: "I want to have my patches merged"?
> 

These allocations are not using ~__GFP_WAIT for a reason, they can block 
on direct reclaim.

But we're convoluting this issue _way_ more than it needs to be.  We have 
used ALLOC_HARDER for these tasks as a convenience for over four years.  
The fix here is to address an omittion in the page allocator refactoring 
code that went into 2.6.31 that dropped the check for !in_interrupt().

If you'd like to raise the concern about the rt exemption being given 
ALLOC_HARDER, then it is seperate from this fix.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-02 20:53                   ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-11-02 20:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Pavel Machek, Andrew Morton, Mel Gorman,
	stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Stephan von Krawczynski,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Mon, 2 Nov 2009, Christoph Lameter wrote:

> What is realtime in this scenario? There are no guarantees that reclaim
> wont have to occur. There are no guarantees anymore and therefore you
> cannot really call this realtime.
> 

Realtime in this scenario is anything with a priority of MAX_RT_PRIO or 
lower.

> Is realtime anything more than: "I want to have my patches merged"?
> 

These allocations are not using ~__GFP_WAIT for a reason, they can block 
on direct reclaim.

But we're convoluting this issue _way_ more than it needs to be.  We have 
used ALLOC_HARDER for these tasks as a convenience for over four years.  
The fix here is to address an omittion in the page allocator refactoring 
code that went into 2.6.31 that dropped the check for !in_interrupt().

If you'd like to raise the concern about the rt exemption being given 
ALLOC_HARDER, then it is seperate from this fix.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-11-02 20:53                   ` David Rientjes
@ 2009-11-03 17:10                     ` Christoph Lameter
  -1 siblings, 0 replies; 115+ messages in thread
From: Christoph Lameter @ 2009-11-03 17:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: Rik van Riel, Pavel Machek, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Mon, 2 Nov 2009, David Rientjes wrote:

> Realtime in this scenario is anything with a priority of MAX_RT_PRIO or
> lower.

If you dont know what "realtime" is then we cannot really implement
"realtime" behavior in the page allocator.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-03 17:10                     ` Christoph Lameter
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Lameter @ 2009-11-03 17:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: Rik van Riel, Pavel Machek, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Mon, 2 Nov 2009, David Rientjes wrote:

> Realtime in this scenario is anything with a priority of MAX_RT_PRIO or
> lower.

If you dont know what "realtime" is then we cannot really implement
"realtime" behavior in the page allocator.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-02 17:38               ` Mel Gorman
                                 ` (2 preceding siblings ...)
  (?)
@ 2009-11-03 22:01               ` Frans Pop
  2009-11-03 22:08                   ` Mel Gorman
  -1 siblings, 1 reply; 115+ messages in thread
From: Frans Pop @ 2009-11-03 22:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

[-- Attachment #1: Type: text/plain, Size: 1390 bytes --]

On Monday 02 November 2009, Mel Gorman wrote:
> On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> > On Monday 02 November 2009, Mel Gorman wrote:
> > > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > > premature sleeps
> > >
> > > There is a growing amount of anedotal evidence that high-order
> > > atomic allocation failures have been increasing since 2.6.31-rc1.
> > > The two strongest possibilities are a marked increase in the number
> > > of GFP_ATOMIC allocations and alterations in timing. Debugging
> > > printk patches have shown for example that kswapd is sleeping for
> > > shorter intervals and going to sleep when watermarks are still not
> > > being met.
> > >
> > > This patch adds two kswapd counters to help identify if timing is an
> > > issue. The first counter kswapd_highorder_rewakeup counts the number
> > > of times that kswapd stops reclaiming at one order and restarts at a
> > > higher order. The second counter kswapd_slept_prematurely counts the
> > > number of times kswapd went to sleep when the high watermark was not
> > > met.
> >
> > What testing would you like done with this patch?
>
> Same reproduction as before except post what the contents of
> /proc/vmstat were after the problem was triggered.

With a representative test I get 0 for kswapd_slept_prematurely.
Tested with .32-rc6 + patches 1-3 + this patch.


[-- Attachment #2: vmstat --]
[-- Type: text/plain, Size: 1407 bytes --]

nr_free_pages 4841
nr_inactive_anon 103124
nr_active_anon 305446
nr_inactive_file 20214
nr_active_file 9217
nr_unevictable 400
nr_mlock 400
nr_anon_pages 364727
nr_mapped 2907
nr_file_pages 74823
nr_dirty 1
nr_writeback 0
nr_slab_reclaimable 2749
nr_slab_unreclaimable 4024
nr_page_table_pages 4286
nr_kernel_stack 177
nr_unstable 0
nr_bounce 0
nr_vmscan_write 226841
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 18
pgpgin 651718
pgpgout 918016
pswpin 10144
pswpout 226833
pgalloc_dma 2193
pgalloc_dma32 1965234
pgalloc_normal 0
pgalloc_movable 0
pgfree 1972499
pgactivate 124982
pgdeactivate 387354
pgfault 2237876
pgmajfault 7305
pgrefill_dma 1538
pgrefill_dma32 388961
pgrefill_normal 0
pgrefill_movable 0
pgsteal_dma 67
pgsteal_dma32 305556
pgsteal_normal 0
pgsteal_movable 0
pgscan_kswapd_dma 192
pgscan_kswapd_dma32 419147
pgscan_kswapd_normal 0
pgscan_kswapd_movable 0
pgscan_direct_dma 576
pgscan_direct_dma32 299638
pgscan_direct_normal 0
pgscan_direct_movable 0
pginodesteal 2504
slabs_scanned 40960
kswapd_steal 250714
kswapd_inodesteal 6259
kswapd_highorder_rewakeup 22
kswapd_slept_prematurely 0
pageoutrun 3502
allocstall 975
pgrotated 226573
unevictable_pgs_culled 4251
unevictable_pgs_scanned 0
unevictable_pgs_rescued 33344
unevictable_pgs_mlocked 43192
unevictable_pgs_munlocked 42780
unevictable_pgs_cleared 2
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-03 22:01               ` Frans Pop
  2009-11-03 22:08                   ` Mel Gorman
@ 2009-11-03 22:08                   ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-03 22:08 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Tue, Nov 03, 2009 at 11:01:50PM +0100, Frans Pop wrote:
> On Monday 02 November 2009, Mel Gorman wrote:
> > On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> > > On Monday 02 November 2009, Mel Gorman wrote:
> > > > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > > > premature sleeps
> > > >
> > > > There is a growing amount of anedotal evidence that high-order
> > > > atomic allocation failures have been increasing since 2.6.31-rc1.
> > > > The two strongest possibilities are a marked increase in the number
> > > > of GFP_ATOMIC allocations and alterations in timing. Debugging
> > > > printk patches have shown for example that kswapd is sleeping for
> > > > shorter intervals and going to sleep when watermarks are still not
> > > > being met.
> > > >
> > > > This patch adds two kswapd counters to help identify if timing is an
> > > > issue. The first counter kswapd_highorder_rewakeup counts the number
> > > > of times that kswapd stops reclaiming at one order and restarts at a
> > > > higher order. The second counter kswapd_slept_prematurely counts the
> > > > number of times kswapd went to sleep when the high watermark was not
> > > > met.
> > >
> > > What testing would you like done with this patch?
> >
> > Same reproduction as before except post what the contents of
> > /proc/vmstat were after the problem was triggered.
> 
> With a representative test I get 0 for kswapd_slept_prematurely.
> Tested with .32-rc6 + patches 1-3 + this patch.
> 

Assuming the problem actually reproduced, can you still retest with the
patch I posted as a follow-up and see if fast or slow premature sleeps
are happening and if the problem still occurs please? It's still
possible with the patch as-is could be timing related. After I posted
this patch, I continued testing and found I could get counts fairly
reliably if kswapd was calling printk() before making the premature
check so the window appears to be very small.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-03 22:08                   ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-03 22:08 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Tue, Nov 03, 2009 at 11:01:50PM +0100, Frans Pop wrote:
> On Monday 02 November 2009, Mel Gorman wrote:
> > On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> > > On Monday 02 November 2009, Mel Gorman wrote:
> > > > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > > > premature sleeps
> > > >
> > > > There is a growing amount of anedotal evidence that high-order
> > > > atomic allocation failures have been increasing since 2.6.31-rc1.
> > > > The two strongest possibilities are a marked increase in the number
> > > > of GFP_ATOMIC allocations and alterations in timing. Debugging
> > > > printk patches have shown for example that kswapd is sleeping for
> > > > shorter intervals and going to sleep when watermarks are still not
> > > > being met.
> > > >
> > > > This patch adds two kswapd counters to help identify if timing is an
> > > > issue. The first counter kswapd_highorder_rewakeup counts the number
> > > > of times that kswapd stops reclaiming at one order and restarts at a
> > > > higher order. The second counter kswapd_slept_prematurely counts the
> > > > number of times kswapd went to sleep when the high watermark was not
> > > > met.
> > >
> > > What testing would you like done with this patch?
> >
> > Same reproduction as before except post what the contents of
> > /proc/vmstat were after the problem was triggered.
> 
> With a representative test I get 0 for kswapd_slept_prematurely.
> Tested with .32-rc6 + patches 1-3 + this patch.
> 

Assuming the problem actually reproduced, can you still retest with the
patch I posted as a follow-up and see if fast or slow premature sleeps
are happening and if the problem still occurs please? It's still
possible with the patch as-is could be timing related. After I posted
this patch, I continued testing and found I could get counts fairly
reliably if kswapd was calling printk() before making the premature
check so the window appears to be very small.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-03 22:08                   ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-03 22:08 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Rik van Riel, Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List

On Tue, Nov 03, 2009 at 11:01:50PM +0100, Frans Pop wrote:
> On Monday 02 November 2009, Mel Gorman wrote:
> > On Mon, Nov 02, 2009 at 06:32:54PM +0100, Frans Pop wrote:
> > > On Monday 02 November 2009, Mel Gorman wrote:
> > > > vmscan: Help debug kswapd issues by counting number of rewakeups and
> > > > premature sleeps
> > > >
> > > > There is a growing amount of anedotal evidence that high-order
> > > > atomic allocation failures have been increasing since 2.6.31-rc1.
> > > > The two strongest possibilities are a marked increase in the number
> > > > of GFP_ATOMIC allocations and alterations in timing. Debugging
> > > > printk patches have shown for example that kswapd is sleeping for
> > > > shorter intervals and going to sleep when watermarks are still not
> > > > being met.
> > > >
> > > > This patch adds two kswapd counters to help identify if timing is an
> > > > issue. The first counter kswapd_highorder_rewakeup counts the number
> > > > of times that kswapd stops reclaiming at one order and restarts at a
> > > > higher order. The second counter kswapd_slept_prematurely counts the
> > > > number of times kswapd went to sleep when the high watermark was not
> > > > met.
> > >
> > > What testing would you like done with this patch?
> >
> > Same reproduction as before except post what the contents of
> > /proc/vmstat were after the problem was triggered.
> 
> With a representative test I get 0 for kswapd_slept_prematurely.
> Tested with .32-rc6 + patches 1-3 + this patch.
> 

Assuming the problem actually reproduced, can you still retest with the
patch I posted as a follow-up and see if fast or slow premature sleeps
are happening and if the problem still occurs please? It's still
possible with the patch as-is could be timing related. After I posted
this patch, I continued testing and found I could get counts fairly
reliably if kswapd was calling printk() before making the premature
check so the window appears to be very small.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-03 22:08                   ` Mel Gorman
  (?)
  (?)
@ 2009-11-04  0:01                   ` Frans Pop
  2009-11-04  1:18                       ` Mel Gorman
  -1 siblings, 1 reply; 115+ messages in thread
From: Frans Pop @ 2009-11-04  0:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

[-- Attachment #1: Type: text/plain, Size: 1675 bytes --]

On Tuesday 03 November 2009, you wrote:
> > With a representative test I get 0 for kswapd_slept_prematurely.
> > Tested with .32-rc6 + patches 1-3 + this patch.
>
> Assuming the problem actually reproduced, can you still retest with the

Yes, it does.

> patch I posted as a follow-up and see if fast or slow premature sleeps
> are happening and if the problem still occurs please? It's still
> possible with the patch as-is could be timing related. After I posted
> this patch, I continued testing and found I could get counts fairly
> reliably if kswapd was calling printk() before making the premature
> check so the window appears to be very small.

Tested with .32-rc6 and .31.1. With that follow-up patch I still get 
freezes and SKB allocation errors. And I don't get anywhere near the fast, 
smooth and reliable behavior I get when I do the congestion_wait() 
reverts.

The new case does trigger as you can see below, but I'm afraid I don't see 
it making any significant difference for my test. Hope the data is still 
useful for you.

From vmstat for .32-rc6:
kswapd_highorder_rewakeup 8
kswapd_slept_prematurely_fast 329
kswapd_slept_prematurely_slow 55

From vmstat for .31.1:
kswapd_highorder_rewakeup 20
kswapd_slept_prematurely_fast 307
kswapd_slept_prematurely_slow 105

If you'd like me to test with the congestion_wait() revert on top of this 
for comparison, please let me know.

Cheers,
FJP

P.S. Your follow-up patch did not apply cleanly on top of the debug one as 
you seem to have made some changes between posting them (dropped kswapd_ 
from the sleeping_prematurely() function name and added a comment).


[-- Attachment #2: vmstat.32 --]
[-- Type: text/plain, Size: 1451 bytes --]

nr_free_pages 4798
nr_inactive_anon 102550
nr_active_anon 305242
nr_inactive_file 17876
nr_active_file 13213
nr_unevictable 400
nr_mlock 400
nr_anon_pages 376898
nr_mapped 2769
nr_file_pages 63678
nr_dirty 18
nr_writeback 0
nr_slab_reclaimable 2236
nr_slab_unreclaimable 3984
nr_page_table_pages 3996
nr_kernel_stack 173
nr_unstable 0
nr_bounce 0
nr_vmscan_write 215582
nr_writeback_temp 0
nr_isolated_anon 0
nr_isolated_file 0
nr_shmem 17
pgpgin 607186
pgpgout 872956
pswpin 9397
pswpout 215580
pgalloc_dma 2128
pgalloc_dma32 1922180
pgalloc_normal 0
pgalloc_movable 0
pgfree 1929319
pgactivate 122493
pgdeactivate 383992
pgfault 2210388
pgmajfault 6625
pgrefill_dma 1792
pgrefill_dma32 386511
pgrefill_normal 0
pgrefill_movable 0
pgsteal_dma 41
pgsteal_dma32 295511
pgsteal_normal 0
pgsteal_movable 0
pgscan_kswapd_dma 64
pgscan_kswapd_dma32 379687
pgscan_kswapd_normal 0
pgscan_kswapd_movable 0
pgscan_direct_dma 36768
pgscan_direct_dma32 5233523
pgscan_direct_normal 0
pgscan_direct_movable 0
pginodesteal 2416
slabs_scanned 42240
kswapd_steal 241253
kswapd_inodesteal 6252
kswapd_highorder_rewakeup 20
kswapd_slept_prematurely_fast 307
kswapd_slept_prematurely_slow 105
pageoutrun 3394
allocstall 964
pgrotated 215342
unevictable_pgs_culled 4247
unevictable_pgs_scanned 0
unevictable_pgs_rescued 33344
unevictable_pgs_mlocked 43192
unevictable_pgs_munlocked 42780
unevictable_pgs_cleared 2
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

[-- Attachment #3: vmstat.31 --]
[-- Type: text/plain, Size: 1375 bytes --]

nr_free_pages 5730
nr_inactive_anon 101680
nr_active_anon 304236
nr_inactive_file 18296
nr_active_file 14717
nr_unevictable 408
nr_mlock 408
nr_anon_pages 347177
nr_mapped 2751
nr_file_pages 93394
nr_dirty 8
nr_writeback 0
nr_slab_reclaimable 2218
nr_slab_unreclaimable 3670
nr_page_table_pages 3976
nr_unstable 0
nr_bounce 0
nr_vmscan_write 238631
nr_writeback_temp 0
pgpgin 594630
pgpgout 964231
pswpin 8629
pswpout 238627
pgalloc_dma 2169
pgalloc_dma32 1869092
pgalloc_normal 0
pgalloc_movable 0
pgfree 1877147
pgactivate 116309
pgdeactivate 372861
pgfault 2152528
pgmajfault 6806
pgrefill_dma 1410
pgrefill_dma32 375616
pgrefill_normal 0
pgrefill_movable 0
pgsteal_dma 54
pgsteal_dma32 285950
pgsteal_normal 0
pgsteal_movable 0
pgscan_kswapd_dma 96
pgscan_kswapd_dma32 564994
pgscan_kswapd_normal 0
pgscan_kswapd_movable 0
pgscan_direct_dma 448
pgscan_direct_dma32 268795
pgscan_direct_normal 0
pgscan_direct_movable 0
pginodesteal 2411
slabs_scanned 41600
kswapd_steal 247394
kswapd_inodesteal 6479
kswapd_highorder_rewakeup 8
kswapd_slept_prematurely_fast 329
kswapd_slept_prematurely_slow 55
pageoutrun 3525
allocstall 686
pgrotated 238322
unevictable_pgs_culled 4254
unevictable_pgs_scanned 0
unevictable_pgs_rescued 33336
unevictable_pgs_mlocked 43192
unevictable_pgs_munlocked 42772
unevictable_pgs_cleared 2
unevictable_pgs_stranded 0
unevictable_pgs_mlockfreed 0

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-04  0:01                   ` Frans Pop
@ 2009-11-04  1:18                       ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-04  1:18 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Nov 04, 2009 at 01:01:46AM +0100, Frans Pop wrote:
> On Tuesday 03 November 2009, you wrote:
> > > With a representative test I get 0 for kswapd_slept_prematurely.
> > > Tested with .32-rc6 + patches 1-3 + this patch.
> >
> > Assuming the problem actually reproduced, can you still retest with the
> 
> Yes, it does.
> 
> > patch I posted as a follow-up and see if fast or slow premature sleeps
> > are happening and if the problem still occurs please? It's still
> > possible with the patch as-is could be timing related. After I posted
> > this patch, I continued testing and found I could get counts fairly
> > reliably if kswapd was calling printk() before making the premature
> > check so the window appears to be very small.
> 
> Tested with .32-rc6 and .31.1. With that follow-up patch I still get 
> freezes and SKB allocation errors. And I don't get anywhere near the fast, 
> smooth and reliable behavior I get when I do the congestion_wait() 
> reverts.
> 

Yeah. What it really appears to get down to is that the congestion changes
have altered the timing in a manner that frankly, I'm not sure how to define
as "good" or "bad". The congestion changes in themselves appear sane and
the amount of time callers sleep appears better but the actual result sucks
for the constant stream of high-order allocations that are occuring from
the driver. This abuse of high-order atomics has been addressed but it's
showing up in other horrible ways.

> The new case does trigger as you can see below, but I'm afraid I don't see 
> it making any significant difference for my test. Hope the data is still 
> useful for you.
> 
> From vmstat for .32-rc6:
> kswapd_highorder_rewakeup 8
> kswapd_slept_prematurely_fast 329
> kswapd_slept_prematurely_slow 55
> 
> From vmstat for .31.1:
> kswapd_highorder_rewakeup 20
> kswapd_slept_prematurely_fast 307
> kswapd_slept_prematurely_slow 105
> 

This is useful.

The high premature_fast shows that after kswapd apparently finishes its work,
the high waterwater marks are being breached very quickly (the fast counter
being positive). The "slow" counter is even worse. Your machine is getting
from the high to low watermark quickly without kswapd noticing and processes
depending on the atomics are not waiting long enough.

> If you'd like me to test with the congestion_wait() revert on top of this 
> for comparison, please let me know.
> 

No, there is resistance to rolling back the congestion_wait() changes from
what I gather because they were introduced for sane reasons. The consequence
is just that the reliability of high-order atomics are impacted because more
processes are making forward progress where previously they would have waited
until kswapd had done work. Your driver has already been fixed in this regard
and maybe it's a case that the other atomic users simply have to be fixed to
"not do that".

> P.S. Your follow-up patch did not apply cleanly on top of the debug one as 
> you seem to have made some changes between posting them (dropped kswapd_ 
> from the sleeping_prematurely() function name and added a comment).
> 

Sorry about that. Clearly I've gotten out of sync slightly with the
patchset I'm testing and basing upon as opposed to what I'm posting
here.

Here is yet another patch to be applied on top of the rest of the
patches. Sorry about any typo's, I was out for a friends birthday and I
have a few beers on me but it boots on qemu and passes basic stress tests
at least. The intention of the patch is to delay high-order allocations of
those that can wait for kswapd to do work in parallel. It will only help
the case where there are a mix of high-order allocations that can sleep and
those that can't. Because the main burst of your allocations appear to be
high-order atomics, it might not help but it might delay order-1 allocations
due to many instances of fork() in your workload if 8K stacks are being used.

==== CUT HERE ====
page allocator: Sleep where the intention was to sleep instead of waiting on congestion

At two points during page allocation, it is possible for the process to
sleep for a short interval depending on congestion. There is some anedotal
evidence that since 2.6.31-rc1, the processes are sleeping for less time
than before as the congestion_wait() logic has improved.

However, one consequence of this is that processes are waking up too
quickly, finding that forward progress is still difficult and failing
too early. This patch causes processes to sleep for a fixed interval
instead of sleeping depending on congestion.

With this patch applied, the number of premature sleeps of kswapd as
measured by kswapd_slept_prematurely is reduced while running a stress
test based on parallel executions of dd under QEMU. Furthermore, under
the stress test, the number of oom-killer occurances is drastically
reduced.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/page_alloc.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2bc2ac6..5884d6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1726,8 +1726,10 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
 			zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
 			preferred_zone, migratetype);

-		if (!page && gfp_mask & __GFP_NOFAIL)
-			congestion_wait(BLK_RW_ASYNC, HZ/50);
+		if (!page && gfp_mask & __GFP_NOFAIL) {
+			set_current_state(TASK_INTERRUPTIBLE);
+			schedule_timeout(HZ/50);
+		}
 	} while (!page && (gfp_mask & __GFP_NOFAIL));

 	return page;
@@ -1898,7 +1900,8 @@ rebalance:
 	pages_reclaimed += did_some_progress;
 	if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) {
 		/* Wait for some write requests to complete then retry */
-		congestion_wait(BLK_RW_ASYNC, HZ/50);
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(HZ/50);
 		goto rebalance;
 	}

-- 
1.6.3.3

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04  1:18                       ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-04  1:18 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Nov 04, 2009 at 01:01:46AM +0100, Frans Pop wrote:
> On Tuesday 03 November 2009, you wrote:
> > > With a representative test I get 0 for kswapd_slept_prematurely.
> > > Tested with .32-rc6 + patches 1-3 + this patch.
> >
> > Assuming the problem actually reproduced, can you still retest with the
> 
> Yes, it does.
> 
> > patch I posted as a follow-up and see if fast or slow premature sleeps
> > are happening and if the problem still occurs please? It's still
> > possible with the patch as-is could be timing related. After I posted
> > this patch, I continued testing and found I could get counts fairly
> > reliably if kswapd was calling printk() before making the premature
> > check so the window appears to be very small.
> 
> Tested with .32-rc6 and .31.1. With that follow-up patch I still get 
> freezes and SKB allocation errors. And I don't get anywhere near the fast, 
> smooth and reliable behavior I get when I do the congestion_wait() 
> reverts.
> 

Yeah. What it really appears to get down to is that the congestion changes
have altered the timing in a manner that frankly, I'm not sure how to define
as "good" or "bad". The congestion changes in themselves appear sane and
the amount of time callers sleep appears better but the actual result sucks
for the constant stream of high-order allocations that are occuring from
the driver. This abuse of high-order atomics has been addressed but it's
showing up in other horrible ways.

> The new case does trigger as you can see below, but I'm afraid I don't see 
> it making any significant difference for my test. Hope the data is still 
> useful for you.
> 
> From vmstat for .32-rc6:
> kswapd_highorder_rewakeup 8
> kswapd_slept_prematurely_fast 329
> kswapd_slept_prematurely_slow 55
> 
> From vmstat for .31.1:
> kswapd_highorder_rewakeup 20
> kswapd_slept_prematurely_fast 307
> kswapd_slept_prematurely_slow 105
> 

This is useful.

The high premature_fast shows that after kswapd apparently finishes its work,
the high waterwater marks are being breached very quickly (the fast counter
being positive). The "slow" counter is even worse. Your machine is getting
from the high to low watermark quickly without kswapd noticing and processes
depending on the atomics are not waiting long enough.

> If you'd like me to test with the congestion_wait() revert on top of this 
> for comparison, please let me know.
> 

No, there is resistance to rolling back the congestion_wait() changes from
what I gather because they were introduced for sane reasons. The consequence
is just that the reliability of high-order atomics are impacted because more
processes are making forward progress where previously they would have waited
until kswapd had done work. Your driver has already been fixed in this regard
and maybe it's a case that the other atomic users simply have to be fixed to
"not do that".

> P.S. Your follow-up patch did not apply cleanly on top of the debug one as 
> you seem to have made some changes between posting them (dropped kswapd_ 
> from the sleeping_prematurely() function name and added a comment).
> 

Sorry about that. Clearly I've gotten out of sync slightly with the
patchset I'm testing and basing upon as opposed to what I'm posting
here.

Here is yet another patch to be applied on top of the rest of the
patches. Sorry about any typo's, I was out for a friends birthday and I
have a few beers on me but it boots on qemu and passes basic stress tests
at least. The intention of the patch is to delay high-order allocations of
those that can wait for kswapd to do work in parallel. It will only help
the case where there are a mix of high-order allocations that can sleep and
those that can't. Because the main burst of your allocations appear to be
high-order atomics, it might not help but it might delay order-1 allocations
due to many instances of fork() in your workload if 8K stacks are being used.

==== CUT HERE ====
page allocator: Sleep where the intention was to sleep instead of waiting on congestion

At two points during page allocation, it is possible for the process to
sleep for a short interval depending on congestion. There is some anedotal
evidence that since 2.6.31-rc1, the processes are sleeping for less time
than before as the congestion_wait() logic has improved.

However, one consequence of this is that processes are waking up too
quickly, finding that forward progress is still difficult and failing
too early. This patch causes processes to sleep for a fixed interval
instead of sleeping depending on congestion.

With this patch applied, the number of premature sleeps of kswapd as
measured by kswapd_slept_prematurely is reduced while running a stress
test based on parallel executions of dd under QEMU. Furthermore, under
the stress test, the number of oom-killer occurances is drastically
reduced.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 mm/page_alloc.c |    9 ++++++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 2bc2ac6..5884d6f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1726,8 +1726,10 @@ __alloc_pages_high_priority(gfp_t gfp_mask, unsigned int order,
 			zonelist, high_zoneidx, ALLOC_NO_WATERMARKS,
 			preferred_zone, migratetype);

-		if (!page && gfp_mask & __GFP_NOFAIL)
-			congestion_wait(BLK_RW_ASYNC, HZ/50);
+		if (!page && gfp_mask & __GFP_NOFAIL) {
+			set_current_state(TASK_INTERRUPTIBLE);
+			schedule_timeout(HZ/50);
+		}
 	} while (!page && (gfp_mask & __GFP_NOFAIL));

 	return page;
@@ -1898,7 +1900,8 @@ rebalance:
 	pages_reclaimed += did_some_progress;
 	if (should_alloc_retry(gfp_mask, order, pages_reclaimed)) {
 		/* Wait for some write requests to complete then retry */
-		congestion_wait(BLK_RW_ASYNC, HZ/50);
+		set_current_state(TASK_INTERRUPTIBLE);
+		schedule_timeout(HZ/50);
 		goto rebalance;
 	}

-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-11-03 17:10                     ` Christoph Lameter
  (?)
@ 2009-11-04  1:46                       ` David Rientjes
  -1 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-11-04  1:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Pavel Machek, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Tue, 3 Nov 2009, Christoph Lameter wrote:

> If you dont know what "realtime" is then we cannot really implement
> "realtime" behavior in the page allocator.
> 

It's not intended to implement realtime behavior!

This is a convenience given to rt_task() to reduce latency when possible 
by avoiding direct reclaim and allowing background reclaim to bring us 
back over the low watermark.

That's been in the page allocator for over four years and is not intended 
to implement realtime behavior.  These tasks do not rely on memory 
reserves being available.

Is it really hard to believe that tasks with such high priorities are 
given an exemption in the page allocator so that we reclaim in the 
background instead of directly?

I hope we can move this to another thread if people would like to remove 
this exemption completely instead of talking about this trivial fix, which 
I doubt there's any objection to.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-04  1:46                       ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-11-04  1:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Pavel Machek, Andrew Morton, Mel Gorman, stable,
	linux-kernel, linux-mm, Frans Pop, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Stephan von Krawczynski, kernel-testers

On Tue, 3 Nov 2009, Christoph Lameter wrote:

> If you dont know what "realtime" is then we cannot really implement
> "realtime" behavior in the page allocator.
> 

It's not intended to implement realtime behavior!

This is a convenience given to rt_task() to reduce latency when possible 
by avoiding direct reclaim and allowing background reclaim to bring us 
back over the low watermark.

That's been in the page allocator for over four years and is not intended 
to implement realtime behavior.  These tasks do not rely on memory 
reserves being available.

Is it really hard to believe that tasks with such high priorities are 
given an exemption in the page allocator so that we reclaim in the 
background instead of directly?

I hope we can move this to another thread if people would like to remove 
this exemption completely instead of talking about this trivial fix, which 
I doubt there's any objection to.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-04  1:46                       ` David Rientjes
  0 siblings, 0 replies; 115+ messages in thread
From: David Rientjes @ 2009-11-04  1:46 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Rik van Riel, Pavel Machek, Andrew Morton, Mel Gorman,
	stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Stephan von Krawczynski,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Tue, 3 Nov 2009, Christoph Lameter wrote:

> If you dont know what "realtime" is then we cannot really implement
> "realtime" behavior in the page allocator.
> 

It's not intended to implement realtime behavior!

This is a convenience given to rt_task() to reduce latency when possible 
by avoiding direct reclaim and allowing background reclaim to bring us 
back over the low watermark.

That's been in the page allocator for over four years and is not intended 
to implement realtime behavior.  These tasks do not rely on memory 
reserves being available.

Is it really hard to believe that tasks with such high priorities are 
given an exemption in the page allocator so that we reclaim in the 
background instead of directly?

I hope we can move this to another thread if people would like to remove 
this exemption completely instead of talking about this trivial fix, which 
I doubt there's any objection to.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-04  1:18                       ` Mel Gorman
  (?)
@ 2009-11-04  2:05                         ` Frans Pop
  -1 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-04  2:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wednesday 04 November 2009, Mel Gorman wrote:
> > If you'd like me to test with the congestion_wait() revert on top of
> > this for comparison, please let me know.
>
> No, there is resistance to rolling back the congestion_wait() changes

I've never promoted the revert as a solution. It just shows the cause of a 
regression.

> from what I gather because they were introduced for sane reasons. The
> consequence is just that the reliability of high-order atomics are
> impacted because more processes are making forward progress where
> previously they would have waited until kswapd had done work. Your
> driver has already been fixed in this regard and maybe it's a case that
> the other atomic users simply have to be fixed to "not do that".

The problem is that although my driver has been fixed so that it no longer 
causes the SKB allocation errors, the also rather serious behavior change 
where due to swapping my 3rd gitk takes up to twice as long to load with 
desktop freezes of up 45 seconds or so is still there.

Although that's somewhat separate from the issue that started this whole 
investigation, I still feel that should be sorted out as well.

The congestion_wait() change, even if theoretically valid, introduced a 
very real regression IMO. Such long desktop freezes during swapping should 
be avoided; .30 and earlier simply behaved a whole lot better in the same 
situation.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04  2:05                         ` Frans Pop
  0 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-04  2:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wednesday 04 November 2009, Mel Gorman wrote:
> > If you'd like me to test with the congestion_wait() revert on top of
> > this for comparison, please let me know.
>
> No, there is resistance to rolling back the congestion_wait() changes

I've never promoted the revert as a solution. It just shows the cause of a 
regression.

> from what I gather because they were introduced for sane reasons. The
> consequence is just that the reliability of high-order atomics are
> impacted because more processes are making forward progress where
> previously they would have waited until kswapd had done work. Your
> driver has already been fixed in this regard and maybe it's a case that
> the other atomic users simply have to be fixed to "not do that".

The problem is that although my driver has been fixed so that it no longer 
causes the SKB allocation errors, the also rather serious behavior change 
where due to swapping my 3rd gitk takes up to twice as long to load with 
desktop freezes of up 45 seconds or so is still there.

Although that's somewhat separate from the issue that started this whole 
investigation, I still feel that should be sorted out as well.

The congestion_wait() change, even if theoretically valid, introduced a 
very real regression IMO. Such long desktop freezes during swapping should 
be avoided; .30 and earlier simply behaved a whole lot better in the same 
situation.

Cheers,
FJP

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04  2:05                         ` Frans Pop
  0 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-04  2:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Rik van Riel, Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List

On Wednesday 04 November 2009, Mel Gorman wrote:
> > If you'd like me to test with the congestion_wait() revert on top of
> > this for comparison, please let me know.
>
> No, there is resistance to rolling back the congestion_wait() changes

I've never promoted the revert as a solution. It just shows the cause of a 
regression.

> from what I gather because they were introduced for sane reasons. The
> consequence is just that the reliability of high-order atomics are
> impacted because more processes are making forward progress where
> previously they would have waited until kswapd had done work. Your
> driver has already been fixed in this regard and maybe it's a case that
> the other atomic users simply have to be fixed to "not do that".

The problem is that although my driver has been fixed so that it no longer 
causes the SKB allocation errors, the also rather serious behavior change 
where due to swapping my 3rd gitk takes up to twice as long to load with 
desktop freezes of up 45 seconds or so is still there.

Although that's somewhat separate from the issue that started this whole 
investigation, I still feel that should be sorted out as well.

The congestion_wait() change, even if theoretically valid, introduced a 
very real regression IMO. Such long desktop freezes during swapping should 
be avoided; .30 and earlier simply behaved a whole lot better in the same 
situation.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-04  1:18                       ` Mel Gorman
  (?)
@ 2009-11-04  2:08                         ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-04  2:08 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Nov 04, 2009 at 01:18:11AM +0000, Mel Gorman wrote:
> > From vmstat for .31.1:
> > kswapd_highorder_rewakeup 20
> > kswapd_slept_prematurely_fast 307
> > kswapd_slept_prematurely_slow 105
> > 
> 
> This is useful.
> 
> The high premature_fast shows that after kswapd apparently finishes its work,
> the high waterwater marks are being breached very quickly (the fast counter
> being positive). The "slow" counter is even worse. Your machine is getting
> from the high to low watermark quickly without kswapd noticing and processes
> depending on the atomics are not waiting long enough.
> 

Sorry, that should have been

 The premature_fast shows that after kswapd finishes its work, the low
 waterwater marks are being breached very quickly as kswapd is being rewoken
 up. The "slow" counter is slightly worse. Just after kswapd sleeps, the
 high watermark is being breached again.

Either counter being positive implies that kswapd is having to do too
much work while parallel allocators are chewing up the high-order pages
quickly. The effect of the patch should still be to delay the rate high-order
pages are consumed but it assumes there are enough high-order requests that
can go to sleep.

Mentioning sleep, I'm going to get some.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04  2:08                         ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-04  2:08 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Nov 04, 2009 at 01:18:11AM +0000, Mel Gorman wrote:
> > From vmstat for .31.1:
> > kswapd_highorder_rewakeup 20
> > kswapd_slept_prematurely_fast 307
> > kswapd_slept_prematurely_slow 105
> > 
> 
> This is useful.
> 
> The high premature_fast shows that after kswapd apparently finishes its work,
> the high waterwater marks are being breached very quickly (the fast counter
> being positive). The "slow" counter is even worse. Your machine is getting
> from the high to low watermark quickly without kswapd noticing and processes
> depending on the atomics are not waiting long enough.
> 

Sorry, that should have been

 The premature_fast shows that after kswapd finishes its work, the low
 waterwater marks are being breached very quickly as kswapd is being rewoken
 up. The "slow" counter is slightly worse. Just after kswapd sleeps, the
 high watermark is being breached again.

Either counter being positive implies that kswapd is having to do too
much work while parallel allocators are chewing up the high-order pages
quickly. The effect of the patch should still be to delay the rate high-order
pages are consumed but it assumes there are enough high-order requests that
can go to sleep.

Mentioning sleep, I'm going to get some.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04  2:08                         ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-04  2:08 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Rik van Riel, Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List

On Wed, Nov 04, 2009 at 01:18:11AM +0000, Mel Gorman wrote:
> > From vmstat for .31.1:
> > kswapd_highorder_rewakeup 20
> > kswapd_slept_prematurely_fast 307
> > kswapd_slept_prematurely_slow 105
> > 
> 
> This is useful.
> 
> The high premature_fast shows that after kswapd apparently finishes its work,
> the high waterwater marks are being breached very quickly (the fast counter
> being positive). The "slow" counter is even worse. Your machine is getting
> from the high to low watermark quickly without kswapd noticing and processes
> depending on the atomics are not waiting long enough.
> 

Sorry, that should have been

 The premature_fast shows that after kswapd finishes its work, the low
 waterwater marks are being breached very quickly as kswapd is being rewoken
 up. The "slow" counter is slightly worse. Just after kswapd sleeps, the
 high watermark is being breached again.

Either counter being positive implies that kswapd is having to do too
much work while parallel allocators are chewing up the high-order pages
quickly. The effect of the patch should still be to delay the rate high-order
pages are consumed but it assumes there are enough high-order requests that
can go to sleep.

Mentioning sleep, I'm going to get some.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-04  2:05                         ` Frans Pop
@ 2009-11-04  2:08                           ` Frans Pop
  -1 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-04  2:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wednesday 04 November 2009, Frans Pop wrote:
> The congestion_wait() change, even if theoretically valid, introduced a
> very real regression IMO. Such long desktop freezes during swapping
> should be avoided; .30 and earlier simply behaved a whole lot better in
> the same situation.

I'll see if your new patch helps for this case.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04  2:08                           ` Frans Pop
  0 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-04  2:08 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wednesday 04 November 2009, Frans Pop wrote:
> The congestion_wait() change, even if theoretically valid, introduced a
> very real regression IMO. Such long desktop freezes during swapping
> should be avoided; .30 and earlier simply behaved a whole lot better in
> the same situation.

I'll see if your new patch helps for this case.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-11-04  1:46                       ` David Rientjes
  (?)
@ 2009-11-04  9:01                         ` Pavel Machek
  -1 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-04  9:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Rik van Riel, Andrew Morton, Mel Gorman,
	stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Stephan von Krawczynski, kernel-testers


> I hope we can move this to another thread if people would like to remove 
> this exemption completely instead of talking about this trivial fix, which 
> I doubt there's any objection to.

I'm arguing that this "trivial fix" is wrong, and that you should just
remove those two lines.

If going into reserves from interrupts hurts, doing that from task
context will hurt, too. "realtime" task should not be normally allowed
to "hurt" the system like that.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-04  9:01                         ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-04  9:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Rik van Riel, Andrew Morton, Mel Gorman,
	stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Stephan von Krawczynski, kernel-testers


> I hope we can move this to another thread if people would like to remove 
> this exemption completely instead of talking about this trivial fix, which 
> I doubt there's any objection to.

I'm arguing that this "trivial fix" is wrong, and that you should just
remove those two lines.

If going into reserves from interrupts hurts, doing that from task
context will hurt, too. "realtime" task should not be normally allowed
to "hurt" the system like that.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-04  9:01                         ` Pavel Machek
  0 siblings, 0 replies; 115+ messages in thread
From: Pavel Machek @ 2009-11-04  9:01 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Rik van Riel, Andrew Morton, Mel Gorman,
	stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Stephan von Krawczynski,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA


> I hope we can move this to another thread if people would like to remove 
> this exemption completely instead of talking about this trivial fix, which 
> I doubt there's any objection to.

I'm arguing that this "trivial fix" is wrong, and that you should just
remove those two lines.

If going into reserves from interrupts hurts, doing that from task
context will hurt, too. "realtime" task should not be normally allowed
to "hurt" the system like that.
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-04  2:05                         ` Frans Pop
  (?)
@ 2009-11-04 15:48                           ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-04 15:48 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Nov 04, 2009 at 03:05:55AM +0100, Frans Pop wrote:
> On Wednesday 04 November 2009, Mel Gorman wrote:
> > > If you'd like me to test with the congestion_wait() revert on top of
> > > this for comparison, please let me know.
> >
> > No, there is resistance to rolling back the congestion_wait() changes
> 
> I've never promoted the revert as a solution. It just shows the cause of a 
> regression.
> 

Yeah, I still haven't managed to figure out what exactly is wrong in there
other than "something changed with timing" and writeback behaves differently. I
still don't know the why of it because I haven't digged into that area in
depth in the past and failed at reproducing this. "My desktop is fine" :/

> > from what I gather because they were introduced for sane reasons. The
> > consequence is just that the reliability of high-order atomics are
> > impacted because more processes are making forward progress where
> > previously they would have waited until kswapd had done work. Your
> > driver has already been fixed in this regard and maybe it's a case that
> > the other atomic users simply have to be fixed to "not do that".
> 
> The problem is that although my driver has been fixed so that it no longer 
> causes the SKB allocation errors, the also rather serious behavior change 
> where due to swapping my 3rd gitk takes up to twice as long to load with 
> desktop freezes of up 45 seconds or so is still there.
> 
> Although that's somewhat separate from the issue that started this whole 
> investigation, I still feel that should be sorted out as well.
> 

You're right. That behaviour sucks.

> The congestion_wait() change, even if theoretically valid, introduced a 
> very real regression IMO. Such long desktop freezes during swapping should 
> be avoided; .30 and earlier simply behaved a whole lot better in the same 
> situation.
> 

Agreed. I'll start from scratch again trying to reproduce what you're seeing
locally. I'll try breaking my network card so that it's making high-order
atomics and see where I get. Machines that were previously tied up are now
free so I might have a better chance.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04 15:48                           ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-04 15:48 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wed, Nov 04, 2009 at 03:05:55AM +0100, Frans Pop wrote:
> On Wednesday 04 November 2009, Mel Gorman wrote:
> > > If you'd like me to test with the congestion_wait() revert on top of
> > > this for comparison, please let me know.
> >
> > No, there is resistance to rolling back the congestion_wait() changes
> 
> I've never promoted the revert as a solution. It just shows the cause of a 
> regression.
> 

Yeah, I still haven't managed to figure out what exactly is wrong in there
other than "something changed with timing" and writeback behaves differently. I
still don't know the why of it because I haven't digged into that area in
depth in the past and failed at reproducing this. "My desktop is fine" :/

> > from what I gather because they were introduced for sane reasons. The
> > consequence is just that the reliability of high-order atomics are
> > impacted because more processes are making forward progress where
> > previously they would have waited until kswapd had done work. Your
> > driver has already been fixed in this regard and maybe it's a case that
> > the other atomic users simply have to be fixed to "not do that".
> 
> The problem is that although my driver has been fixed so that it no longer 
> causes the SKB allocation errors, the also rather serious behavior change 
> where due to swapping my 3rd gitk takes up to twice as long to load with 
> desktop freezes of up 45 seconds or so is still there.
> 
> Although that's somewhat separate from the issue that started this whole 
> investigation, I still feel that should be sorted out as well.
> 

You're right. That behaviour sucks.

> The congestion_wait() change, even if theoretically valid, introduced a 
> very real regression IMO. Such long desktop freezes during swapping should 
> be avoided; .30 and earlier simply behaved a whole lot better in the same 
> situation.
> 

Agreed. I'll start from scratch again trying to reproduce what you're seeing
locally. I'll try breaking my network card so that it's making high-order
atomics and see where I get. Machines that were previously tied up are now
free so I might have a better chance.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04 15:48                           ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-04 15:48 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Rik van Riel, Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List

On Wed, Nov 04, 2009 at 03:05:55AM +0100, Frans Pop wrote:
> On Wednesday 04 November 2009, Mel Gorman wrote:
> > > If you'd like me to test with the congestion_wait() revert on top of
> > > this for comparison, please let me know.
> >
> > No, there is resistance to rolling back the congestion_wait() changes
> 
> I've never promoted the revert as a solution. It just shows the cause of a 
> regression.
> 

Yeah, I still haven't managed to figure out what exactly is wrong in there
other than "something changed with timing" and writeback behaves differently. I
still don't know the why of it because I haven't digged into that area in
depth in the past and failed at reproducing this. "My desktop is fine" :/

> > from what I gather because they were introduced for sane reasons. The
> > consequence is just that the reliability of high-order atomics are
> > impacted because more processes are making forward progress where
> > previously they would have waited until kswapd had done work. Your
> > driver has already been fixed in this regard and maybe it's a case that
> > the other atomic users simply have to be fixed to "not do that".
> 
> The problem is that although my driver has been fixed so that it no longer 
> causes the SKB allocation errors, the also rather serious behavior change 
> where due to swapping my 3rd gitk takes up to twice as long to load with 
> desktop freezes of up 45 seconds or so is still there.
> 
> Although that's somewhat separate from the issue that started this whole 
> investigation, I still feel that should be sorted out as well.
> 

You're right. That behaviour sucks.

> The congestion_wait() change, even if theoretically valid, introduced a 
> very real regression IMO. Such long desktop freezes during swapping should 
> be avoided; .30 and earlier simply behaved a whole lot better in the same 
> situation.
> 

Agreed. I'll start from scratch again trying to reproduce what you're seeing
locally. I'll try breaking my network card so that it's making high-order
atomics and see where I get. Machines that were previously tied up are now
free so I might have a better chance.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
  2009-11-04 15:48                           ` Mel Gorman
  (?)
@ 2009-11-04 20:57                             ` Frans Pop
  -1 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-04 20:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wednesday 04 November 2009, Mel Gorman wrote:
> Agreed. I'll start from scratch again trying to reproduce what you're
> seeing locally. I'll try breaking my network card so that it's making
> high-order atomics and see where I get. Machines that were previously
> tied up are now free so I might have a better chance.

Hmmm. IMO you're looking at this from the wrong side. You don't need to 
break your network card because the SKB problems are only the *result* of 
the change, not the *cause*.

I can reproduce the desktop freeze just as easily when I'm using wired 
(e1000e) networking and when I'm not streaming music at all, but just 
loading that 3rd gitk instance.

So it's not
  "I get a desktop freeze because of high order allocations from wireless
   during swapping",
but
  "during very heavy swapping on a system with an encrypted LMV volume
   group containing (encrypted) fs and (encrytpted) swap, the swapping
   gets into some semi-stalled state *causing* a long desktop freeze
   and, if there also happens to be some process trying higher order
   allocations, failures of those allocations".

I have tried to indicate this in the past, but it may have gotten lost in 
the complexity of the issue.

An important clue is still IMO that during the first part of the freezes 
there is very little disk activity for a long time. Why would that be when 
the system is supposed to be swapping like hell?

Cheers,
FJP

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04 20:57                             ` Frans Pop
  0 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-04 20:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Kernel Testers List

On Wednesday 04 November 2009, Mel Gorman wrote:
> Agreed. I'll start from scratch again trying to reproduce what you're
> seeing locally. I'll try breaking my network card so that it's making
> high-order atomics and see where I get. Machines that were previously
> tied up are now free so I might have a better chance.

Hmmm. IMO you're looking at this from the wrong side. You don't need to 
break your network card because the SKB problems are only the *result* of 
the change, not the *cause*.

I can reproduce the desktop freeze just as easily when I'm using wired 
(e1000e) networking and when I'm not streaming music at all, but just 
loading that 3rd gitk instance.

So it's not
  "I get a desktop freeze because of high order allocations from wireless
   during swapping",
but
  "during very heavy swapping on a system with an encrypted LMV volume
   group containing (encrypted) fs and (encrytpted) swap, the swapping
   gets into some semi-stalled state *causing* a long desktop freeze
   and, if there also happens to be some process trying higher order
   allocations, failures of those allocations".

I have tried to indicate this in the past, but it may have gotten lost in 
the complexity of the issue.

An important clue is still IMO that during the first part of the freezes 
there is very little disk activity for a long time. Why would that be when 
the system is supposed to be swapping like hell?

Cheers,
FJP

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit
@ 2009-11-04 20:57                             ` Frans Pop
  0 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-04 20:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Rik van Riel, Christoph Lameter, Stephan von Krawczynski,
	Kernel Testers List

On Wednesday 04 November 2009, Mel Gorman wrote:
> Agreed. I'll start from scratch again trying to reproduce what you're
> seeing locally. I'll try breaking my network card so that it's making
> high-order atomics and see where I get. Machines that were previously
> tied up are now free so I might have a better chance.

Hmmm. IMO you're looking at this from the wrong side. You don't need to 
break your network card because the SKB problems are only the *result* of 
the change, not the *cause*.

I can reproduce the desktop freeze just as easily when I'm using wired 
(e1000e) networking and when I'm not streaming music at all, but just 
loading that 3rd gitk instance.

So it's not
  "I get a desktop freeze because of high order allocations from wireless
   during swapping",
but
  "during very heavy swapping on a system with an encrypted LMV volume
   group containing (encrypted) fs and (encrytpted) swap, the swapping
   gets into some semi-stalled state *causing* a long desktop freeze
   and, if there also happens to be some process trying higher order
   allocations, failures of those allocations".

I have tried to indicate this in the past, but it may have gotten lost in 
the complexity of the issue.

An important clue is still IMO that during the first part of the freezes 
there is very little disk activity for a long time. Why would that be when 
the system is supposed to be swapping like hell?

Cheers,
FJP

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit (data on latencies available)
  2009-11-04 20:57                             ` Frans Pop
  (?)
@ 2009-11-05 16:48                               ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-05 16:48 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Jens Axboe, Chris Mason,
	Kernel Testers List

On Wed, Nov 04, 2009 at 09:57:21PM +0100, Frans Pop wrote:
> On Wednesday 04 November 2009, Mel Gorman wrote:
> > Agreed. I'll start from scratch again trying to reproduce what you're
> > seeing locally. I'll try breaking my network card so that it's making
> > high-order atomics and see where I get. Machines that were previously
> > tied up are now free so I might have a better chance.
> 
> Hmmm. IMO you're looking at this from the wrong side. You don't need to 
> break your network card because the SKB problems are only the *result* of 
> the change, not the *cause*.
> 

They are a symptom though - albeit a dramatic one from the change on
timing.

> I can reproduce the desktop freeze just as easily when I'm using wired 
> (e1000e) networking and when I'm not streaming music at all, but just 
> loading that 3rd gitk instance.
> 

No one likes desktop freezes but it's a bit on the hard side to measure and
reproduce with multiple kernels reliability.  However, I think I might have
something to help this side of things out.

> So it's not
>   "I get a desktop freeze because of high order allocations from wireless
>    during swapping",
> but
>   "during very heavy swapping on a system with an encrypted LMV volume
>    group containing (encrypted) fs and (encrytpted) swap, the swapping
>    gets into some semi-stalled state *causing* a long desktop freeze
>    and, if there also happens to be some process trying higher order
>    allocations, failures of those allocations".
> 

Right, so it's a related problem, but not the root cause.

> I have tried to indicate this in the past, but it may have gotten lost in 
> the complexity of the issue.
> 

I got it all right, but felt that the page allocation problems were both
compounding the problem and easier to measure.

> An important clue is still IMO that during the first part of the freezes 
> there is very little disk activity for a long time. Why would that be when 
> the system is supposed to be swapping like hell?
> 

One possible guess is that the system as a whole decides everything is
congested and waits for something else to make forward progress. I
really think the people who were involved in the writeback changes need
to get in here and help out.

In the interest of getting something more empirical, I sat down from scratch
with the view to recreating your case and I believe I was successful. I was
able to reproduce your problem after a fashion and generate some figures -
crucially including some latency figures.

I don't have a fix for this, but I'm hoping someone will follow the notes
to recreate the reproduction case and add their own instrumentation to pin
this down.

Steps to setup and reproduce are;

1. X86-64 AMD Phenom booted with mem=512MB. Expectation is any machine
	will do as long as it's 512MB for the size of workload involved.

2. A crypted work partition and swap partition was created. On my
   own setup, I gave no passphrase so it'd be easier to activate without
   interaction but there are multiple options. I should have taken better
   notes but the setup goes something like this;

	cryptsetup create -y crypt-partition /dev/sda5
	pvcreate /dev/mapper/crypt-partition
	vgcreate crypt-volume /dev/mapper/crypt-partition
	lvcreate -L 5G -n crypt-logical crypt-volume
	lvcreate -L 2G -n crypt-swap crypt-volume
	mkfs -t ext3 /dev/crypt-volume/crypt-logical
	mkswap /dev/crypt-volume/crypt-swap

3. With the partition mounted on /scratch, I
	cd /scratch
	mkdir music
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6

4. On a normal partition, I expand a tarball containing test scripts available at
	http://www.csn.ul.ie/~mel/postings/latency-20091105/latency-tests-with-results.tar.gz

	There are two helper programs that run as part of the test - a fake
	music player and a fake gitk.

	The fake music player uses rsync with bandwidth limits to start
	downloading a music folder from another machine. It's bandwidth limited
	to simulate playing music over NFS. I believe it generates similar if
	not exact traffic to a music player. It occured to be afterwards that
	if one patched ogg123 to print a line when 1/10th of a seconds worth
	of music was played, it could be used as an indirect measure of desktop
	interactivity and help pin down pesky "audio skips" bug reports.

	The fake gitk is based on observing roughly what gitk does using
	strace. It loads all the logs into a large buffer and then builds a
	very basic hash map of parent to child commits.  The data is stored
	because it was insufficient just to read the logs. It had to be kept in
	an in-memory buffer to generate swap.  It then discards the data and
	does it over again in a loop for a small number of times so the test
	is finite. When it processes a large number of commits, it outputs
	a line to stdout so that stalls can be observed. Ideal behaviour is
	that commits are read at a constant rate and latencies look flat.

	Output from the two programs is piped through another script -
	latency-output. It records how far into the test it was when the
	line was outputted and what the latency was since the last line
	appeared. The latency should always be very smooth. Because pipes
	buffer IO, they are all run by expect_unbuffered which is available
	from expect-dev on Debian at least.

	All the tests are driven via run-test.sh. While the tests run,
	it records the kern.log to track page allocation failures, records
	nr_writeback at regular intervals and tracks Page IO and Swap IO.

5. For running an actual test, a kernel is built, booted, the
	crypted partition activated, lvm restarted,
	/dev/crypt-volume/crypt-logical mounted on /scratch, all
	swap partitions turned off and then the swap partition on
	/dev/crypt-volume/crypt-swap activated. I then run run-test.sh from
	the tarball

6. I tested kernels 2.6.30, 2.6.31, 2.6.32-rc6,
	2.6.32-rc6-revert-8aa7e847, 2.6.32-rc6-patches123 where patches123
	are the patches in this thread and 2.6.32-rc6-patches45 which include
	the account patch and a delay for direct reclaimers posted within
	this thread. To simulate the wireless network card, I patched skbuff
	on all kernels to always allocate at least order-2. However, the
	latencies are expected to occur without order-2 atomic allocations
	from network being involved.

The tarball contains the scripts I used, generated graphs and the raw
data. Broadly speaking;
	2.6.30 was fine with rare fails although I did trigger page
		allocation failures during at least one test
	2.6.31 was mostly fine with occasional fails both ok latency-wise
	2.6.32-rc6 sucked with multiple failures and large latencies. On
		a few occasions, it's possible for this kernel to get into
		a page allocation failure lockup. I left one running and
		it was still locked up spewing out error messages 8 hours
		later. i.e. it's possible to almost live-lock this kernel
		using this workload
	2.6.32-rc6-revert-8aa7e847 smooths out the latencies but is not great.
		I suspect it made more a difference to 2.6.31 than it
		does to mainline

	2.6.32-rc6-patches123 help a little with latencies and has fewer
	failures.
		More importantly, the failures are hard to trigger. It was
		actually rare for a failure to occur. It just happened to
		occur on the final set of results I gathered so I think that's
		important. It's also important that they bring the allocator
		more in line with 2.6.30 behaviour. The most important
		contribion of all was that I couldn't live-lock the kernel
		with these patches applied but I can with the vanilla kernel.

	2.6.32-rc6-patches12345 did not significantly help leading me to
		conclude that the congestion_wait() called in the page
		allocator is not significant.

patches123 are the three patches that formed this thread originally.
Patches 4 and 5 are the accounting patch and the one that makes kswapd sleep
for a short interval before rechecking watermarks.

On the latency front, look at

http://www.csn.ul.ie/~mel/postings/latency-20091105/graphs/gitk-latency.ps
http://www.csn.ul.ie/~mel/postings/latency-20091105/graphs/gitk-latency-smooth.ps

Both graphs are based on the same data but the smooth one (plotted with
smooth bezier in gnuplot but otherwise based on the same data) is easier
to read for doing a direct comparison. The gitk-latency.ps is based on how
the fourth instance of fake-gitk was running. Every X number of commits, it
prints out how many commits it processed. It should be able to process them
at a constant rate so the Y bars should be all levelish.  2.6.30 is mostly
low with small spikes and 2.6.31 is not too bad.  However, mainline has
massive stalls evidenced by the sawtooth like pattern where there were big
delays and latencies. It can't be seen in the graph but on a few occasions,
2.6.32-rc6 live-locked in order-2 allocation failures during the test.

It's not super-clear from the IO statistics if IO was really happening or
not during the stalls and I can't hear the disks for activity. All that can
be seen on the graphs is the huge spike on pages queued during a period of
proce3sses being stalled. What can be said is that this is probably very
similar to the desktop freezes Frans sees.

Because of other reports, the slight improvements on latency and the removal
of a possible live-lock situation, I think patches 1-3 and the accounting
patch posted in this thread should go ahead. Patches 1,2 bring allocator
behaviour more in line with 2.6.30 and are a proper fix. Patch 3 makes a lot
of sense when there are a lot of high-order atomics going on so that kswapd
notices as fast as possible that it needs to do other work. The accounting
patch monitors what's going on with patch 3.

Beyond that, independent of any allocation failure problems, desktop
latency problems have been reported and I believe this is what I'm
seeing with the massive latencties and stalled processes. This could
lead to some very nasty bug reports when 2.6.32 comes out.

I'm going to rerun these through a profiler and see if something obvious
pops out and if not, then bisect 2.6.31..2.6.32-rc6. It would be great
if those involved in the IO-related changes could take a look at the
results and try reproducing the problem monitoring what they think is
important.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit (data on latencies available)
@ 2009-11-05 16:48                               ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-05 16:48 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Jens Axboe, Chris Mason,
	Kernel Testers List

On Wed, Nov 04, 2009 at 09:57:21PM +0100, Frans Pop wrote:
> On Wednesday 04 November 2009, Mel Gorman wrote:
> > Agreed. I'll start from scratch again trying to reproduce what you're
> > seeing locally. I'll try breaking my network card so that it's making
> > high-order atomics and see where I get. Machines that were previously
> > tied up are now free so I might have a better chance.
> 
> Hmmm. IMO you're looking at this from the wrong side. You don't need to 
> break your network card because the SKB problems are only the *result* of 
> the change, not the *cause*.
> 

They are a symptom though - albeit a dramatic one from the change on
timing.

> I can reproduce the desktop freeze just as easily when I'm using wired 
> (e1000e) networking and when I'm not streaming music at all, but just 
> loading that 3rd gitk instance.
> 

No one likes desktop freezes but it's a bit on the hard side to measure and
reproduce with multiple kernels reliability.  However, I think I might have
something to help this side of things out.

> So it's not
>   "I get a desktop freeze because of high order allocations from wireless
>    during swapping",
> but
>   "during very heavy swapping on a system with an encrypted LMV volume
>    group containing (encrypted) fs and (encrytpted) swap, the swapping
>    gets into some semi-stalled state *causing* a long desktop freeze
>    and, if there also happens to be some process trying higher order
>    allocations, failures of those allocations".
> 

Right, so it's a related problem, but not the root cause.

> I have tried to indicate this in the past, but it may have gotten lost in 
> the complexity of the issue.
> 

I got it all right, but felt that the page allocation problems were both
compounding the problem and easier to measure.

> An important clue is still IMO that during the first part of the freezes 
> there is very little disk activity for a long time. Why would that be when 
> the system is supposed to be swapping like hell?
> 

One possible guess is that the system as a whole decides everything is
congested and waits for something else to make forward progress. I
really think the people who were involved in the writeback changes need
to get in here and help out.

In the interest of getting something more empirical, I sat down from scratch
with the view to recreating your case and I believe I was successful. I was
able to reproduce your problem after a fashion and generate some figures -
crucially including some latency figures.

I don't have a fix for this, but I'm hoping someone will follow the notes
to recreate the reproduction case and add their own instrumentation to pin
this down.

Steps to setup and reproduce are;

1. X86-64 AMD Phenom booted with mem=512MB. Expectation is any machine
	will do as long as it's 512MB for the size of workload involved.

2. A crypted work partition and swap partition was created. On my
   own setup, I gave no passphrase so it'd be easier to activate without
   interaction but there are multiple options. I should have taken better
   notes but the setup goes something like this;

	cryptsetup create -y crypt-partition /dev/sda5
	pvcreate /dev/mapper/crypt-partition
	vgcreate crypt-volume /dev/mapper/crypt-partition
	lvcreate -L 5G -n crypt-logical crypt-volume
	lvcreate -L 2G -n crypt-swap crypt-volume
	mkfs -t ext3 /dev/crypt-volume/crypt-logical
	mkswap /dev/crypt-volume/crypt-swap

3. With the partition mounted on /scratch, I
	cd /scratch
	mkdir music
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6

4. On a normal partition, I expand a tarball containing test scripts available at
	http://www.csn.ul.ie/~mel/postings/latency-20091105/latency-tests-with-results.tar.gz

	There are two helper programs that run as part of the test - a fake
	music player and a fake gitk.

	The fake music player uses rsync with bandwidth limits to start
	downloading a music folder from another machine. It's bandwidth limited
	to simulate playing music over NFS. I believe it generates similar if
	not exact traffic to a music player. It occured to be afterwards that
	if one patched ogg123 to print a line when 1/10th of a seconds worth
	of music was played, it could be used as an indirect measure of desktop
	interactivity and help pin down pesky "audio skips" bug reports.

	The fake gitk is based on observing roughly what gitk does using
	strace. It loads all the logs into a large buffer and then builds a
	very basic hash map of parent to child commits.  The data is stored
	because it was insufficient just to read the logs. It had to be kept in
	an in-memory buffer to generate swap.  It then discards the data and
	does it over again in a loop for a small number of times so the test
	is finite. When it processes a large number of commits, it outputs
	a line to stdout so that stalls can be observed. Ideal behaviour is
	that commits are read at a constant rate and latencies look flat.

	Output from the two programs is piped through another script -
	latency-output. It records how far into the test it was when the
	line was outputted and what the latency was since the last line
	appeared. The latency should always be very smooth. Because pipes
	buffer IO, they are all run by expect_unbuffered which is available
	from expect-dev on Debian at least.

	All the tests are driven via run-test.sh. While the tests run,
	it records the kern.log to track page allocation failures, records
	nr_writeback at regular intervals and tracks Page IO and Swap IO.

5. For running an actual test, a kernel is built, booted, the
	crypted partition activated, lvm restarted,
	/dev/crypt-volume/crypt-logical mounted on /scratch, all
	swap partitions turned off and then the swap partition on
	/dev/crypt-volume/crypt-swap activated. I then run run-test.sh from
	the tarball

6. I tested kernels 2.6.30, 2.6.31, 2.6.32-rc6,
	2.6.32-rc6-revert-8aa7e847, 2.6.32-rc6-patches123 where patches123
	are the patches in this thread and 2.6.32-rc6-patches45 which include
	the account patch and a delay for direct reclaimers posted within
	this thread. To simulate the wireless network card, I patched skbuff
	on all kernels to always allocate at least order-2. However, the
	latencies are expected to occur without order-2 atomic allocations
	from network being involved.

The tarball contains the scripts I used, generated graphs and the raw
data. Broadly speaking;
	2.6.30 was fine with rare fails although I did trigger page
		allocation failures during at least one test
	2.6.31 was mostly fine with occasional fails both ok latency-wise
	2.6.32-rc6 sucked with multiple failures and large latencies. On
		a few occasions, it's possible for this kernel to get into
		a page allocation failure lockup. I left one running and
		it was still locked up spewing out error messages 8 hours
		later. i.e. it's possible to almost live-lock this kernel
		using this workload
	2.6.32-rc6-revert-8aa7e847 smooths out the latencies but is not great.
		I suspect it made more a difference to 2.6.31 than it
		does to mainline

	2.6.32-rc6-patches123 help a little with latencies and has fewer
	failures.
		More importantly, the failures are hard to trigger. It was
		actually rare for a failure to occur. It just happened to
		occur on the final set of results I gathered so I think that's
		important. It's also important that they bring the allocator
		more in line with 2.6.30 behaviour. The most important
		contribion of all was that I couldn't live-lock the kernel
		with these patches applied but I can with the vanilla kernel.

	2.6.32-rc6-patches12345 did not significantly help leading me to
		conclude that the congestion_wait() called in the page
		allocator is not significant.

patches123 are the three patches that formed this thread originally.
Patches 4 and 5 are the accounting patch and the one that makes kswapd sleep
for a short interval before rechecking watermarks.

On the latency front, look at

http://www.csn.ul.ie/~mel/postings/latency-20091105/graphs/gitk-latency.ps
http://www.csn.ul.ie/~mel/postings/latency-20091105/graphs/gitk-latency-smooth.ps

Both graphs are based on the same data but the smooth one (plotted with
smooth bezier in gnuplot but otherwise based on the same data) is easier
to read for doing a direct comparison. The gitk-latency.ps is based on how
the fourth instance of fake-gitk was running. Every X number of commits, it
prints out how many commits it processed. It should be able to process them
at a constant rate so the Y bars should be all levelish.  2.6.30 is mostly
low with small spikes and 2.6.31 is not too bad.  However, mainline has
massive stalls evidenced by the sawtooth like pattern where there were big
delays and latencies. It can't be seen in the graph but on a few occasions,
2.6.32-rc6 live-locked in order-2 allocation failures during the test.

It's not super-clear from the IO statistics if IO was really happening or
not during the stalls and I can't hear the disks for activity. All that can
be seen on the graphs is the huge spike on pages queued during a period of
proce3sses being stalled. What can be said is that this is probably very
similar to the desktop freezes Frans sees.

Because of other reports, the slight improvements on latency and the removal
of a possible live-lock situation, I think patches 1-3 and the accounting
patch posted in this thread should go ahead. Patches 1,2 bring allocator
behaviour more in line with 2.6.30 and are a proper fix. Patch 3 makes a lot
of sense when there are a lot of high-order atomics going on so that kswapd
notices as fast as possible that it needs to do other work. The accounting
patch monitors what's going on with patch 3.

Beyond that, independent of any allocation failure problems, desktop
latency problems have been reported and I believe this is what I'm
seeing with the massive latencties and stalled processes. This could
lead to some very nasty bug reports when 2.6.32 comes out.

I'm going to rerun these through a profiler and see if something obvious
pops out and if not, then bisect 2.6.31..2.6.32-rc6. It would be great
if those involved in the IO-related changes could take a look at the
results and try reproducing the problem monitoring what they think is
important.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit (data on latencies available)
@ 2009-11-05 16:48                               ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-05 16:48 UTC (permalink / raw)
  To: Frans Pop
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Rik van Riel, Christoph Lameter, Stephan von Krawczynski,
	Jens Axboe, Chris Mason, Kernel Testers List

On Wed, Nov 04, 2009 at 09:57:21PM +0100, Frans Pop wrote:
> On Wednesday 04 November 2009, Mel Gorman wrote:
> > Agreed. I'll start from scratch again trying to reproduce what you're
> > seeing locally. I'll try breaking my network card so that it's making
> > high-order atomics and see where I get. Machines that were previously
> > tied up are now free so I might have a better chance.
> 
> Hmmm. IMO you're looking at this from the wrong side. You don't need to 
> break your network card because the SKB problems are only the *result* of 
> the change, not the *cause*.
> 

They are a symptom though - albeit a dramatic one from the change on
timing.

> I can reproduce the desktop freeze just as easily when I'm using wired 
> (e1000e) networking and when I'm not streaming music at all, but just 
> loading that 3rd gitk instance.
> 

No one likes desktop freezes but it's a bit on the hard side to measure and
reproduce with multiple kernels reliability.  However, I think I might have
something to help this side of things out.

> So it's not
>   "I get a desktop freeze because of high order allocations from wireless
>    during swapping",
> but
>   "during very heavy swapping on a system with an encrypted LMV volume
>    group containing (encrypted) fs and (encrytpted) swap, the swapping
>    gets into some semi-stalled state *causing* a long desktop freeze
>    and, if there also happens to be some process trying higher order
>    allocations, failures of those allocations".
> 

Right, so it's a related problem, but not the root cause.

> I have tried to indicate this in the past, but it may have gotten lost in 
> the complexity of the issue.
> 

I got it all right, but felt that the page allocation problems were both
compounding the problem and easier to measure.

> An important clue is still IMO that during the first part of the freezes 
> there is very little disk activity for a long time. Why would that be when 
> the system is supposed to be swapping like hell?
> 

One possible guess is that the system as a whole decides everything is
congested and waits for something else to make forward progress. I
really think the people who were involved in the writeback changes need
to get in here and help out.

In the interest of getting something more empirical, I sat down from scratch
with the view to recreating your case and I believe I was successful. I was
able to reproduce your problem after a fashion and generate some figures -
crucially including some latency figures.

I don't have a fix for this, but I'm hoping someone will follow the notes
to recreate the reproduction case and add their own instrumentation to pin
this down.

Steps to setup and reproduce are;

1. X86-64 AMD Phenom booted with mem=512MB. Expectation is any machine
	will do as long as it's 512MB for the size of workload involved.

2. A crypted work partition and swap partition was created. On my
   own setup, I gave no passphrase so it'd be easier to activate without
   interaction but there are multiple options. I should have taken better
   notes but the setup goes something like this;

	cryptsetup create -y crypt-partition /dev/sda5
	pvcreate /dev/mapper/crypt-partition
	vgcreate crypt-volume /dev/mapper/crypt-partition
	lvcreate -L 5G -n crypt-logical crypt-volume
	lvcreate -L 2G -n crypt-swap crypt-volume
	mkfs -t ext3 /dev/crypt-volume/crypt-logical
	mkswap /dev/crypt-volume/crypt-swap

3. With the partition mounted on /scratch, I
	cd /scratch
	mkdir music
	git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux-2.6

4. On a normal partition, I expand a tarball containing test scripts available at
	http://www.csn.ul.ie/~mel/postings/latency-20091105/latency-tests-with-results.tar.gz

	There are two helper programs that run as part of the test - a fake
	music player and a fake gitk.

	The fake music player uses rsync with bandwidth limits to start
	downloading a music folder from another machine. It's bandwidth limited
	to simulate playing music over NFS. I believe it generates similar if
	not exact traffic to a music player. It occured to be afterwards that
	if one patched ogg123 to print a line when 1/10th of a seconds worth
	of music was played, it could be used as an indirect measure of desktop
	interactivity and help pin down pesky "audio skips" bug reports.

	The fake gitk is based on observing roughly what gitk does using
	strace. It loads all the logs into a large buffer and then builds a
	very basic hash map of parent to child commits.  The data is stored
	because it was insufficient just to read the logs. It had to be kept in
	an in-memory buffer to generate swap.  It then discards the data and
	does it over again in a loop for a small number of times so the test
	is finite. When it processes a large number of commits, it outputs
	a line to stdout so that stalls can be observed. Ideal behaviour is
	that commits are read at a constant rate and latencies look flat.

	Output from the two programs is piped through another script -
	latency-output. It records how far into the test it was when the
	line was outputted and what the latency was since the last line
	appeared. The latency should always be very smooth. Because pipes
	buffer IO, they are all run by expect_unbuffered which is available
	from expect-dev on Debian at least.

	All the tests are driven via run-test.sh. While the tests run,
	it records the kern.log to track page allocation failures, records
	nr_writeback at regular intervals and tracks Page IO and Swap IO.

5. For running an actual test, a kernel is built, booted, the
	crypted partition activated, lvm restarted,
	/dev/crypt-volume/crypt-logical mounted on /scratch, all
	swap partitions turned off and then the swap partition on
	/dev/crypt-volume/crypt-swap activated. I then run run-test.sh from
	the tarball

6. I tested kernels 2.6.30, 2.6.31, 2.6.32-rc6,
	2.6.32-rc6-revert-8aa7e847, 2.6.32-rc6-patches123 where patches123
	are the patches in this thread and 2.6.32-rc6-patches45 which include
	the account patch and a delay for direct reclaimers posted within
	this thread. To simulate the wireless network card, I patched skbuff
	on all kernels to always allocate at least order-2. However, the
	latencies are expected to occur without order-2 atomic allocations
	from network being involved.

The tarball contains the scripts I used, generated graphs and the raw
data. Broadly speaking;
	2.6.30 was fine with rare fails although I did trigger page
		allocation failures during at least one test
	2.6.31 was mostly fine with occasional fails both ok latency-wise
	2.6.32-rc6 sucked with multiple failures and large latencies. On
		a few occasions, it's possible for this kernel to get into
		a page allocation failure lockup. I left one running and
		it was still locked up spewing out error messages 8 hours
		later. i.e. it's possible to almost live-lock this kernel
		using this workload
	2.6.32-rc6-revert-8aa7e847 smooths out the latencies but is not great.
		I suspect it made more a difference to 2.6.31 than it
		does to mainline

	2.6.32-rc6-patches123 help a little with latencies and has fewer
	failures.
		More importantly, the failures are hard to trigger. It was
		actually rare for a failure to occur. It just happened to
		occur on the final set of results I gathered so I think that's
		important. It's also important that they bring the allocator
		more in line with 2.6.30 behaviour. The most important
		contribion of all was that I couldn't live-lock the kernel
		with these patches applied but I can with the vanilla kernel.

	2.6.32-rc6-patches12345 did not significantly help leading me to
		conclude that the congestion_wait() called in the page
		allocator is not significant.

patches123 are the three patches that formed this thread originally.
Patches 4 and 5 are the accounting patch and the one that makes kswapd sleep
for a short interval before rechecking watermarks.

On the latency front, look at

http://www.csn.ul.ie/~mel/postings/latency-20091105/graphs/gitk-latency.ps
http://www.csn.ul.ie/~mel/postings/latency-20091105/graphs/gitk-latency-smooth.ps

Both graphs are based on the same data but the smooth one (plotted with
smooth bezier in gnuplot but otherwise based on the same data) is easier
to read for doing a direct comparison. The gitk-latency.ps is based on how
the fourth instance of fake-gitk was running. Every X number of commits, it
prints out how many commits it processed. It should be able to process them
at a constant rate so the Y bars should be all levelish.  2.6.30 is mostly
low with small spikes and 2.6.31 is not too bad.  However, mainline has
massive stalls evidenced by the sawtooth like pattern where there were big
delays and latencies. It can't be seen in the graph but on a few occasions,
2.6.32-rc6 live-locked in order-2 allocation failures during the test.

It's not super-clear from the IO statistics if IO was really happening or
not during the stalls and I can't hear the disks for activity. All that can
be seen on the graphs is the huge spike on pages queued during a period of
proce3sses being stalled. What can be said is that this is probably very
similar to the desktop freezes Frans sees.

Because of other reports, the slight improvements on latency and the removal
of a possible live-lock situation, I think patches 1-3 and the accounting
patch posted in this thread should go ahead. Patches 1,2 bring allocator
behaviour more in line with 2.6.30 and are a proper fix. Patch 3 makes a lot
of sense when there are a lot of high-order atomics going on so that kswapd
notices as fast as possible that it needs to do other work. The accounting
patch monitors what's going on with patch 3.

Beyond that, independent of any allocation failure problems, desktop
latency problems have been reported and I believe this is what I'm
seeing with the massive latencties and stalled processes. This could
lead to some very nasty bug reports when 2.6.32 comes out.

I'm going to rerun these through a profiler and see if something obvious
pops out and if not, then bisect 2.6.31..2.6.32-rc6. It would be great
if those involved in the IO-related changes could take a look at the
results and try reproducing the problem monitoring what they think is
important.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
  2009-11-04  9:01                         ` Pavel Machek
  (?)
@ 2009-11-09 10:11                           ` Mel Gorman
  -1 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-09 10:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Christoph Lameter, Rik van Riel, Andrew Morton,
	stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Stephan von Krawczynski, kernel-testers

On Wed, Nov 04, 2009 at 10:01:40AM +0100, Pavel Machek wrote:
> 
> > I hope we can move this to another thread if people would like to remove 
> > this exemption completely instead of talking about this trivial fix, which 
> > I doubt there's any objection to.
> 
> I'm arguing that this "trivial fix" is wrong, and that you should just
> remove those two lines.
> 
> If going into reserves from interrupts hurts, doing that from task
> context will hurt, too. "realtime" task should not be normally allowed
> to "hurt" the system like that.
> 									Pavel

As David points out, it has been the behaviour of the system for 4 years
and removing it should be made as a separate decision and not in the
guise of a fix. In the particular case causing concern, there are a lot
more allocations from interrupt due to network receive than there are
from the activities of tasks with a high priority.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-09 10:11                           ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-09 10:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Christoph Lameter, Rik van Riel, Andrew Morton,
	stable, linux-kernel, linux-mm, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Stephan von Krawczynski, kernel-testers

On Wed, Nov 04, 2009 at 10:01:40AM +0100, Pavel Machek wrote:
> 
> > I hope we can move this to another thread if people would like to remove 
> > this exemption completely instead of talking about this trivial fix, which 
> > I doubt there's any objection to.
> 
> I'm arguing that this "trivial fix" is wrong, and that you should just
> remove those two lines.
> 
> If going into reserves from interrupts hurts, doing that from task
> context will hurt, too. "realtime" task should not be normally allowed
> to "hurt" the system like that.
> 									Pavel

As David points out, it has been the behaviour of the system for 4 years
and removing it should be made as a separate decision and not in the
guise of a fix. In the particular case causing concern, there are a lot
more allocations from interrupt due to network receive than there are
from the activities of tasks with a high priority.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER
@ 2009-11-09 10:11                           ` Mel Gorman
  0 siblings, 0 replies; 115+ messages in thread
From: Mel Gorman @ 2009-11-09 10:11 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Rientjes, Christoph Lameter, Rik van Riel, Andrew Morton,
	stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Frans Pop, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Stephan von Krawczynski,
	kernel-testers-u79uwXL29TY76Z2rM5mHXA

On Wed, Nov 04, 2009 at 10:01:40AM +0100, Pavel Machek wrote:
> 
> > I hope we can move this to another thread if people would like to remove 
> > this exemption completely instead of talking about this trivial fix, which 
> > I doubt there's any objection to.
> 
> I'm arguing that this "trivial fix" is wrong, and that you should just
> remove those two lines.
> 
> If going into reserves from interrupts hurts, doing that from task
> context will hurt, too. "realtime" task should not be normally allowed
> to "hurt" the system like that.
> 									Pavel

As David points out, it has been the behaviour of the system for 4 years
and removing it should be made as a separate decision and not in the
guise of a fix. In the particular case causing concern, there are a lot
more allocations from interrupt due to network receive than there are
from the activities of tasks with a high priority.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit (data on latencies available)
  2009-11-05 16:48                               ` Mel Gorman
  (?)
@ 2009-11-12 11:36                                 ` Frans Pop
  -1 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-12 11:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Jens Axboe, Chris Mason,
	Kernel Testers List

First of all, sorry for not replying to this sooner. And my heartfelt 
appreciation for sticking with the issue. I wish I could do more to help 
resolve it instead of just reporting the problem.
I saw your blog post today and am looking forward to your results.

On Thursday 05 November 2009, Mel Gorman wrote:
> In the interest of getting something more empirical, I sat down from
> scratch with the view to recreating your case and I believe I was
> successful. I was able to reproduce your problem after a fashion and
> generate some figures - crucially including some latency figures.

I'm not sure if you have exactly reproduced what I'm seeing, mainly because 
I would have expected a clearer difference between .30 and .31 for the 
latency figures. There's also little difference in latency between .32-rc6 
with and without 8aa7e847 reverted.
So it looks as if latency is not a significant indicator of the effects of 
8aa7e847 in your test.

But if I look at your graphs for IO and writeback, then those *do* show a 
marked difference between .30 and .31. Those graphs also show a 
significant difference between .32-rc6 with and without 8aa7e847 reverted.
So that looks promising.

> Because of other reports, the slight improvements on latency and the
> removal of a possible live-lock situation, I think patches 1-3 and the
> accounting patch posted in this thread should go ahead. Patches 1,2
> bring allocator behaviour more in line with 2.6.30 and are a proper fix.
> Patch 3 makes a lot of sense when there are a lot of high-order atomics
> going on so that kswapd notices as fast as possible that it needs to do
> other work. The accounting patch monitors what's going on with patch 3.

Hmmm. What strikes me most about the latency graphs is how much worse it 
looks for .32 with your patches 1-3 applied than without. That seems to 
contradict what you say above.

The fact that all .32 latencies are worse that with either .30 or .31 is 
probably simply the result of the changes in the scheduler. It's one 
reason why I have tested most candidate patches against both .31 and .32.

As the latencies are not extreme in an absolute sense, I would say it does 
not need to indicate a problem. It just means you cannot easily compare 
latency figures for .30 and .31 with those for .32.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit (data on latencies available)
@ 2009-11-12 11:36                                 ` Frans Pop
  0 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-12 11:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable, linux-kernel, linux-mm, Jiri Kosina,
	Sven Geggus, Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro,
	Pekka Enberg, Rik van Riel, Christoph Lameter,
	Stephan von Krawczynski, Jens Axboe, Chris Mason,
	Kernel Testers List

First of all, sorry for not replying to this sooner. And my heartfelt 
appreciation for sticking with the issue. I wish I could do more to help 
resolve it instead of just reporting the problem.
I saw your blog post today and am looking forward to your results.

On Thursday 05 November 2009, Mel Gorman wrote:
> In the interest of getting something more empirical, I sat down from
> scratch with the view to recreating your case and I believe I was
> successful. I was able to reproduce your problem after a fashion and
> generate some figures - crucially including some latency figures.

I'm not sure if you have exactly reproduced what I'm seeing, mainly because 
I would have expected a clearer difference between .30 and .31 for the 
latency figures. There's also little difference in latency between .32-rc6 
with and without 8aa7e847 reverted.
So it looks as if latency is not a significant indicator of the effects of 
8aa7e847 in your test.

But if I look at your graphs for IO and writeback, then those *do* show a 
marked difference between .30 and .31. Those graphs also show a 
significant difference between .32-rc6 with and without 8aa7e847 reverted.
So that looks promising.

> Because of other reports, the slight improvements on latency and the
> removal of a possible live-lock situation, I think patches 1-3 and the
> accounting patch posted in this thread should go ahead. Patches 1,2
> bring allocator behaviour more in line with 2.6.30 and are a proper fix.
> Patch 3 makes a lot of sense when there are a lot of high-order atomics
> going on so that kswapd notices as fast as possible that it needs to do
> other work. The accounting patch monitors what's going on with patch 3.

Hmmm. What strikes me most about the latency graphs is how much worse it 
looks for .32 with your patches 1-3 applied than without. That seems to 
contradict what you say above.

The fact that all .32 latencies are worse that with either .30 or .31 is 
probably simply the result of the changes in the scheduler. It's one 
reason why I have tested most candidate patches against both .31 and .32.

As the latencies are not extreme in an absolute sense, I would say it does 
not need to indicate a problem. It just means you cannot easily compare 
latency figures for .30 and .31 with those for .32.

Cheers,
FJP

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit (data on latencies available)
@ 2009-11-12 11:36                                 ` Frans Pop
  0 siblings, 0 replies; 115+ messages in thread
From: Frans Pop @ 2009-11-12 11:36 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, stable-DgEjT+Ai2ygdnm+yROfE0A,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jiri Kosina, Sven Geggus,
	Karol Lewandowski, Tobias Oetiker, KOSAKI Motohiro, Pekka Enberg,
	Rik van Riel, Christoph Lameter, Stephan von Krawczynski,
	Jens Axboe, Chris Mason, Kernel Testers List

First of all, sorry for not replying to this sooner. And my heartfelt 
appreciation for sticking with the issue. I wish I could do more to help 
resolve it instead of just reporting the problem.
I saw your blog post today and am looking forward to your results.

On Thursday 05 November 2009, Mel Gorman wrote:
> In the interest of getting something more empirical, I sat down from
> scratch with the view to recreating your case and I believe I was
> successful. I was able to reproduce your problem after a fashion and
> generate some figures - crucially including some latency figures.

I'm not sure if you have exactly reproduced what I'm seeing, mainly because 
I would have expected a clearer difference between .30 and .31 for the 
latency figures. There's also little difference in latency between .32-rc6 
with and without 8aa7e847 reverted.
So it looks as if latency is not a significant indicator of the effects of 
8aa7e847 in your test.

But if I look at your graphs for IO and writeback, then those *do* show a 
marked difference between .30 and .31. Those graphs also show a 
significant difference between .32-rc6 with and without 8aa7e847 reverted.
So that looks promising.

> Because of other reports, the slight improvements on latency and the
> removal of a possible live-lock situation, I think patches 1-3 and the
> accounting patch posted in this thread should go ahead. Patches 1,2
> bring allocator behaviour more in line with 2.6.30 and are a proper fix.
> Patch 3 makes a lot of sense when there are a lot of high-order atomics
> going on so that kswapd notices as fast as possible that it needs to do
> other work. The accounting patch monitors what's going on with patch 3.

Hmmm. What strikes me most about the latency graphs is how much worse it 
looks for .32 with your patches 1-3 applied than without. That seems to 
contradict what you say above.

The fact that all .32 latencies are worse that with either .30 or .31 is 
probably simply the result of the changes in the scheduler. It's one 
reason why I have tested most candidate patches against both .31 and .32.

As the latencies are not extreme in an absolute sense, I would say it does 
not need to indicate a problem. It just means you cannot easily compare 
latency figures for .30 and .31 with those for .32.

Cheers,
FJP

^ permalink raw reply	[flat|nested] 115+ messages in thread

end of thread, other threads:[~2009-11-12 11:36 UTC | newest]

Thread overview: 115+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-10-27 13:40 [PATCH 0/3] Reduce GFP_ATOMIC allocation failures, partial fix V3 Mel Gorman
2009-10-27 13:40 ` Mel Gorman
2009-10-27 13:40 ` [PATCH 1/3] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman
2009-10-27 13:40   ` Mel Gorman
2009-10-27 13:40   ` Mel Gorman
2009-10-27 13:40 ` [PATCH 2/3] page allocator: Do not allow interrupts to use ALLOC_HARDER Mel Gorman
2009-10-27 13:40   ` Mel Gorman
2009-10-27 20:09   ` Andrew Morton
2009-10-27 20:09     ` Andrew Morton
2009-10-27 20:09     ` Andrew Morton
2009-10-27 21:12     ` David Rientjes
2009-10-27 21:12       ` David Rientjes
2009-10-27 21:12       ` David Rientjes
2009-10-31 18:40       ` Pavel Machek
2009-10-31 18:40         ` Pavel Machek
2009-10-31 18:40         ` Pavel Machek
2009-10-31 19:51         ` David Rientjes
2009-10-31 19:51           ` David Rientjes
2009-10-31 20:11           ` Pavel Machek
2009-10-31 20:11             ` Pavel Machek
2009-10-31 21:19             ` David Rientjes
2009-10-31 21:19               ` David Rientjes
2009-10-31 21:19               ` David Rientjes
2009-10-31 22:29               ` Pavel Machek
2009-10-31 22:29                 ` Pavel Machek
2009-10-31 22:55                 ` Rik van Riel
2009-10-31 22:55                   ` Rik van Riel
2009-11-01  7:35                   ` Pavel Machek
2009-11-01  7:35                     ` Pavel Machek
2009-11-01  7:35                     ` Pavel Machek
2009-11-01 12:37                     ` KOSAKI Motohiro
2009-11-01 12:37                       ` KOSAKI Motohiro
2009-11-01 12:37                       ` KOSAKI Motohiro
2009-11-01 14:44                     ` Rik van Riel
2009-11-01 14:44                       ` Rik van Riel
2009-11-01 19:32                       ` Pavel Machek
2009-11-01 19:32                         ` Pavel Machek
2009-11-01 19:32                         ` Pavel Machek
2009-11-02 16:38                       ` Christoph Lameter
2009-11-02 16:38                         ` Christoph Lameter
2009-10-31 23:59             ` Rik van Riel
2009-10-31 23:59               ` Rik van Riel
2009-11-02 16:42               ` Christoph Lameter
2009-11-02 16:42                 ` Christoph Lameter
2009-11-02 20:53                 ` David Rientjes
2009-11-02 20:53                   ` David Rientjes
2009-11-02 20:53                   ` David Rientjes
2009-11-03 17:10                   ` Christoph Lameter
2009-11-03 17:10                     ` Christoph Lameter
2009-11-04  1:46                     ` David Rientjes
2009-11-04  1:46                       ` David Rientjes
2009-11-04  1:46                       ` David Rientjes
2009-11-04  9:01                       ` Pavel Machek
2009-11-04  9:01                         ` Pavel Machek
2009-11-04  9:01                         ` Pavel Machek
2009-11-09 10:11                         ` Mel Gorman
2009-11-09 10:11                           ` Mel Gorman
2009-11-09 10:11                           ` Mel Gorman
2009-10-28 10:24     ` Mel Gorman
2009-10-28 10:24       ` Mel Gorman
2009-10-27 13:40 ` [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit Mel Gorman
2009-10-27 13:40   ` Mel Gorman
2009-10-27 18:18   ` Rik van Riel
     [not found]   ` <1256650833-15516-4-git-send-email-mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
2009-10-27 18:18     ` Rik van Riel
2009-10-27 18:18   ` Rik van Riel
2009-10-27 20:19   ` Andrew Morton
2009-10-27 20:19     ` Andrew Morton
2009-10-28  3:54     ` KOSAKI Motohiro
2009-10-28  3:54       ` KOSAKI Motohiro
2009-10-28  3:54       ` KOSAKI Motohiro
2009-10-28 10:29     ` Mel Gorman
2009-10-28 10:29       ` Mel Gorman
2009-10-28 10:29       ` Mel Gorman
2009-10-28 19:47       ` Andrew Morton
2009-10-28 19:47         ` Andrew Morton
2009-11-02 16:05         ` Mel Gorman
2009-11-02 16:05           ` Mel Gorman
2009-11-02 16:05           ` Mel Gorman
2009-11-02 17:32           ` Frans Pop
2009-11-02 17:32             ` Frans Pop
2009-11-02 17:38             ` Mel Gorman
2009-11-02 17:38               ` Mel Gorman
2009-11-02 17:38               ` Mel Gorman
2009-11-02 20:36               ` Mel Gorman
2009-11-02 20:36                 ` Mel Gorman
2009-11-03 22:01               ` Frans Pop
2009-11-03 22:08                 ` Mel Gorman
2009-11-03 22:08                   ` Mel Gorman
2009-11-03 22:08                   ` Mel Gorman
2009-11-04  0:01                   ` Frans Pop
2009-11-04  1:18                     ` Mel Gorman
2009-11-04  1:18                       ` Mel Gorman
2009-11-04  2:05                       ` Frans Pop
2009-11-04  2:05                         ` Frans Pop
2009-11-04  2:05                         ` Frans Pop
2009-11-04  2:08                         ` Frans Pop
2009-11-04  2:08                           ` Frans Pop
2009-11-04 15:48                         ` Mel Gorman
2009-11-04 15:48                           ` Mel Gorman
2009-11-04 15:48                           ` Mel Gorman
2009-11-04 20:57                           ` Frans Pop
2009-11-04 20:57                             ` Frans Pop
2009-11-04 20:57                             ` Frans Pop
2009-11-05 16:48                             ` [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit (data on latencies available) Mel Gorman
2009-11-05 16:48                               ` Mel Gorman
2009-11-05 16:48                               ` Mel Gorman
2009-11-12 11:36                               ` Frans Pop
2009-11-12 11:36                                 ` Frans Pop
2009-11-12 11:36                                 ` Frans Pop
2009-11-04  2:08                       ` [PATCH 3/3] vmscan: Force kswapd to take notice faster when high-order watermarks are being hit Mel Gorman
2009-11-04  2:08                         ` Mel Gorman
2009-11-04  2:08                         ` Mel Gorman
2009-10-28 13:02 ` [PATCH 0/3] Reduce GFP_ATOMIC allocation failures, partial fix V3 Karol Lewandowski
2009-10-28 13:02   ` Karol Lewandowski
2009-10-28 13:02   ` Karol Lewandowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.