[PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-03-20 18:19 ` Mel Gorman
  0 siblings, 0 replies; 28+ messages in thread
From: Mel Gorman @ 2013-03-20 18:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michal Hocko, Hedi Berriche, linux-mm, linux-kernel

The following problem was reported against a distribution kernel when
zone_reclaim was enabled but the same problem applies to the mainline
kernel. The reproduction case was as follows

1. Run numactl -m +0 dd if=largefile of=/dev/null
   This allocates a large number of clean pages in node 0

2. numactl -N +0 memhog 0.5*Mg
   This start a memory-using application in node 0.

The expected behaviour is that the clean pages get reclaimed and the
application uses node 0 for its memory. The observed behaviour was that
the memory for the memhog application was allocated off-node since commits
cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for
zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones
for allocation after direct reclaim).

The assumption of those patches was that it was always preferable to
allocate quickly than stall for long periods of time and they were
meant to take care that the zone was only marked full when necessary but
an important case was missed.

In the allocator fast path, only the low watermarks are checked. If the
zones free pages are between the low and min watermark then allocations
from the allocators slow path will succeed. However, zone_reclaim
will only reclaim SWAP_CLUSTER_MAX or 1<<order pages. There is no
guarantee that this will meet the low watermark causing the zone to be
marked full prematurely.

This patch will only mark the zone full after zone_reclaim if it the min
watermarks are checked or if page reclaim failed to make sufficient
progress.

Reported-and-tested-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..adce823 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1940,9 +1940,24 @@ zonelist_scan:
 				continue;
 			default:
 				/* did we reclaim enough */
-				if (!zone_watermark_ok(zone, order, mark,
+				if (zone_watermark_ok(zone, order, mark,
 						classzone_idx, alloc_flags))
+					goto try_this_zone;
+
+				/*
+				 * Failed to reclaim enough to meet watermark.
+				 * Only mark the zone full if checking the min
+				 * watermark or if we failed to reclaim just
+				 * 1<<order pages or else the page allocator
+				 * fastpath will prematurely mark zones full
+				 * when the watermark is between the low and
+				 * min watermarks.
+				 */
+				if ((alloc_flags & ALLOC_WMARK_MIN) ||
+				    ret == ZONE_RECLAIM_SOME)
 					goto this_zone_full;
+
+				continue;
 			}
 		}

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-03-20 18:19 ` Mel Gorman
  0 siblings, 0 replies; 28+ messages in thread
From: Mel Gorman @ 2013-03-20 18:19 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Michal Hocko, Hedi Berriche, linux-mm, linux-kernel

The following problem was reported against a distribution kernel when
zone_reclaim was enabled but the same problem applies to the mainline
kernel. The reproduction case was as follows

1. Run numactl -m +0 dd if=largefile of=/dev/null
   This allocates a large number of clean pages in node 0

2. numactl -N +0 memhog 0.5*Mg
   This start a memory-using application in node 0.

The expected behaviour is that the clean pages get reclaimed and the
application uses node 0 for its memory. The observed behaviour was that
the memory for the memhog application was allocated off-node since commits
cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for
zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones
for allocation after direct reclaim).

The assumption of those patches was that it was always preferable to
allocate quickly than stall for long periods of time and they were
meant to take care that the zone was only marked full when necessary but
an important case was missed.

In the allocator fast path, only the low watermarks are checked. If the
zones free pages are between the low and min watermark then allocations
from the allocators slow path will succeed. However, zone_reclaim
will only reclaim SWAP_CLUSTER_MAX or 1<<order pages. There is no
guarantee that this will meet the low watermark causing the zone to be
marked full prematurely.

This patch will only mark the zone full after zone_reclaim if it the min
watermarks are checked or if page reclaim failed to make sufficient
progress.

Reported-and-tested-by: Hedi Berriche <hedi@sgi.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8fcced7..adce823 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1940,9 +1940,24 @@ zonelist_scan:
 				continue;
 			default:
 				/* did we reclaim enough */
-				if (!zone_watermark_ok(zone, order, mark,
+				if (zone_watermark_ok(zone, order, mark,
 						classzone_idx, alloc_flags))
+					goto try_this_zone;
+
+				/*
+				 * Failed to reclaim enough to meet watermark.
+				 * Only mark the zone full if checking the min
+				 * watermark or if we failed to reclaim just
+				 * 1<<order pages or else the page allocator
+				 * fastpath will prematurely mark zones full
+				 * when the watermark is between the low and
+				 * min watermarks.
+				 */
+				if ((alloc_flags & ALLOC_WMARK_MIN) ||
+				    ret == ZONE_RECLAIM_SOME)
 					goto this_zone_full;
+
+				continue;
 			}
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-20 18:19 ` Mel Gorman
@ 2013-03-20 18:45   ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-03-20 18:45 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Wed 20-03-13 18:19:57, Mel Gorman wrote:
> The following problem was reported against a distribution kernel when
> zone_reclaim was enabled but the same problem applies to the mainline
> kernel. The reproduction case was as follows
> 
> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>    This allocates a large number of clean pages in node 0
> 
> 2. numactl -N +0 memhog 0.5*Mg
>    This start a memory-using application in node 0.
> 
> The expected behaviour is that the clean pages get reclaimed and the
> application uses node 0 for its memory. The observed behaviour was that
> the memory for the memhog application was allocated off-node since commits
> cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for
> zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones
> for allocation after direct reclaim).
> 
> The assumption of those patches was that it was always preferable to
> allocate quickly than stall for long periods of time and they were
> meant to take care that the zone was only marked full when necessary but
> an important case was missed.
> 
> In the allocator fast path, only the low watermarks are checked. If the
> zones free pages are between the low and min watermark then allocations
> from the allocators slow path will succeed. However, zone_reclaim
> will only reclaim SWAP_CLUSTER_MAX or 1<<order pages. There is no
> guarantee that this will meet the low watermark causing the zone to be
> marked full prematurely.
> 
> This patch will only mark the zone full after zone_reclaim if it the min
> watermarks are checked or if page reclaim failed to make sufficient
> progress.
> 
> Reported-and-tested-by: Hedi Berriche <hedi@sgi.com>
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/page_alloc.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fcced7..adce823 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1940,9 +1940,24 @@ zonelist_scan:
>  				continue;
>  			default:
>  				/* did we reclaim enough */
> -				if (!zone_watermark_ok(zone, order, mark,
> +				if (zone_watermark_ok(zone, order, mark,
>  						classzone_idx, alloc_flags))
> +					goto try_this_zone;
> +
> +				/*
> +				 * Failed to reclaim enough to meet watermark.
> +				 * Only mark the zone full if checking the min
> +				 * watermark or if we failed to reclaim just
> +				 * 1<<order pages or else the page allocator
> +				 * fastpath will prematurely mark zones full
> +				 * when the watermark is between the low and
> +				 * min watermarks.
> +				 */
> +				if ((alloc_flags & ALLOC_WMARK_MIN) ||
> +				    ret == ZONE_RECLAIM_SOME)
>  					goto this_zone_full;
> +
> +				continue;
>  			}
>  		}
>  

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-03-20 18:45   ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-03-20 18:45 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Wed 20-03-13 18:19:57, Mel Gorman wrote:
> The following problem was reported against a distribution kernel when
> zone_reclaim was enabled but the same problem applies to the mainline
> kernel. The reproduction case was as follows
> 
> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>    This allocates a large number of clean pages in node 0
> 
> 2. numactl -N +0 memhog 0.5*Mg
>    This start a memory-using application in node 0.
> 
> The expected behaviour is that the clean pages get reclaimed and the
> application uses node 0 for its memory. The observed behaviour was that
> the memory for the memhog application was allocated off-node since commits
> cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for
> zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones
> for allocation after direct reclaim).
> 
> The assumption of those patches was that it was always preferable to
> allocate quickly than stall for long periods of time and they were
> meant to take care that the zone was only marked full when necessary but
> an important case was missed.
> 
> In the allocator fast path, only the low watermarks are checked. If the
> zones free pages are between the low and min watermark then allocations
> from the allocators slow path will succeed. However, zone_reclaim
> will only reclaim SWAP_CLUSTER_MAX or 1<<order pages. There is no
> guarantee that this will meet the low watermark causing the zone to be
> marked full prematurely.
> 
> This patch will only mark the zone full after zone_reclaim if it the min
> watermarks are checked or if page reclaim failed to make sufficient
> progress.
> 
> Reported-and-tested-by: Hedi Berriche <hedi@sgi.com>
> Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/page_alloc.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fcced7..adce823 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1940,9 +1940,24 @@ zonelist_scan:
>  				continue;
>  			default:
>  				/* did we reclaim enough */
> -				if (!zone_watermark_ok(zone, order, mark,
> +				if (zone_watermark_ok(zone, order, mark,
>  						classzone_idx, alloc_flags))
> +					goto try_this_zone;
> +
> +				/*
> +				 * Failed to reclaim enough to meet watermark.
> +				 * Only mark the zone full if checking the min
> +				 * watermark or if we failed to reclaim just
> +				 * 1<<order pages or else the page allocator
> +				 * fastpath will prematurely mark zones full
> +				 * when the watermark is between the low and
> +				 * min watermarks.
> +				 */
> +				if ((alloc_flags & ALLOC_WMARK_MIN) ||
> +				    ret == ZONE_RECLAIM_SOME)
>  					goto this_zone_full;
> +
> +				continue;
>  			}
>  		}
>  

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-20 18:19 ` Mel Gorman
                   ` (2 preceding siblings ...)
  (?)
@ 2013-03-21  2:32 ` Wanpeng Li
  -1 siblings, 0 replies; 28+ messages in thread
From: Wanpeng Li @ 2013-03-21  2:32 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Wed, Mar 20, 2013 at 06:19:57PM +0000, Mel Gorman wrote:
>The following problem was reported against a distribution kernel when
>zone_reclaim was enabled but the same problem applies to the mainline
>kernel. The reproduction case was as follows
>
>1. Run numactl -m +0 dd if=largefile of=/dev/null
>   This allocates a large number of clean pages in node 0
>
>2. numactl -N +0 memhog 0.5*Mg
>   This start a memory-using application in node 0.
>
>The expected behaviour is that the clean pages get reclaimed and the
>application uses node 0 for its memory. The observed behaviour was that
>the memory for the memhog application was allocated off-node since commits
>cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for
>zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones
>for allocation after direct reclaim).
>
>The assumption of those patches was that it was always preferable to
>allocate quickly than stall for long periods of time and they were
>meant to take care that the zone was only marked full when necessary but
>an important case was missed.
>
>In the allocator fast path, only the low watermarks are checked. If the
>zones free pages are between the low and min watermark then allocations
>from the allocators slow path will succeed. However, zone_reclaim
>will only reclaim SWAP_CLUSTER_MAX or 1<<order pages. There is no
>guarantee that this will meet the low watermark causing the zone to be
>marked full prematurely.
>
>This patch will only mark the zone full after zone_reclaim if it the min
>watermarks are checked or if page reclaim failed to make sufficient
>progress.
>
>Reported-and-tested-by: Hedi Berriche <hedi@sgi.com>
>Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

>---
> mm/page_alloc.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 8fcced7..adce823 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -1940,9 +1940,24 @@ zonelist_scan:
> 				continue;
> 			default:
> 				/* did we reclaim enough */
>-				if (!zone_watermark_ok(zone, order, mark,
>+				if (zone_watermark_ok(zone, order, mark,
> 						classzone_idx, alloc_flags))
>+					goto try_this_zone;
>+
>+				/*
>+				 * Failed to reclaim enough to meet watermark.
>+				 * Only mark the zone full if checking the min
>+				 * watermark or if we failed to reclaim just
>+				 * 1<<order pages or else the page allocator
>+				 * fastpath will prematurely mark zones full
>+				 * when the watermark is between the low and
>+				 * min watermarks.
>+				 */
>+				if ((alloc_flags & ALLOC_WMARK_MIN) ||
>+				    ret == ZONE_RECLAIM_SOME)
> 					goto this_zone_full;
>+
>+				continue;
> 			}
> 		}
>
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-20 18:19 ` Mel Gorman
  (?)
  (?)
@ 2013-03-21  2:32 ` Wanpeng Li
  -1 siblings, 0 replies; 28+ messages in thread
From: Wanpeng Li @ 2013-03-21  2:32 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Wed, Mar 20, 2013 at 06:19:57PM +0000, Mel Gorman wrote:
>The following problem was reported against a distribution kernel when
>zone_reclaim was enabled but the same problem applies to the mainline
>kernel. The reproduction case was as follows
>
>1. Run numactl -m +0 dd if=largefile of=/dev/null
>   This allocates a large number of clean pages in node 0
>
>2. numactl -N +0 memhog 0.5*Mg
>   This start a memory-using application in node 0.
>
>The expected behaviour is that the clean pages get reclaimed and the
>application uses node 0 for its memory. The observed behaviour was that
>the memory for the memhog application was allocated off-node since commits
>cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for
>zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones
>for allocation after direct reclaim).
>
>The assumption of those patches was that it was always preferable to
>allocate quickly than stall for long periods of time and they were
>meant to take care that the zone was only marked full when necessary but
>an important case was missed.
>
>In the allocator fast path, only the low watermarks are checked. If the
>zones free pages are between the low and min watermark then allocations
>from the allocators slow path will succeed. However, zone_reclaim
>will only reclaim SWAP_CLUSTER_MAX or 1<<order pages. There is no
>guarantee that this will meet the low watermark causing the zone to be
>marked full prematurely.
>
>This patch will only mark the zone full after zone_reclaim if it the min
>watermarks are checked or if page reclaim failed to make sufficient
>progress.
>
>Reported-and-tested-by: Hedi Berriche <hedi@sgi.com>
>Signed-off-by: Mel Gorman <mgorman@suse.de>

Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>

>---
> mm/page_alloc.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 8fcced7..adce823 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -1940,9 +1940,24 @@ zonelist_scan:
> 				continue;
> 			default:
> 				/* did we reclaim enough */
>-				if (!zone_watermark_ok(zone, order, mark,
>+				if (zone_watermark_ok(zone, order, mark,
> 						classzone_idx, alloc_flags))
>+					goto try_this_zone;
>+
>+				/*
>+				 * Failed to reclaim enough to meet watermark.
>+				 * Only mark the zone full if checking the min
>+				 * watermark or if we failed to reclaim just
>+				 * 1<<order pages or else the page allocator
>+				 * fastpath will prematurely mark zones full
>+				 * when the watermark is between the low and
>+				 * min watermarks.
>+				 */
>+				if ((alloc_flags & ALLOC_WMARK_MIN) ||
>+				    ret == ZONE_RECLAIM_SOME)
> 					goto this_zone_full;
>+
>+				continue;
> 			}
> 		}
>
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-20 18:19 ` Mel Gorman
@ 2013-03-21  2:33   ` Simon Jeons
  -1 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-03-21  2:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Michal Hocko, Hedi Berriche, linux-mm, linux-kernel

Hi Mel,
On 03/21/2013 02:19 AM, Mel Gorman wrote:
> The following problem was reported against a distribution kernel when
> zone_reclaim was enabled but the same problem applies to the mainline
> kernel. The reproduction case was as follows
>
> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>     This allocates a large number of clean pages in node 0

I confuse why this need allocate a large number of clean pages?

>
> 2. numactl -N +0 memhog 0.5*Mg
>     This start a memory-using application in node 0.
>
> The expected behaviour is that the clean pages get reclaimed and the
> application uses node 0 for its memory. The observed behaviour was that
> the memory for the memhog application was allocated off-node since commits
> cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for
> zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones
> for allocation after direct reclaim).
>
> The assumption of those patches was that it was always preferable to
> allocate quickly than stall for long periods of time and they were
> meant to take care that the zone was only marked full when necessary but
> an important case was missed.
>
> In the allocator fast path, only the low watermarks are checked. If the
> zones free pages are between the low and min watermark then allocations
> from the allocators slow path will succeed. However, zone_reclaim
> will only reclaim SWAP_CLUSTER_MAX or 1<<order pages. There is no
> guarantee that this will meet the low watermark causing the zone to be
> marked full prematurely.
>
> This patch will only mark the zone full after zone_reclaim if it the min
> watermarks are checked or if page reclaim failed to make sufficient
> progress.
>
> Reported-and-tested-by: Hedi Berriche <hedi@sgi.com>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>   mm/page_alloc.c | 17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fcced7..adce823 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1940,9 +1940,24 @@ zonelist_scan:
>   				continue;
>   			default:
>   				/* did we reclaim enough */
> -				if (!zone_watermark_ok(zone, order, mark,
> +				if (zone_watermark_ok(zone, order, mark,
>   						classzone_idx, alloc_flags))
> +					goto try_this_zone;
> +
> +				/*
> +				 * Failed to reclaim enough to meet watermark.
> +				 * Only mark the zone full if checking the min
> +				 * watermark or if we failed to reclaim just
> +				 * 1<<order pages or else the page allocator
> +				 * fastpath will prematurely mark zones full
> +				 * when the watermark is between the low and
> +				 * min watermarks.
> +				 */
> +				if ((alloc_flags & ALLOC_WMARK_MIN) ||
> +				    ret == ZONE_RECLAIM_SOME)
>   					goto this_zone_full;
> +
> +				continue;
>   			}
>   		}
>   
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-03-21  2:33   ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-03-21  2:33 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Michal Hocko, Hedi Berriche, linux-mm, linux-kernel

Hi Mel,
On 03/21/2013 02:19 AM, Mel Gorman wrote:
> The following problem was reported against a distribution kernel when
> zone_reclaim was enabled but the same problem applies to the mainline
> kernel. The reproduction case was as follows
>
> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>     This allocates a large number of clean pages in node 0

I confuse why this need allocate a large number of clean pages?

>
> 2. numactl -N +0 memhog 0.5*Mg
>     This start a memory-using application in node 0.
>
> The expected behaviour is that the clean pages get reclaimed and the
> application uses node 0 for its memory. The observed behaviour was that
> the memory for the memhog application was allocated off-node since commits
> cd38b11 (mm: page allocator: initialise ZLC for first zone eligible for
> zone_reclaim) and commit 76d3fbf (mm: page allocator: reconsider zones
> for allocation after direct reclaim).
>
> The assumption of those patches was that it was always preferable to
> allocate quickly than stall for long periods of time and they were
> meant to take care that the zone was only marked full when necessary but
> an important case was missed.
>
> In the allocator fast path, only the low watermarks are checked. If the
> zones free pages are between the low and min watermark then allocations
> from the allocators slow path will succeed. However, zone_reclaim
> will only reclaim SWAP_CLUSTER_MAX or 1<<order pages. There is no
> guarantee that this will meet the low watermark causing the zone to be
> marked full prematurely.
>
> This patch will only mark the zone full after zone_reclaim if it the min
> watermarks are checked or if page reclaim failed to make sufficient
> progress.
>
> Reported-and-tested-by: Hedi Berriche <hedi@sgi.com>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>   mm/page_alloc.c | 17 ++++++++++++++++-
>   1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 8fcced7..adce823 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1940,9 +1940,24 @@ zonelist_scan:
>   				continue;
>   			default:
>   				/* did we reclaim enough */
> -				if (!zone_watermark_ok(zone, order, mark,
> +				if (zone_watermark_ok(zone, order, mark,
>   						classzone_idx, alloc_flags))
> +					goto try_this_zone;
> +
> +				/*
> +				 * Failed to reclaim enough to meet watermark.
> +				 * Only mark the zone full if checking the min
> +				 * watermark or if we failed to reclaim just
> +				 * 1<<order pages or else the page allocator
> +				 * fastpath will prematurely mark zones full
> +				 * when the watermark is between the low and
> +				 * min watermarks.
> +				 */
> +				if ((alloc_flags & ALLOC_WMARK_MIN) ||
> +				    ret == ZONE_RECLAIM_SOME)
>   					goto this_zone_full;
> +
> +				continue;
>   			}
>   		}
>   
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-21  2:33   ` Simon Jeons
@ 2013-03-21  8:19     ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-03-21  8:19 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Thu 21-03-13 10:33:07, Simon Jeons wrote:
> Hi Mel,
> On 03/21/2013 02:19 AM, Mel Gorman wrote:
> >The following problem was reported against a distribution kernel when
> >zone_reclaim was enabled but the same problem applies to the mainline
> >kernel. The reproduction case was as follows
> >
> >1. Run numactl -m +0 dd if=largefile of=/dev/null
> >    This allocates a large number of clean pages in node 0
> 
> I confuse why this need allocate a large number of clean pages?

It reads from file and puts pages into the page cache. The pages are not
modified so they are clean. Output file is /dev/null so no pages are
written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
by default so pages from the file stay in the page cache
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-03-21  8:19     ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-03-21  8:19 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Thu 21-03-13 10:33:07, Simon Jeons wrote:
> Hi Mel,
> On 03/21/2013 02:19 AM, Mel Gorman wrote:
> >The following problem was reported against a distribution kernel when
> >zone_reclaim was enabled but the same problem applies to the mainline
> >kernel. The reproduction case was as follows
> >
> >1. Run numactl -m +0 dd if=largefile of=/dev/null
> >    This allocates a large number of clean pages in node 0
> 
> I confuse why this need allocate a large number of clean pages?

It reads from file and puts pages into the page cache. The pages are not
modified so they are clean. Output file is /dev/null so no pages are
written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
by default so pages from the file stay in the page cache
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-21  8:19     ` Michal Hocko
@ 2013-03-21  8:32       ` Simon Jeons
  -1 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-03-21  8:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Hi Michal,
On 03/21/2013 04:19 PM, Michal Hocko wrote:
> On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>> Hi Mel,
>> On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>> The following problem was reported against a distribution kernel when
>>> zone_reclaim was enabled but the same problem applies to the mainline
>>> kernel. The reproduction case was as follows
>>>
>>> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>     This allocates a large number of clean pages in node 0
>> I confuse why this need allocate a large number of clean pages?
> It reads from file and puts pages into the page cache. The pages are not
> modified so they are clean. Output file is /dev/null so no pages are
> written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
> by default so pages from the file stay in the page cache

Thanks for your clarify Michal.
dd will use page cache instead of direct IO? Where can I got dd source 
codes?
One offline question, when should use page cache and when should use 
direct IO?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-03-21  8:32       ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-03-21  8:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Hi Michal,
On 03/21/2013 04:19 PM, Michal Hocko wrote:
> On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>> Hi Mel,
>> On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>> The following problem was reported against a distribution kernel when
>>> zone_reclaim was enabled but the same problem applies to the mainline
>>> kernel. The reproduction case was as follows
>>>
>>> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>     This allocates a large number of clean pages in node 0
>> I confuse why this need allocate a large number of clean pages?
> It reads from file and puts pages into the page cache. The pages are not
> modified so they are clean. Output file is /dev/null so no pages are
> written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
> by default so pages from the file stay in the page cache

Thanks for your clarify Michal.
dd will use page cache instead of direct IO? Where can I got dd source 
codes?
One offline question, when should use page cache and when should use 
direct IO?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-21  8:32       ` Simon Jeons
@ 2013-03-21  8:44         ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-03-21  8:44 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Thu 21-03-13 16:32:03, Simon Jeons wrote:
> Hi Michal,
> On 03/21/2013 04:19 PM, Michal Hocko wrote:
> >On Thu 21-03-13 10:33:07, Simon Jeons wrote:
> >>Hi Mel,
> >>On 03/21/2013 02:19 AM, Mel Gorman wrote:
> >>>The following problem was reported against a distribution kernel when
> >>>zone_reclaim was enabled but the same problem applies to the mainline
> >>>kernel. The reproduction case was as follows
> >>>
> >>>1. Run numactl -m +0 dd if=largefile of=/dev/null
> >>>    This allocates a large number of clean pages in node 0
> >>I confuse why this need allocate a large number of clean pages?
> >It reads from file and puts pages into the page cache. The pages are not
> >modified so they are clean. Output file is /dev/null so no pages are
> >written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
> >by default so pages from the file stay in the page cache
> 
> Thanks for your clarify Michal.

This is getting off-topic.

> dd will use page cache instead of direct IO?

no by default. You can use direct option. Refer to man dd for more
information.

> Where can I got dd source codes?

dd is part of coreutils: http://www.gnu.org/software/coreutils/
Please do not be afraid to use google. Most of these answers are there
already...

> One offline question, when should use page cache and when should use
> direct IO?

And this is really off-topic. The simplest answer would be. Use direct
IO when you want to prevent from caching because you are doing it
yourselvef. Please try to search the web it is full of more specific
examples.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-03-21  8:44         ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-03-21  8:44 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Thu 21-03-13 16:32:03, Simon Jeons wrote:
> Hi Michal,
> On 03/21/2013 04:19 PM, Michal Hocko wrote:
> >On Thu 21-03-13 10:33:07, Simon Jeons wrote:
> >>Hi Mel,
> >>On 03/21/2013 02:19 AM, Mel Gorman wrote:
> >>>The following problem was reported against a distribution kernel when
> >>>zone_reclaim was enabled but the same problem applies to the mainline
> >>>kernel. The reproduction case was as follows
> >>>
> >>>1. Run numactl -m +0 dd if=largefile of=/dev/null
> >>>    This allocates a large number of clean pages in node 0
> >>I confuse why this need allocate a large number of clean pages?
> >It reads from file and puts pages into the page cache. The pages are not
> >modified so they are clean. Output file is /dev/null so no pages are
> >written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
> >by default so pages from the file stay in the page cache
> 
> Thanks for your clarify Michal.

This is getting off-topic.

> dd will use page cache instead of direct IO?

no by default. You can use direct option. Refer to man dd for more
information.

> Where can I got dd source codes?

dd is part of coreutils: http://www.gnu.org/software/coreutils/
Please do not be afraid to use google. Most of these answers are there
already...

> One offline question, when should use page cache and when should use
> direct IO?

And this is really off-topic. The simplest answer would be. Use direct
IO when you want to prevent from caching because you are doing it
yourselvef. Please try to search the web it is full of more specific
examples.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-21  8:32       ` Simon Jeons
                         ` (2 preceding siblings ...)
  (?)
@ 2013-03-21  8:59       ` Wanpeng Li
  -1 siblings, 0 replies; 28+ messages in thread
From: Wanpeng Li @ 2013-03-21  8:59 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michal Hocko, Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm,
	linux-kernel

On Thu, Mar 21, 2013 at 04:32:03PM +0800, Simon Jeons wrote:
>Hi Michal,
>On 03/21/2013 04:19 PM, Michal Hocko wrote:
>>On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>>>Hi Mel,
>>>On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>>>The following problem was reported against a distribution kernel when
>>>>zone_reclaim was enabled but the same problem applies to the mainline
>>>>kernel. The reproduction case was as follows
>>>>
>>>>1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>>    This allocates a large number of clean pages in node 0
>>>I confuse why this need allocate a large number of clean pages?
>>It reads from file and puts pages into the page cache. The pages are not
>>modified so they are clean. Output file is /dev/null so no pages are
>>written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
>>by default so pages from the file stay in the page cache
>
>Thanks for your clarify Michal.
>dd will use page cache instead of direct IO? Where can I got dd
>source codes?
>One offline question, when should use page cache and when should use
>direct IO?

who prefer direct IO:
- the users believe they can manage caching of file contents better than 
  the kernel can. 
- the users want to avoid overflowing the page cache with data which is 
  unlikely to be of use in the near future.

Regards,
Wanpeng Li 

>
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-21  8:32       ` Simon Jeons
  (?)
  (?)
@ 2013-03-21  8:59       ` Wanpeng Li
  -1 siblings, 0 replies; 28+ messages in thread
From: Wanpeng Li @ 2013-03-21  8:59 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Michal Hocko, Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm,
	linux-kernel

On Thu, Mar 21, 2013 at 04:32:03PM +0800, Simon Jeons wrote:
>Hi Michal,
>On 03/21/2013 04:19 PM, Michal Hocko wrote:
>>On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>>>Hi Mel,
>>>On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>>>The following problem was reported against a distribution kernel when
>>>>zone_reclaim was enabled but the same problem applies to the mainline
>>>>kernel. The reproduction case was as follows
>>>>
>>>>1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>>    This allocates a large number of clean pages in node 0
>>>I confuse why this need allocate a large number of clean pages?
>>It reads from file and puts pages into the page cache. The pages are not
>>modified so they are clean. Output file is /dev/null so no pages are
>>written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
>>by default so pages from the file stay in the page cache
>
>Thanks for your clarify Michal.
>dd will use page cache instead of direct IO? Where can I got dd
>source codes?
>One offline question, when should use page cache and when should use
>direct IO?

who prefer direct IO:
- the users believe they can manage caching of file contents better than 
  the kernel can. 
- the users want to avoid overflowing the page cache with data which is 
  unlikely to be of use in the near future.

Regards,
Wanpeng Li 

>
>
>--
>To unsubscribe, send a message with 'unsubscribe linux-mm' in
>the body to majordomo@kvack.org.  For more info on Linux MM,
>see: http://www.linux-mm.org/ .
>Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-03-21  8:19     ` Michal Hocko
@ 2013-04-05  6:31       ` Simon Jeons
  -1 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-04-05  6:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Hi Michal,
On 03/21/2013 04:19 PM, Michal Hocko wrote:
> On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>> Hi Mel,
>> On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>> The following problem was reported against a distribution kernel when
>>> zone_reclaim was enabled but the same problem applies to the mainline
>>> kernel. The reproduction case was as follows
>>>
>>> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>     This allocates a large number of clean pages in node 0
>> I confuse why this need allocate a large number of clean pages?
> It reads from file and puts pages into the page cache. The pages are not
> modified so they are clean. Output file is /dev/null so no pages are
> written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
> by default so pages from the file stay in the page cache

I try this in v3.9-rc5:
dd if=/dev/sda of=/dev/null bs=1MB
14813+0 records in
14812+0 records out
14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s

free -m -s 1

                    total       used       free     shared buffers     
cached
Mem:          7912       1181       6731          0 663        239
-/+ buffers/cache:        277       7634
Swap:         8011          0       8011

It seems that almost 15GB copied before I stop dd, but the used pages 
which I monitor during dd always around 1200MB. Weird, why?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-04-05  6:31       ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-04-05  6:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Hi Michal,
On 03/21/2013 04:19 PM, Michal Hocko wrote:
> On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>> Hi Mel,
>> On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>> The following problem was reported against a distribution kernel when
>>> zone_reclaim was enabled but the same problem applies to the mainline
>>> kernel. The reproduction case was as follows
>>>
>>> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>     This allocates a large number of clean pages in node 0
>> I confuse why this need allocate a large number of clean pages?
> It reads from file and puts pages into the page cache. The pages are not
> modified so they are clean. Output file is /dev/null so no pages are
> written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
> by default so pages from the file stay in the page cache

I try this in v3.9-rc5:
dd if=/dev/sda of=/dev/null bs=1MB
14813+0 records in
14812+0 records out
14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s

free -m -s 1

                    total       used       free     shared buffers     
cached
Mem:          7912       1181       6731          0 663        239
-/+ buffers/cache:        277       7634
Swap:         8011          0       8011

It seems that almost 15GB copied before I stop dd, but the used pages 
which I monitor during dd always around 1200MB. Weird, why?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-04-05  6:31       ` Simon Jeons
@ 2013-04-07  6:37         ` Simon Jeons
  -1 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-04-07  6:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Ping!
On 04/05/2013 02:31 PM, Simon Jeons wrote:
> Hi Michal,
> On 03/21/2013 04:19 PM, Michal Hocko wrote:
>> On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>>> Hi Mel,
>>> On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>>> The following problem was reported against a distribution kernel when
>>>> zone_reclaim was enabled but the same problem applies to the mainline
>>>> kernel. The reproduction case was as follows
>>>>
>>>> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>>     This allocates a large number of clean pages in node 0
>>> I confuse why this need allocate a large number of clean pages?
>> It reads from file and puts pages into the page cache. The pages are not
>> modified so they are clean. Output file is /dev/null so no pages are
>> written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
>> by default so pages from the file stay in the page cache
>
> I try this in v3.9-rc5:
> dd if=/dev/sda of=/dev/null bs=1MB
> 14813+0 records in
> 14812+0 records out
> 14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
>
> free -m -s 1
>
>                    total       used       free     shared buffers     
> cached
> Mem:          7912       1181       6731          0 663        239
> -/+ buffers/cache:        277       7634
> Swap:         8011          0       8011
>
> It seems that almost 15GB copied before I stop dd, but the used pages 
> which I monitor during dd always around 1200MB. Weird, why?
>


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-04-07  6:37         ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-04-07  6:37 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Ping!
On 04/05/2013 02:31 PM, Simon Jeons wrote:
> Hi Michal,
> On 03/21/2013 04:19 PM, Michal Hocko wrote:
>> On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>>> Hi Mel,
>>> On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>>> The following problem was reported against a distribution kernel when
>>>> zone_reclaim was enabled but the same problem applies to the mainline
>>>> kernel. The reproduction case was as follows
>>>>
>>>> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>>     This allocates a large number of clean pages in node 0
>>> I confuse why this need allocate a large number of clean pages?
>> It reads from file and puts pages into the page cache. The pages are not
>> modified so they are clean. Output file is /dev/null so no pages are
>> written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
>> by default so pages from the file stay in the page cache
>
> I try this in v3.9-rc5:
> dd if=/dev/sda of=/dev/null bs=1MB
> 14813+0 records in
> 14812+0 records out
> 14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
>
> free -m -s 1
>
>                    total       used       free     shared buffers     
> cached
> Mem:          7912       1181       6731          0 663        239
> -/+ buffers/cache:        277       7634
> Swap:         8011          0       8011
>
> It seems that almost 15GB copied before I stop dd, but the used pages 
> which I monitor during dd always around 1200MB. Weird, why?
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-04-05  6:31       ` Simon Jeons
@ 2013-04-09 10:05         ` Simon Jeons
  -1 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-04-09 10:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Hi Michal,
On 04/05/2013 02:31 PM, Simon Jeons wrote:
> Hi Michal,
> On 03/21/2013 04:19 PM, Michal Hocko wrote:
>> On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>>> Hi Mel,
>>> On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>>> The following problem was reported against a distribution kernel when
>>>> zone_reclaim was enabled but the same problem applies to the mainline
>>>> kernel. The reproduction case was as follows
>>>>
>>>> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>>     This allocates a large number of clean pages in node 0
>>> I confuse why this need allocate a large number of clean pages?
>> It reads from file and puts pages into the page cache. The pages are not
>> modified so they are clean. Output file is /dev/null so no pages are
>> written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
>> by default so pages from the file stay in the page cache
>
> I try this in v3.9-rc5:
> dd if=/dev/sda of=/dev/null bs=1MB
> 14813+0 records in
> 14812+0 records out
> 14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
>
> free -m -s 1
>
>                    total       used       free     shared buffers     
> cached
> Mem:          7912       1181       6731          0 663        239
> -/+ buffers/cache:        277       7634
> Swap:         8011          0       8011
>
> It seems that almost 15GB copied before I stop dd, but the used pages 
> which I monitor during dd always around 1200MB. Weird, why?
>

Sorry for waste your time, but the test result is weird, is it?


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-04-09 10:05         ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-04-09 10:05 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Hi Michal,
On 04/05/2013 02:31 PM, Simon Jeons wrote:
> Hi Michal,
> On 03/21/2013 04:19 PM, Michal Hocko wrote:
>> On Thu 21-03-13 10:33:07, Simon Jeons wrote:
>>> Hi Mel,
>>> On 03/21/2013 02:19 AM, Mel Gorman wrote:
>>>> The following problem was reported against a distribution kernel when
>>>> zone_reclaim was enabled but the same problem applies to the mainline
>>>> kernel. The reproduction case was as follows
>>>>
>>>> 1. Run numactl -m +0 dd if=largefile of=/dev/null
>>>>     This allocates a large number of clean pages in node 0
>>> I confuse why this need allocate a large number of clean pages?
>> It reads from file and puts pages into the page cache. The pages are not
>> modified so they are clean. Output file is /dev/null so no pages are
>> written. dd doesn't call fadvise(POSIX_FADV_DONTNEED) on the input file
>> by default so pages from the file stay in the page cache
>
> I try this in v3.9-rc5:
> dd if=/dev/sda of=/dev/null bs=1MB
> 14813+0 records in
> 14812+0 records out
> 14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
>
> free -m -s 1
>
>                    total       used       free     shared buffers     
> cached
> Mem:          7912       1181       6731          0 663        239
> -/+ buffers/cache:        277       7634
> Swap:         8011          0       8011
>
> It seems that almost 15GB copied before I stop dd, but the used pages 
> which I monitor during dd always around 1200MB. Weird, why?
>

Sorry for waste your time, but the test result is weird, is it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-04-09 10:05         ` Simon Jeons
@ 2013-04-09 10:14           ` Michal Hocko
  -1 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-04-09 10:14 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Tue 09-04-13 18:05:30, Simon Jeons wrote:
[...]
> >I try this in v3.9-rc5:
> >dd if=/dev/sda of=/dev/null bs=1MB
> >14813+0 records in
> >14812+0 records out
> >14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
> >
> >free -m -s 1
> >
> >                   total       used       free     shared buffers
> >cached
> >Mem:          7912       1181       6731          0 663        239
> >-/+ buffers/cache:        277       7634
> >Swap:         8011          0       8011
> >
> >It seems that almost 15GB copied before I stop dd, but the used
> >pages which I monitor during dd always around 1200MB. Weird, why?
> >
> 
> Sorry for waste your time, but the test result is weird, is it?

I am not sure which values you have been watching but you have to
realize that you are reading a _partition_ not a file and those pages
go into buffers rather than the page chache.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-04-09 10:14           ` Michal Hocko
  0 siblings, 0 replies; 28+ messages in thread
From: Michal Hocko @ 2013-04-09 10:14 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

On Tue 09-04-13 18:05:30, Simon Jeons wrote:
[...]
> >I try this in v3.9-rc5:
> >dd if=/dev/sda of=/dev/null bs=1MB
> >14813+0 records in
> >14812+0 records out
> >14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
> >
> >free -m -s 1
> >
> >                   total       used       free     shared buffers
> >cached
> >Mem:          7912       1181       6731          0 663        239
> >-/+ buffers/cache:        277       7634
> >Swap:         8011          0       8011
> >
> >It seems that almost 15GB copied before I stop dd, but the used
> >pages which I monitor during dd always around 1200MB. Weird, why?
> >
> 
> Sorry for waste your time, but the test result is weird, is it?

I am not sure which values you have been watching but you have to
realize that you are reading a _partition_ not a file and those pages
go into buffers rather than the page chache.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-04-09 10:14           ` Michal Hocko
@ 2013-04-09 10:20             ` Simon Jeons
  -1 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-04-09 10:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Hi Michal,
On 04/09/2013 06:14 PM, Michal Hocko wrote:
> On Tue 09-04-13 18:05:30, Simon Jeons wrote:
> [...]
>>> I try this in v3.9-rc5:
>>> dd if=/dev/sda of=/dev/null bs=1MB
>>> 14813+0 records in
>>> 14812+0 records out
>>> 14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
>>>
>>> free -m -s 1
>>>
>>>                    total       used       free     shared buffers
>>> cached
>>> Mem:          7912       1181       6731          0 663        239
>>> -/+ buffers/cache:        277       7634
>>> Swap:         8011          0       8011
>>>
>>> It seems that almost 15GB copied before I stop dd, but the used
>>> pages which I monitor during dd always around 1200MB. Weird, why?
>>>
>> Sorry for waste your time, but the test result is weird, is it?
> I am not sure which values you have been watching but you have to
> realize that you are reading a _partition_ not a file and those pages
> go into buffers rather than the page chache.

buffer cache are contained in page cache, is it? Which value I should watch?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-04-09 10:20             ` Simon Jeons
  0 siblings, 0 replies; 28+ messages in thread
From: Simon Jeons @ 2013-04-09 10:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm, linux-kernel

Hi Michal,
On 04/09/2013 06:14 PM, Michal Hocko wrote:
> On Tue 09-04-13 18:05:30, Simon Jeons wrote:
> [...]
>>> I try this in v3.9-rc5:
>>> dd if=/dev/sda of=/dev/null bs=1MB
>>> 14813+0 records in
>>> 14812+0 records out
>>> 14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
>>>
>>> free -m -s 1
>>>
>>>                    total       used       free     shared buffers
>>> cached
>>> Mem:          7912       1181       6731          0 663        239
>>> -/+ buffers/cache:        277       7634
>>> Swap:         8011          0       8011
>>>
>>> It seems that almost 15GB copied before I stop dd, but the used
>>> pages which I monitor during dd always around 1200MB. Weird, why?
>>>
>> Sorry for waste your time, but the test result is weird, is it?
> I am not sure which values you have been watching but you have to
> realize that you are reading a _partition_ not a file and those pages
> go into buffers rather than the page chache.

buffer cache are contained in page cache, is it? Which value I should watch?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
  2013-04-09 10:14           ` Michal Hocko
@ 2013-04-10  5:15             ` Ric Mason
  -1 siblings, 0 replies; 28+ messages in thread
From: Ric Mason @ 2013-04-10  5:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Simon Jeons, Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm,
	linux-kernel

Hi Michal,
On 04/09/2013 06:14 PM, Michal Hocko wrote:
> On Tue 09-04-13 18:05:30, Simon Jeons wrote:
> [...]
>>> I try this in v3.9-rc5:
>>> dd if=/dev/sda of=/dev/null bs=1MB
>>> 14813+0 records in
>>> 14812+0 records out
>>> 14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
>>>
>>> free -m -s 1
>>>
>>>                    total       used       free     shared buffers
>>> cached
>>> Mem:          7912       1181       6731          0 663        239
>>> -/+ buffers/cache:        277       7634
>>> Swap:         8011          0       8011
>>>
>>> It seems that almost 15GB copied before I stop dd, but the used
>>> pages which I monitor during dd always around 1200MB. Weird, why?
>>>
>> Sorry for waste your time, but the test result is weird, is it?
> I am not sure which values you have been watching but you have to
> realize that you are reading a _partition_ not a file and those pages
> go into buffers rather than the page chache.

Interesting. ;-)

What's the difference between buffers and page cache? Why buffers don't 
grow?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim()
@ 2013-04-10  5:15             ` Ric Mason
  0 siblings, 0 replies; 28+ messages in thread
From: Ric Mason @ 2013-04-10  5:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Simon Jeons, Mel Gorman, Andrew Morton, Hedi Berriche, linux-mm,
	linux-kernel

Hi Michal,
On 04/09/2013 06:14 PM, Michal Hocko wrote:
> On Tue 09-04-13 18:05:30, Simon Jeons wrote:
> [...]
>>> I try this in v3.9-rc5:
>>> dd if=/dev/sda of=/dev/null bs=1MB
>>> 14813+0 records in
>>> 14812+0 records out
>>> 14812000000 bytes (15 GB) copied, 105.988 s, 140 MB/s
>>>
>>> free -m -s 1
>>>
>>>                    total       used       free     shared buffers
>>> cached
>>> Mem:          7912       1181       6731          0 663        239
>>> -/+ buffers/cache:        277       7634
>>> Swap:         8011          0       8011
>>>
>>> It seems that almost 15GB copied before I stop dd, but the used
>>> pages which I monitor during dd always around 1200MB. Weird, why?
>>>
>> Sorry for waste your time, but the test result is weird, is it?
> I am not sure which values you have been watching but you have to
> realize that you are reading a _partition_ not a file and those pages
> go into buffers rather than the page chache.

Interesting. ;-)

What's the difference between buffers and page cache? Why buffers don't 
grow?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-04-10  5:15 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-20 18:19 [PATCH] mm: page_alloc: Avoid marking zones full prematurely after zone_reclaim() Mel Gorman
2013-03-20 18:19 ` Mel Gorman
2013-03-20 18:45 ` Michal Hocko
2013-03-20 18:45   ` Michal Hocko
2013-03-21  2:32 ` Wanpeng Li
2013-03-21  2:32 ` Wanpeng Li
2013-03-21  2:33 ` Simon Jeons
2013-03-21  2:33   ` Simon Jeons
2013-03-21  8:19   ` Michal Hocko
2013-03-21  8:19     ` Michal Hocko
2013-03-21  8:32     ` Simon Jeons
2013-03-21  8:32       ` Simon Jeons
2013-03-21  8:44       ` Michal Hocko
2013-03-21  8:44         ` Michal Hocko
2013-03-21  8:59       ` Wanpeng Li
2013-03-21  8:59       ` Wanpeng Li
2013-04-05  6:31     ` Simon Jeons
2013-04-05  6:31       ` Simon Jeons
2013-04-07  6:37       ` Simon Jeons
2013-04-07  6:37         ` Simon Jeons
2013-04-09 10:05       ` Simon Jeons
2013-04-09 10:05         ` Simon Jeons
2013-04-09 10:14         ` Michal Hocko
2013-04-09 10:14           ` Michal Hocko
2013-04-09 10:20           ` Simon Jeons
2013-04-09 10:20             ` Simon Jeons
2013-04-10  5:15           ` Ric Mason
2013-04-10  5:15             ` Ric Mason

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.