All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 10:23 ` KOSAKI Motohiro
  0 siblings, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2009-06-04 10:23 UTC (permalink / raw)
  To: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton
  Cc: kosaki.motohiro


Current linux policy is, zone_reclaim_mode is enabled by default if the machine
has large remote node distance. it's because we could assume that large distance
mean large server until recently.

Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
memory controller. IOW it's seen as NUMA from software view.
Some Core i7 machine has large remote node distance.

Yanmin reported zone_reclaim_mode=1 cause large apache regression.

    One Nehalem machine has 12GB memory,
    but there is always 2GB free although applications accesses lots of files.
    Eventually we located the root cause as zone_reclaim_mode=1.

Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
disk access", it makes performance improvement to HPC workload.
but it makes performance degression to desktop, file server and web server.

In general, workload depended configration shouldn't put into default settings.

However, current code is long standing about two year. Highest POWER and IA64 HPC machine
(only) use this setting.

Thus, x86 and almost rest architecture change default setting, but Only power and ia64
remain current configuration for backward-compatibility.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <holt@sgi.com>
Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@ozlabs.org
---
 arch/powerpc/include/asm/topology.h |    6 ++++++
 include/linux/topology.h            |    7 +------
 2 files changed, 7 insertions(+), 6 deletions(-)

Index: b/include/linux/topology.h
===================================================================
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
 #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
 #endif
 #ifndef RECLAIM_DISTANCE
-/*
- * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
- * (in whatever arch specific measurement units returned by node_distance())
- * then switch on zone reclaim on boot.
- */
-#define RECLAIM_DISTANCE 20
+#define RECLAIM_DISTANCE INT_MAX
 #endif
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
Index: b/arch/powerpc/include/asm/topology.h
===================================================================
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -10,6 +10,12 @@ struct device_node;
 
 #include <asm/mmzone.h>
 
+/*
+ * Distance above which we begin to use zone reclaim
+ */
+#define RECLAIM_DISTANCE 20
+
+
 static inline int cpu_to_node(int cpu)
 {
 	return numa_cpu_lookup_table[cpu];



^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 10:23 ` KOSAKI Motohiro
  0 siblings, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2009-06-04 10:23 UTC (permalink / raw)
  To: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton
  Cc: kosaki.motohiro


Current linux policy is, zone_reclaim_mode is enabled by default if the machine
has large remote node distance. it's because we could assume that large distance
mean large server until recently.

Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
memory controller. IOW it's seen as NUMA from software view.
Some Core i7 machine has large remote node distance.

Yanmin reported zone_reclaim_mode=1 cause large apache regression.

    One Nehalem machine has 12GB memory,
    but there is always 2GB free although applications accesses lots of files.
    Eventually we located the root cause as zone_reclaim_mode=1.

Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
disk access", it makes performance improvement to HPC workload.
but it makes performance degression to desktop, file server and web server.

In general, workload depended configration shouldn't put into default settings.

However, current code is long standing about two year. Highest POWER and IA64 HPC machine
(only) use this setting.

Thus, x86 and almost rest architecture change default setting, but Only power and ia64
remain current configuration for backward-compatibility.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <holt@sgi.com>
Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@ozlabs.org
---
 arch/powerpc/include/asm/topology.h |    6 ++++++
 include/linux/topology.h            |    7 +------
 2 files changed, 7 insertions(+), 6 deletions(-)

Index: b/include/linux/topology.h
===================================================================
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
 #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
 #endif
 #ifndef RECLAIM_DISTANCE
-/*
- * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
- * (in whatever arch specific measurement units returned by node_distance())
- * then switch on zone reclaim on boot.
- */
-#define RECLAIM_DISTANCE 20
+#define RECLAIM_DISTANCE INT_MAX
 #endif
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
Index: b/arch/powerpc/include/asm/topology.h
===================================================================
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -10,6 +10,12 @@ struct device_node;
 
 #include <asm/mmzone.h>
 
+/*
+ * Distance above which we begin to use zone reclaim
+ */
+#define RECLAIM_DISTANCE 20
+
+
 static inline int cpu_to_node(int cpu)
 {
 	return numa_cpu_lookup_table[cpu];


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 10:23 ` KOSAKI Motohiro
  0 siblings, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2009-06-04 10:23 UTC (permalink / raw)
  To: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton
  Cc: kosaki.motohiro


Current linux policy is, zone_reclaim_mode is enabled by default if the machine
has large remote node distance. it's because we could assume that large distance
mean large server until recently.

Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
memory controller. IOW it's seen as NUMA from software view.
Some Core i7 machine has large remote node distance.

Yanmin reported zone_reclaim_mode=1 cause large apache regression.

    One Nehalem machine has 12GB memory,
    but there is always 2GB free although applications accesses lots of files.
    Eventually we located the root cause as zone_reclaim_mode=1.

Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
disk access", it makes performance improvement to HPC workload.
but it makes performance degression to desktop, file server and web server.

In general, workload depended configration shouldn't put into default settings.

However, current code is long standing about two year. Highest POWER and IA64 HPC machine
(only) use this setting.

Thus, x86 and almost rest architecture change default setting, but Only power and ia64
remain current configuration for backward-compatibility.


Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Robin Holt <holt@sgi.com>
Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@ozlabs.org
---
 arch/powerpc/include/asm/topology.h |    6 ++++++
 include/linux/topology.h            |    7 +------
 2 files changed, 7 insertions(+), 6 deletions(-)

Index: b/include/linux/topology.h
=================================--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
 #define node_distance(from,to)	((from) = (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
 #endif
 #ifndef RECLAIM_DISTANCE
-/*
- * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
- * (in whatever arch specific measurement units returned by node_distance())
- * then switch on zone reclaim on boot.
- */
-#define RECLAIM_DISTANCE 20
+#define RECLAIM_DISTANCE INT_MAX
 #endif
 #ifndef PENALTY_FOR_NODE_WITH_CPUS
 #define PENALTY_FOR_NODE_WITH_CPUS	(1)
Index: b/arch/powerpc/include/asm/topology.h
=================================--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -10,6 +10,12 @@ struct device_node;
 
 #include <asm/mmzone.h>
 
+/*
+ * Distance above which we begin to use zone reclaim
+ */
+#define RECLAIM_DISTANCE 20
+
+
 static inline int cpu_to_node(int cpu)
 {
 	return numa_cpu_lookup_table[cpu];



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-04 10:23 ` KOSAKI Motohiro
  (?)
  (?)
@ 2009-06-04 10:59   ` Wu Fengguang
  -1 siblings, 0 replies; 39+ messages in thread
From: Wu Fengguang @ 2009-06-04 10:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	linux-ia64, linuxppc-dev, LKML, linux-mm, Andrew Morton

On Thu, Jun 04, 2009 at 06:23:15PM +0800, KOSAKI Motohiro wrote:
> 
> Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> has large remote node distance. it's because we could assume that large distance
> mean large server until recently.
> 
> Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> memory controller. IOW it's seen as NUMA from software view.
> Some Core i7 machine has large remote node distance.
> 
> Yanmin reported zone_reclaim_mode=1 cause large apache regression.
> 
>     One Nehalem machine has 12GB memory,
>     but there is always 2GB free although applications accesses lots of files.
>     Eventually we located the root cause as zone_reclaim_mode=1.
> 
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.
> 
> In general, workload depended configration shouldn't put into default settings.
> 
> However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> (only) use this setting.
> 
> Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> remain current configuration for backward-compatibility.

The above lines are too long. Limit to 72 cols in general could be
better as git-log may add additional leading white spaces.

Thank you for all the efforts!

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h |    6 ++++++
>  include/linux/topology.h            |    7 +------
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Index: b/include/linux/topology.h
> ===================================================================
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX
>  #endif
>  #ifndef PENALTY_FOR_NODE_WITH_CPUS
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
> Index: b/arch/powerpc/include/asm/topology.h
> ===================================================================
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -10,6 +10,12 @@ struct device_node;
>  
>  #include <asm/mmzone.h>
>  
> +/*
> + * Distance above which we begin to use zone reclaim

s/begin to/default to/ ?

> + */
> +#define RECLAIM_DISTANCE 20
> +
> +
>  static inline int cpu_to_node(int cpu)
>  {
>  	return numa_cpu_lookup_table[cpu];
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 10:59   ` Wu Fengguang
  0 siblings, 0 replies; 39+ messages in thread
From: Wu Fengguang @ 2009-06-04 10:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	linux-ia64, linuxppc-dev, LKML, linux-mm, Andrew Morton

On Thu, Jun 04, 2009 at 06:23:15PM +0800, KOSAKI Motohiro wrote:
> 
> Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> has large remote node distance. it's because we could assume that large distance
> mean large server until recently.
> 
> Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> memory controller. IOW it's seen as NUMA from software view.
> Some Core i7 machine has large remote node distance.
> 
> Yanmin reported zone_reclaim_mode=1 cause large apache regression.
> 
>     One Nehalem machine has 12GB memory,
>     but there is always 2GB free although applications accesses lots of files.
>     Eventually we located the root cause as zone_reclaim_mode=1.
> 
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.
> 
> In general, workload depended configration shouldn't put into default settings.
> 
> However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> (only) use this setting.
> 
> Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> remain current configuration for backward-compatibility.

The above lines are too long. Limit to 72 cols in general could be
better as git-log may add additional leading white spaces.

Thank you for all the efforts!

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h |    6 ++++++
>  include/linux/topology.h            |    7 +------
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Index: b/include/linux/topology.h
> ===================================================================
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX
>  #endif
>  #ifndef PENALTY_FOR_NODE_WITH_CPUS
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
> Index: b/arch/powerpc/include/asm/topology.h
> ===================================================================
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -10,6 +10,12 @@ struct device_node;
>  
>  #include <asm/mmzone.h>
>  
> +/*
> + * Distance above which we begin to use zone reclaim

s/begin to/default to/ ?

> + */
> +#define RECLAIM_DISTANCE 20
> +
> +
>  static inline int cpu_to_node(int cpu)
>  {
>  	return numa_cpu_lookup_table[cpu];
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 10:59   ` Wu Fengguang
  0 siblings, 0 replies; 39+ messages in thread
From: Wu Fengguang @ 2009-06-04 10:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Rik van Riel, linux-ia64, linux-mm, Zhang, Yanmin, LKML,
	linuxppc-dev, Robin Holt, Christoph Lameter, Andrew Morton

On Thu, Jun 04, 2009 at 06:23:15PM +0800, KOSAKI Motohiro wrote:
> 
> Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> has large remote node distance. it's because we could assume that large distance
> mean large server until recently.
> 
> Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> memory controller. IOW it's seen as NUMA from software view.
> Some Core i7 machine has large remote node distance.
> 
> Yanmin reported zone_reclaim_mode=1 cause large apache regression.
> 
>     One Nehalem machine has 12GB memory,
>     but there is always 2GB free although applications accesses lots of files.
>     Eventually we located the root cause as zone_reclaim_mode=1.
> 
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.
> 
> In general, workload depended configration shouldn't put into default settings.
> 
> However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> (only) use this setting.
> 
> Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> remain current configuration for backward-compatibility.

The above lines are too long. Limit to 72 cols in general could be
better as git-log may add additional leading white spaces.

Thank you for all the efforts!

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h |    6 ++++++
>  include/linux/topology.h            |    7 +------
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Index: b/include/linux/topology.h
> ===================================================================
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX
>  #endif
>  #ifndef PENALTY_FOR_NODE_WITH_CPUS
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
> Index: b/arch/powerpc/include/asm/topology.h
> ===================================================================
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -10,6 +10,12 @@ struct device_node;
>  
>  #include <asm/mmzone.h>
>  
> +/*
> + * Distance above which we begin to use zone reclaim

s/begin to/default to/ ?

> + */
> +#define RECLAIM_DISTANCE 20
> +
> +
>  static inline int cpu_to_node(int cpu)
>  {
>  	return numa_cpu_lookup_table[cpu];
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 10:59   ` Wu Fengguang
  0 siblings, 0 replies; 39+ messages in thread
From: Wu Fengguang @ 2009-06-04 10:59 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	linux-ia64, linuxppc-dev, LKML, linux-mm, Andrew Morton

On Thu, Jun 04, 2009 at 06:23:15PM +0800, KOSAKI Motohiro wrote:
> 
> Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> has large remote node distance. it's because we could assume that large distance
> mean large server until recently.
> 
> Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> memory controller. IOW it's seen as NUMA from software view.
> Some Core i7 machine has large remote node distance.
> 
> Yanmin reported zone_reclaim_mode=1 cause large apache regression.
> 
>     One Nehalem machine has 12GB memory,
>     but there is always 2GB free although applications accesses lots of files.
>     Eventually we located the root cause as zone_reclaim_mode=1.
> 
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.
> 
> In general, workload depended configration shouldn't put into default settings.
> 
> However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> (only) use this setting.
> 
> Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> remain current configuration for backward-compatibility.

The above lines are too long. Limit to 72 cols in general could be
better as git-log may add additional leading white spaces.

Thank you for all the efforts!

Acked-by: Wu Fengguang <fengguang.wu@intel.com>

> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h |    6 ++++++
>  include/linux/topology.h            |    7 +------
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Index: b/include/linux/topology.h
> =================================> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) = (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX
>  #endif
>  #ifndef PENALTY_FOR_NODE_WITH_CPUS
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
> Index: b/arch/powerpc/include/asm/topology.h
> =================================> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -10,6 +10,12 @@ struct device_node;
>  
>  #include <asm/mmzone.h>
>  
> +/*
> + * Distance above which we begin to use zone reclaim

s/begin to/default to/ ?

> + */
> +#define RECLAIM_DISTANCE 20
> +
> +
>  static inline int cpu_to_node(int cpu)
>  {
>  	return numa_cpu_lookup_table[cpu];
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-04 10:23 ` KOSAKI Motohiro
  (?)
  (?)
@ 2009-06-04 12:24   ` Robin Holt
  -1 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-04 12:24 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

Acked-by: Robin Holt <holt@sgi.com>


On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
...
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.

I still disagree with this statement, but I don't care that much.
Why not something more to the effect of:

Setting zone_reclaim_mode=1 causes memory allocations on a nearly
exhausted node to do direct reclaim within that node before attempting
off-node allocations.  For work loads where most pages are clean in
page cache and easily reclaimed, this can result excessive disk activity
versus a more fair node memory balance.

If you disagree, don't respond, just ignore.

...
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX

Why remove this comment?  It seems more-or-less a reasonable statement.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 12:24   ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-04 12:24 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

Acked-by: Robin Holt <holt@sgi.com>


On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
...
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.

I still disagree with this statement, but I don't care that much.
Why not something more to the effect of:

Setting zone_reclaim_mode=1 causes memory allocations on a nearly
exhausted node to do direct reclaim within that node before attempting
off-node allocations.  For work loads where most pages are clean in
page cache and easily reclaimed, this can result excessive disk activity
versus a more fair node memory balance.

If you disagree, don't respond, just ignore.

...
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX

Why remove this comment?  It seems more-or-less a reasonable statement.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 12:24   ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-04 12:24 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Rik van Riel, linux-ia64, linux-mm, Zhang, Yanmin, LKML,
	linuxppc-dev, Robin Holt, Christoph Lameter, Andrew Morton,
	Wu Fengguang

Acked-by: Robin Holt <holt@sgi.com>


On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
...
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.

I still disagree with this statement, but I don't care that much.
Why not something more to the effect of:

Setting zone_reclaim_mode=1 causes memory allocations on a nearly
exhausted node to do direct reclaim within that node before attempting
off-node allocations.  For work loads where most pages are clean in
page cache and easily reclaimed, this can result excessive disk activity
versus a more fair node memory balance.

If you disagree, don't respond, just ignore.

...
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX

Why remove this comment?  It seems more-or-less a reasonable statement.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-04 12:24   ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-04 12:24 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

Acked-by: Robin Holt <holt@sgi.com>


On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
...
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.

I still disagree with this statement, but I don't care that much.
Why not something more to the effect of:

Setting zone_reclaim_mode=1 causes memory allocations on a nearly
exhausted node to do direct reclaim within that node before attempting
off-node allocations.  For work loads where most pages are clean in
page cache and easily reclaimed, this can result excessive disk activity
versus a more fair node memory balance.

If you disagree, don't respond, just ignore.

...
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) = (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX

Why remove this comment?  It seems more-or-less a reasonable statement.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-04 10:23 ` KOSAKI Motohiro
  (?)
  (?)
@ 2009-06-08 11:50   ` Mel Gorman
  -1 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-08 11:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> 
> Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> has large remote node distance. it's because we could assume that large distance
> mean large server until recently.
> 

We don't make assumptions about the server being large, small or otherwise. The
affinity tables reporting a distance of 20 or more is saying "remote memory
has twice the latency of local memory". This is true irrespective of workload
and implies that going off-node has a real penalty regardless of workload.

> Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> memory controller. IOW it's seen as NUMA from software view.
> Some Core i7 machine has large remote node distance.
> 

If they have large remote node distance, they have large remote node
distance. Now, if they are *lying* and remote memory is not really that
expensive, then prehaps we should be thinking of a per-arch-per-chip
modifier to the distances reported by ACPI.

> Yanmin reported zone_reclaim_mode=1 cause large apache regression.
> 
>     One Nehalem machine has 12GB memory,
>     but there is always 2GB free although applications accesses lots of files.
>     Eventually we located the root cause as zone_reclaim_mode=1.
> 
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.
> 

How are you determining a performance regression to desktop? On a
desktop, I would expect processes to be spread on the different CPUs for
each of the nodes. In that case, memory faulted on each CPU should be
faulted locally.

If there are local processes that access a lot of files, then it might end
up reclaiming those to keep memory local and this might be undesirable
but this is explicitly documented;

"It may be beneficial to switch off zone reclaim if the system is used for a
file server and all of memory should be used for caching files from disk. In
that case the caching effect is more important than data locality."

Ideally we could detect if the machine was a file-server or not but no
such luck.

> In general, workload depended configration shouldn't put into default settings.
> 
> However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> (only) use this setting.
> 
> Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> remain current configuration for backward-compatibility.
> 

What about if it's x86-64-based NUMA but it's not i7 based. There, the
NUMA distances might really mean something and that zone_reclaim behaviour
is desirable.

I think if we're going down the road of setting the default, it shouldn't be
per-architecture defaults as such. Other choices for addressing this might be;

1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
   (or some other sensible figure) on i7

2. There should be a per-arch modifier callback for the affinity
   distances. If the x86 code detects the CPU is an i7, it can reduce the
   reported latencies to be more in line with expected reality.

3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
   overall is free. The difficulty is figuring out if the allocation is for
   file pages.

4. Change zone_reclaim_mode default to mean "do your best to figure it
   out". Patch 1 would default large distances to 1 to see what happens.
   Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1

	If we have locally reclaimed 2% of the nodes memory in file pages
	within the last 5 seconds when >= 20% of total physical memory was
	free, then set the reclaim_mode to 0 on the assumption the node is
	mostly caching pages and shouldn't be reclaimed to avoid excessive IO

Option 1 would appear to be the most straight-forward but option 2
should be doable. Option 3 and 4 could turn into a rats nest and I would
consider those approaches a bit more drastic.

> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h |    6 ++++++
>  include/linux/topology.h            |    7 +------
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Index: b/include/linux/topology.h
> ===================================================================
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX
>  #endif
>  #ifndef PENALTY_FOR_NODE_WITH_CPUS
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
> Index: b/arch/powerpc/include/asm/topology.h
> ===================================================================
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -10,6 +10,12 @@ struct device_node;
>  
>  #include <asm/mmzone.h>
>  
> +/*
> + * Distance above which we begin to use zone reclaim
> + */
> +#define RECLAIM_DISTANCE 20
> +
> +

Where is the ia-64-specific modifier to RECAIM_DISTANCE?

>  static inline int cpu_to_node(int cpu)
>  {
>  	return numa_cpu_lookup_table[cpu];
> 
> 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-08 11:50   ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-08 11:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> 
> Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> has large remote node distance. it's because we could assume that large distance
> mean large server until recently.
> 

We don't make assumptions about the server being large, small or otherwise. The
affinity tables reporting a distance of 20 or more is saying "remote memory
has twice the latency of local memory". This is true irrespective of workload
and implies that going off-node has a real penalty regardless of workload.

> Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> memory controller. IOW it's seen as NUMA from software view.
> Some Core i7 machine has large remote node distance.
> 

If they have large remote node distance, they have large remote node
distance. Now, if they are *lying* and remote memory is not really that
expensive, then prehaps we should be thinking of a per-arch-per-chip
modifier to the distances reported by ACPI.

> Yanmin reported zone_reclaim_mode=1 cause large apache regression.
> 
>     One Nehalem machine has 12GB memory,
>     but there is always 2GB free although applications accesses lots of files.
>     Eventually we located the root cause as zone_reclaim_mode=1.
> 
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.
> 

How are you determining a performance regression to desktop? On a
desktop, I would expect processes to be spread on the different CPUs for
each of the nodes. In that case, memory faulted on each CPU should be
faulted locally.

If there are local processes that access a lot of files, then it might end
up reclaiming those to keep memory local and this might be undesirable
but this is explicitly documented;

"It may be beneficial to switch off zone reclaim if the system is used for a
file server and all of memory should be used for caching files from disk. In
that case the caching effect is more important than data locality."

Ideally we could detect if the machine was a file-server or not but no
such luck.

> In general, workload depended configration shouldn't put into default settings.
> 
> However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> (only) use this setting.
> 
> Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> remain current configuration for backward-compatibility.
> 

What about if it's x86-64-based NUMA but it's not i7 based. There, the
NUMA distances might really mean something and that zone_reclaim behaviour
is desirable.

I think if we're going down the road of setting the default, it shouldn't be
per-architecture defaults as such. Other choices for addressing this might be;

1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
   (or some other sensible figure) on i7

2. There should be a per-arch modifier callback for the affinity
   distances. If the x86 code detects the CPU is an i7, it can reduce the
   reported latencies to be more in line with expected reality.

3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
   overall is free. The difficulty is figuring out if the allocation is for
   file pages.

4. Change zone_reclaim_mode default to mean "do your best to figure it
   out". Patch 1 would default large distances to 1 to see what happens.
   Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1

	If we have locally reclaimed 2% of the nodes memory in file pages
	within the last 5 seconds when >= 20% of total physical memory was
	free, then set the reclaim_mode to 0 on the assumption the node is
	mostly caching pages and shouldn't be reclaimed to avoid excessive IO

Option 1 would appear to be the most straight-forward but option 2
should be doable. Option 3 and 4 could turn into a rats nest and I would
consider those approaches a bit more drastic.

> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h |    6 ++++++
>  include/linux/topology.h            |    7 +------
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Index: b/include/linux/topology.h
> ===================================================================
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX
>  #endif
>  #ifndef PENALTY_FOR_NODE_WITH_CPUS
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
> Index: b/arch/powerpc/include/asm/topology.h
> ===================================================================
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -10,6 +10,12 @@ struct device_node;
>  
>  #include <asm/mmzone.h>
>  
> +/*
> + * Distance above which we begin to use zone reclaim
> + */
> +#define RECLAIM_DISTANCE 20
> +
> +

Where is the ia-64-specific modifier to RECAIM_DISTANCE?

>  static inline int cpu_to_node(int cpu)
>  {
>  	return numa_cpu_lookup_table[cpu];
> 
> 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-08 11:50   ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-08 11:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Rik van Riel, Christoph Lameter, linux-mm, Zhang, Yanmin, LKML,
	linuxppc-dev, Robin Holt, linux-ia64, Andrew Morton,
	Wu Fengguang

On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> 
> Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> has large remote node distance. it's because we could assume that large distance
> mean large server until recently.
> 

We don't make assumptions about the server being large, small or otherwise. The
affinity tables reporting a distance of 20 or more is saying "remote memory
has twice the latency of local memory". This is true irrespective of workload
and implies that going off-node has a real penalty regardless of workload.

> Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> memory controller. IOW it's seen as NUMA from software view.
> Some Core i7 machine has large remote node distance.
> 

If they have large remote node distance, they have large remote node
distance. Now, if they are *lying* and remote memory is not really that
expensive, then prehaps we should be thinking of a per-arch-per-chip
modifier to the distances reported by ACPI.

> Yanmin reported zone_reclaim_mode=1 cause large apache regression.
> 
>     One Nehalem machine has 12GB memory,
>     but there is always 2GB free although applications accesses lots of files.
>     Eventually we located the root cause as zone_reclaim_mode=1.
> 
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.
> 

How are you determining a performance regression to desktop? On a
desktop, I would expect processes to be spread on the different CPUs for
each of the nodes. In that case, memory faulted on each CPU should be
faulted locally.

If there are local processes that access a lot of files, then it might end
up reclaiming those to keep memory local and this might be undesirable
but this is explicitly documented;

"It may be beneficial to switch off zone reclaim if the system is used for a
file server and all of memory should be used for caching files from disk. In
that case the caching effect is more important than data locality."

Ideally we could detect if the machine was a file-server or not but no
such luck.

> In general, workload depended configration shouldn't put into default settings.
> 
> However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> (only) use this setting.
> 
> Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> remain current configuration for backward-compatibility.
> 

What about if it's x86-64-based NUMA but it's not i7 based. There, the
NUMA distances might really mean something and that zone_reclaim behaviour
is desirable.

I think if we're going down the road of setting the default, it shouldn't be
per-architecture defaults as such. Other choices for addressing this might be;

1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
   (or some other sensible figure) on i7

2. There should be a per-arch modifier callback for the affinity
   distances. If the x86 code detects the CPU is an i7, it can reduce the
   reported latencies to be more in line with expected reality.

3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
   overall is free. The difficulty is figuring out if the allocation is for
   file pages.

4. Change zone_reclaim_mode default to mean "do your best to figure it
   out". Patch 1 would default large distances to 1 to see what happens.
   Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1

	If we have locally reclaimed 2% of the nodes memory in file pages
	within the last 5 seconds when >= 20% of total physical memory was
	free, then set the reclaim_mode to 0 on the assumption the node is
	mostly caching pages and shouldn't be reclaimed to avoid excessive IO

Option 1 would appear to be the most straight-forward but option 2
should be doable. Option 3 and 4 could turn into a rats nest and I would
consider those approaches a bit more drastic.

> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h |    6 ++++++
>  include/linux/topology.h            |    7 +------
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Index: b/include/linux/topology.h
> ===================================================================
> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) == (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX
>  #endif
>  #ifndef PENALTY_FOR_NODE_WITH_CPUS
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
> Index: b/arch/powerpc/include/asm/topology.h
> ===================================================================
> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -10,6 +10,12 @@ struct device_node;
>  
>  #include <asm/mmzone.h>
>  
> +/*
> + * Distance above which we begin to use zone reclaim
> + */
> +#define RECLAIM_DISTANCE 20
> +
> +

Where is the ia-64-specific modifier to RECAIM_DISTANCE?

>  static inline int cpu_to_node(int cpu)
>  {
>  	return numa_cpu_lookup_table[cpu];
> 
> 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-08 11:50   ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-08 11:50 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> 
> Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> has large remote node distance. it's because we could assume that large distance
> mean large server until recently.
> 

We don't make assumptions about the server being large, small or otherwise. The
affinity tables reporting a distance of 20 or more is saying "remote memory
has twice the latency of local memory". This is true irrespective of workload
and implies that going off-node has a real penalty regardless of workload.

> Unfortunately, recent modern x86 CPU (e.g. Core i7, Opeteron) have P2P transport
> memory controller. IOW it's seen as NUMA from software view.
> Some Core i7 machine has large remote node distance.
> 

If they have large remote node distance, they have large remote node
distance. Now, if they are *lying* and remote memory is not really that
expensive, then prehaps we should be thinking of a per-arch-per-chip
modifier to the distances reported by ACPI.

> Yanmin reported zone_reclaim_mode=1 cause large apache regression.
> 
>     One Nehalem machine has 12GB memory,
>     but there is always 2GB free although applications accesses lots of files.
>     Eventually we located the root cause as zone_reclaim_mode=1.
> 
> Actually, zone_reclaim_mode=1 mean "I dislike remote node allocation rather than
> disk access", it makes performance improvement to HPC workload.
> but it makes performance degression to desktop, file server and web server.
> 

How are you determining a performance regression to desktop? On a
desktop, I would expect processes to be spread on the different CPUs for
each of the nodes. In that case, memory faulted on each CPU should be
faulted locally.

If there are local processes that access a lot of files, then it might end
up reclaiming those to keep memory local and this might be undesirable
but this is explicitly documented;

"It may be beneficial to switch off zone reclaim if the system is used for a
file server and all of memory should be used for caching files from disk. In
that case the caching effect is more important than data locality."

Ideally we could detect if the machine was a file-server or not but no
such luck.

> In general, workload depended configration shouldn't put into default settings.
> 
> However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> (only) use this setting.
> 
> Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> remain current configuration for backward-compatibility.
> 

What about if it's x86-64-based NUMA but it's not i7 based. There, the
NUMA distances might really mean something and that zone_reclaim behaviour
is desirable.

I think if we're going down the road of setting the default, it shouldn't be
per-architecture defaults as such. Other choices for addressing this might be;

1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
   (or some other sensible figure) on i7

2. There should be a per-arch modifier callback for the affinity
   distances. If the x86 code detects the CPU is an i7, it can reduce the
   reported latencies to be more in line with expected reality.

3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
   overall is free. The difficulty is figuring out if the allocation is for
   file pages.

4. Change zone_reclaim_mode default to mean "do your best to figure it
   out". Patch 1 would default large distances to 1 to see what happens.
   Then apply a heuristic when in figure-it-out mode and using reclaim_mode = 1

	If we have locally reclaimed 2% of the nodes memory in file pages
	within the last 5 seconds when >= 20% of total physical memory was
	free, then set the reclaim_mode to 0 on the assumption the node is
	mostly caching pages and shouldn't be reclaimed to avoid excessive IO

Option 1 would appear to be the most straight-forward but option 2
should be doable. Option 3 and 4 could turn into a rats nest and I would
consider those approaches a bit more drastic.

> 
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Christoph Lameter <cl@linux-foundation.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Robin Holt <holt@sgi.com>
> Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: linux-ia64@vger.kernel.org
> Cc: linuxppc-dev@ozlabs.org
> ---
>  arch/powerpc/include/asm/topology.h |    6 ++++++
>  include/linux/topology.h            |    7 +------
>  2 files changed, 7 insertions(+), 6 deletions(-)
> 
> Index: b/include/linux/topology.h
> =================================> --- a/include/linux/topology.h
> +++ b/include/linux/topology.h
> @@ -54,12 +54,7 @@ int arch_update_cpu_topology(void);
>  #define node_distance(from,to)	((from) = (to) ? LOCAL_DISTANCE : REMOTE_DISTANCE)
>  #endif
>  #ifndef RECLAIM_DISTANCE
> -/*
> - * If the distance between nodes in a system is larger than RECLAIM_DISTANCE
> - * (in whatever arch specific measurement units returned by node_distance())
> - * then switch on zone reclaim on boot.
> - */
> -#define RECLAIM_DISTANCE 20
> +#define RECLAIM_DISTANCE INT_MAX
>  #endif
>  #ifndef PENALTY_FOR_NODE_WITH_CPUS
>  #define PENALTY_FOR_NODE_WITH_CPUS	(1)
> Index: b/arch/powerpc/include/asm/topology.h
> =================================> --- a/arch/powerpc/include/asm/topology.h
> +++ b/arch/powerpc/include/asm/topology.h
> @@ -10,6 +10,12 @@ struct device_node;
>  
>  #include <asm/mmzone.h>
>  
> +/*
> + * Distance above which we begin to use zone reclaim
> + */
> +#define RECLAIM_DISTANCE 20
> +
> +

Where is the ia-64-specific modifier to RECAIM_DISTANCE?

>  static inline int cpu_to_node(int cpu)
>  {
>  	return numa_cpu_lookup_table[cpu];
> 
> 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-08 11:50   ` Mel Gorman
  (?)
  (?)
@ 2009-06-09  9:55     ` Robin Holt
  -1 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-09  9:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Christoph Lameter, Rik van Riel, Robin Holt,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:

Let me start by saying I agree completely with everything you wrote and
still disagree with this patch, but was willing to compromise and work
around this for our upcoming x86_64 machine by putting a "value add"
into our packaging of adding a sysctl that turns reclaim back on.

...
> > Index: b/arch/powerpc/include/asm/topology.h
> > ===================================================================
> > --- a/arch/powerpc/include/asm/topology.h
> > +++ b/arch/powerpc/include/asm/topology.h
> > @@ -10,6 +10,12 @@ struct device_node;
> >  
> >  #include <asm/mmzone.h>
> >  
> > +/*
> > + * Distance above which we begin to use zone reclaim
> > + */
> > +#define RECLAIM_DISTANCE 20
> > +
> > +
> 
> Where is the ia-64-specific modifier to RECAIM_DISTANCE?

It was already defined as 15 in arch/ia64/include/asm/topology.h

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09  9:55     ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-09  9:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Christoph Lameter, Rik van Riel, Robin Holt,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:

Let me start by saying I agree completely with everything you wrote and
still disagree with this patch, but was willing to compromise and work
around this for our upcoming x86_64 machine by putting a "value add"
into our packaging of adding a sysctl that turns reclaim back on.

...
> > Index: b/arch/powerpc/include/asm/topology.h
> > ===================================================================
> > --- a/arch/powerpc/include/asm/topology.h
> > +++ b/arch/powerpc/include/asm/topology.h
> > @@ -10,6 +10,12 @@ struct device_node;
> >  
> >  #include <asm/mmzone.h>
> >  
> > +/*
> > + * Distance above which we begin to use zone reclaim
> > + */
> > +#define RECLAIM_DISTANCE 20
> > +
> > +
> 
> Where is the ia-64-specific modifier to RECAIM_DISTANCE?

It was already defined as 15 in arch/ia64/include/asm/topology.h

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09  9:55     ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-09  9:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Christoph Lameter, linux-mm, Zhang, Yanmin, LKML,
	linuxppc-dev, Robin Holt, KOSAKI Motohiro, linux-ia64,
	Andrew Morton, Wu Fengguang

On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:

Let me start by saying I agree completely with everything you wrote and
still disagree with this patch, but was willing to compromise and work
around this for our upcoming x86_64 machine by putting a "value add"
into our packaging of adding a sysctl that turns reclaim back on.

...
> > Index: b/arch/powerpc/include/asm/topology.h
> > ===================================================================
> > --- a/arch/powerpc/include/asm/topology.h
> > +++ b/arch/powerpc/include/asm/topology.h
> > @@ -10,6 +10,12 @@ struct device_node;
> >  
> >  #include <asm/mmzone.h>
> >  
> > +/*
> > + * Distance above which we begin to use zone reclaim
> > + */
> > +#define RECLAIM_DISTANCE 20
> > +
> > +
> 
> Where is the ia-64-specific modifier to RECAIM_DISTANCE?

It was already defined as 15 in arch/ia64/include/asm/topology.h

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09  9:55     ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-09  9:55 UTC (permalink / raw)
  To: Mel Gorman
  Cc: KOSAKI Motohiro, Christoph Lameter, Rik van Riel, Robin Holt,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:

Let me start by saying I agree completely with everything you wrote and
still disagree with this patch, but was willing to compromise and work
around this for our upcoming x86_64 machine by putting a "value add"
into our packaging of adding a sysctl that turns reclaim back on.

...
> > Index: b/arch/powerpc/include/asm/topology.h
> > =================================> > --- a/arch/powerpc/include/asm/topology.h
> > +++ b/arch/powerpc/include/asm/topology.h
> > @@ -10,6 +10,12 @@ struct device_node;
> >  
> >  #include <asm/mmzone.h>
> >  
> > +/*
> > + * Distance above which we begin to use zone reclaim
> > + */
> > +#define RECLAIM_DISTANCE 20
> > +
> > +
> 
> Where is the ia-64-specific modifier to RECAIM_DISTANCE?

It was already defined as 15 in arch/ia64/include/asm/topology.h

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-09  9:55     ` Robin Holt
  (?)
  (?)
@ 2009-06-09 10:37       ` Mel Gorman
  -1 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-09 10:37 UTC (permalink / raw)
  To: Robin Holt
  Cc: KOSAKI Motohiro, Christoph Lameter, Rik van Riel, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> 
> Let me start by saying I agree completely with everything you wrote and
> still disagree with this patch, but was willing to compromise and work
> around this for our upcoming x86_64 machine by putting a "value add"
> into our packaging of adding a sysctl that turns reclaim back on.
> 

To be honest, I'm more leaning towards a NACK than an ACK on this one. I
don't support enough NUMA machines to feel strongly enough about it but
unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
might be there seems ill-advised to me and will have other consequences for
existing more traditional x86-64 NUMA machines.

> ...
> > > Index: b/arch/powerpc/include/asm/topology.h
> > > ===================================================================
> > > --- a/arch/powerpc/include/asm/topology.h
> > > +++ b/arch/powerpc/include/asm/topology.h
> > > @@ -10,6 +10,12 @@ struct device_node;
> > >  
> > >  #include <asm/mmzone.h>
> > >  
> > > +/*
> > > + * Distance above which we begin to use zone reclaim
> > > + */
> > > +#define RECLAIM_DISTANCE 20
> > > +
> > > +
> > 
> > Where is the ia-64-specific modifier to RECAIM_DISTANCE?
> 
> It was already defined as 15 in arch/ia64/include/asm/topology.h
> 

/me slaps self

thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 10:37       ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-09 10:37 UTC (permalink / raw)
  To: Robin Holt
  Cc: KOSAKI Motohiro, Christoph Lameter, Rik van Riel, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> 
> Let me start by saying I agree completely with everything you wrote and
> still disagree with this patch, but was willing to compromise and work
> around this for our upcoming x86_64 machine by putting a "value add"
> into our packaging of adding a sysctl that turns reclaim back on.
> 

To be honest, I'm more leaning towards a NACK than an ACK on this one. I
don't support enough NUMA machines to feel strongly enough about it but
unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
might be there seems ill-advised to me and will have other consequences for
existing more traditional x86-64 NUMA machines.

> ...
> > > Index: b/arch/powerpc/include/asm/topology.h
> > > ===================================================================
> > > --- a/arch/powerpc/include/asm/topology.h
> > > +++ b/arch/powerpc/include/asm/topology.h
> > > @@ -10,6 +10,12 @@ struct device_node;
> > >  
> > >  #include <asm/mmzone.h>
> > >  
> > > +/*
> > > + * Distance above which we begin to use zone reclaim
> > > + */
> > > +#define RECLAIM_DISTANCE 20
> > > +
> > > +
> > 
> > Where is the ia-64-specific modifier to RECAIM_DISTANCE?
> 
> It was already defined as 15 in arch/ia64/include/asm/topology.h
> 

/me slaps self

thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 10:37       ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-09 10:37 UTC (permalink / raw)
  To: Robin Holt
  Cc: Rik van Riel, Christoph Lameter, linux-mm, Zhang, Yanmin, LKML,
	linuxppc-dev, KOSAKI Motohiro, linux-ia64, Andrew Morton,
	Wu Fengguang

On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> 
> Let me start by saying I agree completely with everything you wrote and
> still disagree with this patch, but was willing to compromise and work
> around this for our upcoming x86_64 machine by putting a "value add"
> into our packaging of adding a sysctl that turns reclaim back on.
> 

To be honest, I'm more leaning towards a NACK than an ACK on this one. I
don't support enough NUMA machines to feel strongly enough about it but
unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
might be there seems ill-advised to me and will have other consequences for
existing more traditional x86-64 NUMA machines.

> ...
> > > Index: b/arch/powerpc/include/asm/topology.h
> > > ===================================================================
> > > --- a/arch/powerpc/include/asm/topology.h
> > > +++ b/arch/powerpc/include/asm/topology.h
> > > @@ -10,6 +10,12 @@ struct device_node;
> > >  
> > >  #include <asm/mmzone.h>
> > >  
> > > +/*
> > > + * Distance above which we begin to use zone reclaim
> > > + */
> > > +#define RECLAIM_DISTANCE 20
> > > +
> > > +
> > 
> > Where is the ia-64-specific modifier to RECAIM_DISTANCE?
> 
> It was already defined as 15 in arch/ia64/include/asm/topology.h
> 

/me slaps self

thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 10:37       ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-09 10:37 UTC (permalink / raw)
  To: Robin Holt
  Cc: KOSAKI Motohiro, Christoph Lameter, Rik van Riel, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> 
> Let me start by saying I agree completely with everything you wrote and
> still disagree with this patch, but was willing to compromise and work
> around this for our upcoming x86_64 machine by putting a "value add"
> into our packaging of adding a sysctl that turns reclaim back on.
> 

To be honest, I'm more leaning towards a NACK than an ACK on this one. I
don't support enough NUMA machines to feel strongly enough about it but
unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
might be there seems ill-advised to me and will have other consequences for
existing more traditional x86-64 NUMA machines.

> ...
> > > Index: b/arch/powerpc/include/asm/topology.h
> > > =================================> > > --- a/arch/powerpc/include/asm/topology.h
> > > +++ b/arch/powerpc/include/asm/topology.h
> > > @@ -10,6 +10,12 @@ struct device_node;
> > >  
> > >  #include <asm/mmzone.h>
> > >  
> > > +/*
> > > + * Distance above which we begin to use zone reclaim
> > > + */
> > > +#define RECLAIM_DISTANCE 20
> > > +
> > > +
> > 
> > Where is the ia-64-specific modifier to RECAIM_DISTANCE?
> 
> It was already defined as 15 in arch/ia64/include/asm/topology.h
> 

/me slaps self

thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-09 10:37       ` Mel Gorman
  (?)
  (?)
@ 2009-06-09 12:02         ` Robin Holt
  -1 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-09 12:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Robin Holt, KOSAKI Motohiro, Christoph Lameter, Rik van Riel,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
> On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> > On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> > 
> > Let me start by saying I agree completely with everything you wrote and
> > still disagree with this patch, but was willing to compromise and work
> > around this for our upcoming x86_64 machine by putting a "value add"
> > into our packaging of adding a sysctl that turns reclaim back on.
> > 
> 
> To be honest, I'm more leaning towards a NACK than an ACK on this one. I
> don't support enough NUMA machines to feel strongly enough about it but
> unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
> might be there seems ill-advised to me and will have other consequences for
> existing more traditional x86-64 NUMA machines.

I was sort-of planning on coming up with an x86_64 arch specific function
for setting zone_reclaim_mode, but didn't like the direction things
were going.

Something to the effect of...
--- 20090609.orig/mm/page_alloc.c       2009-06-09 06:51:34.000000000 -0500
+++ 20090609/mm/page_alloc.c    2009-06-09 06:55:00.160762069 -0500
@@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
        while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
                int distance = node_distance(local_node, node);
 
-               /*
-                * If another node is sufficiently far away then it is better
-                * to reclaim pages in a zone before going off node.
-                */
-               if (distance > RECLAIM_DISTANCE)
-                       zone_reclaim_mode = 1;
+               zone_reclaim_mode = arch_zone_reclaim_mode(distance);
 
                /*
                 * We don't want to pressure a particular node.

And then letting each arch define an arch_zone_reclaim_mode().  If other
values are needed in the determination, we would add parameters to
reflect this.

For ia64, add

static inline ia64_zone_reclaim_mode(int distance)
{
	if (distance > 15)
		return 1;
}

#define	arch_zone_reclaim_mode(_d)	ia64_zone_reclaim_mode(_d)


Then, inside x86_64_zone_reclaim_mode(), I could make it something like
	if (distance > 40 || is_uv_system())
		return 1;

In the end, I didn't think this fight was worth fighting given how ugly
this felt.  Upon second thought, I am beginning to think it is not that
bad, but I also don't think it is that good either.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 12:02         ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-09 12:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Robin Holt, KOSAKI Motohiro, Christoph Lameter, Rik van Riel,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
> On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> > On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> > 
> > Let me start by saying I agree completely with everything you wrote and
> > still disagree with this patch, but was willing to compromise and work
> > around this for our upcoming x86_64 machine by putting a "value add"
> > into our packaging of adding a sysctl that turns reclaim back on.
> > 
> 
> To be honest, I'm more leaning towards a NACK than an ACK on this one. I
> don't support enough NUMA machines to feel strongly enough about it but
> unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
> might be there seems ill-advised to me and will have other consequences for
> existing more traditional x86-64 NUMA machines.

I was sort-of planning on coming up with an x86_64 arch specific function
for setting zone_reclaim_mode, but didn't like the direction things
were going.

Something to the effect of...
--- 20090609.orig/mm/page_alloc.c       2009-06-09 06:51:34.000000000 -0500
+++ 20090609/mm/page_alloc.c    2009-06-09 06:55:00.160762069 -0500
@@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
        while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
                int distance = node_distance(local_node, node);
 
-               /*
-                * If another node is sufficiently far away then it is better
-                * to reclaim pages in a zone before going off node.
-                */
-               if (distance > RECLAIM_DISTANCE)
-                       zone_reclaim_mode = 1;
+               zone_reclaim_mode = arch_zone_reclaim_mode(distance);
 
                /*
                 * We don't want to pressure a particular node.

And then letting each arch define an arch_zone_reclaim_mode().  If other
values are needed in the determination, we would add parameters to
reflect this.

For ia64, add

static inline ia64_zone_reclaim_mode(int distance)
{
	if (distance > 15)
		return 1;
}

#define	arch_zone_reclaim_mode(_d)	ia64_zone_reclaim_mode(_d)


Then, inside x86_64_zone_reclaim_mode(), I could make it something like
	if (distance > 40 || is_uv_system())
		return 1;

In the end, I didn't think this fight was worth fighting given how ugly
this felt.  Upon second thought, I am beginning to think it is not that
bad, but I also don't think it is that good either.

Thanks,
Robin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 12:02         ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-09 12:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Christoph Lameter, linux-mm, Zhang, Yanmin, LKML,
	linuxppc-dev, Robin Holt, KOSAKI Motohiro, linux-ia64,
	Andrew Morton, Wu Fengguang

On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
> On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> > On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> > 
> > Let me start by saying I agree completely with everything you wrote and
> > still disagree with this patch, but was willing to compromise and work
> > around this for our upcoming x86_64 machine by putting a "value add"
> > into our packaging of adding a sysctl that turns reclaim back on.
> > 
> 
> To be honest, I'm more leaning towards a NACK than an ACK on this one. I
> don't support enough NUMA machines to feel strongly enough about it but
> unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
> might be there seems ill-advised to me and will have other consequences for
> existing more traditional x86-64 NUMA machines.

I was sort-of planning on coming up with an x86_64 arch specific function
for setting zone_reclaim_mode, but didn't like the direction things
were going.

Something to the effect of...
--- 20090609.orig/mm/page_alloc.c       2009-06-09 06:51:34.000000000 -0500
+++ 20090609/mm/page_alloc.c    2009-06-09 06:55:00.160762069 -0500
@@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
        while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
                int distance = node_distance(local_node, node);
 
-               /*
-                * If another node is sufficiently far away then it is better
-                * to reclaim pages in a zone before going off node.
-                */
-               if (distance > RECLAIM_DISTANCE)
-                       zone_reclaim_mode = 1;
+               zone_reclaim_mode = arch_zone_reclaim_mode(distance);
 
                /*
                 * We don't want to pressure a particular node.

And then letting each arch define an arch_zone_reclaim_mode().  If other
values are needed in the determination, we would add parameters to
reflect this.

For ia64, add

static inline ia64_zone_reclaim_mode(int distance)
{
	if (distance > 15)
		return 1;
}

#define	arch_zone_reclaim_mode(_d)	ia64_zone_reclaim_mode(_d)


Then, inside x86_64_zone_reclaim_mode(), I could make it something like
	if (distance > 40 || is_uv_system())
		return 1;

In the end, I didn't think this fight was worth fighting given how ugly
this felt.  Upon second thought, I am beginning to think it is not that
bad, but I also don't think it is that good either.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 12:02         ` Robin Holt
  0 siblings, 0 replies; 39+ messages in thread
From: Robin Holt @ 2009-06-09 12:02 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Robin Holt, KOSAKI Motohiro, Christoph Lameter, Rik van Riel,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
> On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> > On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> > 
> > Let me start by saying I agree completely with everything you wrote and
> > still disagree with this patch, but was willing to compromise and work
> > around this for our upcoming x86_64 machine by putting a "value add"
> > into our packaging of adding a sysctl that turns reclaim back on.
> > 
> 
> To be honest, I'm more leaning towards a NACK than an ACK on this one. I
> don't support enough NUMA machines to feel strongly enough about it but
> unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
> might be there seems ill-advised to me and will have other consequences for
> existing more traditional x86-64 NUMA machines.

I was sort-of planning on coming up with an x86_64 arch specific function
for setting zone_reclaim_mode, but didn't like the direction things
were going.

Something to the effect of...
--- 20090609.orig/mm/page_alloc.c       2009-06-09 06:51:34.000000000 -0500
+++ 20090609/mm/page_alloc.c    2009-06-09 06:55:00.160762069 -0500
@@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
        while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
                int distance = node_distance(local_node, node);
 
-               /*
-                * If another node is sufficiently far away then it is better
-                * to reclaim pages in a zone before going off node.
-                */
-               if (distance > RECLAIM_DISTANCE)
-                       zone_reclaim_mode = 1;
+               zone_reclaim_mode = arch_zone_reclaim_mode(distance);
 
                /*
                 * We don't want to pressure a particular node.

And then letting each arch define an arch_zone_reclaim_mode().  If other
values are needed in the determination, we would add parameters to
reflect this.

For ia64, add

static inline ia64_zone_reclaim_mode(int distance)
{
	if (distance > 15)
		return 1;
}

#define	arch_zone_reclaim_mode(_d)	ia64_zone_reclaim_mode(_d)


Then, inside x86_64_zone_reclaim_mode(), I could make it something like
	if (distance > 40 || is_uv_system())
		return 1;

In the end, I didn't think this fight was worth fighting given how ugly
this felt.  Upon second thought, I am beginning to think it is not that
bad, but I also don't think it is that good either.

Thanks,
Robin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-08 11:50   ` Mel Gorman
  (?)
  (?)
@ 2009-06-09 13:48     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2009-06-09 13:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Christoph Lameter, Rik van Riel, Robin Holt,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

Hi

sorry for late responce. my e-mail reading speed is very slow ;-)

First, Could you please read past thread?
I think many topic of this mail are already discussed.


> On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> > 
> > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > has large remote node distance. it's because we could assume that large distance
> > mean large server until recently.
> > 
> 
> We don't make assumptions about the server being large, small or otherwise. The
> affinity tables reporting a distance of 20 or more is saying "remote memory
> has twice the latency of local memory". This is true irrespective of workload
> and implies that going off-node has a real penalty regardless of workload.

No.
Now, we talk about off-node allocation vs unnecessary file cache dropping.
IOW, off-node allocation vs disk access.

Then, the worth doesn't only depend on off-node distance, but also depend on
workload IO tendency and IO speed.

Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
degression although its box. 

So, I don't think this problem is small vs large machine issue.
nor i7 issue.
high-speed P2P CPU integrated memory controller expose old issue.


> > In general, workload depended configration shouldn't put into default settings.
> > 
> > However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> > (only) use this setting.
> > 
> > Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> > remain current configuration for backward-compatibility.
> > 
> 
> What about if it's x86-64-based NUMA but it's not i7 based. There, the
> NUMA distances might really mean something and that zone_reclaim behaviour
> is desirable.

hmmm..
I don't hope ignore AMD, I think it's common characterastic of P2P and
integrated memory controller machine.

Also, I don't hope detect CPU family or similar, because we need update
such code evey when Intel makes new cpu.

Can we detect P2P interconnect machine? I'm not sure.


> I think if we're going down the road of setting the default, it shouldn't be
> per-architecture defaults as such. Other choices for addressing this might be;
> 
> 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
>    (or some other sensible figure) on i7
> 
> 2. There should be a per-arch modifier callback for the affinity
>    distances. If the x86 code detects the CPU is an i7, it can reduce the
>    reported latencies to be more in line with expected reality.
> 
> 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
>    overall is free. The difficulty is figuring out if the allocation is for
>    file pages.
> 
> 4. Change zone_reclaim_mode default to mean "do your best to figure it
>    out". Patch 1 would default large distances to 1 to see what happens.
>    Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1
> 
> 	If we have locally reclaimed 2% of the nodes memory in file pages
> 	within the last 5 seconds when >= 20% of total physical memory was
> 	free, then set the reclaim_mode to 0 on the assumption the node is
> 	mostly caching pages and shouldn't be reclaimed to avoid excessive IO
> 
> Option 1 would appear to be the most straight-forward but option 2
> should be doable. Option 3 and 4 could turn into a rats nest and I would
> consider those approaches a bit more drastic.

hmhm. 
I think the key-point of option 1 and 2 are proper hardware detecting way.

option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
but you already pointed out its hard, because page-allocator don't know
allocation purpose ;)


> > @@ -10,6 +10,12 @@ struct device_node;
> >  
> >  #include <asm/mmzone.h>
> >  
> > +/*
> > + * Distance above which we begin to use zone reclaim
> > + */
> > +#define RECLAIM_DISTANCE 20
> > +
> > +
> 
> Where is the ia-64-specific modifier to RECAIM_DISTANCE?


arch/ia64/include/asm/topology.h has

	/*
	 * Distance above which we begin to use zone reclaim
	 */
	#define RECLAIM_DISTANCE 15


I don't think distance==15 is machine independent proper definition.
but there is long lived definition ;)





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 13:48     ` KOSAKI Motohiro
  0 siblings, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2009-06-09 13:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Christoph Lameter, Rik van Riel, Robin Holt,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

Hi

sorry for late responce. my e-mail reading speed is very slow ;-)

First, Could you please read past thread?
I think many topic of this mail are already discussed.


> On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> > 
> > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > has large remote node distance. it's because we could assume that large distance
> > mean large server until recently.
> > 
> 
> We don't make assumptions about the server being large, small or otherwise. The
> affinity tables reporting a distance of 20 or more is saying "remote memory
> has twice the latency of local memory". This is true irrespective of workload
> and implies that going off-node has a real penalty regardless of workload.

No.
Now, we talk about off-node allocation vs unnecessary file cache dropping.
IOW, off-node allocation vs disk access.

Then, the worth doesn't only depend on off-node distance, but also depend on
workload IO tendency and IO speed.

Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
degression although its box. 

So, I don't think this problem is small vs large machine issue.
nor i7 issue.
high-speed P2P CPU integrated memory controller expose old issue.


> > In general, workload depended configration shouldn't put into default settings.
> > 
> > However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> > (only) use this setting.
> > 
> > Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> > remain current configuration for backward-compatibility.
> > 
> 
> What about if it's x86-64-based NUMA but it's not i7 based. There, the
> NUMA distances might really mean something and that zone_reclaim behaviour
> is desirable.

hmmm..
I don't hope ignore AMD, I think it's common characterastic of P2P and
integrated memory controller machine.

Also, I don't hope detect CPU family or similar, because we need update
such code evey when Intel makes new cpu.

Can we detect P2P interconnect machine? I'm not sure.


> I think if we're going down the road of setting the default, it shouldn't be
> per-architecture defaults as such. Other choices for addressing this might be;
> 
> 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
>    (or some other sensible figure) on i7
> 
> 2. There should be a per-arch modifier callback for the affinity
>    distances. If the x86 code detects the CPU is an i7, it can reduce the
>    reported latencies to be more in line with expected reality.
> 
> 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
>    overall is free. The difficulty is figuring out if the allocation is for
>    file pages.
> 
> 4. Change zone_reclaim_mode default to mean "do your best to figure it
>    out". Patch 1 would default large distances to 1 to see what happens.
>    Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1
> 
> 	If we have locally reclaimed 2% of the nodes memory in file pages
> 	within the last 5 seconds when >= 20% of total physical memory was
> 	free, then set the reclaim_mode to 0 on the assumption the node is
> 	mostly caching pages and shouldn't be reclaimed to avoid excessive IO
> 
> Option 1 would appear to be the most straight-forward but option 2
> should be doable. Option 3 and 4 could turn into a rats nest and I would
> consider those approaches a bit more drastic.

hmhm. 
I think the key-point of option 1 and 2 are proper hardware detecting way.

option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
but you already pointed out its hard, because page-allocator don't know
allocation purpose ;)


> > @@ -10,6 +10,12 @@ struct device_node;
> >  
> >  #include <asm/mmzone.h>
> >  
> > +/*
> > + * Distance above which we begin to use zone reclaim
> > + */
> > +#define RECLAIM_DISTANCE 20
> > +
> > +
> 
> Where is the ia-64-specific modifier to RECAIM_DISTANCE?


arch/ia64/include/asm/topology.h has

	/*
	 * Distance above which we begin to use zone reclaim
	 */
	#define RECLAIM_DISTANCE 15


I don't think distance==15 is machine independent proper definition.
but there is long lived definition ;)




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 13:48     ` KOSAKI Motohiro
  0 siblings, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2009-06-09 13:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Rik van Riel, Christoph Lameter, linux-mm, Zhang, Yanmin, LKML,
	linuxppc-dev, Robin Holt, kosaki.motohiro, linux-ia64,
	Andrew Morton, Wu Fengguang

Hi

sorry for late responce. my e-mail reading speed is very slow ;-)

First, Could you please read past thread?
I think many topic of this mail are already discussed.


> On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> > 
> > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > has large remote node distance. it's because we could assume that large distance
> > mean large server until recently.
> > 
> 
> We don't make assumptions about the server being large, small or otherwise. The
> affinity tables reporting a distance of 20 or more is saying "remote memory
> has twice the latency of local memory". This is true irrespective of workload
> and implies that going off-node has a real penalty regardless of workload.

No.
Now, we talk about off-node allocation vs unnecessary file cache dropping.
IOW, off-node allocation vs disk access.

Then, the worth doesn't only depend on off-node distance, but also depend on
workload IO tendency and IO speed.

Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
degression although its box. 

So, I don't think this problem is small vs large machine issue.
nor i7 issue.
high-speed P2P CPU integrated memory controller expose old issue.


> > In general, workload depended configration shouldn't put into default settings.
> > 
> > However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> > (only) use this setting.
> > 
> > Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> > remain current configuration for backward-compatibility.
> > 
> 
> What about if it's x86-64-based NUMA but it's not i7 based. There, the
> NUMA distances might really mean something and that zone_reclaim behaviour
> is desirable.

hmmm..
I don't hope ignore AMD, I think it's common characterastic of P2P and
integrated memory controller machine.

Also, I don't hope detect CPU family or similar, because we need update
such code evey when Intel makes new cpu.

Can we detect P2P interconnect machine? I'm not sure.


> I think if we're going down the road of setting the default, it shouldn't be
> per-architecture defaults as such. Other choices for addressing this might be;
> 
> 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
>    (or some other sensible figure) on i7
> 
> 2. There should be a per-arch modifier callback for the affinity
>    distances. If the x86 code detects the CPU is an i7, it can reduce the
>    reported latencies to be more in line with expected reality.
> 
> 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
>    overall is free. The difficulty is figuring out if the allocation is for
>    file pages.
> 
> 4. Change zone_reclaim_mode default to mean "do your best to figure it
>    out". Patch 1 would default large distances to 1 to see what happens.
>    Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1
> 
> 	If we have locally reclaimed 2% of the nodes memory in file pages
> 	within the last 5 seconds when >= 20% of total physical memory was
> 	free, then set the reclaim_mode to 0 on the assumption the node is
> 	mostly caching pages and shouldn't be reclaimed to avoid excessive IO
> 
> Option 1 would appear to be the most straight-forward but option 2
> should be doable. Option 3 and 4 could turn into a rats nest and I would
> consider those approaches a bit more drastic.

hmhm. 
I think the key-point of option 1 and 2 are proper hardware detecting way.

option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
but you already pointed out its hard, because page-allocator don't know
allocation purpose ;)


> > @@ -10,6 +10,12 @@ struct device_node;
> >  
> >  #include <asm/mmzone.h>
> >  
> > +/*
> > + * Distance above which we begin to use zone reclaim
> > + */
> > +#define RECLAIM_DISTANCE 20
> > +
> > +
> 
> Where is the ia-64-specific modifier to RECAIM_DISTANCE?


arch/ia64/include/asm/topology.h has

	/*
	 * Distance above which we begin to use zone reclaim
	 */
	#define RECLAIM_DISTANCE 15


I don't think distance==15 is machine independent proper definition.
but there is long lived definition ;)

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 13:48     ` KOSAKI Motohiro
  0 siblings, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2009-06-09 13:48 UTC (permalink / raw)
  To: Mel Gorman
  Cc: kosaki.motohiro, Christoph Lameter, Rik van Riel, Robin Holt,
	Zhang, Yanmin, Wu Fengguang, linux-ia64, linuxppc-dev, LKML,
	linux-mm, Andrew Morton

Hi

sorry for late responce. my e-mail reading speed is very slow ;-)

First, Could you please read past thread?
I think many topic of this mail are already discussed.


> On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> > 
> > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > has large remote node distance. it's because we could assume that large distance
> > mean large server until recently.
> > 
> 
> We don't make assumptions about the server being large, small or otherwise. The
> affinity tables reporting a distance of 20 or more is saying "remote memory
> has twice the latency of local memory". This is true irrespective of workload
> and implies that going off-node has a real penalty regardless of workload.

No.
Now, we talk about off-node allocation vs unnecessary file cache dropping.
IOW, off-node allocation vs disk access.

Then, the worth doesn't only depend on off-node distance, but also depend on
workload IO tendency and IO speed.

Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
degression although its box. 

So, I don't think this problem is small vs large machine issue.
nor i7 issue.
high-speed P2P CPU integrated memory controller expose old issue.


> > In general, workload depended configration shouldn't put into default settings.
> > 
> > However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> > (only) use this setting.
> > 
> > Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> > remain current configuration for backward-compatibility.
> > 
> 
> What about if it's x86-64-based NUMA but it's not i7 based. There, the
> NUMA distances might really mean something and that zone_reclaim behaviour
> is desirable.

hmmm..
I don't hope ignore AMD, I think it's common characterastic of P2P and
integrated memory controller machine.

Also, I don't hope detect CPU family or similar, because we need update
such code evey when Intel makes new cpu.

Can we detect P2P interconnect machine? I'm not sure.


> I think if we're going down the road of setting the default, it shouldn't be
> per-architecture defaults as such. Other choices for addressing this might be;
> 
> 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
>    (or some other sensible figure) on i7
> 
> 2. There should be a per-arch modifier callback for the affinity
>    distances. If the x86 code detects the CPU is an i7, it can reduce the
>    reported latencies to be more in line with expected reality.
> 
> 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
>    overall is free. The difficulty is figuring out if the allocation is for
>    file pages.
> 
> 4. Change zone_reclaim_mode default to mean "do your best to figure it
>    out". Patch 1 would default large distances to 1 to see what happens.
>    Then apply a heuristic when in figure-it-out mode and using reclaim_mode = 1
> 
> 	If we have locally reclaimed 2% of the nodes memory in file pages
> 	within the last 5 seconds when >= 20% of total physical memory was
> 	free, then set the reclaim_mode to 0 on the assumption the node is
> 	mostly caching pages and shouldn't be reclaimed to avoid excessive IO
> 
> Option 1 would appear to be the most straight-forward but option 2
> should be doable. Option 3 and 4 could turn into a rats nest and I would
> consider those approaches a bit more drastic.

hmhm. 
I think the key-point of option 1 and 2 are proper hardware detecting way.

option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
but you already pointed out its hard, because page-allocator don't know
allocation purpose ;)


> > @@ -10,6 +10,12 @@ struct device_node;
> >  
> >  #include <asm/mmzone.h>
> >  
> > +/*
> > + * Distance above which we begin to use zone reclaim
> > + */
> > +#define RECLAIM_DISTANCE 20
> > +
> > +
> 
> Where is the ia-64-specific modifier to RECAIM_DISTANCE?


arch/ia64/include/asm/topology.h has

	/*
	 * Distance above which we begin to use zone reclaim
	 */
	#define RECLAIM_DISTANCE 15


I don't think distance=15 is machine independent proper definition.
but there is long lived definition ;)





^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-09 13:48     ` KOSAKI Motohiro
  (?)
  (?)
@ 2009-06-09 14:38       ` Mel Gorman
  -1 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-09 14:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Tue, Jun 09, 2009 at 10:48:34PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> sorry for late responce. my e-mail reading speed is very slow ;-)
> 
> First, Could you please read past thread?
> I think many topic of this mail are already discussed.
> 

I think I caught them all but the horrible fact of the matter is that
whether zone_reclaim_mode should be 1 or 0 on NUMA machines is "it depends".
There are arguements for both and no clear winner.

> 
> > On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> > > 
> > > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > > has large remote node distance. it's because we could assume that large distance
> > > mean large server until recently.
> > > 
> > 
> > We don't make assumptions about the server being large, small or otherwise. The
> > affinity tables reporting a distance of 20 or more is saying "remote memory
> > has twice the latency of local memory". This is true irrespective of workload
> > and implies that going off-node has a real penalty regardless of workload.
> 
> No.
> Now, we talk about off-node allocation vs unnecessary file cache dropping.
> IOW, off-node allocation vs disk access.
> 

Even if we used GFP flags to identify the file pages, there is no guarantee
that we are taking the correct action to keep "relevant" pages in memory.

> Then, the worth doesn't only depend on off-node distance, but also depend on
> workload IO tendency and IO speed.
> 
> Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
> degression although its box. 
> 

I bet if it was 0, that the off-node accesses would somewtimes make
"performance degression" as well :(

> So, I don't think this problem is small vs large machine issue.
> nor i7 issue.
> high-speed P2P CPU integrated memory controller expose old issue.
> 
> 
> > > In general, workload depended configration shouldn't put into default settings.
> > > 
> > > However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> > > (only) use this setting.
> > > 
> > > Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> > > remain current configuration for backward-compatibility.
> > > 
> > 
> > What about if it's x86-64-based NUMA but it's not i7 based. There, the
> > NUMA distances might really mean something and that zone_reclaim behaviour
> > is desirable.
> 
> hmmm..
> I don't hope ignore AMD, I think it's common characterastic of P2P and
> integrated memory controller machine.
> 
> Also, I don't hope detect CPU family or similar, because we need update
> such code evey when Intel makes new cpu.
> 
> Can we detect P2P interconnect machine? I'm not sure.
> 

I've no idea. It's not just I7 because some of the AMD chips will have
integrated memory controllers as well. We were somewhat depending on the
affinity information providing the necessary information.

> > I think if we're going down the road of setting the default, it shouldn't be
> > per-architecture defaults as such. Other choices for addressing this might be;
> > 
> > 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
> >    (or some other sensible figure) on i7
> > 
> > 2. There should be a per-arch modifier callback for the affinity
> >    distances. If the x86 code detects the CPU is an i7, it can reduce the
> >    reported latencies to be more in line with expected reality.
> > 
> > 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
> >    overall is free. The difficulty is figuring out if the allocation is for
> >    file pages.
> > 
> > 4. Change zone_reclaim_mode default to mean "do your best to figure it
> >    out". Patch 1 would default large distances to 1 to see what happens.
> >    Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1
> > 
> > 	If we have locally reclaimed 2% of the nodes memory in file pages
> > 	within the last 5 seconds when >= 20% of total physical memory was
> > 	free, then set the reclaim_mode to 0 on the assumption the node is
> > 	mostly caching pages and shouldn't be reclaimed to avoid excessive IO
> > 
> > Option 1 would appear to be the most straight-forward but option 2
> > should be doable. Option 3 and 4 could turn into a rats nest and I would
> > consider those approaches a bit more drastic.
> 
> hmhm. 
> I think the key-point of option 1 and 2 are proper hardware detecting way.
> 
> option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
> but you already pointed out its hard, because page-allocator don't know
> allocation purpose ;)
> 

Option 3 may be undoable. Even if the allocations are tagged as "this is
a file-backed allocation", we have no way of detecting how important
that is to the overall workload. Option 4 would be the preference. It's
a heuristic that might let us down, but the administrator can override
it and fix the reclaim_mode in the event we get it wrong.

> 
> > > @@ -10,6 +10,12 @@ struct device_node;
> > >  
> > >  #include <asm/mmzone.h>
> > >  
> > > +/*
> > > + * Distance above which we begin to use zone reclaim
> > > + */
> > > +#define RECLAIM_DISTANCE 20
> > > +
> > > +
> > 
> > Where is the ia-64-specific modifier to RECAIM_DISTANCE?
> 
> 
> arch/ia64/include/asm/topology.h has
> 
> 	/*
> 	 * Distance above which we begin to use zone reclaim
> 	 */
> 	#define RECLAIM_DISTANCE 15
> 
> 
> I don't think distance==15 is machine independent proper definition.
> but there is long lived definition ;)
> 
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 14:38       ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-09 14:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Tue, Jun 09, 2009 at 10:48:34PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> sorry for late responce. my e-mail reading speed is very slow ;-)
> 
> First, Could you please read past thread?
> I think many topic of this mail are already discussed.
> 

I think I caught them all but the horrible fact of the matter is that
whether zone_reclaim_mode should be 1 or 0 on NUMA machines is "it depends".
There are arguements for both and no clear winner.

> 
> > On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> > > 
> > > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > > has large remote node distance. it's because we could assume that large distance
> > > mean large server until recently.
> > > 
> > 
> > We don't make assumptions about the server being large, small or otherwise. The
> > affinity tables reporting a distance of 20 or more is saying "remote memory
> > has twice the latency of local memory". This is true irrespective of workload
> > and implies that going off-node has a real penalty regardless of workload.
> 
> No.
> Now, we talk about off-node allocation vs unnecessary file cache dropping.
> IOW, off-node allocation vs disk access.
> 

Even if we used GFP flags to identify the file pages, there is no guarantee
that we are taking the correct action to keep "relevant" pages in memory.

> Then, the worth doesn't only depend on off-node distance, but also depend on
> workload IO tendency and IO speed.
> 
> Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
> degression although its box. 
> 

I bet if it was 0, that the off-node accesses would somewtimes make
"performance degression" as well :(

> So, I don't think this problem is small vs large machine issue.
> nor i7 issue.
> high-speed P2P CPU integrated memory controller expose old issue.
> 
> 
> > > In general, workload depended configration shouldn't put into default settings.
> > > 
> > > However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> > > (only) use this setting.
> > > 
> > > Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> > > remain current configuration for backward-compatibility.
> > > 
> > 
> > What about if it's x86-64-based NUMA but it's not i7 based. There, the
> > NUMA distances might really mean something and that zone_reclaim behaviour
> > is desirable.
> 
> hmmm..
> I don't hope ignore AMD, I think it's common characterastic of P2P and
> integrated memory controller machine.
> 
> Also, I don't hope detect CPU family or similar, because we need update
> such code evey when Intel makes new cpu.
> 
> Can we detect P2P interconnect machine? I'm not sure.
> 

I've no idea. It's not just I7 because some of the AMD chips will have
integrated memory controllers as well. We were somewhat depending on the
affinity information providing the necessary information.

> > I think if we're going down the road of setting the default, it shouldn't be
> > per-architecture defaults as such. Other choices for addressing this might be;
> > 
> > 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
> >    (or some other sensible figure) on i7
> > 
> > 2. There should be a per-arch modifier callback for the affinity
> >    distances. If the x86 code detects the CPU is an i7, it can reduce the
> >    reported latencies to be more in line with expected reality.
> > 
> > 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
> >    overall is free. The difficulty is figuring out if the allocation is for
> >    file pages.
> > 
> > 4. Change zone_reclaim_mode default to mean "do your best to figure it
> >    out". Patch 1 would default large distances to 1 to see what happens.
> >    Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1
> > 
> > 	If we have locally reclaimed 2% of the nodes memory in file pages
> > 	within the last 5 seconds when >= 20% of total physical memory was
> > 	free, then set the reclaim_mode to 0 on the assumption the node is
> > 	mostly caching pages and shouldn't be reclaimed to avoid excessive IO
> > 
> > Option 1 would appear to be the most straight-forward but option 2
> > should be doable. Option 3 and 4 could turn into a rats nest and I would
> > consider those approaches a bit more drastic.
> 
> hmhm. 
> I think the key-point of option 1 and 2 are proper hardware detecting way.
> 
> option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
> but you already pointed out its hard, because page-allocator don't know
> allocation purpose ;)
> 

Option 3 may be undoable. Even if the allocations are tagged as "this is
a file-backed allocation", we have no way of detecting how important
that is to the overall workload. Option 4 would be the preference. It's
a heuristic that might let us down, but the administrator can override
it and fix the reclaim_mode in the event we get it wrong.

> 
> > > @@ -10,6 +10,12 @@ struct device_node;
> > >  
> > >  #include <asm/mmzone.h>
> > >  
> > > +/*
> > > + * Distance above which we begin to use zone reclaim
> > > + */
> > > +#define RECLAIM_DISTANCE 20
> > > +
> > > +
> > 
> > Where is the ia-64-specific modifier to RECAIM_DISTANCE?
> 
> 
> arch/ia64/include/asm/topology.h has
> 
> 	/*
> 	 * Distance above which we begin to use zone reclaim
> 	 */
> 	#define RECLAIM_DISTANCE 15
> 
> 
> I don't think distance==15 is machine independent proper definition.
> but there is long lived definition ;)
> 
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 14:38       ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-09 14:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Rik van Riel, Christoph Lameter, linux-mm, Zhang, Yanmin, LKML,
	linuxppc-dev, Robin Holt, linux-ia64, Andrew Morton,
	Wu Fengguang

On Tue, Jun 09, 2009 at 10:48:34PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> sorry for late responce. my e-mail reading speed is very slow ;-)
> 
> First, Could you please read past thread?
> I think many topic of this mail are already discussed.
> 

I think I caught them all but the horrible fact of the matter is that
whether zone_reclaim_mode should be 1 or 0 on NUMA machines is "it depends".
There are arguements for both and no clear winner.

> 
> > On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> > > 
> > > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > > has large remote node distance. it's because we could assume that large distance
> > > mean large server until recently.
> > > 
> > 
> > We don't make assumptions about the server being large, small or otherwise. The
> > affinity tables reporting a distance of 20 or more is saying "remote memory
> > has twice the latency of local memory". This is true irrespective of workload
> > and implies that going off-node has a real penalty regardless of workload.
> 
> No.
> Now, we talk about off-node allocation vs unnecessary file cache dropping.
> IOW, off-node allocation vs disk access.
> 

Even if we used GFP flags to identify the file pages, there is no guarantee
that we are taking the correct action to keep "relevant" pages in memory.

> Then, the worth doesn't only depend on off-node distance, but also depend on
> workload IO tendency and IO speed.
> 
> Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
> degression although its box. 
> 

I bet if it was 0, that the off-node accesses would somewtimes make
"performance degression" as well :(

> So, I don't think this problem is small vs large machine issue.
> nor i7 issue.
> high-speed P2P CPU integrated memory controller expose old issue.
> 
> 
> > > In general, workload depended configration shouldn't put into default settings.
> > > 
> > > However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> > > (only) use this setting.
> > > 
> > > Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> > > remain current configuration for backward-compatibility.
> > > 
> > 
> > What about if it's x86-64-based NUMA but it's not i7 based. There, the
> > NUMA distances might really mean something and that zone_reclaim behaviour
> > is desirable.
> 
> hmmm..
> I don't hope ignore AMD, I think it's common characterastic of P2P and
> integrated memory controller machine.
> 
> Also, I don't hope detect CPU family or similar, because we need update
> such code evey when Intel makes new cpu.
> 
> Can we detect P2P interconnect machine? I'm not sure.
> 

I've no idea. It's not just I7 because some of the AMD chips will have
integrated memory controllers as well. We were somewhat depending on the
affinity information providing the necessary information.

> > I think if we're going down the road of setting the default, it shouldn't be
> > per-architecture defaults as such. Other choices for addressing this might be;
> > 
> > 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
> >    (or some other sensible figure) on i7
> > 
> > 2. There should be a per-arch modifier callback for the affinity
> >    distances. If the x86 code detects the CPU is an i7, it can reduce the
> >    reported latencies to be more in line with expected reality.
> > 
> > 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
> >    overall is free. The difficulty is figuring out if the allocation is for
> >    file pages.
> > 
> > 4. Change zone_reclaim_mode default to mean "do your best to figure it
> >    out". Patch 1 would default large distances to 1 to see what happens.
> >    Then apply a heuristic when in figure-it-out mode and using reclaim_mode == 1
> > 
> > 	If we have locally reclaimed 2% of the nodes memory in file pages
> > 	within the last 5 seconds when >= 20% of total physical memory was
> > 	free, then set the reclaim_mode to 0 on the assumption the node is
> > 	mostly caching pages and shouldn't be reclaimed to avoid excessive IO
> > 
> > Option 1 would appear to be the most straight-forward but option 2
> > should be doable. Option 3 and 4 could turn into a rats nest and I would
> > consider those approaches a bit more drastic.
> 
> hmhm. 
> I think the key-point of option 1 and 2 are proper hardware detecting way.
> 
> option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
> but you already pointed out its hard, because page-allocator don't know
> allocation purpose ;)
> 

Option 3 may be undoable. Even if the allocations are tagged as "this is
a file-backed allocation", we have no way of detecting how important
that is to the overall workload. Option 4 would be the preference. It's
a heuristic that might let us down, but the administrator can override
it and fix the reclaim_mode in the event we get it wrong.

> 
> > > @@ -10,6 +10,12 @@ struct device_node;
> > >  
> > >  #include <asm/mmzone.h>
> > >  
> > > +/*
> > > + * Distance above which we begin to use zone reclaim
> > > + */
> > > +#define RECLAIM_DISTANCE 20
> > > +
> > > +
> > 
> > Where is the ia-64-specific modifier to RECAIM_DISTANCE?
> 
> 
> arch/ia64/include/asm/topology.h has
> 
> 	/*
> 	 * Distance above which we begin to use zone reclaim
> 	 */
> 	#define RECLAIM_DISTANCE 15
> 
> 
> I don't think distance==15 is machine independent proper definition.
> but there is long lived definition ;)
> 
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 14:38       ` Mel Gorman
  0 siblings, 0 replies; 39+ messages in thread
From: Mel Gorman @ 2009-06-09 14:38 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Christoph Lameter, Rik van Riel, Robin Holt, Zhang, Yanmin,
	Wu Fengguang, linux-ia64, linuxppc-dev, LKML, linux-mm,
	Andrew Morton

On Tue, Jun 09, 2009 at 10:48:34PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> sorry for late responce. my e-mail reading speed is very slow ;-)
> 
> First, Could you please read past thread?
> I think many topic of this mail are already discussed.
> 

I think I caught them all but the horrible fact of the matter is that
whether zone_reclaim_mode should be 1 or 0 on NUMA machines is "it depends".
There are arguements for both and no clear winner.

> 
> > On Thu, Jun 04, 2009 at 07:23:15PM +0900, KOSAKI Motohiro wrote:
> > > 
> > > Current linux policy is, zone_reclaim_mode is enabled by default if the machine
> > > has large remote node distance. it's because we could assume that large distance
> > > mean large server until recently.
> > > 
> > 
> > We don't make assumptions about the server being large, small or otherwise. The
> > affinity tables reporting a distance of 20 or more is saying "remote memory
> > has twice the latency of local memory". This is true irrespective of workload
> > and implies that going off-node has a real penalty regardless of workload.
> 
> No.
> Now, we talk about off-node allocation vs unnecessary file cache dropping.
> IOW, off-node allocation vs disk access.
> 

Even if we used GFP flags to identify the file pages, there is no guarantee
that we are taking the correct action to keep "relevant" pages in memory.

> Then, the worth doesn't only depend on off-node distance, but also depend on
> workload IO tendency and IO speed.
> 
> Fujitsu has 64 core ia64 HPC box, zone-reclaim sometimes made performance
> degression although its box. 
> 

I bet if it was 0, that the off-node accesses would somewtimes make
"performance degression" as well :(

> So, I don't think this problem is small vs large machine issue.
> nor i7 issue.
> high-speed P2P CPU integrated memory controller expose old issue.
> 
> 
> > > In general, workload depended configration shouldn't put into default settings.
> > > 
> > > However, current code is long standing about two year. Highest POWER and IA64 HPC machine
> > > (only) use this setting.
> > > 
> > > Thus, x86 and almost rest architecture change default setting, but Only power and ia64
> > > remain current configuration for backward-compatibility.
> > > 
> > 
> > What about if it's x86-64-based NUMA but it's not i7 based. There, the
> > NUMA distances might really mean something and that zone_reclaim behaviour
> > is desirable.
> 
> hmmm..
> I don't hope ignore AMD, I think it's common characterastic of P2P and
> integrated memory controller machine.
> 
> Also, I don't hope detect CPU family or similar, because we need update
> such code evey when Intel makes new cpu.
> 
> Can we detect P2P interconnect machine? I'm not sure.
> 

I've no idea. It's not just I7 because some of the AMD chips will have
integrated memory controllers as well. We were somewhat depending on the
affinity information providing the necessary information.

> > I think if we're going down the road of setting the default, it shouldn't be
> > per-architecture defaults as such. Other choices for addressing this might be;
> > 
> > 1. Make RECLAIM_DISTANCE a variable on x86. Set it to 20 by default, and 5
> >    (or some other sensible figure) on i7
> > 
> > 2. There should be a per-arch modifier callback for the affinity
> >    distances. If the x86 code detects the CPU is an i7, it can reduce the
> >    reported latencies to be more in line with expected reality.
> > 
> > 3. Do not use zone_reclaim() for file-backed data if more than 20% of memory
> >    overall is free. The difficulty is figuring out if the allocation is for
> >    file pages.
> > 
> > 4. Change zone_reclaim_mode default to mean "do your best to figure it
> >    out". Patch 1 would default large distances to 1 to see what happens.
> >    Then apply a heuristic when in figure-it-out mode and using reclaim_mode = 1
> > 
> > 	If we have locally reclaimed 2% of the nodes memory in file pages
> > 	within the last 5 seconds when >= 20% of total physical memory was
> > 	free, then set the reclaim_mode to 0 on the assumption the node is
> > 	mostly caching pages and shouldn't be reclaimed to avoid excessive IO
> > 
> > Option 1 would appear to be the most straight-forward but option 2
> > should be doable. Option 3 and 4 could turn into a rats nest and I would
> > consider those approaches a bit more drastic.
> 
> hmhm. 
> I think the key-point of option 1 and 2 are proper hardware detecting way.
> 
> option 3 and 4 are more prefere idea to me. I like workload adapted heuristic.
> but you already pointed out its hard, because page-allocator don't know
> allocation purpose ;)
> 

Option 3 may be undoable. Even if the allocations are tagged as "this is
a file-backed allocation", we have no way of detecting how important
that is to the overall workload. Option 4 would be the preference. It's
a heuristic that might let us down, but the administrator can override
it and fix the reclaim_mode in the event we get it wrong.

> 
> > > @@ -10,6 +10,12 @@ struct device_node;
> > >  
> > >  #include <asm/mmzone.h>
> > >  
> > > +/*
> > > + * Distance above which we begin to use zone reclaim
> > > + */
> > > +#define RECLAIM_DISTANCE 20
> > > +
> > > +
> > 
> > Where is the ia-64-specific modifier to RECAIM_DISTANCE?
> 
> 
> arch/ia64/include/asm/topology.h has
> 
> 	/*
> 	 * Distance above which we begin to use zone reclaim
> 	 */
> 	#define RECLAIM_DISTANCE 15
> 
> 
> I don't think distance=15 is machine independent proper definition.
> but there is long lived definition ;)
> 
> 
> 
> 

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
  2009-06-09 12:02         ` Robin Holt
  (?)
  (?)
@ 2009-06-09 19:47           ` Andrew Morton
  -1 siblings, 0 replies; 39+ messages in thread
From: Andrew Morton @ 2009-06-09 19:47 UTC (permalink / raw)
  To: Robin Holt
  Cc: mel, holt, kosaki.motohiro, cl, riel, yanmin.zhang, fengguang.wu,
	linux-ia64, linuxppc-dev, linux-kernel, linux-mm

On Tue, 9 Jun 2009 07:02:14 -0500
Robin Holt <holt@sgi.com> wrote:

> On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
> > On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> > > On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> > > 
> > > Let me start by saying I agree completely with everything you wrote and
> > > still disagree with this patch, but was willing to compromise and work
> > > around this for our upcoming x86_64 machine by putting a "value add"
> > > into our packaging of adding a sysctl that turns reclaim back on.
> > > 
> > 
> > To be honest, I'm more leaning towards a NACK than an ACK on this one. I
> > don't support enough NUMA machines to feel strongly enough about it but
> > unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
> > might be there seems ill-advised to me and will have other consequences for
> > existing more traditional x86-64 NUMA machines.
> 
> I was sort-of planning on coming up with an x86_64 arch specific function
> for setting zone_reclaim_mode, but didn't like the direction things
> were going.
> 
> Something to the effect of...
> --- 20090609.orig/mm/page_alloc.c       2009-06-09 06:51:34.000000000 -0500
> +++ 20090609/mm/page_alloc.c    2009-06-09 06:55:00.160762069 -0500
> @@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
>         while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
>                 int distance = node_distance(local_node, node);
>  
> -               /*
> -                * If another node is sufficiently far away then it is better
> -                * to reclaim pages in a zone before going off node.
> -                */
> -               if (distance > RECLAIM_DISTANCE)
> -                       zone_reclaim_mode = 1;
> +               zone_reclaim_mode = arch_zone_reclaim_mode(distance);
>  
>                 /*
>                  * We don't want to pressure a particular node.
> 
> And then letting each arch define an arch_zone_reclaim_mode().  If other
> values are needed in the determination, we would add parameters to
> reflect this.
> 
> For ia64, add
> 
> static inline ia64_zone_reclaim_mode(int distance)
> {
> 	if (distance > 15)
> 		return 1;
> }
> 
> #define	arch_zone_reclaim_mode(_d)	ia64_zone_reclaim_mode(_d)
> 
> 
> Then, inside x86_64_zone_reclaim_mode(), I could make it something like
> 	if (distance > 40 || is_uv_system())
> 		return 1;
> 
> In the end, I didn't think this fight was worth fighting given how ugly
> this felt.  Upon second thought, I am beginning to think it is not that
> bad, but I also don't think it is that good either.
> 

We've done worse before now...

Is it not possible to work out at runtime whether zone reclaim mode is
beneficial?

Given that zone_reclaim_mode is settable from initscripts, why all the
fuss?

Is anyone testing RECLAIM_WRITE and RECLAIM_SWAP, btw?

The root cause of this problem: having something called "mode".  Any
time we put a "mode" in the kernel, we get in a mess trying to work out
when to set it and to what.

I think I'll drop this patch for now.


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 19:47           ` Andrew Morton
  0 siblings, 0 replies; 39+ messages in thread
From: Andrew Morton @ 2009-06-09 19:47 UTC (permalink / raw)
  To: Robin Holt
  Cc: mel, kosaki.motohiro, cl, riel, yanmin.zhang, fengguang.wu,
	linux-ia64, linuxppc-dev, linux-kernel, linux-mm

On Tue, 9 Jun 2009 07:02:14 -0500
Robin Holt <holt@sgi.com> wrote:

> On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
> > On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> > > On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> > > 
> > > Let me start by saying I agree completely with everything you wrote and
> > > still disagree with this patch, but was willing to compromise and work
> > > around this for our upcoming x86_64 machine by putting a "value add"
> > > into our packaging of adding a sysctl that turns reclaim back on.
> > > 
> > 
> > To be honest, I'm more leaning towards a NACK than an ACK on this one. I
> > don't support enough NUMA machines to feel strongly enough about it but
> > unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
> > might be there seems ill-advised to me and will have other consequences for
> > existing more traditional x86-64 NUMA machines.
> 
> I was sort-of planning on coming up with an x86_64 arch specific function
> for setting zone_reclaim_mode, but didn't like the direction things
> were going.
> 
> Something to the effect of...
> --- 20090609.orig/mm/page_alloc.c       2009-06-09 06:51:34.000000000 -0500
> +++ 20090609/mm/page_alloc.c    2009-06-09 06:55:00.160762069 -0500
> @@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
>         while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
>                 int distance = node_distance(local_node, node);
>  
> -               /*
> -                * If another node is sufficiently far away then it is better
> -                * to reclaim pages in a zone before going off node.
> -                */
> -               if (distance > RECLAIM_DISTANCE)
> -                       zone_reclaim_mode = 1;
> +               zone_reclaim_mode = arch_zone_reclaim_mode(distance);
>  
>                 /*
>                  * We don't want to pressure a particular node.
> 
> And then letting each arch define an arch_zone_reclaim_mode().  If other
> values are needed in the determination, we would add parameters to
> reflect this.
> 
> For ia64, add
> 
> static inline ia64_zone_reclaim_mode(int distance)
> {
> 	if (distance > 15)
> 		return 1;
> }
> 
> #define	arch_zone_reclaim_mode(_d)	ia64_zone_reclaim_mode(_d)
> 
> 
> Then, inside x86_64_zone_reclaim_mode(), I could make it something like
> 	if (distance > 40 || is_uv_system())
> 		return 1;
> 
> In the end, I didn't think this fight was worth fighting given how ugly
> this felt.  Upon second thought, I am beginning to think it is not that
> bad, but I also don't think it is that good either.
> 

We've done worse before now...

Is it not possible to work out at runtime whether zone reclaim mode is
beneficial?

Given that zone_reclaim_mode is settable from initscripts, why all the
fuss?

Is anyone testing RECLAIM_WRITE and RECLAIM_SWAP, btw?

The root cause of this problem: having something called "mode".  Any
time we put a "mode" in the kernel, we get in a mess trying to work out
when to set it and to what.

I think I'll drop this patch for now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 19:47           ` Andrew Morton
  0 siblings, 0 replies; 39+ messages in thread
From: Andrew Morton @ 2009-06-09 19:47 UTC (permalink / raw)
  To: Robin Holt
  Cc: riel, cl, linux-mm, mel, yanmin.zhang, linux-kernel,
	linuxppc-dev, holt, kosaki.motohiro, linux-ia64, fengguang.wu

On Tue, 9 Jun 2009 07:02:14 -0500
Robin Holt <holt@sgi.com> wrote:

> On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
> > On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> > > On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> > > 
> > > Let me start by saying I agree completely with everything you wrote and
> > > still disagree with this patch, but was willing to compromise and work
> > > around this for our upcoming x86_64 machine by putting a "value add"
> > > into our packaging of adding a sysctl that turns reclaim back on.
> > > 
> > 
> > To be honest, I'm more leaning towards a NACK than an ACK on this one. I
> > don't support enough NUMA machines to feel strongly enough about it but
> > unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
> > might be there seems ill-advised to me and will have other consequences for
> > existing more traditional x86-64 NUMA machines.
> 
> I was sort-of planning on coming up with an x86_64 arch specific function
> for setting zone_reclaim_mode, but didn't like the direction things
> were going.
> 
> Something to the effect of...
> --- 20090609.orig/mm/page_alloc.c       2009-06-09 06:51:34.000000000 -0500
> +++ 20090609/mm/page_alloc.c    2009-06-09 06:55:00.160762069 -0500
> @@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
>         while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
>                 int distance = node_distance(local_node, node);
>  
> -               /*
> -                * If another node is sufficiently far away then it is better
> -                * to reclaim pages in a zone before going off node.
> -                */
> -               if (distance > RECLAIM_DISTANCE)
> -                       zone_reclaim_mode = 1;
> +               zone_reclaim_mode = arch_zone_reclaim_mode(distance);
>  
>                 /*
>                  * We don't want to pressure a particular node.
> 
> And then letting each arch define an arch_zone_reclaim_mode().  If other
> values are needed in the determination, we would add parameters to
> reflect this.
> 
> For ia64, add
> 
> static inline ia64_zone_reclaim_mode(int distance)
> {
> 	if (distance > 15)
> 		return 1;
> }
> 
> #define	arch_zone_reclaim_mode(_d)	ia64_zone_reclaim_mode(_d)
> 
> 
> Then, inside x86_64_zone_reclaim_mode(), I could make it something like
> 	if (distance > 40 || is_uv_system())
> 		return 1;
> 
> In the end, I didn't think this fight was worth fighting given how ugly
> this felt.  Upon second thought, I am beginning to think it is not that
> bad, but I also don't think it is that good either.
> 

We've done worse before now...

Is it not possible to work out at runtime whether zone reclaim mode is
beneficial?

Given that zone_reclaim_mode is settable from initscripts, why all the
fuss?

Is anyone testing RECLAIM_WRITE and RECLAIM_SWAP, btw?

The root cause of this problem: having something called "mode".  Any
time we put a "mode" in the kernel, we get in a mess trying to work out
when to set it and to what.

I think I'll drop this patch for now.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH v4] zone_reclaim is always 0 by default
@ 2009-06-09 19:47           ` Andrew Morton
  0 siblings, 0 replies; 39+ messages in thread
From: Andrew Morton @ 2009-06-09 19:47 UTC (permalink / raw)
  To: Robin Holt
  Cc: mel, holt, kosaki.motohiro, cl, riel, yanmin.zhang, fengguang.wu,
	linux-ia64, linuxppc-dev, linux-kernel, linux-mm

On Tue, 9 Jun 2009 07:02:14 -0500
Robin Holt <holt@sgi.com> wrote:

> On Tue, Jun 09, 2009 at 11:37:55AM +0100, Mel Gorman wrote:
> > On Tue, Jun 09, 2009 at 04:55:07AM -0500, Robin Holt wrote:
> > > On Mon, Jun 08, 2009 at 12:50:48PM +0100, Mel Gorman wrote:
> > > 
> > > Let me start by saying I agree completely with everything you wrote and
> > > still disagree with this patch, but was willing to compromise and work
> > > around this for our upcoming x86_64 machine by putting a "value add"
> > > into our packaging of adding a sysctl that turns reclaim back on.
> > > 
> > 
> > To be honest, I'm more leaning towards a NACK than an ACK on this one. I
> > don't support enough NUMA machines to feel strongly enough about it but
> > unconditionally setting zone_reclaim_mode to 0 on x86-64 just because i7's
> > might be there seems ill-advised to me and will have other consequences for
> > existing more traditional x86-64 NUMA machines.
> 
> I was sort-of planning on coming up with an x86_64 arch specific function
> for setting zone_reclaim_mode, but didn't like the direction things
> were going.
> 
> Something to the effect of...
> --- 20090609.orig/mm/page_alloc.c       2009-06-09 06:51:34.000000000 -0500
> +++ 20090609/mm/page_alloc.c    2009-06-09 06:55:00.160762069 -0500
> @@ -2326,12 +2326,7 @@ static void build_zonelists(pg_data_t *p
>         while ((node = find_next_best_node(local_node, &used_mask)) >= 0) {
>                 int distance = node_distance(local_node, node);
>  
> -               /*
> -                * If another node is sufficiently far away then it is better
> -                * to reclaim pages in a zone before going off node.
> -                */
> -               if (distance > RECLAIM_DISTANCE)
> -                       zone_reclaim_mode = 1;
> +               zone_reclaim_mode = arch_zone_reclaim_mode(distance);
>  
>                 /*
>                  * We don't want to pressure a particular node.
> 
> And then letting each arch define an arch_zone_reclaim_mode().  If other
> values are needed in the determination, we would add parameters to
> reflect this.
> 
> For ia64, add
> 
> static inline ia64_zone_reclaim_mode(int distance)
> {
> 	if (distance > 15)
> 		return 1;
> }
> 
> #define	arch_zone_reclaim_mode(_d)	ia64_zone_reclaim_mode(_d)
> 
> 
> Then, inside x86_64_zone_reclaim_mode(), I could make it something like
> 	if (distance > 40 || is_uv_system())
> 		return 1;
> 
> In the end, I didn't think this fight was worth fighting given how ugly
> this felt.  Upon second thought, I am beginning to think it is not that
> bad, but I also don't think it is that good either.
> 

We've done worse before now...

Is it not possible to work out at runtime whether zone reclaim mode is
beneficial?

Given that zone_reclaim_mode is settable from initscripts, why all the
fuss?

Is anyone testing RECLAIM_WRITE and RECLAIM_SWAP, btw?

The root cause of this problem: having something called "mode".  Any
time we put a "mode" in the kernel, we get in a mess trying to work out
when to set it and to what.

I think I'll drop this patch for now.


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2009-06-09 19:48 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-04 10:23 [PATCH v4] zone_reclaim is always 0 by default KOSAKI Motohiro
2009-06-04 10:23 ` KOSAKI Motohiro
2009-06-04 10:23 ` KOSAKI Motohiro
2009-06-04 10:59 ` Wu Fengguang
2009-06-04 10:59   ` Wu Fengguang
2009-06-04 10:59   ` Wu Fengguang
2009-06-04 10:59   ` Wu Fengguang
2009-06-04 12:24 ` Robin Holt
2009-06-04 12:24   ` Robin Holt
2009-06-04 12:24   ` Robin Holt
2009-06-04 12:24   ` Robin Holt
2009-06-08 11:50 ` Mel Gorman
2009-06-08 11:50   ` Mel Gorman
2009-06-08 11:50   ` Mel Gorman
2009-06-08 11:50   ` Mel Gorman
2009-06-09  9:55   ` Robin Holt
2009-06-09  9:55     ` Robin Holt
2009-06-09  9:55     ` Robin Holt
2009-06-09  9:55     ` Robin Holt
2009-06-09 10:37     ` Mel Gorman
2009-06-09 10:37       ` Mel Gorman
2009-06-09 10:37       ` Mel Gorman
2009-06-09 10:37       ` Mel Gorman
2009-06-09 12:02       ` Robin Holt
2009-06-09 12:02         ` Robin Holt
2009-06-09 12:02         ` Robin Holt
2009-06-09 12:02         ` Robin Holt
2009-06-09 19:47         ` Andrew Morton
2009-06-09 19:47           ` Andrew Morton
2009-06-09 19:47           ` Andrew Morton
2009-06-09 19:47           ` Andrew Morton
2009-06-09 13:48   ` KOSAKI Motohiro
2009-06-09 13:48     ` KOSAKI Motohiro
2009-06-09 13:48     ` KOSAKI Motohiro
2009-06-09 13:48     ` KOSAKI Motohiro
2009-06-09 14:38     ` Mel Gorman
2009-06-09 14:38       ` Mel Gorman
2009-06-09 14:38       ` Mel Gorman
2009-06-09 14:38       ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.