All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 18:28 ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 18:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hugh Dickins, Nick Piggin, linux-mm, Andi Kleen, Marcelo Tosatti,
	Christoph Lameter

[RFC] Framework for accurate node based statistics

Currently we have various vm counters that are split per cpu. This arrangement
does not allow access to per node statistics that are important to optimize
VM behavior for NUMA architectures. All one can say from the per_cpu
differential variables is how much a certain variable was changed by this cpu
without being able to deduce how many pages in each node are of a certain type.

This patch introduces a generic framework to allow accurate per node vm
statistics through a large per node and per cpu array. The numbers are
consolidated when the slab drainer runs (every 3 seconds or so) into global
and per node counters. VM functions can then check these statistics by
simply accessing the node specific or global counter.

A significant problem with this approach is that the statistics are only
accumulated every 3 seconds or so. I have tried various other approaches
but they typically end up with having to add atomic variables to critical
VM paths. I'd be glad if someone else had a bright idea on how to improve
the situation.

There are two patches following that convert two important counters to
work per node but there may be many more that may be useful in the future.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc3/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/page_alloc.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/page_alloc.c	2005-12-01 00:38:05.000000000 -0800
@@ -557,6 +557,33 @@ static int rmqueue_bulk(struct zone *zon
 }
 
 #ifdef CONFIG_NUMA
+static spinlock_t node_stat_lock;
+unsigned long vm_stat_global[NR_STAT_ITEMS];
+unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
+int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+
+void refresh_vm_stats(void) {
+	int cpu;
+	int node;
+	int i;
+
+	spin_lock(&node_stat_lock);
+
+	cpu = get_cpu();
+	for_each_online_node(node)
+		for(i = 0; i < NR_STAT_ITEMS; i++) {
+			int * p = vm_stat_diff[cpu][node]+i;
+			if (*p) {
+				vm_stat_node[node][i] += *p;
+				vm_stat_global[i] += *p;
+				*p = 0;
+			}
+		}
+	put_cpu();
+
+	spin_unlock(&node_stat_lock);
+}
+
 /* Called from the slab reaper to drain remote pagesets */
 void drain_remote_pages(void)
 {
Index: linux-2.6.15-rc3/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc3.orig/include/linux/page-flags.h	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/include/linux/page-flags.h	2005-12-01 00:35:38.000000000 -0800
@@ -163,6 +163,27 @@ extern void __mod_page_state(unsigned lo
 	} while (0)
 
 /*
+ * Node based accounting with per cpu differentials.
+ */
+enum node_stat_item { };
+#define NR_STAT_ITEMS 0
+
+extern unsigned long vm_stat_global[NR_STAT_ITEMS];
+extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
+extern int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+
+static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
+{
+	vm_stat_diff[get_cpu()][node][item] += delta;
+	put_cpu();
+}
+
+#define inc_node_page_state(node, item) mod_node_page_state(node, item, 1)
+#define dec_node_page_state(node, item) mod_node_page_state(node, item, -1)
+#define add_node_page_state(node, item) mod_node_page_state(node, item, delta)
+#define sub_node_page_state(node, item) mod_node_page_state(node, item, -(delta))
+
+/*
  * Manipulation of page state flags
  */
 #define PageLocked(page)		\
Index: linux-2.6.15-rc3/include/linux/gfp.h
===================================================================
--- linux-2.6.15-rc3.orig/include/linux/gfp.h	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/include/linux/gfp.h	2005-12-01 00:34:02.000000000 -0800
@@ -153,8 +153,10 @@ extern void FASTCALL(free_cold_page(stru
 void page_alloc_init(void);
 #ifdef CONFIG_NUMA
 void drain_remote_pages(void);
+void refresh_vm_stats(void);
 #else
 static inline void drain_remote_pages(void) { };
+static inline void refresh_vm_stats(void) { }
 #endif
 
 #endif /* __LINUX_GFP_H */
Index: linux-2.6.15-rc3/mm/slab.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/slab.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/slab.c	2005-12-01 00:34:02.000000000 -0800
@@ -3359,6 +3359,7 @@ next:
 	check_irq_on();
 	up(&cache_chain_sem);
 	drain_remote_pages();
+	refresh_vm_stats();
 	/* Setup the next iteration */
 	schedule_delayed_work(&__get_cpu_var(reap_work), REAPTIMEOUT_CPUC);
 }

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 18:28 ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 18:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hugh Dickins, Nick Piggin, linux-mm, Andi Kleen, Marcelo Tosatti,
	Christoph Lameter

[RFC] Framework for accurate node based statistics

Currently we have various vm counters that are split per cpu. This arrangement
does not allow access to per node statistics that are important to optimize
VM behavior for NUMA architectures. All one can say from the per_cpu
differential variables is how much a certain variable was changed by this cpu
without being able to deduce how many pages in each node are of a certain type.

This patch introduces a generic framework to allow accurate per node vm
statistics through a large per node and per cpu array. The numbers are
consolidated when the slab drainer runs (every 3 seconds or so) into global
and per node counters. VM functions can then check these statistics by
simply accessing the node specific or global counter.

A significant problem with this approach is that the statistics are only
accumulated every 3 seconds or so. I have tried various other approaches
but they typically end up with having to add atomic variables to critical
VM paths. I'd be glad if someone else had a bright idea on how to improve
the situation.

There are two patches following that convert two important counters to
work per node but there may be many more that may be useful in the future.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc3/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/page_alloc.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/page_alloc.c	2005-12-01 00:38:05.000000000 -0800
@@ -557,6 +557,33 @@ static int rmqueue_bulk(struct zone *zon
 }
 
 #ifdef CONFIG_NUMA
+static spinlock_t node_stat_lock;
+unsigned long vm_stat_global[NR_STAT_ITEMS];
+unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
+int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+
+void refresh_vm_stats(void) {
+	int cpu;
+	int node;
+	int i;
+
+	spin_lock(&node_stat_lock);
+
+	cpu = get_cpu();
+	for_each_online_node(node)
+		for(i = 0; i < NR_STAT_ITEMS; i++) {
+			int * p = vm_stat_diff[cpu][node]+i;
+			if (*p) {
+				vm_stat_node[node][i] += *p;
+				vm_stat_global[i] += *p;
+				*p = 0;
+			}
+		}
+	put_cpu();
+
+	spin_unlock(&node_stat_lock);
+}
+
 /* Called from the slab reaper to drain remote pagesets */
 void drain_remote_pages(void)
 {
Index: linux-2.6.15-rc3/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc3.orig/include/linux/page-flags.h	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/include/linux/page-flags.h	2005-12-01 00:35:38.000000000 -0800
@@ -163,6 +163,27 @@ extern void __mod_page_state(unsigned lo
 	} while (0)
 
 /*
+ * Node based accounting with per cpu differentials.
+ */
+enum node_stat_item { };
+#define NR_STAT_ITEMS 0
+
+extern unsigned long vm_stat_global[NR_STAT_ITEMS];
+extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
+extern int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+
+static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
+{
+	vm_stat_diff[get_cpu()][node][item] += delta;
+	put_cpu();
+}
+
+#define inc_node_page_state(node, item) mod_node_page_state(node, item, 1)
+#define dec_node_page_state(node, item) mod_node_page_state(node, item, -1)
+#define add_node_page_state(node, item) mod_node_page_state(node, item, delta)
+#define sub_node_page_state(node, item) mod_node_page_state(node, item, -(delta))
+
+/*
  * Manipulation of page state flags
  */
 #define PageLocked(page)		\
Index: linux-2.6.15-rc3/include/linux/gfp.h
===================================================================
--- linux-2.6.15-rc3.orig/include/linux/gfp.h	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/include/linux/gfp.h	2005-12-01 00:34:02.000000000 -0800
@@ -153,8 +153,10 @@ extern void FASTCALL(free_cold_page(stru
 void page_alloc_init(void);
 #ifdef CONFIG_NUMA
 void drain_remote_pages(void);
+void refresh_vm_stats(void);
 #else
 static inline void drain_remote_pages(void) { };
+static inline void refresh_vm_stats(void) { }
 #endif
 
 #endif /* __LINUX_GFP_H */
Index: linux-2.6.15-rc3/mm/slab.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/slab.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/slab.c	2005-12-01 00:34:02.000000000 -0800
@@ -3359,6 +3359,7 @@ next:
 	check_irq_on();
 	up(&cache_chain_sem);
 	drain_remote_pages();
+	refresh_vm_stats();
 	/* Setup the next iteration */
 	schedule_delayed_work(&__get_cpu_var(reap_work), REAPTIMEOUT_CPUC);
 }

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC 2/3] Make nr_mapped a per node counter
  2005-12-06 18:28 ` Christoph Lameter
@ 2005-12-06 18:28   ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 18:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hugh Dickins, Nick Piggin, linux-mm, Andi Kleen, Marcelo Tosatti,
	Christoph Lameter

Make nr_mapped a per node counter

The per cpu nr_mapped counter is important because it allows a determination
how many pages of a node are not mapped, which would allow a more effiecient
means of determining when a node should reclaim memory.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc3/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc3.orig/include/linux/page-flags.h	2005-12-01 00:35:38.000000000 -0800
+++ linux-2.6.15-rc3/include/linux/page-flags.h	2005-12-01 00:35:49.000000000 -0800
@@ -85,7 +85,6 @@ struct page_state {
 	unsigned long nr_writeback;	/* Pages under writeback */
 	unsigned long nr_unstable;	/* NFS unstable pages */
 	unsigned long nr_page_table_pages;/* Pages used for pagetables */
-	unsigned long nr_mapped;	/* mapped into pagetables */
 	unsigned long nr_slab;		/* In slab */
 #define GET_PAGE_STATE_LAST nr_slab
 
@@ -165,8 +164,8 @@ extern void __mod_page_state(unsigned lo
 /*
  * Node based accounting with per cpu differentials.
  */
-enum node_stat_item { };
-#define NR_STAT_ITEMS 0
+enum node_stat_item { NR_MAPPED };
+#define NR_STAT_ITEMS 1
 
 extern unsigned long vm_stat_global[NR_STAT_ITEMS];
 extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
Index: linux-2.6.15-rc3/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc3.orig/drivers/base/node.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/drivers/base/node.c	2005-12-01 00:35:49.000000000 -0800
@@ -53,8 +53,6 @@ static ssize_t node_read_meminfo(struct 
 		ps.nr_dirty = 0;
 	if ((long)ps.nr_writeback < 0)
 		ps.nr_writeback = 0;
-	if ((long)ps.nr_mapped < 0)
-		ps.nr_mapped = 0;
 	if ((long)ps.nr_slab < 0)
 		ps.nr_slab = 0;
 
@@ -83,7 +81,7 @@ static ssize_t node_read_meminfo(struct 
 		       nid, K(i.freeram - i.freehigh),
 		       nid, K(ps.nr_dirty),
 		       nid, K(ps.nr_writeback),
-		       nid, K(ps.nr_mapped),
+		       nid, K(vm_stat_node[nid][NR_MAPPED]),
 		       nid, K(ps.nr_slab));
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
Index: linux-2.6.15-rc3/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc3.orig/fs/proc/proc_misc.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/fs/proc/proc_misc.c	2005-12-01 00:35:49.000000000 -0800
@@ -190,7 +190,7 @@ static int meminfo_read_proc(char *page,
 		K(i.freeswap),
 		K(ps.nr_dirty),
 		K(ps.nr_writeback),
-		K(ps.nr_mapped),
+		K(vm_stat_global[NR_MAPPED]),
 		K(ps.nr_slab),
 		K(allowed),
 		K(committed),
Index: linux-2.6.15-rc3/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/vmscan.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/vmscan.c	2005-12-01 00:35:49.000000000 -0800
@@ -967,7 +967,7 @@ int try_to_free_pages(struct zone **zone
 	}
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
-		sc.nr_mapped = read_page_state(nr_mapped);
+		sc.nr_mapped = vm_stat_global[NR_MAPPED];
 		sc.nr_scanned = 0;
 		sc.nr_reclaimed = 0;
 		sc.priority = priority;
@@ -1056,7 +1056,7 @@ loop_again:
 	sc.gfp_mask = GFP_KERNEL;
 	sc.may_writepage = 0;
 	sc.may_swap = 1;
-	sc.nr_mapped = read_page_state(nr_mapped);
+	sc.nr_mapped = vm_stat_global[NR_MAPPED];
 
 	inc_page_state(pageoutrun);
 
@@ -1373,7 +1373,7 @@ int zone_reclaim(struct zone *zone, gfp_
 	sc.gfp_mask = gfp_mask;
 	sc.may_writepage = 0;
 	sc.may_swap = 0;
-	sc.nr_mapped = read_page_state(nr_mapped);
+	sc.nr_mapped = vm_stat_global[NR_MAPPED];
 	sc.nr_scanned = 0;
 	sc.nr_reclaimed = 0;
 	/* scan at the highest priority */
Index: linux-2.6.15-rc3/mm/page-writeback.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/page-writeback.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/page-writeback.c	2005-12-01 00:35:49.000000000 -0800
@@ -111,7 +111,7 @@ static void get_writeback_state(struct w
 {
 	wbs->nr_dirty = read_page_state(nr_dirty);
 	wbs->nr_unstable = read_page_state(nr_unstable);
-	wbs->nr_mapped = read_page_state(nr_mapped);
+	wbs->nr_mapped = vm_stat_global[NR_MAPPED];
 	wbs->nr_writeback = read_page_state(nr_writeback);
 }
 
Index: linux-2.6.15-rc3/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/page_alloc.c	2005-12-01 00:34:02.000000000 -0800
+++ linux-2.6.15-rc3/mm/page_alloc.c	2005-12-01 00:35:49.000000000 -0800
@@ -1400,7 +1400,7 @@ void show_free_areas(void)
 		ps.nr_unstable,
 		nr_free_pages(),
 		ps.nr_slab,
-		ps.nr_mapped,
+		vm_stat_global[NR_MAPPED],
 		ps.nr_page_table_pages);
 
 	for_each_zone(zone) {
Index: linux-2.6.15-rc3/mm/rmap.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/rmap.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/rmap.c	2005-12-01 00:35:49.000000000 -0800
@@ -454,7 +454,7 @@ void page_add_anon_rmap(struct page *pag
 
 		page->index = linear_page_index(vma, address);
 
-		inc_page_state(nr_mapped);
+		inc_node_page_state(page_to_nid(page), NR_MAPPED);
 	}
 	/* else checking page index and mapping is racy */
 }
@@ -471,7 +471,7 @@ void page_add_file_rmap(struct page *pag
 	BUG_ON(!pfn_valid(page_to_pfn(page)));
 
 	if (atomic_inc_and_test(&page->_mapcount))
-		inc_page_state(nr_mapped);
+		inc_node_page_state(page_to_nid(page), NR_MAPPED);
 }
 
 /**
@@ -495,7 +495,7 @@ void page_remove_rmap(struct page *page)
 		 */
 		if (page_test_and_clear_dirty(page))
 			set_page_dirty(page);
-		dec_page_state(nr_mapped);
+		dec_node_page_state(page_to_nid(page), NR_MAPPED);
 	}
 }
 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC 2/3] Make nr_mapped a per node counter
@ 2005-12-06 18:28   ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 18:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hugh Dickins, Nick Piggin, linux-mm, Andi Kleen, Marcelo Tosatti,
	Christoph Lameter

Make nr_mapped a per node counter

The per cpu nr_mapped counter is important because it allows a determination
how many pages of a node are not mapped, which would allow a more effiecient
means of determining when a node should reclaim memory.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc3/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc3.orig/include/linux/page-flags.h	2005-12-01 00:35:38.000000000 -0800
+++ linux-2.6.15-rc3/include/linux/page-flags.h	2005-12-01 00:35:49.000000000 -0800
@@ -85,7 +85,6 @@ struct page_state {
 	unsigned long nr_writeback;	/* Pages under writeback */
 	unsigned long nr_unstable;	/* NFS unstable pages */
 	unsigned long nr_page_table_pages;/* Pages used for pagetables */
-	unsigned long nr_mapped;	/* mapped into pagetables */
 	unsigned long nr_slab;		/* In slab */
 #define GET_PAGE_STATE_LAST nr_slab
 
@@ -165,8 +164,8 @@ extern void __mod_page_state(unsigned lo
 /*
  * Node based accounting with per cpu differentials.
  */
-enum node_stat_item { };
-#define NR_STAT_ITEMS 0
+enum node_stat_item { NR_MAPPED };
+#define NR_STAT_ITEMS 1
 
 extern unsigned long vm_stat_global[NR_STAT_ITEMS];
 extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
Index: linux-2.6.15-rc3/drivers/base/node.c
===================================================================
--- linux-2.6.15-rc3.orig/drivers/base/node.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/drivers/base/node.c	2005-12-01 00:35:49.000000000 -0800
@@ -53,8 +53,6 @@ static ssize_t node_read_meminfo(struct 
 		ps.nr_dirty = 0;
 	if ((long)ps.nr_writeback < 0)
 		ps.nr_writeback = 0;
-	if ((long)ps.nr_mapped < 0)
-		ps.nr_mapped = 0;
 	if ((long)ps.nr_slab < 0)
 		ps.nr_slab = 0;
 
@@ -83,7 +81,7 @@ static ssize_t node_read_meminfo(struct 
 		       nid, K(i.freeram - i.freehigh),
 		       nid, K(ps.nr_dirty),
 		       nid, K(ps.nr_writeback),
-		       nid, K(ps.nr_mapped),
+		       nid, K(vm_stat_node[nid][NR_MAPPED]),
 		       nid, K(ps.nr_slab));
 	n += hugetlb_report_node_meminfo(nid, buf + n);
 	return n;
Index: linux-2.6.15-rc3/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.15-rc3.orig/fs/proc/proc_misc.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/fs/proc/proc_misc.c	2005-12-01 00:35:49.000000000 -0800
@@ -190,7 +190,7 @@ static int meminfo_read_proc(char *page,
 		K(i.freeswap),
 		K(ps.nr_dirty),
 		K(ps.nr_writeback),
-		K(ps.nr_mapped),
+		K(vm_stat_global[NR_MAPPED]),
 		K(ps.nr_slab),
 		K(allowed),
 		K(committed),
Index: linux-2.6.15-rc3/mm/vmscan.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/vmscan.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/vmscan.c	2005-12-01 00:35:49.000000000 -0800
@@ -967,7 +967,7 @@ int try_to_free_pages(struct zone **zone
 	}
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
-		sc.nr_mapped = read_page_state(nr_mapped);
+		sc.nr_mapped = vm_stat_global[NR_MAPPED];
 		sc.nr_scanned = 0;
 		sc.nr_reclaimed = 0;
 		sc.priority = priority;
@@ -1056,7 +1056,7 @@ loop_again:
 	sc.gfp_mask = GFP_KERNEL;
 	sc.may_writepage = 0;
 	sc.may_swap = 1;
-	sc.nr_mapped = read_page_state(nr_mapped);
+	sc.nr_mapped = vm_stat_global[NR_MAPPED];
 
 	inc_page_state(pageoutrun);
 
@@ -1373,7 +1373,7 @@ int zone_reclaim(struct zone *zone, gfp_
 	sc.gfp_mask = gfp_mask;
 	sc.may_writepage = 0;
 	sc.may_swap = 0;
-	sc.nr_mapped = read_page_state(nr_mapped);
+	sc.nr_mapped = vm_stat_global[NR_MAPPED];
 	sc.nr_scanned = 0;
 	sc.nr_reclaimed = 0;
 	/* scan at the highest priority */
Index: linux-2.6.15-rc3/mm/page-writeback.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/page-writeback.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/page-writeback.c	2005-12-01 00:35:49.000000000 -0800
@@ -111,7 +111,7 @@ static void get_writeback_state(struct w
 {
 	wbs->nr_dirty = read_page_state(nr_dirty);
 	wbs->nr_unstable = read_page_state(nr_unstable);
-	wbs->nr_mapped = read_page_state(nr_mapped);
+	wbs->nr_mapped = vm_stat_global[NR_MAPPED];
 	wbs->nr_writeback = read_page_state(nr_writeback);
 }
 
Index: linux-2.6.15-rc3/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/page_alloc.c	2005-12-01 00:34:02.000000000 -0800
+++ linux-2.6.15-rc3/mm/page_alloc.c	2005-12-01 00:35:49.000000000 -0800
@@ -1400,7 +1400,7 @@ void show_free_areas(void)
 		ps.nr_unstable,
 		nr_free_pages(),
 		ps.nr_slab,
-		ps.nr_mapped,
+		vm_stat_global[NR_MAPPED],
 		ps.nr_page_table_pages);
 
 	for_each_zone(zone) {
Index: linux-2.6.15-rc3/mm/rmap.c
===================================================================
--- linux-2.6.15-rc3.orig/mm/rmap.c	2005-11-28 19:51:27.000000000 -0800
+++ linux-2.6.15-rc3/mm/rmap.c	2005-12-01 00:35:49.000000000 -0800
@@ -454,7 +454,7 @@ void page_add_anon_rmap(struct page *pag
 
 		page->index = linear_page_index(vma, address);
 
-		inc_page_state(nr_mapped);
+		inc_node_page_state(page_to_nid(page), NR_MAPPED);
 	}
 	/* else checking page index and mapping is racy */
 }
@@ -471,7 +471,7 @@ void page_add_file_rmap(struct page *pag
 	BUG_ON(!pfn_valid(page_to_pfn(page)));
 
 	if (atomic_inc_and_test(&page->_mapcount))
-		inc_page_state(nr_mapped);
+		inc_node_page_state(page_to_nid(page), NR_MAPPED);
 }
 
 /**
@@ -495,7 +495,7 @@ void page_remove_rmap(struct page *page)
 		 */
 		if (page_test_and_clear_dirty(page))
 			set_page_dirty(page);
-		dec_page_state(nr_mapped);
+		dec_node_page_state(page_to_nid(page), NR_MAPPED);
 	}
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC 3/3] Make nr_pagecache a per node counter
  2005-12-06 18:28 ` Christoph Lameter
@ 2005-12-06 18:28   ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 18:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hugh Dickins, Nick Piggin, linux-mm, Andi Kleen, Marcelo Tosatti,
	Christoph Lameter

Make nr_pagecache a per node variable

The nr_pagecache atomic variable is a particular ugly spot in the VM right
now. We ultimately need a sortof accurate value. This patch makes nr_pagecache
conform to the other VM statistics

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc5/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/page-flags.h	2005-12-06 10:13:49.000000000 -0800
+++ linux-2.6.15-rc5/include/linux/page-flags.h	2005-12-06 10:15:59.000000000 -0800
@@ -164,8 +164,8 @@ extern void __mod_page_state(unsigned lo
 /*
  * Node based accounting with per cpu differentials.
  */
-enum node_stat_item { NR_MAPPED };
-#define NR_STAT_ITEMS 1
+enum node_stat_item { NR_MAPPED, NR_PAGECACHE };
+#define NR_STAT_ITEMS 2
 
 extern unsigned long vm_stat_global[NR_STAT_ITEMS];
 extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
Index: linux-2.6.15-rc5/include/linux/pagemap.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/pagemap.h	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/include/linux/pagemap.h	2005-12-06 10:15:59.000000000 -0800
@@ -99,49 +99,9 @@ int add_to_page_cache_lru(struct page *p
 extern void remove_from_page_cache(struct page *page);
 extern void __remove_from_page_cache(struct page *page);
 
-extern atomic_t nr_pagecache;
-
-#ifdef CONFIG_SMP
-
-#define PAGECACHE_ACCT_THRESHOLD        max(16, NR_CPUS * 2)
-DECLARE_PER_CPU(long, nr_pagecache_local);
-
-/*
- * pagecache_acct implements approximate accounting for pagecache.
- * vm_enough_memory() do not need high accuracy. Writers will keep
- * an offset in their per-cpu arena and will spill that into the
- * global count whenever the absolute value of the local count
- * exceeds the counter's threshold.
- *
- * MUST be protected from preemption.
- * current protection is mapping->page_lock.
- */
-static inline void pagecache_acct(int count)
-{
-	long *local;
-
-	local = &__get_cpu_var(nr_pagecache_local);
-	*local += count;
-	if (*local > PAGECACHE_ACCT_THRESHOLD || *local < -PAGECACHE_ACCT_THRESHOLD) {
-		atomic_add(*local, &nr_pagecache);
-		*local = 0;
-	}
-}
-
-#else
-
-static inline void pagecache_acct(int count)
-{
-	atomic_add(count, &nr_pagecache);
-}
-#endif
-
 static inline unsigned long get_page_cache_size(void)
 {
-	int ret = atomic_read(&nr_pagecache);
-	if (unlikely(ret < 0))
-		ret = 0;
-	return ret;
+	return vm_stat_global[NR_PAGECACHE];
 }
 
 /*
Index: linux-2.6.15-rc5/mm/swap_state.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/swap_state.c	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/mm/swap_state.c	2005-12-06 10:15:59.000000000 -0800
@@ -84,7 +84,7 @@ static int __add_to_swap_cache(struct pa
 			SetPageSwapCache(page);
 			set_page_private(page, entry.val);
 			total_swapcache_pages++;
-			pagecache_acct(1);
+			inc_node_page_state(page_to_nid(page), NR_PAGECACHE);
 		}
 		write_unlock_irq(&swapper_space.tree_lock);
 		radix_tree_preload_end();
@@ -129,7 +129,7 @@ void __delete_from_swap_cache(struct pag
 	set_page_private(page, 0);
 	ClearPageSwapCache(page);
 	total_swapcache_pages--;
-	pagecache_acct(-1);
+	dec_node_page_state(page_to_nid(page), NR_PAGECACHE);
 	INC_CACHE_INFO(del_total);
 }
 
Index: linux-2.6.15-rc5/mm/filemap.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/filemap.c	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/mm/filemap.c	2005-12-06 10:15:59.000000000 -0800
@@ -115,7 +115,7 @@ void __remove_from_page_cache(struct pag
 	radix_tree_delete(&mapping->page_tree, page->index);
 	page->mapping = NULL;
 	mapping->nrpages--;
-	pagecache_acct(-1);
+	dec_node_page_state(page_to_nid(page), NR_PAGECACHE);
 }
 
 void remove_from_page_cache(struct page *page)
@@ -390,7 +390,7 @@ int add_to_page_cache(struct page *page,
 			page->mapping = mapping;
 			page->index = offset;
 			mapping->nrpages++;
-			pagecache_acct(1);
+			inc_node_page_state(page_to_nid(page), NR_PAGECACHE);
 		}
 		write_unlock_irq(&mapping->tree_lock);
 		radix_tree_preload_end();

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [RFC 3/3] Make nr_pagecache a per node counter
@ 2005-12-06 18:28   ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 18:28 UTC (permalink / raw)
  To: linux-kernel
  Cc: Hugh Dickins, Nick Piggin, linux-mm, Andi Kleen, Marcelo Tosatti,
	Christoph Lameter

Make nr_pagecache a per node variable

The nr_pagecache atomic variable is a particular ugly spot in the VM right
now. We ultimately need a sortof accurate value. This patch makes nr_pagecache
conform to the other VM statistics

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.15-rc5/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/page-flags.h	2005-12-06 10:13:49.000000000 -0800
+++ linux-2.6.15-rc5/include/linux/page-flags.h	2005-12-06 10:15:59.000000000 -0800
@@ -164,8 +164,8 @@ extern void __mod_page_state(unsigned lo
 /*
  * Node based accounting with per cpu differentials.
  */
-enum node_stat_item { NR_MAPPED };
-#define NR_STAT_ITEMS 1
+enum node_stat_item { NR_MAPPED, NR_PAGECACHE };
+#define NR_STAT_ITEMS 2
 
 extern unsigned long vm_stat_global[NR_STAT_ITEMS];
 extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
Index: linux-2.6.15-rc5/include/linux/pagemap.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/pagemap.h	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/include/linux/pagemap.h	2005-12-06 10:15:59.000000000 -0800
@@ -99,49 +99,9 @@ int add_to_page_cache_lru(struct page *p
 extern void remove_from_page_cache(struct page *page);
 extern void __remove_from_page_cache(struct page *page);
 
-extern atomic_t nr_pagecache;
-
-#ifdef CONFIG_SMP
-
-#define PAGECACHE_ACCT_THRESHOLD        max(16, NR_CPUS * 2)
-DECLARE_PER_CPU(long, nr_pagecache_local);
-
-/*
- * pagecache_acct implements approximate accounting for pagecache.
- * vm_enough_memory() do not need high accuracy. Writers will keep
- * an offset in their per-cpu arena and will spill that into the
- * global count whenever the absolute value of the local count
- * exceeds the counter's threshold.
- *
- * MUST be protected from preemption.
- * current protection is mapping->page_lock.
- */
-static inline void pagecache_acct(int count)
-{
-	long *local;
-
-	local = &__get_cpu_var(nr_pagecache_local);
-	*local += count;
-	if (*local > PAGECACHE_ACCT_THRESHOLD || *local < -PAGECACHE_ACCT_THRESHOLD) {
-		atomic_add(*local, &nr_pagecache);
-		*local = 0;
-	}
-}
-
-#else
-
-static inline void pagecache_acct(int count)
-{
-	atomic_add(count, &nr_pagecache);
-}
-#endif
-
 static inline unsigned long get_page_cache_size(void)
 {
-	int ret = atomic_read(&nr_pagecache);
-	if (unlikely(ret < 0))
-		ret = 0;
-	return ret;
+	return vm_stat_global[NR_PAGECACHE];
 }
 
 /*
Index: linux-2.6.15-rc5/mm/swap_state.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/swap_state.c	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/mm/swap_state.c	2005-12-06 10:15:59.000000000 -0800
@@ -84,7 +84,7 @@ static int __add_to_swap_cache(struct pa
 			SetPageSwapCache(page);
 			set_page_private(page, entry.val);
 			total_swapcache_pages++;
-			pagecache_acct(1);
+			inc_node_page_state(page_to_nid(page), NR_PAGECACHE);
 		}
 		write_unlock_irq(&swapper_space.tree_lock);
 		radix_tree_preload_end();
@@ -129,7 +129,7 @@ void __delete_from_swap_cache(struct pag
 	set_page_private(page, 0);
 	ClearPageSwapCache(page);
 	total_swapcache_pages--;
-	pagecache_acct(-1);
+	dec_node_page_state(page_to_nid(page), NR_PAGECACHE);
 	INC_CACHE_INFO(del_total);
 }
 
Index: linux-2.6.15-rc5/mm/filemap.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/filemap.c	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/mm/filemap.c	2005-12-06 10:15:59.000000000 -0800
@@ -115,7 +115,7 @@ void __remove_from_page_cache(struct pag
 	radix_tree_delete(&mapping->page_tree, page->index);
 	page->mapping = NULL;
 	mapping->nrpages--;
-	pagecache_acct(-1);
+	dec_node_page_state(page_to_nid(page), NR_PAGECACHE);
 }
 
 void remove_from_page_cache(struct page *page)
@@ -390,7 +390,7 @@ int add_to_page_cache(struct page *page,
 			page->mapping = mapping;
 			page->index = offset;
 			mapping->nrpages++;
-			pagecache_acct(1);
+			inc_node_page_state(page_to_nid(page), NR_PAGECACHE);
 		}
 		write_unlock_irq(&mapping->tree_lock);
 		radix_tree_preload_end();

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 18:28 ` Christoph Lameter
@ 2005-12-06 18:35   ` Andi Kleen
  -1 siblings, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2005-12-06 18:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, Andi Kleen,
	Marcelo Tosatti

> +static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
> +{
> +	vm_stat_diff[get_cpu()][node][item] += delta;
> +	put_cpu();

Instead of get/put_cpu I would use a local_t. This would give much better code
on i386/x86-64.  I have some plans to port over all the MM statistics counters
over to local_t, still stuck, but for new code it should be definitely done.

-Andi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 18:35   ` Andi Kleen
  0 siblings, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2005-12-06 18:35 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, Andi Kleen,
	Marcelo Tosatti

> +static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
> +{
> +	vm_stat_diff[get_cpu()][node][item] += delta;
> +	put_cpu();

Instead of get/put_cpu I would use a local_t. This would give much better code
on i386/x86-64.  I have some plans to port over all the MM statistics counters
over to local_t, still stuck, but for new code it should be definitely done.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 18:35   ` Andi Kleen
@ 2005-12-06 19:08     ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 19:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, Marcelo Tosatti

On Tue, 6 Dec 2005, Andi Kleen wrote:

> > +static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
> > +{
> > +	vm_stat_diff[get_cpu()][node][item] += delta;
> > +	put_cpu();
> 
> Instead of get/put_cpu I would use a local_t. This would give much better code
> on i386/x86-64.  I have some plans to port over all the MM statistics counters
> over to local_t, still stuck, but for new code it should be definitely done.

Yuck. That code uses atomic operations and is not aware of atomic64_t.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 19:08     ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 19:08 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, Marcelo Tosatti

On Tue, 6 Dec 2005, Andi Kleen wrote:

> > +static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
> > +{
> > +	vm_stat_diff[get_cpu()][node][item] += delta;
> > +	put_cpu();
> 
> Instead of get/put_cpu I would use a local_t. This would give much better code
> on i386/x86-64.  I have some plans to port over all the MM statistics counters
> over to local_t, still stuck, but for new code it should be definitely done.

Yuck. That code uses atomic operations and is not aware of atomic64_t.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 19:08     ` Christoph Lameter
@ 2005-12-06 19:26       ` Andi Kleen
  -1 siblings, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2005-12-06 19:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	Marcelo Tosatti

On Tue, Dec 06, 2005 at 11:08:42AM -0800, Christoph Lameter wrote:
> On Tue, 6 Dec 2005, Andi Kleen wrote:
> 
> > > +static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
> > > +{
> > > +	vm_stat_diff[get_cpu()][node][item] += delta;
> > > +	put_cpu();
> > 
> > Instead of get/put_cpu I would use a local_t. This would give much better code
> > on i386/x86-64.  I have some plans to port over all the MM statistics counters
> > over to local_t, still stuck, but for new code it should be definitely done.
> 
> Yuck. That code uses atomic operations and is not aware of atomic64_t.

Hmm? What code are you looking at? 

At least i386/x86-64/generic don't use any atomic operations, just
normal non atomic on bus but atomic for interrupts local rmw.

Do you actually need 64bit? 

-Andi

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 19:26       ` Andi Kleen
  0 siblings, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2005-12-06 19:26 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	Marcelo Tosatti

On Tue, Dec 06, 2005 at 11:08:42AM -0800, Christoph Lameter wrote:
> On Tue, 6 Dec 2005, Andi Kleen wrote:
> 
> > > +static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
> > > +{
> > > +	vm_stat_diff[get_cpu()][node][item] += delta;
> > > +	put_cpu();
> > 
> > Instead of get/put_cpu I would use a local_t. This would give much better code
> > on i386/x86-64.  I have some plans to port over all the MM statistics counters
> > over to local_t, still stuck, but for new code it should be definitely done.
> 
> Yuck. That code uses atomic operations and is not aware of atomic64_t.

Hmm? What code are you looking at? 

At least i386/x86-64/generic don't use any atomic operations, just
normal non atomic on bus but atomic for interrupts local rmw.

Do you actually need 64bit? 

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 19:26       ` Andi Kleen
@ 2005-12-06 19:36         ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 19:36 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, Marcelo Tosatti

On Tue, 6 Dec 2005, Andi Kleen wrote:

> > Yuck. That code uses atomic operations and is not aware of atomic64_t.
> Hmm? What code are you looking at? 
include/asm-generic/local.h. this is the default right? And 
include/asm-ia64/local.h.
 
> At least i386/x86-64/generic don't use any atomic operations, just
> normal non atomic on bus but atomic for interrupts local rmw.

inc/dec are atomic by default on x86_64?
 
> Do you actually need 64bit? 

32 bit limits us in the worst case to 8 Terabytes of RAM (assuming a very 
small page size of 4k and 31 bit available for an atomic variable 
[sparc]). SGI already has installations with 15 Terabytes of RAM.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 19:36         ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 19:36 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, Marcelo Tosatti

On Tue, 6 Dec 2005, Andi Kleen wrote:

> > Yuck. That code uses atomic operations and is not aware of atomic64_t.
> Hmm? What code are you looking at? 
include/asm-generic/local.h. this is the default right? And 
include/asm-ia64/local.h.
 
> At least i386/x86-64/generic don't use any atomic operations, just
> normal non atomic on bus but atomic for interrupts local rmw.

inc/dec are atomic by default on x86_64?
 
> Do you actually need 64bit? 

32 bit limits us in the worst case to 8 Terabytes of RAM (assuming a very 
small page size of 4k and 31 bit available for an atomic variable 
[sparc]). SGI already has installations with 15 Terabytes of RAM.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 19:36         ` Christoph Lameter
@ 2005-12-06 20:06           ` Andi Kleen
  -1 siblings, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2005-12-06 20:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	Marcelo Tosatti

On Tue, Dec 06, 2005 at 11:36:43AM -0800, Christoph Lameter wrote:
> On Tue, 6 Dec 2005, Andi Kleen wrote:
> 
> > > Yuck. That code uses atomic operations and is not aware of atomic64_t.
> > Hmm? What code are you looking at? 
> include/asm-generic/local.h. this is the default right? And 
> include/asm-ia64/local.h.
>  
> > At least i386/x86-64/generic don't use any atomic operations, just
> > normal non atomic on bus but atomic for interrupts local rmw.
> 
> inc/dec are atomic by default on x86_64?

They are atomic against interrupts on the same CPU. And on Linux
also atomic against preempt moving you to another CPU. And all that
without the cost of a bus lock. And that is what local_t is about.

>  
> > Do you actually need 64bit? 
> 
> 32 bit limits us in the worst case to 8 Terabytes of RAM (assuming a very 
> small page size of 4k and 31 bit available for an atomic variable 
> [sparc]). SGI already has installations with 15 Terabytes of RAM.

Ok we'll need a local64_t then. No big deal - can be easily added.
Or perhaps better a long_local_t so that 32bit doesn't need to
pay the cost.

-Andi


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 20:06           ` Andi Kleen
  0 siblings, 0 replies; 51+ messages in thread
From: Andi Kleen @ 2005-12-06 20:06 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	Marcelo Tosatti

On Tue, Dec 06, 2005 at 11:36:43AM -0800, Christoph Lameter wrote:
> On Tue, 6 Dec 2005, Andi Kleen wrote:
> 
> > > Yuck. That code uses atomic operations and is not aware of atomic64_t.
> > Hmm? What code are you looking at? 
> include/asm-generic/local.h. this is the default right? And 
> include/asm-ia64/local.h.
>  
> > At least i386/x86-64/generic don't use any atomic operations, just
> > normal non atomic on bus but atomic for interrupts local rmw.
> 
> inc/dec are atomic by default on x86_64?

They are atomic against interrupts on the same CPU. And on Linux
also atomic against preempt moving you to another CPU. And all that
without the cost of a bus lock. And that is what local_t is about.

>  
> > Do you actually need 64bit? 
> 
> 32 bit limits us in the worst case to 8 Terabytes of RAM (assuming a very 
> small page size of 4k and 31 bit available for an atomic variable 
> [sparc]). SGI already has installations with 15 Terabytes of RAM.

Ok we'll need a local64_t then. No big deal - can be easily added.
Or perhaps better a long_local_t so that 32bit doesn't need to
pay the cost.

-Andi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 20:06           ` Andi Kleen
  (?)
@ 2005-12-06 22:52             ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 22:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, linux-ia64,
	Marcelo Tosatti

On Tue, 6 Dec 2005, Andi Kleen wrote:

> Ok we'll need a local64_t then. No big deal - can be easily added.
> Or perhaps better a long_local_t so that 32bit doesn't need to
> pay the cost.

I jusw saw that ia64 already has local_t as 64 bit, so that is no problem 
for us. Here is a patch that would convert the framework to use local_t.
Is that okay?

The problem with this solution is that the use of local_t will lead to the 
use of atomic operations (in case the preemption status is unknown). It 
may be better to use atomic operations and simply drop the per_cpu stuff. 
That way the summing of the per cpu variables is avoided and the 
stats are accurate in real time.

Seems that local.h is rarely used. There was an obvious mistake in there 
for ia64.

Index: linux-2.6.15-rc5/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/page_alloc.c	2005-12-06 10:13:49.000000000 -0800
+++ linux-2.6.15-rc5/mm/page_alloc.c	2005-12-06 14:43:41.000000000 -0800
@@ -560,26 +560,25 @@ static int rmqueue_bulk(struct zone *zon
 static spinlock_t node_stat_lock;
 unsigned long vm_stat_global[NR_STAT_ITEMS];
 unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
-int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
 
 void refresh_vm_stats(void) {
-	int cpu;
 	int node;
 	int i;
 
 	spin_lock(&node_stat_lock);
 
-	cpu = get_cpu();
 	for_each_online_node(node)
 		for(i = 0; i < NR_STAT_ITEMS; i++) {
-			int * p = vm_stat_diff[cpu][node]+i;
-			if (*p) {
-				vm_stat_node[node][i] += *p;
-				vm_stat_global[i] += *p;
-				*p = 0;
+			long v;
+
+			v = cpu_local_read(vm_stat_diff[node][i]);
+			if (v) {
+				vm_stat_node[node][i] += v;
+				vm_stat_global[i] += v;
+				cpu_local_set(vm_stat_diff[node][i], 0);
 			}
 		}
-	put_cpu();
 
 	spin_unlock(&node_stat_lock);
 }
Index: linux-2.6.15-rc5/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/page-flags.h	2005-12-06 10:15:59.000000000 -0800
+++ linux-2.6.15-rc5/include/linux/page-flags.h	2005-12-06 14:47:03.000000000 -0800
@@ -8,6 +8,7 @@
 #include <linux/percpu.h>
 #include <linux/cache.h>
 #include <asm/pgtable.h>
+#include <asm/local.h>
 
 /*
  * Various page->flags bits:
@@ -169,12 +170,19 @@ enum node_stat_item { NR_MAPPED, NR_PAGE
 
 extern unsigned long vm_stat_global[NR_STAT_ITEMS];
 extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
-extern int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+DECLARE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
 
 static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
 {
-	vm_stat_diff[get_cpu()][node][item] += delta;
-	put_cpu();
+	cpu_local_add(delta, vm_stat_diff[node][item]);
+}
+
+/*
+ * For use when we know that preemption is disabled. Avoids atomic operations.
+ */
+static inline void __mod_node_page_state(int node, enum node_stat_item item, int delta)
+{
+	__local_add(delta, &__get_cpu_var(vm_stat_diff[node][item]));
 }
 
 #define inc_node_page_state(node, item) mod_node_page_state(node, item, 1)
Index: linux-2.6.15-rc5/include/asm-ia64/local.h
===================================================================
--- linux-2.6.15-rc5.orig/include/asm-ia64/local.h	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/include/asm-ia64/local.h	2005-12-06 14:39:47.000000000 -0800
@@ -17,7 +17,7 @@ typedef struct {
 #define local_set(l, i)	atomic64_set(&(l)->val, i)
 #define local_inc(l)	atomic64_inc(&(l)->val)
 #define local_dec(l)	atomic64_dec(&(l)->val)
-#define local_add(l)	atomic64_add(&(l)->val)
+#define local_add(i, l)	atomic64_add((i), &(l)->val)
 #define local_sub(l)	atomic64_sub(&(l)->val)
 
 /* Non-atomic variants, i.e., preemption disabled and won't be touched in interrupt, etc.  */

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 22:52             ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 22:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, linux-ia64,
	Marcelo Tosatti

On Tue, 6 Dec 2005, Andi Kleen wrote:

> Ok we'll need a local64_t then. No big deal - can be easily added.
> Or perhaps better a long_local_t so that 32bit doesn't need to
> pay the cost.

I jusw saw that ia64 already has local_t as 64 bit, so that is no problem 
for us. Here is a patch that would convert the framework to use local_t.
Is that okay?

The problem with this solution is that the use of local_t will lead to the 
use of atomic operations (in case the preemption status is unknown). It 
may be better to use atomic operations and simply drop the per_cpu stuff. 
That way the summing of the per cpu variables is avoided and the 
stats are accurate in real time.

Seems that local.h is rarely used. There was an obvious mistake in there 
for ia64.

Index: linux-2.6.15-rc5/mm/page_alloc.c
===================================================================
--- linux-2.6.15-rc5.orig/mm/page_alloc.c	2005-12-06 10:13:49.000000000 -0800
+++ linux-2.6.15-rc5/mm/page_alloc.c	2005-12-06 14:43:41.000000000 -0800
@@ -560,26 +560,25 @@ static int rmqueue_bulk(struct zone *zon
 static spinlock_t node_stat_lock;
 unsigned long vm_stat_global[NR_STAT_ITEMS];
 unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
-int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
 
 void refresh_vm_stats(void) {
-	int cpu;
 	int node;
 	int i;
 
 	spin_lock(&node_stat_lock);
 
-	cpu = get_cpu();
 	for_each_online_node(node)
 		for(i = 0; i < NR_STAT_ITEMS; i++) {
-			int * p = vm_stat_diff[cpu][node]+i;
-			if (*p) {
-				vm_stat_node[node][i] += *p;
-				vm_stat_global[i] += *p;
-				*p = 0;
+			long v;
+
+			v = cpu_local_read(vm_stat_diff[node][i]);
+			if (v) {
+				vm_stat_node[node][i] += v;
+				vm_stat_global[i] += v;
+				cpu_local_set(vm_stat_diff[node][i], 0);
 			}
 		}
-	put_cpu();
 
 	spin_unlock(&node_stat_lock);
 }
Index: linux-2.6.15-rc5/include/linux/page-flags.h
===================================================================
--- linux-2.6.15-rc5.orig/include/linux/page-flags.h	2005-12-06 10:15:59.000000000 -0800
+++ linux-2.6.15-rc5/include/linux/page-flags.h	2005-12-06 14:47:03.000000000 -0800
@@ -8,6 +8,7 @@
 #include <linux/percpu.h>
 #include <linux/cache.h>
 #include <asm/pgtable.h>
+#include <asm/local.h>
 
 /*
  * Various page->flags bits:
@@ -169,12 +170,19 @@ enum node_stat_item { NR_MAPPED, NR_PAGE
 
 extern unsigned long vm_stat_global[NR_STAT_ITEMS];
 extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
-extern int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+DECLARE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
 
 static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
 {
-	vm_stat_diff[get_cpu()][node][item] += delta;
-	put_cpu();
+	cpu_local_add(delta, vm_stat_diff[node][item]);
+}
+
+/*
+ * For use when we know that preemption is disabled. Avoids atomic operations.
+ */
+static inline void __mod_node_page_state(int node, enum node_stat_item item, int delta)
+{
+	__local_add(delta, &__get_cpu_var(vm_stat_diff[node][item]));
 }
 
 #define inc_node_page_state(node, item) mod_node_page_state(node, item, 1)
Index: linux-2.6.15-rc5/include/asm-ia64/local.h
===================================================================
--- linux-2.6.15-rc5.orig/include/asm-ia64/local.h	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/include/asm-ia64/local.h	2005-12-06 14:39:47.000000000 -0800
@@ -17,7 +17,7 @@ typedef struct {
 #define local_set(l, i)	atomic64_set(&(l)->val, i)
 #define local_inc(l)	atomic64_inc(&(l)->val)
 #define local_dec(l)	atomic64_dec(&(l)->val)
-#define local_add(l)	atomic64_add(&(l)->val)
+#define local_add(i, l)	atomic64_add((i), &(l)->val)
 #define local_sub(l)	atomic64_sub(&(l)->val)
 
 /* Non-atomic variants, i.e., preemption disabled and won't be touched in interrupt, etc.  */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 22:52             ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 22:52 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Hugh Dickins, Nick Piggin, linux-mm, linux-ia64,
	Marcelo Tosatti

On Tue, 6 Dec 2005, Andi Kleen wrote:

> Ok we'll need a local64_t then. No big deal - can be easily added.
> Or perhaps better a long_local_t so that 32bit doesn't need to
> pay the cost.

I jusw saw that ia64 already has local_t as 64 bit, so that is no problem 
for us. Here is a patch that would convert the framework to use local_t.
Is that okay?

The problem with this solution is that the use of local_t will lead to the 
use of atomic operations (in case the preemption status is unknown). It 
may be better to use atomic operations and simply drop the per_cpu stuff. 
That way the summing of the per cpu variables is avoided and the 
stats are accurate in real time.

Seems that local.h is rarely used. There was an obvious mistake in there 
for ia64.

Index: linux-2.6.15-rc5/mm/page_alloc.c
=================================--- linux-2.6.15-rc5.orig/mm/page_alloc.c	2005-12-06 10:13:49.000000000 -0800
+++ linux-2.6.15-rc5/mm/page_alloc.c	2005-12-06 14:43:41.000000000 -0800
@@ -560,26 +560,25 @@ static int rmqueue_bulk(struct zone *zon
 static spinlock_t node_stat_lock;
 unsigned long vm_stat_global[NR_STAT_ITEMS];
 unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
-int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
 
 void refresh_vm_stats(void) {
-	int cpu;
 	int node;
 	int i;
 
 	spin_lock(&node_stat_lock);
 
-	cpu = get_cpu();
 	for_each_online_node(node)
 		for(i = 0; i < NR_STAT_ITEMS; i++) {
-			int * p = vm_stat_diff[cpu][node]+i;
-			if (*p) {
-				vm_stat_node[node][i] += *p;
-				vm_stat_global[i] += *p;
-				*p = 0;
+			long v;
+
+			v = cpu_local_read(vm_stat_diff[node][i]);
+			if (v) {
+				vm_stat_node[node][i] += v;
+				vm_stat_global[i] += v;
+				cpu_local_set(vm_stat_diff[node][i], 0);
 			}
 		}
-	put_cpu();
 
 	spin_unlock(&node_stat_lock);
 }
Index: linux-2.6.15-rc5/include/linux/page-flags.h
=================================--- linux-2.6.15-rc5.orig/include/linux/page-flags.h	2005-12-06 10:15:59.000000000 -0800
+++ linux-2.6.15-rc5/include/linux/page-flags.h	2005-12-06 14:47:03.000000000 -0800
@@ -8,6 +8,7 @@
 #include <linux/percpu.h>
 #include <linux/cache.h>
 #include <asm/pgtable.h>
+#include <asm/local.h>
 
 /*
  * Various page->flags bits:
@@ -169,12 +170,19 @@ enum node_stat_item { NR_MAPPED, NR_PAGE
 
 extern unsigned long vm_stat_global[NR_STAT_ITEMS];
 extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
-extern int vm_stat_diff[NR_CPUS][MAX_NUMNODES][NR_STAT_ITEMS];
+DECLARE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
 
 static inline void mod_node_page_state(int node, enum node_stat_item item, int delta)
 {
-	vm_stat_diff[get_cpu()][node][item] += delta;
-	put_cpu();
+	cpu_local_add(delta, vm_stat_diff[node][item]);
+}
+
+/*
+ * For use when we know that preemption is disabled. Avoids atomic operations.
+ */
+static inline void __mod_node_page_state(int node, enum node_stat_item item, int delta)
+{
+	__local_add(delta, &__get_cpu_var(vm_stat_diff[node][item]));
 }
 
 #define inc_node_page_state(node, item) mod_node_page_state(node, item, 1)
Index: linux-2.6.15-rc5/include/asm-ia64/local.h
=================================--- linux-2.6.15-rc5.orig/include/asm-ia64/local.h	2005-12-03 21:10:42.000000000 -0800
+++ linux-2.6.15-rc5/include/asm-ia64/local.h	2005-12-06 14:39:47.000000000 -0800
@@ -17,7 +17,7 @@ typedef struct {
 #define local_set(l, i)	atomic64_set(&(l)->val, i)
 #define local_inc(l)	atomic64_inc(&(l)->val)
 #define local_dec(l)	atomic64_dec(&(l)->val)
-#define local_add(l)	atomic64_add(&(l)->val)
+#define local_add(i, l)	atomic64_add((i), &(l)->val)
 #define local_sub(l)	atomic64_sub(&(l)->val)
 
 /* Non-atomic variants, i.e., preemption disabled and won't be touched in interrupt, etc.  */

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 2/3] Make nr_mapped a per node counter
  2005-12-06 18:28   ` Christoph Lameter
@ 2005-12-06 23:05     ` Nick Piggin
  -1 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-06 23:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> Make nr_mapped a per node counter
> 
> The per cpu nr_mapped counter is important because it allows a determination
> how many pages of a node are not mapped, which would allow a more effiecient
> means of determining when a node should reclaim memory.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.15-rc3/include/linux/page-flags.h
> ===================================================================
> --- linux-2.6.15-rc3.orig/include/linux/page-flags.h	2005-12-01 00:35:38.000000000 -0800
> +++ linux-2.6.15-rc3/include/linux/page-flags.h	2005-12-01 00:35:49.000000000 -0800
> @@ -85,7 +85,6 @@ struct page_state {
>  	unsigned long nr_writeback;	/* Pages under writeback */
>  	unsigned long nr_unstable;	/* NFS unstable pages */
>  	unsigned long nr_page_table_pages;/* Pages used for pagetables */
> -	unsigned long nr_mapped;	/* mapped into pagetables */
>  	unsigned long nr_slab;		/* In slab */
>  #define GET_PAGE_STATE_LAST nr_slab
>  
> @@ -165,8 +164,8 @@ extern void __mod_page_state(unsigned lo
>  /*
>   * Node based accounting with per cpu differentials.
>   */
> -enum node_stat_item { };
> -#define NR_STAT_ITEMS 0
> +enum node_stat_item { NR_MAPPED };
> +#define NR_STAT_ITEMS 1
>  
>  extern unsigned long vm_stat_global[NR_STAT_ITEMS];
>  extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
> Index: linux-2.6.15-rc3/drivers/base/node.c
> ===================================================================
> --- linux-2.6.15-rc3.orig/drivers/base/node.c	2005-11-28 19:51:27.000000000 -0800
> +++ linux-2.6.15-rc3/drivers/base/node.c	2005-12-01 00:35:49.000000000 -0800
> @@ -53,8 +53,6 @@ static ssize_t node_read_meminfo(struct 
>  		ps.nr_dirty = 0;
>  	if ((long)ps.nr_writeback < 0)
>  		ps.nr_writeback = 0;
> -	if ((long)ps.nr_mapped < 0)
> -		ps.nr_mapped = 0;
>  	if ((long)ps.nr_slab < 0)
>  		ps.nr_slab = 0;
>  
> @@ -83,7 +81,7 @@ static ssize_t node_read_meminfo(struct 
>  		       nid, K(i.freeram - i.freehigh),
>  		       nid, K(ps.nr_dirty),
>  		       nid, K(ps.nr_writeback),
> -		       nid, K(ps.nr_mapped),
> +		       nid, K(vm_stat_node[nid][NR_MAPPED]),
>  		       nid, K(ps.nr_slab));
>  	n += hugetlb_report_node_meminfo(nid, buf + n);
>  	return n;
> Index: linux-2.6.15-rc3/fs/proc/proc_misc.c
> ===================================================================
> --- linux-2.6.15-rc3.orig/fs/proc/proc_misc.c	2005-11-28 19:51:27.000000000 -0800
> +++ linux-2.6.15-rc3/fs/proc/proc_misc.c	2005-12-01 00:35:49.000000000 -0800
> @@ -190,7 +190,7 @@ static int meminfo_read_proc(char *page,
>  		K(i.freeswap),
>  		K(ps.nr_dirty),
>  		K(ps.nr_writeback),
> -		K(ps.nr_mapped),
> +		K(vm_stat_global[NR_MAPPED]),
>  		K(ps.nr_slab),
>  		K(allowed),
>  		K(committed),
> Index: linux-2.6.15-rc3/mm/vmscan.c
> ===================================================================
> --- linux-2.6.15-rc3.orig/mm/vmscan.c	2005-11-28 19:51:27.000000000 -0800
> +++ linux-2.6.15-rc3/mm/vmscan.c	2005-12-01 00:35:49.000000000 -0800
> @@ -967,7 +967,7 @@ int try_to_free_pages(struct zone **zone
>  	}
>  
>  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> -		sc.nr_mapped = read_page_state(nr_mapped);
> +		sc.nr_mapped = vm_stat_global[NR_MAPPED];
>  		sc.nr_scanned = 0;
>  		sc.nr_reclaimed = 0;
>  		sc.priority = priority;
> @@ -1056,7 +1056,7 @@ loop_again:
>  	sc.gfp_mask = GFP_KERNEL;
>  	sc.may_writepage = 0;
>  	sc.may_swap = 1;
> -	sc.nr_mapped = read_page_state(nr_mapped);
> +	sc.nr_mapped = vm_stat_global[NR_MAPPED];
>  

Any chance you can wrap these in macros? (something like read_page_node_state())

I gather Andrew did this so that they can easily be defined out for things
that don't want them (maybe, embedded systems).

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 2/3] Make nr_mapped a per node counter
@ 2005-12-06 23:05     ` Nick Piggin
  0 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-06 23:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> Make nr_mapped a per node counter
> 
> The per cpu nr_mapped counter is important because it allows a determination
> how many pages of a node are not mapped, which would allow a more effiecient
> means of determining when a node should reclaim memory.
> 
> Signed-off-by: Christoph Lameter <clameter@sgi.com>
> 
> Index: linux-2.6.15-rc3/include/linux/page-flags.h
> ===================================================================
> --- linux-2.6.15-rc3.orig/include/linux/page-flags.h	2005-12-01 00:35:38.000000000 -0800
> +++ linux-2.6.15-rc3/include/linux/page-flags.h	2005-12-01 00:35:49.000000000 -0800
> @@ -85,7 +85,6 @@ struct page_state {
>  	unsigned long nr_writeback;	/* Pages under writeback */
>  	unsigned long nr_unstable;	/* NFS unstable pages */
>  	unsigned long nr_page_table_pages;/* Pages used for pagetables */
> -	unsigned long nr_mapped;	/* mapped into pagetables */
>  	unsigned long nr_slab;		/* In slab */
>  #define GET_PAGE_STATE_LAST nr_slab
>  
> @@ -165,8 +164,8 @@ extern void __mod_page_state(unsigned lo
>  /*
>   * Node based accounting with per cpu differentials.
>   */
> -enum node_stat_item { };
> -#define NR_STAT_ITEMS 0
> +enum node_stat_item { NR_MAPPED };
> +#define NR_STAT_ITEMS 1
>  
>  extern unsigned long vm_stat_global[NR_STAT_ITEMS];
>  extern unsigned long vm_stat_node[MAX_NUMNODES][NR_STAT_ITEMS];
> Index: linux-2.6.15-rc3/drivers/base/node.c
> ===================================================================
> --- linux-2.6.15-rc3.orig/drivers/base/node.c	2005-11-28 19:51:27.000000000 -0800
> +++ linux-2.6.15-rc3/drivers/base/node.c	2005-12-01 00:35:49.000000000 -0800
> @@ -53,8 +53,6 @@ static ssize_t node_read_meminfo(struct 
>  		ps.nr_dirty = 0;
>  	if ((long)ps.nr_writeback < 0)
>  		ps.nr_writeback = 0;
> -	if ((long)ps.nr_mapped < 0)
> -		ps.nr_mapped = 0;
>  	if ((long)ps.nr_slab < 0)
>  		ps.nr_slab = 0;
>  
> @@ -83,7 +81,7 @@ static ssize_t node_read_meminfo(struct 
>  		       nid, K(i.freeram - i.freehigh),
>  		       nid, K(ps.nr_dirty),
>  		       nid, K(ps.nr_writeback),
> -		       nid, K(ps.nr_mapped),
> +		       nid, K(vm_stat_node[nid][NR_MAPPED]),
>  		       nid, K(ps.nr_slab));
>  	n += hugetlb_report_node_meminfo(nid, buf + n);
>  	return n;
> Index: linux-2.6.15-rc3/fs/proc/proc_misc.c
> ===================================================================
> --- linux-2.6.15-rc3.orig/fs/proc/proc_misc.c	2005-11-28 19:51:27.000000000 -0800
> +++ linux-2.6.15-rc3/fs/proc/proc_misc.c	2005-12-01 00:35:49.000000000 -0800
> @@ -190,7 +190,7 @@ static int meminfo_read_proc(char *page,
>  		K(i.freeswap),
>  		K(ps.nr_dirty),
>  		K(ps.nr_writeback),
> -		K(ps.nr_mapped),
> +		K(vm_stat_global[NR_MAPPED]),
>  		K(ps.nr_slab),
>  		K(allowed),
>  		K(committed),
> Index: linux-2.6.15-rc3/mm/vmscan.c
> ===================================================================
> --- linux-2.6.15-rc3.orig/mm/vmscan.c	2005-11-28 19:51:27.000000000 -0800
> +++ linux-2.6.15-rc3/mm/vmscan.c	2005-12-01 00:35:49.000000000 -0800
> @@ -967,7 +967,7 @@ int try_to_free_pages(struct zone **zone
>  	}
>  
>  	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
> -		sc.nr_mapped = read_page_state(nr_mapped);
> +		sc.nr_mapped = vm_stat_global[NR_MAPPED];
>  		sc.nr_scanned = 0;
>  		sc.nr_reclaimed = 0;
>  		sc.priority = priority;
> @@ -1056,7 +1056,7 @@ loop_again:
>  	sc.gfp_mask = GFP_KERNEL;
>  	sc.may_writepage = 0;
>  	sc.may_swap = 1;
> -	sc.nr_mapped = read_page_state(nr_mapped);
> +	sc.nr_mapped = vm_stat_global[NR_MAPPED];
>  

Any chance you can wrap these in macros? (something like read_page_node_state())

I gather Andrew did this so that they can easily be defined out for things
that don't want them (maybe, embedded systems).

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 18:28 ` Christoph Lameter
@ 2005-12-06 23:08   ` Nick Piggin
  -1 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-06 23:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> [RFC] Framework for accurate node based statistics
> 
> Currently we have various vm counters that are split per cpu. This arrangement
> does not allow access to per node statistics that are important to optimize
> VM behavior for NUMA architectures. All one can say from the per_cpu
> differential variables is how much a certain variable was changed by this cpu
> without being able to deduce how many pages in each node are of a certain type.
> 
> This patch introduces a generic framework to allow accurate per node vm
> statistics through a large per node and per cpu array. The numbers are
> consolidated when the slab drainer runs (every 3 seconds or so) into global
> and per node counters. VM functions can then check these statistics by
> simply accessing the node specific or global counter.
> 
> A significant problem with this approach is that the statistics are only
> accumulated every 3 seconds or so. I have tried various other approaches
> but they typically end up with having to add atomic variables to critical
> VM paths. I'd be glad if someone else had a bright idea on how to improve
> the situation.
> 

Why not have per-node * per-cpu counters?

Or even use the current per-zone * per-cpu counters, and work out your
node details from there?

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 23:08   ` Nick Piggin
  0 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-06 23:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> [RFC] Framework for accurate node based statistics
> 
> Currently we have various vm counters that are split per cpu. This arrangement
> does not allow access to per node statistics that are important to optimize
> VM behavior for NUMA architectures. All one can say from the per_cpu
> differential variables is how much a certain variable was changed by this cpu
> without being able to deduce how many pages in each node are of a certain type.
> 
> This patch introduces a generic framework to allow accurate per node vm
> statistics through a large per node and per cpu array. The numbers are
> consolidated when the slab drainer runs (every 3 seconds or so) into global
> and per node counters. VM functions can then check these statistics by
> simply accessing the node specific or global counter.
> 
> A significant problem with this approach is that the statistics are only
> accumulated every 3 seconds or so. I have tried various other approaches
> but they typically end up with having to add atomic variables to critical
> VM paths. I'd be glad if someone else had a bright idea on how to improve
> the situation.
> 

Why not have per-node * per-cpu counters?

Or even use the current per-zone * per-cpu counters, and work out your
node details from there?

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 23:08   ` Nick Piggin
@ 2005-12-06 23:37     ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 23:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Wed, 7 Dec 2005, Nick Piggin wrote:

> Why not have per-node * per-cpu counters?

Yes, that is exactly what this patch implements.
 
> Or even use the current per-zone * per-cpu counters, and work out your
> node details from there?

I am not aware of any per-zone per cpu counters.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 23:37     ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 23:37 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Wed, 7 Dec 2005, Nick Piggin wrote:

> Why not have per-node * per-cpu counters?

Yes, that is exactly what this patch implements.
 
> Or even use the current per-zone * per-cpu counters, and work out your
> node details from there?

I am not aware of any per-zone per cpu counters.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 23:37     ` Christoph Lameter
@ 2005-12-06 23:40       ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 23:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Tue, 6 Dec 2005, Christoph Lameter wrote:

> I am not aware of any per-zone per cpu counters.

Argh. Wrong. Yes there are counters in the per cpu structures for each 
zone. The pointers here could be folded into that and then would give us 
zone based statistics which may be better than per node statistics for 
decision making about memory in a zone.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-06 23:40       ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-06 23:40 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Tue, 6 Dec 2005, Christoph Lameter wrote:

> I am not aware of any per-zone per cpu counters.

Argh. Wrong. Yes there are counters in the per cpu structures for each 
zone. The pointers here could be folded into that and then would give us 
zone based statistics which may be better than per node statistics for 
decision making about memory in a zone.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 22:52             ` Christoph Lameter
  (?)
@ 2005-12-07  5:50               ` Keith Owens
  -1 siblings, 0 replies; 51+ messages in thread
From: Keith Owens @ 2005-12-07  5:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

On Tue, 6 Dec 2005 14:52:33 -0800 (PST), 
Christoph Lameter <clameter@engr.sgi.com> wrote:
>+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);

How big is that array going to get?  The total per cpu data area is
limited to 64K on IA64 and we already use at least 34K.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07  5:50               ` Keith Owens
  0 siblings, 0 replies; 51+ messages in thread
From: Keith Owens @ 2005-12-07  5:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

On Tue, 6 Dec 2005 14:52:33 -0800 (PST), 
Christoph Lameter <clameter@engr.sgi.com> wrote:
>+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);

How big is that array going to get?  The total per cpu data area is
limited to 64K on IA64 and we already use at least 34K.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07  5:50               ` Keith Owens
  0 siblings, 0 replies; 51+ messages in thread
From: Keith Owens @ 2005-12-07  5:50 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

On Tue, 6 Dec 2005 14:52:33 -0800 (PST), 
Christoph Lameter <clameter@engr.sgi.com> wrote:
>+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);

How big is that array going to get?  The total per cpu data area is
limited to 64K on IA64 and we already use at least 34K.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-06 23:37     ` Christoph Lameter
@ 2005-12-07  6:44       ` Nick Piggin
  -1 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-07  6:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> On Wed, 7 Dec 2005, Nick Piggin wrote:
> 
> 
>>Why not have per-node * per-cpu counters?
> 
> 
> Yes, that is exactly what this patch implements.
>  

Sorry, I think I meant: why don't you just use the "add all counters
from all per-cpu of the node" in order to find the node-statistic?

Ie. like the node based page_state statistics that we already have.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07  6:44       ` Nick Piggin
  0 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-07  6:44 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> On Wed, 7 Dec 2005, Nick Piggin wrote:
> 
> 
>>Why not have per-node * per-cpu counters?
> 
> 
> Yes, that is exactly what this patch implements.
>  

Sorry, I think I meant: why don't you just use the "add all counters
from all per-cpu of the node" in order to find the node-statistic?

Ie. like the node based page_state statistics that we already have.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-07  5:50               ` Keith Owens
  (?)
@ 2005-12-07 18:24                 ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-07 18:24 UTC (permalink / raw)
  To: Keith Owens
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

On Wed, 7 Dec 2005, Keith Owens wrote:

> On Tue, 6 Dec 2005 14:52:33 -0800 (PST), 
> Christoph Lameter <clameter@engr.sgi.com> wrote:
> >+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
> 
> How big is that array going to get?  The total per cpu data area is
> limited to 64K on IA64 and we already use at least 34K.

Maximum around 1k nodes and I guess we may end up with 16 counters:

1024*16*8 = 131k ?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 18:24                 ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-07 18:24 UTC (permalink / raw)
  To: Keith Owens
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

On Wed, 7 Dec 2005, Keith Owens wrote:

> On Tue, 6 Dec 2005 14:52:33 -0800 (PST), 
> Christoph Lameter <clameter@engr.sgi.com> wrote:
> >+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
> 
> How big is that array going to get?  The total per cpu data area is
> limited to 64K on IA64 and we already use at least 34K.

Maximum around 1k nodes and I guess we may end up with 16 counters:

1024*16*8 = 131k ?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 18:24                 ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-07 18:24 UTC (permalink / raw)
  To: Keith Owens
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

On Wed, 7 Dec 2005, Keith Owens wrote:

> On Tue, 6 Dec 2005 14:52:33 -0800 (PST), 
> Christoph Lameter <clameter@engr.sgi.com> wrote:
> >+DEFINE_PER_CPU(local_t [MAX_NUMNODES][NR_STAT_ITEMS], vm_stat_diff);
> 
> How big is that array going to get?  The total per cpu data area is
> limited to 64K on IA64 and we already use at least 34K.

Maximum around 1k nodes and I guess we may end up with 16 counters:

1024*16*8 = 131k ?


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-07  6:44       ` Nick Piggin
@ 2005-12-07 18:27         ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-07 18:27 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Wed, 7 Dec 2005, Nick Piggin wrote:

> Sorry, I think I meant: why don't you just use the "add all counters
> from all per-cpu of the node" in order to find the node-statistic?

which function is that?

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 18:27         ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-07 18:27 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Wed, 7 Dec 2005, Nick Piggin wrote:

> Sorry, I think I meant: why don't you just use the "add all counters
> from all per-cpu of the node" in order to find the node-statistic?

which function is that?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 18:39 ` Luck, Tony
  0 siblings, 0 replies; 51+ messages in thread
From: Luck, Tony @ 2005-12-07 18:39 UTC (permalink / raw)
  To: Christoph Lameter, Keith Owens
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

>> How big is that array going to get?  The total per cpu data area is
>> limited to 64K on IA64 and we already use at least 34K.
>
> Maximum around 1k nodes and I guess we may end up with 16 counters:
>
> 1024*16*8 = 131k ?

Ouch.

Can you live with a pointer to that monster block of space in the
per-cpu area?

Otherwise the next step up is a 256K per cpu area ... which I wouldn't
want to make the default (so we'll have another 2*X explosion in the
number of possible configs to test).

-Tony

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 18:39 ` Luck, Tony
  0 siblings, 0 replies; 51+ messages in thread
From: Luck, Tony @ 2005-12-07 18:39 UTC (permalink / raw)
  To: Christoph Lameter, Keith Owens
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

>> How big is that array going to get?  The total per cpu data area is
>> limited to 64K on IA64 and we already use at least 34K.
>
> Maximum around 1k nodes and I guess we may end up with 16 counters:
>
> 1024*16*8 = 131k ?

Ouch.

Can you live with a pointer to that monster block of space in the
per-cpu area?

Otherwise the next step up is a 256K per cpu area ... which I wouldn't
want to make the default (so we'll have another 2*X explosion in the
number of possible configs to test).

-Tony

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 18:39 ` Luck, Tony
  0 siblings, 0 replies; 51+ messages in thread
From: Luck, Tony @ 2005-12-07 18:39 UTC (permalink / raw)
  To: Christoph Lameter, Keith Owens
  Cc: Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin, linux-mm,
	linux-ia64, Marcelo Tosatti

>> How big is that array going to get?  The total per cpu data area is
>> limited to 64K on IA64 and we already use at least 34K.
>
> Maximum around 1k nodes and I guess we may end up with 16 counters:
>
> 1024*16*8 = 131k ?

Ouch.

Can you live with a pointer to that monster block of space in the
per-cpu area?

Otherwise the next step up is a 256K per cpu area ... which I wouldn't
want to make the default (so we'll have another 2*X explosion in the
number of possible configs to test).

-Tony

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC 1/3] Framework for accurate node based statistics
  2005-12-07 18:39 ` Luck, Tony
  (?)
@ 2005-12-07 18:47   ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-07 18:47 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Keith Owens, Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin,
	linux-mm, linux-ia64, Marcelo Tosatti

On Wed, 7 Dec 2005, Luck, Tony wrote:

> Can you live with a pointer to that monster block of space in the
> per-cpu area?
> 
> Otherwise the next step up is a 256K per cpu area ... which I wouldn't
> want to make the default (so we'll have another 2*X explosion in the
> number of possible configs to test).

Lets wait. I just did this to show how local_t could be implemented. This 
is a RFC and the major problems (f.e. the 3 second delay) 
have not been addressed so this is all vaporware for now.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 18:47   ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-07 18:47 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Keith Owens, Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin,
	linux-mm, linux-ia64, Marcelo Tosatti

On Wed, 7 Dec 2005, Luck, Tony wrote:

> Can you live with a pointer to that monster block of space in the
> per-cpu area?
> 
> Otherwise the next step up is a 256K per cpu area ... which I wouldn't
> want to make the default (so we'll have another 2*X explosion in the
> number of possible configs to test).

Lets wait. I just did this to show how local_t could be implemented. This 
is a RFC and the major problems (f.e. the 3 second delay) 
have not been addressed so this is all vaporware for now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* RE: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 18:47   ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-07 18:47 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Keith Owens, Andi Kleen, linux-kernel, Hugh Dickins, Nick Piggin,
	linux-mm, linux-ia64, Marcelo Tosatti

On Wed, 7 Dec 2005, Luck, Tony wrote:

> Can you live with a pointer to that monster block of space in the
> per-cpu area?
> 
> Otherwise the next step up is a 256K per cpu area ... which I wouldn't
> want to make the default (so we'll have another 2*X explosion in the
> number of possible configs to test).

Lets wait. I just did this to show how local_t could be implemented. This 
is a RFC and the major problems (f.e. the 3 second delay) 
have not been addressed so this is all vaporware for now.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-07 18:27         ` Christoph Lameter
@ 2005-12-07 22:59           ` Nick Piggin
  -1 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-07 22:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> On Wed, 7 Dec 2005, Nick Piggin wrote:
> 
> 
>>Sorry, I think I meant: why don't you just use the "add all counters
>>from all per-cpu of the node" in order to find the node-statistic?
> 
> 
> which function is that?
> 

I'm thinking of get_page_state_node... but that's not quite the same
thing. I guess sum all per-CPU counters from all zones in the node,
but that's going to be costly on big machines.

So I'm not sure, I guess I don't have any bright ideas... there is the
batching approach used by current pagecache_acct - is something like
that not sufficient either?

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-07 22:59           ` Nick Piggin
  0 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-07 22:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> On Wed, 7 Dec 2005, Nick Piggin wrote:
> 
> 
>>Sorry, I think I meant: why don't you just use the "add all counters
>>from all per-cpu of the node" in order to find the node-statistic?
> 
> 
> which function is that?
> 

I'm thinking of get_page_state_node... but that's not quite the same
thing. I guess sum all per-CPU counters from all zones in the node,
but that's going to be costly on big machines.

So I'm not sure, I guess I don't have any bright ideas... there is the
batching approach used by current pagecache_acct - is something like
that not sufficient either?

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-07 22:59           ` Nick Piggin
@ 2005-12-08  0:02             ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-08  0:02 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Thu, 8 Dec 2005, Nick Piggin wrote:

> Christoph Lameter wrote:
> > On Wed, 7 Dec 2005, Nick Piggin wrote:
> > > Sorry, I think I meant: why don't you just use the "add all counters
> > > from all per-cpu of the node" in order to find the node-statistic?
> > which function is that?
> > 
> 
> I'm thinking of get_page_state_node... but that's not quite the same
> thing. I guess sum all per-CPU counters from all zones in the node,
> but that's going to be costly on big machines.

The per cpu counters count when a cpu did an allocation. They do not count 
on which node the allocation was done and are thereofre not useful to 
determine the memory use on one node.

> So I'm not sure, I guess I don't have any bright ideas... there is the
> batching approach used by current pagecache_acct - is something like
> that not sufficient either?

The framework provides a similar approach by keeping differential 
counters for each processor.


^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-08  0:02             ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-08  0:02 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Thu, 8 Dec 2005, Nick Piggin wrote:

> Christoph Lameter wrote:
> > On Wed, 7 Dec 2005, Nick Piggin wrote:
> > > Sorry, I think I meant: why don't you just use the "add all counters
> > > from all per-cpu of the node" in order to find the node-statistic?
> > which function is that?
> > 
> 
> I'm thinking of get_page_state_node... but that's not quite the same
> thing. I guess sum all per-CPU counters from all zones in the node,
> but that's going to be costly on big machines.

The per cpu counters count when a cpu did an allocation. They do not count 
on which node the allocation was done and are thereofre not useful to 
determine the memory use on one node.

> So I'm not sure, I guess I don't have any bright ideas... there is the
> batching approach used by current pagecache_acct - is something like
> that not sufficient either?

The framework provides a similar approach by keeping differential 
counters for each processor.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-08  0:02             ` Christoph Lameter
@ 2005-12-08  0:13               ` Nick Piggin
  -1 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-08  0:13 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> On Thu, 8 Dec 2005, Nick Piggin wrote:
> 
> 
>>Christoph Lameter wrote:
>>
>>>On Wed, 7 Dec 2005, Nick Piggin wrote:
>>>
>>>>Sorry, I think I meant: why don't you just use the "add all counters
>>>>from all per-cpu of the node" in order to find the node-statistic?
>>>
>>>which function is that?
>>>
>>
>>I'm thinking of get_page_state_node... but that's not quite the same
>>thing. I guess sum all per-CPU counters from all zones in the node,
>>but that's going to be costly on big machines.
> 
> 
> The per cpu counters count when a cpu did an allocation. They do not count 
> on which node the allocation was done and are thereofre not useful to 
> determine the memory use on one node.
> 

Yes, not that exact function of course.

> 
>>So I'm not sure, I guess I don't have any bright ideas... there is the
>>batching approach used by current pagecache_acct - is something like
>>that not sufficient either?
> 
> 
> The framework provides a similar approach by keeping differential 
> counters for each processor.
> 

But the accounting delay has the unbounded error problem that the
batching approach does not.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-08  0:13               ` Nick Piggin
  0 siblings, 0 replies; 51+ messages in thread
From: Nick Piggin @ 2005-12-08  0:13 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

Christoph Lameter wrote:
> On Thu, 8 Dec 2005, Nick Piggin wrote:
> 
> 
>>Christoph Lameter wrote:
>>
>>>On Wed, 7 Dec 2005, Nick Piggin wrote:
>>>
>>>>Sorry, I think I meant: why don't you just use the "add all counters
>>>>from all per-cpu of the node" in order to find the node-statistic?
>>>
>>>which function is that?
>>>
>>
>>I'm thinking of get_page_state_node... but that's not quite the same
>>thing. I guess sum all per-CPU counters from all zones in the node,
>>but that's going to be costly on big machines.
> 
> 
> The per cpu counters count when a cpu did an allocation. They do not count 
> on which node the allocation was done and are thereofre not useful to 
> determine the memory use on one node.
> 

Yes, not that exact function of course.

> 
>>So I'm not sure, I guess I don't have any bright ideas... there is the
>>batching approach used by current pagecache_acct - is something like
>>that not sufficient either?
> 
> 
> The framework provides a similar approach by keeping differential 
> counters for each processor.
> 

But the accounting delay has the unbounded error problem that the
batching approach does not.

-- 
SUSE Labs, Novell Inc.

Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
  2005-12-08  0:13               ` Nick Piggin
@ 2005-12-08  0:35                 ` Christoph Lameter
  -1 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-08  0:35 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Thu, 8 Dec 2005, Nick Piggin wrote:

> > The framework provides a similar approach by keeping differential counters
> > for each processor.
> But the accounting delay has the unbounded error problem that the
> batching approach does not.

Ok. We could switch to batching in order to avoid using the 
slab reaper.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [RFC 1/3] Framework for accurate node based statistics
@ 2005-12-08  0:35                 ` Christoph Lameter
  0 siblings, 0 replies; 51+ messages in thread
From: Christoph Lameter @ 2005-12-08  0:35 UTC (permalink / raw)
  To: Nick Piggin
  Cc: linux-kernel, Hugh Dickins, linux-mm, Andi Kleen, Marcelo Tosatti

On Thu, 8 Dec 2005, Nick Piggin wrote:

> > The framework provides a similar approach by keeping differential counters
> > for each processor.
> But the accounting delay has the unbounded error problem that the
> batching approach does not.

Ok. We could switch to batching in order to avoid using the 
slab reaper.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2005-12-08  0:35 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-12-06 18:28 [RFC 1/3] Framework for accurate node based statistics Christoph Lameter
2005-12-06 18:28 ` Christoph Lameter
2005-12-06 18:28 ` [RFC 2/3] Make nr_mapped a per node counter Christoph Lameter
2005-12-06 18:28   ` Christoph Lameter
2005-12-06 23:05   ` Nick Piggin
2005-12-06 23:05     ` Nick Piggin
2005-12-06 18:28 ` [RFC 3/3] Make nr_pagecache " Christoph Lameter
2005-12-06 18:28   ` Christoph Lameter
2005-12-06 18:35 ` [RFC 1/3] Framework for accurate node based statistics Andi Kleen
2005-12-06 18:35   ` Andi Kleen
2005-12-06 19:08   ` Christoph Lameter
2005-12-06 19:08     ` Christoph Lameter
2005-12-06 19:26     ` Andi Kleen
2005-12-06 19:26       ` Andi Kleen
2005-12-06 19:36       ` Christoph Lameter
2005-12-06 19:36         ` Christoph Lameter
2005-12-06 20:06         ` Andi Kleen
2005-12-06 20:06           ` Andi Kleen
2005-12-06 22:52           ` Christoph Lameter
2005-12-06 22:52             ` Christoph Lameter
2005-12-06 22:52             ` Christoph Lameter
2005-12-07  5:50             ` Keith Owens
2005-12-07  5:50               ` Keith Owens
2005-12-07  5:50               ` Keith Owens
2005-12-07 18:24               ` Christoph Lameter
2005-12-07 18:24                 ` Christoph Lameter
2005-12-07 18:24                 ` Christoph Lameter
2005-12-06 23:08 ` Nick Piggin
2005-12-06 23:08   ` Nick Piggin
2005-12-06 23:37   ` Christoph Lameter
2005-12-06 23:37     ` Christoph Lameter
2005-12-06 23:40     ` Christoph Lameter
2005-12-06 23:40       ` Christoph Lameter
2005-12-07  6:44     ` Nick Piggin
2005-12-07  6:44       ` Nick Piggin
2005-12-07 18:27       ` Christoph Lameter
2005-12-07 18:27         ` Christoph Lameter
2005-12-07 22:59         ` Nick Piggin
2005-12-07 22:59           ` Nick Piggin
2005-12-08  0:02           ` Christoph Lameter
2005-12-08  0:02             ` Christoph Lameter
2005-12-08  0:13             ` Nick Piggin
2005-12-08  0:13               ` Nick Piggin
2005-12-08  0:35               ` Christoph Lameter
2005-12-08  0:35                 ` Christoph Lameter
2005-12-07 18:39 Luck, Tony
2005-12-07 18:39 ` Luck, Tony
2005-12-07 18:39 ` Luck, Tony
2005-12-07 18:47 ` Christoph Lameter
2005-12-07 18:47   ` Christoph Lameter
2005-12-07 18:47   ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.