All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/8] Use ZVCs for accurate writeback ratio determination
@ 2007-01-26  5:41 Christoph Lameter
  2007-01-26  5:41 ` [RFC 1/8] Use ZVC for inactive and active counts Christoph Lameter
                   ` (8 more replies)
  0 siblings, 9 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:41 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

The determination of the dirty ratio to determine writeback behavior
is currently based on the number of total pages on the system.

However, not all pages in the system may be dirtied. Thus the ratio
is always too low and can never reach 100%. The ratio may be
particularly skewed if large hugepage allocations, slab allocations
or device driver buffers make large sections of memory not available
anymore. In that case we may get into a situation in which f.e. the
background writeback ratio of 40% cannot be reached anymore which
leads to undesired writeback behavior.

This patchset fixes that issue by determining the ratio based
on the actual pages that may potentially be dirty. These are
the pages on the active and the inactive list plus free pages.

The problem with those counts has so far been that it is expensive
to calculate these because counts from multiple nodes and multiple
zones will have to be summed up. This patchset makes these counters
ZVC counters. This means that a current sum per zone, per node and
for the whole system is always available via global variables
and not expensive anymore to calculate.

The patchset results in some other good side effects:

- Removal of the various functions that sum up free, active
  and inactive page counts

- Cleanup of the functions that display information via the
  proc filesystem.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 1/8] Use ZVC for inactive and active counts
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
@ 2007-01-26  5:41 ` Christoph Lameter
  2007-01-26  5:42 ` [RFC 2/8] Use ZVC for free_pages Christoph Lameter
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:41 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

Use ZVC for nr_inactive and nr_active

The use of a ZVC for nr_inactive and nr_active allows a simplification
of some counter operations. More ZVC functionality is used for sums etc
in the following patches.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/include/linux/mm_inline.h
===================================================================
--- linux-2.6.20-rc6.orig/include/linux/mm_inline.h	2007-01-25 20:22:49.000000000 -0800
+++ linux-2.6.20-rc6/include/linux/mm_inline.h	2007-01-25 20:22:52.000000000 -0800
@@ -1,30 +1,29 @@
-
 static inline void
 add_page_to_active_list(struct zone *zone, struct page *page)
 {
 	list_add(&page->lru, &zone->active_list);
-	zone->nr_active++;
+	__inc_zone_state(zone, NR_ACTIVE);
 }
 
 static inline void
 add_page_to_inactive_list(struct zone *zone, struct page *page)
 {
 	list_add(&page->lru, &zone->inactive_list);
-	zone->nr_inactive++;
+	__inc_zone_state(zone, NR_INACTIVE);
 }
 
 static inline void
 del_page_from_active_list(struct zone *zone, struct page *page)
 {
 	list_del(&page->lru);
-	zone->nr_active--;
+	__dec_zone_state(zone, NR_ACTIVE);
 }
 
 static inline void
 del_page_from_inactive_list(struct zone *zone, struct page *page)
 {
 	list_del(&page->lru);
-	zone->nr_inactive--;
+	__dec_zone_state(zone, NR_INACTIVE);
 }
 
 static inline void
@@ -33,9 +32,9 @@ del_page_from_lru(struct zone *zone, str
 	list_del(&page->lru);
 	if (PageActive(page)) {
 		__ClearPageActive(page);
-		zone->nr_active--;
+		__dec_zone_state(zone, NR_ACTIVE);
 	} else {
-		zone->nr_inactive--;
+		__dec_zone_state(zone, NR_INACTIVE);
 	}
 }
 
Index: linux-2.6.20-rc6/include/linux/mmzone.h
===================================================================
--- linux-2.6.20-rc6.orig/include/linux/mmzone.h	2007-01-25 20:22:49.000000000 -0800
+++ linux-2.6.20-rc6/include/linux/mmzone.h	2007-01-25 20:22:52.000000000 -0800
@@ -47,6 +47,8 @@ struct zone_padding {
 #endif
 
 enum zone_stat_item {
+	NR_INACTIVE,
+	NR_ACTIVE,
 	NR_ANON_PAGES,	/* Mapped anonymous pages */
 	NR_FILE_MAPPED,	/* pagecache pages mapped into pagetables.
 			   only modified from process context */
@@ -197,8 +199,6 @@ struct zone {
 	struct list_head	inactive_list;
 	unsigned long		nr_scan_active;
 	unsigned long		nr_scan_inactive;
-	unsigned long		nr_active;
-	unsigned long		nr_inactive;
 	unsigned long		pages_scanned;	   /* since last reclaim */
 	int			all_unreclaimable; /* All pages pinned */
 
Index: linux-2.6.20-rc6/include/linux/vmstat.h
===================================================================
--- linux-2.6.20-rc6.orig/include/linux/vmstat.h	2007-01-25 20:22:49.000000000 -0800
+++ linux-2.6.20-rc6/include/linux/vmstat.h	2007-01-25 20:22:52.000000000 -0800
@@ -186,6 +186,9 @@ void inc_zone_page_state(struct page *, 
 void dec_zone_page_state(struct page *, enum zone_stat_item);
 
 extern void inc_zone_state(struct zone *, enum zone_stat_item);
+extern void __inc_zone_state(struct zone *, enum zone_stat_item);
+extern void dec_zone_state(struct zone *, enum zone_stat_item);
+extern void __dec_zone_state(struct zone *, enum zone_stat_item);
 
 void refresh_cpu_vm_stats(int);
 void refresh_vm_stats(void);
Index: linux-2.6.20-rc6/mm/vmscan.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/vmscan.c	2007-01-25 20:22:49.000000000 -0800
+++ linux-2.6.20-rc6/mm/vmscan.c	2007-01-25 20:22:52.000000000 -0800
@@ -679,7 +679,7 @@ static unsigned long shrink_inactive_lis
 		nr_taken = isolate_lru_pages(sc->swap_cluster_max,
 					     &zone->inactive_list,
 					     &page_list, &nr_scan);
-		zone->nr_inactive -= nr_taken;
+		__mod_zone_page_state(zone, NR_INACTIVE, -nr_taken);
 		zone->pages_scanned += nr_scan;
 		spin_unlock_irq(&zone->lru_lock);
 
@@ -740,7 +740,8 @@ static inline void note_zone_scanning_pr
 
 static inline int zone_is_near_oom(struct zone *zone)
 {
-	return zone->pages_scanned >= (zone->nr_active + zone->nr_inactive)*3;
+	return zone->pages_scanned >= (zone_page_state(zone, NR_ACTIVE)
+				+ zone_page_state(zone, NR_INACTIVE))*3;
 }
 
 /*
@@ -825,7 +826,7 @@ force_reclaim_mapped:
 	pgmoved = isolate_lru_pages(nr_pages, &zone->active_list,
 				    &l_hold, &pgscanned);
 	zone->pages_scanned += pgscanned;
-	zone->nr_active -= pgmoved;
+	__mod_zone_page_state(zone, NR_ACTIVE, -pgmoved);
 	spin_unlock_irq(&zone->lru_lock);
 
 	while (!list_empty(&l_hold)) {
@@ -857,7 +858,7 @@ force_reclaim_mapped:
 		list_move(&page->lru, &zone->inactive_list);
 		pgmoved++;
 		if (!pagevec_add(&pvec, page)) {
-			zone->nr_inactive += pgmoved;
+			__mod_zone_page_state(zone, NR_INACTIVE, pgmoved);
 			spin_unlock_irq(&zone->lru_lock);
 			pgdeactivate += pgmoved;
 			pgmoved = 0;
@@ -867,7 +868,7 @@ force_reclaim_mapped:
 			spin_lock_irq(&zone->lru_lock);
 		}
 	}
-	zone->nr_inactive += pgmoved;
+	__mod_zone_page_state(zone, NR_INACTIVE, pgmoved);
 	pgdeactivate += pgmoved;
 	if (buffer_heads_over_limit) {
 		spin_unlock_irq(&zone->lru_lock);
@@ -885,14 +886,14 @@ force_reclaim_mapped:
 		list_move(&page->lru, &zone->active_list);
 		pgmoved++;
 		if (!pagevec_add(&pvec, page)) {
-			zone->nr_active += pgmoved;
+			__mod_zone_page_state(zone, NR_ACTIVE, pgmoved);
 			pgmoved = 0;
 			spin_unlock_irq(&zone->lru_lock);
 			__pagevec_release(&pvec);
 			spin_lock_irq(&zone->lru_lock);
 		}
 	}
-	zone->nr_active += pgmoved;
+	__mod_zone_page_state(zone, NR_ACTIVE, pgmoved);
 
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 	__count_vm_events(PGDEACTIVATE, pgdeactivate);
@@ -918,14 +919,16 @@ static unsigned long shrink_zone(int pri
 	 * Add one to `nr_to_scan' just to make sure that the kernel will
 	 * slowly sift through the active list.
 	 */
-	zone->nr_scan_active += (zone->nr_active >> priority) + 1;
+	zone->nr_scan_active +=
+		(zone_page_state(zone, NR_ACTIVE) >> priority) + 1;
 	nr_active = zone->nr_scan_active;
 	if (nr_active >= sc->swap_cluster_max)
 		zone->nr_scan_active = 0;
 	else
 		nr_active = 0;
 
-	zone->nr_scan_inactive += (zone->nr_inactive >> priority) + 1;
+	zone->nr_scan_inactive +=
+		(zone_page_state(zone, NR_INACTIVE) >> priority) + 1;
 	nr_inactive = zone->nr_scan_inactive;
 	if (nr_inactive >= sc->swap_cluster_max)
 		zone->nr_scan_inactive = 0;
@@ -1037,7 +1040,8 @@ unsigned long try_to_free_pages(struct z
 		if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
 			continue;
 
-		lru_pages += zone->nr_active + zone->nr_inactive;
+		lru_pages += zone_page_state(zone, NR_ACTIVE)
+				+ zone_page_state(zone, NR_INACTIVE);
 	}
 
 	for (priority = DEF_PRIORITY; priority >= 0; priority--) {
@@ -1182,7 +1186,8 @@ loop_again:
 		for (i = 0; i <= end_zone; i++) {
 			struct zone *zone = pgdat->node_zones + i;
 
-			lru_pages += zone->nr_active + zone->nr_inactive;
+			lru_pages += zone_page_state(zone, NR_ACTIVE)
+					+ zone_page_state(zone, NR_INACTIVE);
 		}
 
 		/*
@@ -1219,8 +1224,9 @@ loop_again:
 			if (zone->all_unreclaimable)
 				continue;
 			if (nr_slab == 0 && zone->pages_scanned >=
-				    (zone->nr_active + zone->nr_inactive) * 6)
-				zone->all_unreclaimable = 1;
+				(zone_page_state(zone, NR_ACTIVE)
+				+ zone_page_state(zone, NR_INACTIVE)) * 6)
+					zone->all_unreclaimable = 1;
 			/*
 			 * If we've done a decent amount of scanning and
 			 * the reclaim ratio is low, start doing writepage
@@ -1385,18 +1391,22 @@ static unsigned long shrink_all_zones(un
 
 		/* For pass = 0 we don't shrink the active list */
 		if (pass > 0) {
-			zone->nr_scan_active += (zone->nr_active >> prio) + 1;
+			zone->nr_scan_active +=
+				(zone_page_state(zone, NR_ACTIVE) >> prio) + 1;
 			if (zone->nr_scan_active >= nr_pages || pass > 3) {
 				zone->nr_scan_active = 0;
-				nr_to_scan = min(nr_pages, zone->nr_active);
+				nr_to_scan = min(nr_pages,
+					zone_page_state(zone, NR_ACTIVE));
 				shrink_active_list(nr_to_scan, zone, sc, prio);
 			}
 		}
 
-		zone->nr_scan_inactive += (zone->nr_inactive >> prio) + 1;
+		zone->nr_scan_inactive +=
+			(zone_page_state(zone, NR_INACTIVE) >> prio) + 1;
 		if (zone->nr_scan_inactive >= nr_pages || pass > 3) {
 			zone->nr_scan_inactive = 0;
-			nr_to_scan = min(nr_pages, zone->nr_inactive);
+			nr_to_scan = min(nr_pages,
+				zone_page_state(zone, NR_INACTIVE));
 			ret += shrink_inactive_list(nr_to_scan, zone, sc);
 			if (ret >= nr_pages)
 				return ret;
@@ -1408,12 +1418,7 @@ static unsigned long shrink_all_zones(un
 
 static unsigned long count_lru_pages(void)
 {
-	struct zone *zone;
-	unsigned long ret = 0;
-
-	for_each_zone(zone)
-		ret += zone->nr_active + zone->nr_inactive;
-	return ret;
+	return global_page_state(NR_ACTIVE) + global_page_state(NR_INACTIVE);
 }
 
 /*
Index: linux-2.6.20-rc6/mm/vmstat.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/vmstat.c	2007-01-25 20:22:49.000000000 -0800
+++ linux-2.6.20-rc6/mm/vmstat.c	2007-01-25 20:22:52.000000000 -0800
@@ -19,12 +19,10 @@ void __get_zone_counts(unsigned long *ac
 	struct zone *zones = pgdat->node_zones;
 	int i;
 
-	*active = 0;
-	*inactive = 0;
+	*active = node_page_state(pgdat->node_id, NR_ACTIVE);
+	*inactive = node_page_state(pgdat->node_id, NR_INACTIVE);
 	*free = 0;
 	for (i = 0; i < MAX_NR_ZONES; i++) {
-		*active += zones[i].nr_active;
-		*inactive += zones[i].nr_inactive;
 		*free += zones[i].free_pages;
 	}
 }
@@ -34,14 +32,12 @@ void get_zone_counts(unsigned long *acti
 {
 	struct pglist_data *pgdat;
 
-	*active = 0;
-	*inactive = 0;
+	*active = global_page_state(NR_ACTIVE);
+	*inactive = global_page_state(NR_INACTIVE);
 	*free = 0;
 	for_each_online_pgdat(pgdat) {
 		unsigned long l, m, n;
 		__get_zone_counts(&l, &m, &n, pgdat);
-		*active += l;
-		*inactive += m;
 		*free += n;
 	}
 }
@@ -239,7 +235,7 @@ EXPORT_SYMBOL(mod_zone_page_state);
  * in between and therefore the atomicity vs. interrupt cannot be exploited
  * in a useful way here.
  */
-static void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
+void __inc_zone_state(struct zone *zone, enum zone_stat_item item)
 {
 	struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
 	s8 *p = pcp->vm_stat_diff + item;
@@ -260,9 +256,8 @@ void __inc_zone_page_state(struct page *
 }
 EXPORT_SYMBOL(__inc_zone_page_state);
 
-void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
+void __dec_zone_state(struct zone *zone, enum zone_stat_item item)
 {
-	struct zone *zone = page_zone(page);
 	struct per_cpu_pageset *pcp = zone_pcp(zone, smp_processor_id());
 	s8 *p = pcp->vm_stat_diff + item;
 
@@ -275,6 +270,11 @@ void __dec_zone_page_state(struct page *
 		*p = overstep;
 	}
 }
+
+void __dec_zone_page_state(struct page *page, enum zone_stat_item item)
+{
+	__dec_zone_state(page_zone(page), item);
+}
 EXPORT_SYMBOL(__dec_zone_page_state);
 
 void inc_zone_state(struct zone *zone, enum zone_stat_item item)
@@ -454,6 +454,8 @@ const struct seq_operations fragmentatio
 
 static const char * const vmstat_text[] = {
 	/* Zoned VM counters */
+	"nr_active",
+	"nr_inactive",
 	"nr_anon_pages",
 	"nr_mapped",
 	"nr_file_pages",
@@ -529,8 +531,6 @@ static int zoneinfo_show(struct seq_file
 			   "\n        min      %lu"
 			   "\n        low      %lu"
 			   "\n        high     %lu"
-			   "\n        active   %lu"
-			   "\n        inactive %lu"
 			   "\n        scanned  %lu (a: %lu i: %lu)"
 			   "\n        spanned  %lu"
 			   "\n        present  %lu",
@@ -538,8 +538,6 @@ static int zoneinfo_show(struct seq_file
 			   zone->pages_min,
 			   zone->pages_low,
 			   zone->pages_high,
-			   zone->nr_active,
-			   zone->nr_inactive,
 			   zone->pages_scanned,
 			   zone->nr_scan_active, zone->nr_scan_inactive,
 			   zone->spanned_pages,
Index: linux-2.6.20-rc6/mm/page_alloc.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/page_alloc.c	2007-01-25 20:22:49.000000000 -0800
+++ linux-2.6.20-rc6/mm/page_alloc.c	2007-01-25 20:22:52.000000000 -0800
@@ -1616,8 +1616,8 @@ void show_free_areas(void)
 			K(zone->pages_min),
 			K(zone->pages_low),
 			K(zone->pages_high),
-			K(zone->nr_active),
-			K(zone->nr_inactive),
+			K(zone_page_state(zone, NR_ACTIVE)),
+			K(zone_page_state(zone, NR_INACTIVE)),
 			K(zone->present_pages),
 			zone->pages_scanned,
 			(zone->all_unreclaimable ? "yes" : "no")
@@ -2684,8 +2684,6 @@ static void __meminit free_area_init_cor
 		INIT_LIST_HEAD(&zone->inactive_list);
 		zone->nr_scan_active = 0;
 		zone->nr_scan_inactive = 0;
-		zone->nr_active = 0;
-		zone->nr_inactive = 0;
 		zap_zone_vm_stats(zone);
 		atomic_set(&zone->reclaim_in_progress, 0);
 		if (!size)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 2/8] Use ZVC for free_pages
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
  2007-01-26  5:41 ` [RFC 1/8] Use ZVC for inactive and active counts Christoph Lameter
@ 2007-01-26  5:42 ` Christoph Lameter
  2007-01-26  5:42 ` [RFC 3/8] Reorder ZVCs according to cacheline Christoph Lameter
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:42 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

Use ZVC for free_pages

This is again simplifies some of the VM counter calculations through the use
of the ZVC consolidated counters.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/include/linux/mmzone.h
===================================================================
--- linux-2.6.20-rc6.orig/include/linux/mmzone.h	2007-01-25 11:19:05.000000000 -0800
+++ linux-2.6.20-rc6/include/linux/mmzone.h	2007-01-25 11:20:17.000000000 -0800
@@ -47,6 +47,7 @@ struct zone_padding {
 #endif
 
 enum zone_stat_item {
+	NR_FREE_PAGES,
 	NR_INACTIVE,
 	NR_ACTIVE,
 	NR_ANON_PAGES,	/* Mapped anonymous pages */
@@ -157,7 +158,6 @@ enum zone_type {
 
 struct zone {
 	/* Fields commonly accessed by the page allocator */
-	unsigned long		free_pages;
 	unsigned long		pages_min, pages_low, pages_high;
 	/*
 	 * We don't know if the memory that we're going to allocate will be freeable
Index: linux-2.6.20-rc6/mm/page_alloc.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/page_alloc.c	2007-01-25 11:18:41.000000000 -0800
+++ linux-2.6.20-rc6/mm/page_alloc.c	2007-01-25 11:19:46.000000000 -0800
@@ -395,7 +395,7 @@ static inline void __free_one_page(struc
 	VM_BUG_ON(page_idx & (order_size - 1));
 	VM_BUG_ON(bad_range(zone, page));
 
-	zone->free_pages += order_size;
+	__mod_zone_page_state(zone, NR_FREE_PAGES, order_size);
 	while (order < MAX_ORDER-1) {
 		unsigned long combined_idx;
 		struct free_area *area;
@@ -631,7 +631,7 @@ static struct page *__rmqueue(struct zon
 		list_del(&page->lru);
 		rmv_page_order(page);
 		area->nr_free--;
-		zone->free_pages -= 1UL << order;
+		__mod_zone_page_state(zone, NR_FREE_PAGES, - (1UL << order));
 		expand(zone, page, order, current_order, area);
 		return page;
 	}
@@ -990,7 +990,8 @@ int zone_watermark_ok(struct zone *z, in
 {
 	/* free_pages my go negative - that's OK */
 	unsigned long min = mark;
-	long free_pages = z->free_pages - (1 << order) + 1;
+	long free_pages = zone_page_state(z, NR_FREE_PAGES)
+				- (1 << order) + 1;
 	int o;
 
 	if (alloc_flags & ALLOC_HIGH)
@@ -1445,13 +1446,7 @@ EXPORT_SYMBOL(free_pages);
  */
 unsigned int nr_free_pages(void)
 {
-	unsigned int sum = 0;
-	struct zone *zone;
-
-	for_each_zone(zone)
-		sum += zone->free_pages;
-
-	return sum;
+	return global_page_state(NR_FREE_PAGES);
 }
 
 EXPORT_SYMBOL(nr_free_pages);
@@ -1459,13 +1454,7 @@ EXPORT_SYMBOL(nr_free_pages);
 #ifdef CONFIG_NUMA
 unsigned int nr_free_pages_pgdat(pg_data_t *pgdat)
 {
-	unsigned int sum = 0;
-	enum zone_type i;
-
-	for (i = 0; i < MAX_NR_ZONES; i++)
-		sum += pgdat->node_zones[i].free_pages;
-
-	return sum;
+	return node_page_state(pgdat->node_id, NR_FREE_PAGES);
 }
 #endif
 
@@ -1515,7 +1504,7 @@ void si_meminfo(struct sysinfo *val)
 {
 	val->totalram = totalram_pages;
 	val->sharedram = 0;
-	val->freeram = nr_free_pages();
+	val->freeram = global_page_state(NR_FREE_PAGES);
 	val->bufferram = nr_blockdev_pages();
 	val->totalhigh = totalhigh_pages;
 	val->freehigh = nr_free_highpages();
@@ -1530,7 +1519,7 @@ void si_meminfo_node(struct sysinfo *val
 	pg_data_t *pgdat = NODE_DATA(nid);
 
 	val->totalram = pgdat->node_present_pages;
-	val->freeram = nr_free_pages_pgdat(pgdat);
+	val->freeram = node_page_state(nid, NR_FREE_PAGES);
 #ifdef CONFIG_HIGHMEM
 	val->totalhigh = pgdat->node_zones[ZONE_HIGHMEM].present_pages;
 	val->freehigh = pgdat->node_zones[ZONE_HIGHMEM].free_pages;
@@ -1581,13 +1570,13 @@ void show_free_areas(void)
 	get_zone_counts(&active, &inactive, &free);
 
 	printk("Active:%lu inactive:%lu dirty:%lu writeback:%lu "
-		"unstable:%lu free:%u slab:%lu mapped:%lu pagetables:%lu\n",
+		"unstable:%lu free:%lu slab:%lu mapped:%lu pagetables:%lu\n",
 		active,
 		inactive,
 		global_page_state(NR_FILE_DIRTY),
 		global_page_state(NR_WRITEBACK),
 		global_page_state(NR_UNSTABLE_NFS),
-		nr_free_pages(),
+		global_page_state(NR_FREE_PAGES),
 		global_page_state(NR_SLAB_RECLAIMABLE) +
 			global_page_state(NR_SLAB_UNRECLAIMABLE),
 		global_page_state(NR_FILE_MAPPED),
@@ -1612,7 +1601,7 @@ void show_free_areas(void)
 			" all_unreclaimable? %s"
 			"\n",
 			zone->name,
-			K(zone->free_pages),
+			K(zone_page_state(zone, NR_FREE_PAGES)),
 			K(zone->pages_min),
 			K(zone->pages_low),
 			K(zone->pages_high),
@@ -2675,7 +2664,6 @@ static void __meminit free_area_init_cor
 		spin_lock_init(&zone->lru_lock);
 		zone_seqlock_init(zone);
 		zone->zone_pgdat = pgdat;
-		zone->free_pages = 0;
 
 		zone->prev_priority = DEF_PRIORITY;
 
Index: linux-2.6.20-rc6/mm/vmstat.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/vmstat.c	2007-01-25 11:19:34.000000000 -0800
+++ linux-2.6.20-rc6/mm/vmstat.c	2007-01-25 11:20:32.000000000 -0800
@@ -16,30 +16,17 @@
 void __get_zone_counts(unsigned long *active, unsigned long *inactive,
 			unsigned long *free, struct pglist_data *pgdat)
 {
-	struct zone *zones = pgdat->node_zones;
-	int i;
-
 	*active = node_page_state(pgdat->node_id, NR_ACTIVE);
 	*inactive = node_page_state(pgdat->node_id, NR_INACTIVE);
-	*free = 0;
-	for (i = 0; i < MAX_NR_ZONES; i++) {
-		*free += zones[i].free_pages;
-	}
+	*free = node_page_state(pgdat->node_id, NR_FREE_PAGES);
 }
 
 void get_zone_counts(unsigned long *active,
 		unsigned long *inactive, unsigned long *free)
 {
-	struct pglist_data *pgdat;
-
 	*active = global_page_state(NR_ACTIVE);
 	*inactive = global_page_state(NR_INACTIVE);
-	*free = 0;
-	for_each_online_pgdat(pgdat) {
-		unsigned long l, m, n;
-		__get_zone_counts(&l, &m, &n, pgdat);
-		*free += n;
-	}
+	*free = global_page_state(NR_FREE_PAGES);
 }
 
 #ifdef CONFIG_VM_EVENT_COUNTERS
@@ -454,6 +441,7 @@ const struct seq_operations fragmentatio
 
 static const char * const vmstat_text[] = {
 	/* Zoned VM counters */
+	"nr_free_pages",
 	"nr_active",
 	"nr_inactive",
 	"nr_anon_pages",
@@ -534,7 +522,7 @@ static int zoneinfo_show(struct seq_file
 			   "\n        scanned  %lu (a: %lu i: %lu)"
 			   "\n        spanned  %lu"
 			   "\n        present  %lu",
-			   zone->free_pages,
+			   zone_page_state(zone, NR_FREE_PAGES),
 			   zone->pages_min,
 			   zone->pages_low,
 			   zone->pages_high,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 3/8] Reorder ZVCs according to cacheline
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
  2007-01-26  5:41 ` [RFC 1/8] Use ZVC for inactive and active counts Christoph Lameter
  2007-01-26  5:42 ` [RFC 2/8] Use ZVC for free_pages Christoph Lameter
@ 2007-01-26  5:42 ` Christoph Lameter
  2007-01-26  5:42 ` [RFC 4/8] Drop free_pages() Christoph Lameter
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:42 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

Reorder ZVCs so that the main counters are in the same cacheline.

The global and per zone counter sums are in arrays of longs. Reorder
the ZVCs so that the most frequently used ZVCs are put into the same
cacheline. That way calculations of the global, node and per zone
vm state touches only a single cacheline. This is mostly important
for 64 bit systems were one 128 byte cacheline takes only 8 longs.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/include/linux/mmzone.h
===================================================================
--- linux-2.6.20-rc6.orig/include/linux/mmzone.h	2007-01-25 11:20:59.000000000 -0800
+++ linux-2.6.20-rc6/include/linux/mmzone.h	2007-01-25 11:28:07.000000000 -0800
@@ -47,6 +47,7 @@ struct zone_padding {
 #endif
 
 enum zone_stat_item {
+	/* First 128 byte cacheline (assuming 64 bit words) */
 	NR_FREE_PAGES,
 	NR_INACTIVE,
 	NR_ACTIVE,
@@ -54,11 +55,12 @@ enum zone_stat_item {
 	NR_FILE_MAPPED,	/* pagecache pages mapped into pagetables.
 			   only modified from process context */
 	NR_FILE_PAGES,
-	NR_SLAB_RECLAIMABLE,
-	NR_SLAB_UNRECLAIMABLE,
-	NR_PAGETABLE,	/* used for pagetables */
 	NR_FILE_DIRTY,
 	NR_WRITEBACK,
+	/* Second 128 byte cacheline */
+	NR_SLAB_RECLAIMABLE,
+	NR_SLAB_UNRECLAIMABLE,
+	NR_PAGETABLE,		/* used for pagetables */
 	NR_UNSTABLE_NFS,	/* NFS unstable pages */
 	NR_BOUNCE,
 	NR_VMSCAN_WRITE,
Index: linux-2.6.20-rc6/mm/vmstat.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/vmstat.c	2007-01-25 11:21:30.000000000 -0800
+++ linux-2.6.20-rc6/mm/vmstat.c	2007-01-25 11:22:22.000000000 -0800
@@ -447,11 +447,11 @@ static const char * const vmstat_text[] 
 	"nr_anon_pages",
 	"nr_mapped",
 	"nr_file_pages",
+	"nr_dirty",
+	"nr_writeback",
 	"nr_slab_reclaimable",
 	"nr_slab_unreclaimable",
 	"nr_page_table_pages",
-	"nr_dirty",
-	"nr_writeback",
 	"nr_unstable",
 	"nr_bounce",
 	"nr_vmscan_write",

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 4/8] Drop free_pages()
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
                   ` (2 preceding siblings ...)
  2007-01-26  5:42 ` [RFC 3/8] Reorder ZVCs according to cacheline Christoph Lameter
@ 2007-01-26  5:42 ` Christoph Lameter
  2007-01-26  5:42 ` [RFC 5/8] Drop nr_free_pages_pgdat() Christoph Lameter
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:42 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

Further simplification for nr_free_pages.

nr_free_pages is now a simple access to a global variable. Make it a macro
instead of a function.

The nr_free_pages now requires vmstat.h to be included. There is one
occurrence in power management where we need to add the include. Directly
refrer to global_page_state() there to clarify why the #include was added.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/include/linux/swap.h
===================================================================
--- linux-2.6.20-rc6.orig/include/linux/swap.h	2007-01-25 11:18:12.000000000 -0800
+++ linux-2.6.20-rc6/include/linux/swap.h	2007-01-25 20:19:24.000000000 -0800
@@ -170,11 +170,14 @@ extern void swapin_readahead(swp_entry_t
 extern unsigned long totalram_pages;
 extern unsigned long totalreserve_pages;
 extern long nr_swap_pages;
-extern unsigned int nr_free_pages(void);
 extern unsigned int nr_free_pages_pgdat(pg_data_t *pgdat);
 extern unsigned int nr_free_buffer_pages(void);
 extern unsigned int nr_free_pagecache_pages(void);
 
+/* Definition of global_page_state not available yet */
+#define nr_free_pages() global_page_state(NR_FREE_PAGES)
+
+
 /* linux/mm/swap.c */
 extern void FASTCALL(lru_cache_add(struct page *));
 extern void FASTCALL(lru_cache_add_active(struct page *));
Index: linux-2.6.20-rc6/mm/page_alloc.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/page_alloc.c	2007-01-25 11:19:46.000000000 -0800
+++ linux-2.6.20-rc6/mm/page_alloc.c	2007-01-25 20:19:24.000000000 -0800
@@ -1441,16 +1441,6 @@ fastcall void free_pages(unsigned long a
 
 EXPORT_SYMBOL(free_pages);
 
-/*
- * Total amount of free (allocatable) RAM:
- */
-unsigned int nr_free_pages(void)
-{
-	return global_page_state(NR_FREE_PAGES);
-}
-
-EXPORT_SYMBOL(nr_free_pages);
-
 #ifdef CONFIG_NUMA
 unsigned int nr_free_pages_pgdat(pg_data_t *pgdat)
 {
Index: linux-2.6.20-rc6/kernel/power/main.c
===================================================================
--- linux-2.6.20-rc6.orig/kernel/power/main.c	2007-01-25 20:19:17.000000000 -0800
+++ linux-2.6.20-rc6/kernel/power/main.c	2007-01-25 20:19:30.000000000 -0800
@@ -20,6 +20,7 @@
 #include <linux/cpu.h>
 #include <linux/resume-trace.h>
 #include <linux/freezer.h>
+#include <linux/vmstat.h>
 
 #include "power.h"
 
@@ -72,7 +73,8 @@ static int suspend_prepare(suspend_state
 		goto Thaw;
 	}
 
-	if ((free_pages = nr_free_pages()) < FREE_PAGE_NUMBER) {
+	if ((free_pages = global_page_state(NR_FREE_PAGES))
+			< FREE_PAGE_NUMBER) {
 		pr_debug("PM: free some memory\n");
 		shrink_all_memory(FREE_PAGE_NUMBER - free_pages);
 		if (nr_free_pages() < FREE_PAGE_NUMBER) {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 5/8] Drop nr_free_pages_pgdat()
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
                   ` (3 preceding siblings ...)
  2007-01-26  5:42 ` [RFC 4/8] Drop free_pages() Christoph Lameter
@ 2007-01-26  5:42 ` Christoph Lameter
  2007-01-26  5:42 ` [RFC 6/8] Drop __get_zone_counts() Christoph Lameter
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:42 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

Get rid of nr_free_pages_pgdat()

Function is unnecessary now. We can use the summing features of the ZVCs to
get the values we need.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.20-rc6.orig/arch/ia64/mm/init.c	2007-01-25 10:42:21.000000000 -0800
+++ linux-2.6.20-rc6/arch/ia64/mm/init.c	2007-01-25 10:43:05.000000000 -0800
@@ -67,7 +67,7 @@ max_pgt_pages(void)
 #ifndef	CONFIG_NUMA
 	node_free_pages = nr_free_pages();
 #else
-	node_free_pages = nr_free_pages_pgdat(NODE_DATA(numa_node_id()));
+	node_free_pages = node_page_state(numa_node_id(), NR_FREE_PAGES);
 #endif
 	max_pgt_pages = node_free_pages / PGT_FRACTION_OF_NODE_MEM;
 	max_pgt_pages = max(max_pgt_pages, MIN_PGT_PAGES);
Index: linux-2.6.20-rc6/include/linux/swap.h
===================================================================
--- linux-2.6.20-rc6.orig/include/linux/swap.h	2007-01-25 10:41:14.000000000 -0800
+++ linux-2.6.20-rc6/include/linux/swap.h	2007-01-25 10:41:22.000000000 -0800
@@ -170,7 +170,6 @@ extern void swapin_readahead(swp_entry_t
 extern unsigned long totalram_pages;
 extern unsigned long totalreserve_pages;
 extern long nr_swap_pages;
-extern unsigned int nr_free_pages_pgdat(pg_data_t *pgdat);
 extern unsigned int nr_free_buffer_pages(void);
 extern unsigned int nr_free_pagecache_pages(void);
 
Index: linux-2.6.20-rc6/mm/page_alloc.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/page_alloc.c	2007-01-25 10:41:29.000000000 -0800
+++ linux-2.6.20-rc6/mm/page_alloc.c	2007-01-25 10:41:43.000000000 -0800
@@ -1441,13 +1441,6 @@ fastcall void free_pages(unsigned long a
 
 EXPORT_SYMBOL(free_pages);
 
-#ifdef CONFIG_NUMA
-unsigned int nr_free_pages_pgdat(pg_data_t *pgdat)
-{
-	return node_page_state(pgdat->node_id, NR_FREE_PAGES);
-}
-#endif
-
 static unsigned int nr_free_zone_pages(int offset)
 {
 	/* Just pick one node, since fallback list is circular */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 6/8] Drop __get_zone_counts()
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
                   ` (4 preceding siblings ...)
  2007-01-26  5:42 ` [RFC 5/8] Drop nr_free_pages_pgdat() Christoph Lameter
@ 2007-01-26  5:42 ` Christoph Lameter
  2007-01-26  5:42 ` [RFC 7/8] Drop get_zone_counts() Christoph Lameter
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:42 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

Get rid of __get_zone_counts

Values are readily available via ZVC per node and global sums.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/drivers/base/node.c
===================================================================
--- linux-2.6.20-rc6.orig/drivers/base/node.c	2007-01-25 20:29:22.000000000 -0800
+++ linux-2.6.20-rc6/drivers/base/node.c	2007-01-25 20:30:17.000000000 -0800
@@ -40,13 +40,8 @@ static ssize_t node_read_meminfo(struct 
 	int n;
 	int nid = dev->id;
 	struct sysinfo i;
-	unsigned long inactive;
-	unsigned long active;
-	unsigned long free;
 
 	si_meminfo_node(&i, nid);
-	__get_zone_counts(&active, &inactive, &free, NODE_DATA(nid));
-
 
 	n = sprintf(buf, "\n"
 		       "Node %d MemTotal:     %8lu kB\n"
@@ -74,8 +69,8 @@ static ssize_t node_read_meminfo(struct 
 		       nid, K(i.totalram),
 		       nid, K(i.freeram),
 		       nid, K(i.totalram - i.freeram),
-		       nid, K(active),
-		       nid, K(inactive),
+		       nid, node_page_state(nid, NR_ACTIVE),
+		       nid, node_page_state(nid, NR_INACTIVE),
 #ifdef CONFIG_HIGHMEM
 		       nid, K(i.totalhigh),
 		       nid, K(i.freehigh),
Index: linux-2.6.20-rc6/include/linux/mmzone.h
===================================================================
--- linux-2.6.20-rc6.orig/include/linux/mmzone.h	2007-01-25 20:29:55.000000000 -0800
+++ linux-2.6.20-rc6/include/linux/mmzone.h	2007-01-25 20:29:58.000000000 -0800
@@ -444,8 +444,6 @@ typedef struct pglist_data {
 
 #include <linux/memory_hotplug.h>
 
-void __get_zone_counts(unsigned long *active, unsigned long *inactive,
-			unsigned long *free, struct pglist_data *pgdat);
 void get_zone_counts(unsigned long *active, unsigned long *inactive,
 			unsigned long *free);
 void build_all_zonelists(void);
Index: linux-2.6.20-rc6/mm/readahead.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/readahead.c	2007-01-25 20:29:14.000000000 -0800
+++ linux-2.6.20-rc6/mm/readahead.c	2007-01-25 20:29:58.000000000 -0800
@@ -575,10 +575,6 @@ void handle_ra_miss(struct address_space
  */
 unsigned long max_sane_readahead(unsigned long nr)
 {
-	unsigned long active;
-	unsigned long inactive;
-	unsigned long free;
-
-	__get_zone_counts(&active, &inactive, &free, NODE_DATA(numa_node_id()));
-	return min(nr, (inactive + free) / 2);
+	return min(nr, (node_page_state(numa_node_id(), NR_INACTIVE)
+		+ node_page_state(numa_node_id(), NR_FREE_PAGES)) / 2);
 }
Index: linux-2.6.20-rc6/mm/vmstat.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/vmstat.c	2007-01-25 20:29:55.000000000 -0800
+++ linux-2.6.20-rc6/mm/vmstat.c	2007-01-25 20:29:58.000000000 -0800
@@ -13,14 +13,6 @@
 #include <linux/module.h>
 #include <linux/cpu.h>
 
-void __get_zone_counts(unsigned long *active, unsigned long *inactive,
-			unsigned long *free, struct pglist_data *pgdat)
-{
-	*active = node_page_state(pgdat->node_id, NR_ACTIVE);
-	*inactive = node_page_state(pgdat->node_id, NR_INACTIVE);
-	*free = node_page_state(pgdat->node_id, NR_FREE_PAGES);
-}
-
 void get_zone_counts(unsigned long *active,
 		unsigned long *inactive, unsigned long *free)
 {

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 7/8] Drop get_zone_counts()
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
                   ` (5 preceding siblings ...)
  2007-01-26  5:42 ` [RFC 6/8] Drop __get_zone_counts() Christoph Lameter
@ 2007-01-26  5:42 ` Christoph Lameter
  2007-01-26  5:42 ` [RFC 8/8] Fix writeback calculation Christoph Lameter
  2007-01-26 12:22 ` [RFC 0/8] Use ZVCs for accurate writeback ratio determination Nick Piggin
  8 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:42 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

Get rid of get_zone_counts

Values are available via ZVC sums.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/fs/proc/proc_misc.c
===================================================================
--- linux-2.6.20-rc6.orig/fs/proc/proc_misc.c	2007-01-25 10:52:06.000000000 -0800
+++ linux-2.6.20-rc6/fs/proc/proc_misc.c	2007-01-25 10:52:51.000000000 -0800
@@ -121,16 +121,11 @@ static int meminfo_read_proc(char *page,
 {
 	struct sysinfo i;
 	int len;
-	unsigned long inactive;
-	unsigned long active;
-	unsigned long free;
 	unsigned long committed;
 	unsigned long allowed;
 	struct vmalloc_info vmi;
 	long cached;
 
-	get_zone_counts(&active, &inactive, &free);
-
 /*
  * display in kilobytes.
  */
@@ -187,8 +182,8 @@ static int meminfo_read_proc(char *page,
 		K(i.bufferram),
 		K(cached),
 		K(total_swapcache_pages),
-		K(active),
-		K(inactive),
+		K(global_page_state(NR_ACTIVE)),
+		K(global_page_state(NR_INACTIVE)),
 #ifdef CONFIG_HIGHMEM
 		K(i.totalhigh),
 		K(i.freehigh),
Index: linux-2.6.20-rc6/mm/page_alloc.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/page_alloc.c	2007-01-25 10:52:50.000000000 -0800
+++ linux-2.6.20-rc6/mm/page_alloc.c	2007-01-25 10:52:51.000000000 -0800
@@ -1524,9 +1524,6 @@ void si_meminfo_node(struct sysinfo *val
 void show_free_areas(void)
 {
 	int cpu;
-	unsigned long active;
-	unsigned long inactive;
-	unsigned long free;
 	struct zone *zone;
 
 	for_each_zone(zone) {
@@ -1550,12 +1547,10 @@ void show_free_areas(void)
 		}
 	}
 
-	get_zone_counts(&active, &inactive, &free);
-
 	printk("Active:%lu inactive:%lu dirty:%lu writeback:%lu "
 		"unstable:%lu free:%lu slab:%lu mapped:%lu pagetables:%lu\n",
-		active,
-		inactive,
+		global_page_state(NR_ACTIVE),
+		global_page_state(NR_INACTIVE),
 		global_page_state(NR_FILE_DIRTY),
 		global_page_state(NR_WRITEBACK),
 		global_page_state(NR_UNSTABLE_NFS),
Index: linux-2.6.20-rc6/mm/vmstat.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/vmstat.c	2007-01-25 10:52:51.000000000 -0800
+++ linux-2.6.20-rc6/mm/vmstat.c	2007-01-25 10:52:51.000000000 -0800
@@ -13,14 +13,6 @@
 #include <linux/module.h>
 #include <linux/cpu.h>
 
-void get_zone_counts(unsigned long *active,
-		unsigned long *inactive, unsigned long *free)
-{
-	*active = global_page_state(NR_ACTIVE);
-	*inactive = global_page_state(NR_INACTIVE);
-	*free = global_page_state(NR_FREE_PAGES);
-}
-
 #ifdef CONFIG_VM_EVENT_COUNTERS
 DEFINE_PER_CPU(struct vm_event_state, vm_event_states) = {{0}};
 EXPORT_PER_CPU_SYMBOL(vm_event_states);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC 8/8] Fix writeback calculation
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
                   ` (6 preceding siblings ...)
  2007-01-26  5:42 ` [RFC 7/8] Drop get_zone_counts() Christoph Lameter
@ 2007-01-26  5:42 ` Christoph Lameter
  2007-01-26 12:22 ` [RFC 0/8] Use ZVCs for accurate writeback ratio determination Nick Piggin
  8 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26  5:42 UTC (permalink / raw)
  To: akpm
  Cc: Peter Zijlstra, Nick Piggin, linux-mm, Christoph Lameter,
	Nikita Danilov, Andi Kleen

We can use the global ZVC counters to establish the exact size of the LRU
and the free pages. This allows a more accurate determination of the dirty
ratio.

This patch will fix the broken ratio calculations if large amounts of memory
are allocated to huge pags or other consumers that do not put the pages on
to the LRU.

However, we are unable to use the accurate base in the case of HIGHMEM and
an allocation excluding HIGHMEM pages. In that case just fall back to the
old scheme.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6.20-rc6/mm/page-writeback.c
===================================================================
--- linux-2.6.20-rc6.orig/mm/page-writeback.c	2007-01-25 10:53:56.000000000 -0800
+++ linux-2.6.20-rc6/mm/page-writeback.c	2007-01-25 11:03:47.000000000 -0800
@@ -128,7 +128,9 @@ get_dirty_limits(long *pbackground, long
 	int unmapped_ratio;
 	long background;
 	long dirty;
-	unsigned long available_memory = vm_total_pages;
+	unsigned long available_memory = global_page_state(NR_FREE_PAGES) +
+			global_page_state(NR_INACTIVE) +
+			global_page_state(NR_ACTIVE);
 	struct task_struct *tsk;
 
 #ifdef CONFIG_HIGHMEM
@@ -137,7 +139,12 @@ get_dirty_limits(long *pbackground, long
 	 * we exclude high memory from our count.
 	 */
 	if (mapping && !(mapping_gfp_mask(mapping) & __GFP_HIGHMEM))
-		available_memory -= totalhigh_pages;
+		/*
+		 * This is not as accurate as the non highmem calculation
+		 * but it has worked for years. So let it be as it was.
+		 * People know how to deal with it it seems.
+		 */
+		available_memory = vm_total_pages - totalhigh_pages;
 #endif
 
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 0/8] Use ZVCs for accurate writeback ratio determination
  2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
                   ` (7 preceding siblings ...)
  2007-01-26  5:42 ` [RFC 8/8] Fix writeback calculation Christoph Lameter
@ 2007-01-26 12:22 ` Nick Piggin
  2007-01-26 15:49   ` Christoph Lameter
  8 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2007-01-26 12:22 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Peter Zijlstra, linux-mm, Nikita Danilov, Andi Kleen

Christoph Lameter wrote:
> The determination of the dirty ratio to determine writeback behavior
> is currently based on the number of total pages on the system.
> 
> However, not all pages in the system may be dirtied. Thus the ratio
> is always too low and can never reach 100%. The ratio may be
> particularly skewed if large hugepage allocations, slab allocations
> or device driver buffers make large sections of memory not available
> anymore. In that case we may get into a situation in which f.e. the
> background writeback ratio of 40% cannot be reached anymore which
> leads to undesired writeback behavior.
> 
> This patchset fixes that issue by determining the ratio based
> on the actual pages that may potentially be dirty. These are
> the pages on the active and the inactive list plus free pages.

So you no longer account for reclaimable slab allocations, which
would be a significant change on some workloads. Any reason for
that?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 0/8] Use ZVCs for accurate writeback ratio determination
  2007-01-26 12:22 ` [RFC 0/8] Use ZVCs for accurate writeback ratio determination Nick Piggin
@ 2007-01-26 15:49   ` Christoph Lameter
  2007-01-29  2:40     ` Nick Piggin
  0 siblings, 1 reply; 13+ messages in thread
From: Christoph Lameter @ 2007-01-26 15:49 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, Peter Zijlstra, linux-mm, Nikita Danilov, Andi Kleen

On Fri, 26 Jan 2007, Nick Piggin wrote:

> So you no longer account for reclaimable slab allocations, which
> would be a significant change on some workloads. Any reason for
> that?

We could add NR_SLAB_RECLAIMABLE if that is a factor. However, 
these pages cannot be dirtied. They may be reclaimed yes and then pages 
may become available again. However, that is a difficult process without
slab defrag. Are you sure that these are significant?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 0/8] Use ZVCs for accurate writeback ratio determination
  2007-01-26 15:49   ` Christoph Lameter
@ 2007-01-29  2:40     ` Nick Piggin
  2007-01-29 16:56       ` Christoph Lameter
  0 siblings, 1 reply; 13+ messages in thread
From: Nick Piggin @ 2007-01-29  2:40 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, Peter Zijlstra, linux-mm, Nikita Danilov, Andi Kleen

Christoph Lameter wrote:
> On Fri, 26 Jan 2007, Nick Piggin wrote:
> 
> 
>>So you no longer account for reclaimable slab allocations, which
>>would be a significant change on some workloads. Any reason for
>>that?
> 
> 
> We could add NR_SLAB_RECLAIMABLE if that is a factor. However, 
> these pages cannot be dirtied. They may be reclaimed yes and then pages 
> may become available again. However, that is a difficult process without
> slab defrag. Are you sure that these are significant?

I think so. I have seen systems that get very full of dcache/icache, and
little to no pagecache. In that case it makes no sense to limit dirty
pages to a potentially small amount.

Slab reclaim does work. It may not be perfect, but I don't think that
should spill over into dirty page calculations. If anything we need to
improve slab reclaimability estimates for that.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC 0/8] Use ZVCs for accurate writeback ratio determination
  2007-01-29  2:40     ` Nick Piggin
@ 2007-01-29 16:56       ` Christoph Lameter
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Lameter @ 2007-01-29 16:56 UTC (permalink / raw)
  To: Nick Piggin; +Cc: akpm, Peter Zijlstra, linux-mm, Nikita Danilov, Andi Kleen

On Mon, 29 Jan 2007, Nick Piggin wrote:

> > We could add NR_SLAB_RECLAIMABLE if that is a factor. However, these pages
> > cannot be dirtied. They may be reclaimed yes and then pages may become
> > available again. However, that is a difficult process without
> > slab defrag. Are you sure that these are significant?
> 
> I think so. I have seen systems that get very full of dcache/icache, and
> little to no pagecache. In that case it makes no sense to limit dirty
> pages to a potentially small amount.
> 
> Slab reclaim does work. It may not be perfect, but I don't think that
> should spill over into dirty page calculations. If anything we need to
> improve slab reclaimability estimates for that.

How about adding NR_SLAB_RECLAIMABLE / 2 to take the reclaim problems into 
account that may lead to many pages not be recoverable?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2007-01-29 16:56 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-01-26  5:41 [RFC 0/8] Use ZVCs for accurate writeback ratio determination Christoph Lameter
2007-01-26  5:41 ` [RFC 1/8] Use ZVC for inactive and active counts Christoph Lameter
2007-01-26  5:42 ` [RFC 2/8] Use ZVC for free_pages Christoph Lameter
2007-01-26  5:42 ` [RFC 3/8] Reorder ZVCs according to cacheline Christoph Lameter
2007-01-26  5:42 ` [RFC 4/8] Drop free_pages() Christoph Lameter
2007-01-26  5:42 ` [RFC 5/8] Drop nr_free_pages_pgdat() Christoph Lameter
2007-01-26  5:42 ` [RFC 6/8] Drop __get_zone_counts() Christoph Lameter
2007-01-26  5:42 ` [RFC 7/8] Drop get_zone_counts() Christoph Lameter
2007-01-26  5:42 ` [RFC 8/8] Fix writeback calculation Christoph Lameter
2007-01-26 12:22 ` [RFC 0/8] Use ZVCs for accurate writeback ratio determination Nick Piggin
2007-01-26 15:49   ` Christoph Lameter
2007-01-29  2:40     ` Nick Piggin
2007-01-29 16:56       ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.