All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/7] f/madivse(DONTNEED) support
@ 2010-12-05 17:29 ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim

Recently there is a report about working set page eviction due to rsync
workload. application programmers want to use fadvise but it's not easy.
You could see detailed description on [2/7].
 - [1/7] is to remove checkpatch's reporting in mm/swap.c
 - [2/7] is to move invalidated page which is dirty/writeback on active list
   into inactive list's head.
 - [3/7] is to move memcg reclaimable page on inactive's tail.
 - [4/7] is for moving invalidated page into inactive list's tail when the
   page's writeout is completed for reclaim asap.
 - [5/7] is to add profing information for evaluation.
 - [6/7] is to remove zap_detail NULL dependency to some functions. It is for
	next patch.
 - [7/7] is to not calling mark_page_accessed in case of madvise(DONTNEED)

This patches are based on mmotm-12-02
Before applying the series, Please, remove below patches.
mm-deactivate-invalidated-pages.patch
mm-deactivate-invalidated-pages-fix.patch

Minchan Kim (7):
  Fix checkpatch's report in swap.c
  deactivate invalidated pages
  move memcg reclaimable page into tail of inactive list
  Reclaim invalidated page ASAP
  add profile information for invalidated page reclaim
  Remove zap_details NULL dependency
  Prevent activation of page in madvise_dontneed

 include/linux/memcontrol.h |    6 ++
 include/linux/mm.h         |   10 ++++
 include/linux/swap.h       |    1 +
 include/linux/vmstat.h     |    4 +-
 mm/madvise.c               |   13 +++--
 mm/memcontrol.c            |   27 +++++++++
 mm/memory.c                |   19 ++++---
 mm/mmap.c                  |    6 ++-
 mm/page-writeback.c        |   12 ++++-
 mm/swap.c                  |  127 +++++++++++++++++++++++++++++++++++++++++---
 mm/truncate.c              |   17 +++++--
 mm/vmstat.c                |    3 +
 12 files changed, 216 insertions(+), 29 deletions(-)


^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 0/7] f/madivse(DONTNEED) support
@ 2010-12-05 17:29 ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim

Recently there is a report about working set page eviction due to rsync
workload. application programmers want to use fadvise but it's not easy.
You could see detailed description on [2/7].
 - [1/7] is to remove checkpatch's reporting in mm/swap.c
 - [2/7] is to move invalidated page which is dirty/writeback on active list
   into inactive list's head.
 - [3/7] is to move memcg reclaimable page on inactive's tail.
 - [4/7] is for moving invalidated page into inactive list's tail when the
   page's writeout is completed for reclaim asap.
 - [5/7] is to add profing information for evaluation.
 - [6/7] is to remove zap_detail NULL dependency to some functions. It is for
	next patch.
 - [7/7] is to not calling mark_page_accessed in case of madvise(DONTNEED)

This patches are based on mmotm-12-02
Before applying the series, Please, remove below patches.
mm-deactivate-invalidated-pages.patch
mm-deactivate-invalidated-pages-fix.patch

Minchan Kim (7):
  Fix checkpatch's report in swap.c
  deactivate invalidated pages
  move memcg reclaimable page into tail of inactive list
  Reclaim invalidated page ASAP
  add profile information for invalidated page reclaim
  Remove zap_details NULL dependency
  Prevent activation of page in madvise_dontneed

 include/linux/memcontrol.h |    6 ++
 include/linux/mm.h         |   10 ++++
 include/linux/swap.h       |    1 +
 include/linux/vmstat.h     |    4 +-
 mm/madvise.c               |   13 +++--
 mm/memcontrol.c            |   27 +++++++++
 mm/memory.c                |   19 ++++---
 mm/mmap.c                  |    6 ++-
 mm/page-writeback.c        |   12 ++++-
 mm/swap.c                  |  127 +++++++++++++++++++++++++++++++++++++++++---
 mm/truncate.c              |   17 +++++--
 mm/vmstat.c                |    3 +
 12 files changed, 216 insertions(+), 29 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* [PATCH v4 1/7] Fix checkpatch's report in swap.c
  2010-12-05 17:29 ` Minchan Kim
@ 2010-12-05 17:29   ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim

checkpatch reports following problems.
It's a very annoying. This patch fixes it.

barrios@barrios-desktop:~/linux-2.6$ ./scripts/checkpatch.pl -f mm/swap.c
WARNING: line over 80 characters
+		if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(mark_page_accessed);

ERROR: code indent should use tabs where possible
+  ^I^I}$

WARNING: please, no space before tabs
+  ^I^I}$

WARNING: please, no spaces at the start of a line
+  ^I^I}$

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(__pagevec_release);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(____pagevec_lru_add);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(pagevec_lookup);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(pagevec_lookup_tag);

total: 1 errors, 8 warnings, 517 lines checked

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/swap.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 3f48542..d5822b0 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -118,7 +118,8 @@ static void pagevec_move_tail(struct pagevec *pvec)
 			zone = pagezone;
 			spin_lock(&zone->lru_lock);
 		}
-		if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
+		if (PageLRU(page) && !PageActive(page) &&
+					!PageUnevictable(page)) {
 			int lru = page_lru_base_type(page);
 			list_move_tail(&page->lru, &zone->lru[lru].list);
 			pgmoved++;
@@ -212,7 +213,6 @@ void mark_page_accessed(struct page *page)
 		SetPageReferenced(page);
 	}
 }
-
 EXPORT_SYMBOL(mark_page_accessed);
 
 void __lru_cache_add(struct page *page, enum lru_list lru)
@@ -371,7 +371,7 @@ void release_pages(struct page **pages, int nr, int cold)
 			}
 			__pagevec_free(&pages_to_free);
 			pagevec_reinit(&pages_to_free);
-  		}
+		}
 	}
 	if (zone)
 		spin_unlock_irqrestore(&zone->lru_lock, flags);
@@ -396,7 +396,6 @@ void __pagevec_release(struct pagevec *pvec)
 	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
 	pagevec_reinit(pvec);
 }
-
 EXPORT_SYMBOL(__pagevec_release);
 
 /*
@@ -438,7 +437,6 @@ void ____pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
 	release_pages(pvec->pages, pvec->nr, pvec->cold);
 	pagevec_reinit(pvec);
 }
-
 EXPORT_SYMBOL(____pagevec_lru_add);
 
 /*
@@ -481,7 +479,6 @@ unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
 	pvec->nr = find_get_pages(mapping, start, nr_pages, pvec->pages);
 	return pagevec_count(pvec);
 }
-
 EXPORT_SYMBOL(pagevec_lookup);
 
 unsigned pagevec_lookup_tag(struct pagevec *pvec, struct address_space *mapping,
@@ -491,7 +488,6 @@ unsigned pagevec_lookup_tag(struct pagevec *pvec, struct address_space *mapping,
 					nr_pages, pvec->pages);
 	return pagevec_count(pvec);
 }
-
 EXPORT_SYMBOL(pagevec_lookup_tag);
 
 /*
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 1/7] Fix checkpatch's report in swap.c
@ 2010-12-05 17:29   ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim

checkpatch reports following problems.
It's a very annoying. This patch fixes it.

barrios@barrios-desktop:~/linux-2.6$ ./scripts/checkpatch.pl -f mm/swap.c
WARNING: line over 80 characters
+		if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(mark_page_accessed);

ERROR: code indent should use tabs where possible
+  ^I^I}$

WARNING: please, no space before tabs
+  ^I^I}$

WARNING: please, no spaces at the start of a line
+  ^I^I}$

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(__pagevec_release);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(____pagevec_lru_add);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(pagevec_lookup);

WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
+EXPORT_SYMBOL(pagevec_lookup_tag);

total: 1 errors, 8 warnings, 517 lines checked

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 mm/swap.c |   10 +++-------
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/mm/swap.c b/mm/swap.c
index 3f48542..d5822b0 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -118,7 +118,8 @@ static void pagevec_move_tail(struct pagevec *pvec)
 			zone = pagezone;
 			spin_lock(&zone->lru_lock);
 		}
-		if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
+		if (PageLRU(page) && !PageActive(page) &&
+					!PageUnevictable(page)) {
 			int lru = page_lru_base_type(page);
 			list_move_tail(&page->lru, &zone->lru[lru].list);
 			pgmoved++;
@@ -212,7 +213,6 @@ void mark_page_accessed(struct page *page)
 		SetPageReferenced(page);
 	}
 }
-
 EXPORT_SYMBOL(mark_page_accessed);
 
 void __lru_cache_add(struct page *page, enum lru_list lru)
@@ -371,7 +371,7 @@ void release_pages(struct page **pages, int nr, int cold)
 			}
 			__pagevec_free(&pages_to_free);
 			pagevec_reinit(&pages_to_free);
-  		}
+		}
 	}
 	if (zone)
 		spin_unlock_irqrestore(&zone->lru_lock, flags);
@@ -396,7 +396,6 @@ void __pagevec_release(struct pagevec *pvec)
 	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
 	pagevec_reinit(pvec);
 }
-
 EXPORT_SYMBOL(__pagevec_release);
 
 /*
@@ -438,7 +437,6 @@ void ____pagevec_lru_add(struct pagevec *pvec, enum lru_list lru)
 	release_pages(pvec->pages, pvec->nr, pvec->cold);
 	pagevec_reinit(pvec);
 }
-
 EXPORT_SYMBOL(____pagevec_lru_add);
 
 /*
@@ -481,7 +479,6 @@ unsigned pagevec_lookup(struct pagevec *pvec, struct address_space *mapping,
 	pvec->nr = find_get_pages(mapping, start, nr_pages, pvec->pages);
 	return pagevec_count(pvec);
 }
-
 EXPORT_SYMBOL(pagevec_lookup);
 
 unsigned pagevec_lookup_tag(struct pagevec *pvec, struct address_space *mapping,
@@ -491,7 +488,6 @@ unsigned pagevec_lookup_tag(struct pagevec *pvec, struct address_space *mapping,
 					nr_pages, pvec->pages);
 	return pagevec_count(pvec);
 }
-
 EXPORT_SYMBOL(pagevec_lookup_tag);
 
 /*
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 2/7] deactivate invalidated pages
  2010-12-05 17:29 ` Minchan Kim
@ 2010-12-05 17:29   ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Peter Zijlstra, Wu Fengguang, Johannes Weiner, Nick Piggin,
	Mel Gorman

Recently, there are reported problem about thrashing.
(http://marc.info/?l=rsync&m=128885034930933&w=2)
It happens by backup workloads(ex, nightly rsync).
That's because the workload makes just use-once pages
and touches pages twice. It promotes the page into
active list so that it results in working set page eviction.

Some app developer want to support POSIX_FADV_NOREUSE.
But other OSes don't support it, either.
(http://marc.info/?l=linux-mm&m=128928979512086&w=2)

By other approach, app developers use POSIX_FADV_DONTNEED.
But it has a problem. If kernel meets page is writing
during invalidate_mapping_pages, it can't work.
It is very hard for application programmer to use it.
Because they always have to sync data before calling
fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
be discardable. At last, they can't use deferred write of kernel
so that they could see performance loss.
(http://insights.oetiker.ch/linux/fadvise.html)

In fact, invalidation is very big hint to reclaimer.
It means we don't use the page any more. So let's move
the writing page into inactive list's head.

Why I need the page to head, Dirty/Writeback page would be flushed
sooner or later. It can prevent writeout of pageout which is less
effective than flusher's writeout.

Originally, I reused lru_demote of Peter with some change so added
his Signed-off-by.

Reported-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>

Andrew. Before applying this series, please drop below two patches.
 mm-deactivate-invalidated-pages.patch
 mm-deactivate-invalidated-pages-fix.patch

Changelog since v3:
 - Change function comments - suggested by Johannes
 - Change function name - suggested by Johannes
 - add only dirty/writeback pages to deactive pagevec

Changelog since v2:
 - mapped page leaves alone - suggested by Mel
 - pass part related PG_reclaim in next patch.

Changelog since v1:
 - modify description
 - correct typo
 - add some comment

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/swap.h |    1 +
 mm/swap.c            |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/truncate.c        |   17 ++++++++--
 3 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index eba53e7..605ab62 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -213,6 +213,7 @@ extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern int lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
+extern void deactivate_page(struct page *page);
 extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/mm/swap.c b/mm/swap.c
index d5822b0..1f36f6f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -39,6 +39,8 @@ int page_cluster;
 
 static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
+
 
 /*
  * This path almost never happens for VM activity - pages are normally
@@ -267,6 +269,59 @@ void add_page_to_unevictable_list(struct page *page)
 }
 
 /*
+ * If the page can not be invalidated, it is moved to the
+ * inactive list to speed up its reclaim.  It is moved to the
+ * head of the list, rather than the tail, to give the flusher
+ * threads some time to write it out, as this is much more
+ * effective than the single-page writeout from reclaim.
+ */
+static void lru_deactivate(struct page *page, struct zone *zone)
+{
+	int lru, file;
+
+	if (!PageLRU(page) || !PageActive(page))
+		return;
+
+	/* Some processes are using the page */
+	if (page_mapped(page))
+		return;
+
+	file = page_is_file_cache(page);
+	lru = page_lru_base_type(page);
+	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
+	ClearPageActive(page);
+	ClearPageReferenced(page);
+	add_page_to_lru_list(zone, page, lru);
+	__count_vm_event(PGDEACTIVATE);
+
+	update_page_reclaim_stat(zone, page, file, 0);
+}
+
+static void ____pagevec_lru_deactivate(struct pagevec *pvec)
+{
+	int i;
+	struct zone *zone = NULL;
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+		struct zone *pagezone = page_zone(page);
+
+		if (pagezone != zone) {
+			if (zone)
+				spin_unlock_irq(&zone->lru_lock);
+			zone = pagezone;
+			spin_lock_irq(&zone->lru_lock);
+		}
+		lru_deactivate(page, zone);
+	}
+	if (zone)
+		spin_unlock_irq(&zone->lru_lock);
+
+	release_pages(pvec->pages, pvec->nr, pvec->cold);
+	pagevec_reinit(pvec);
+}
+
+/*
  * Drain pages out of the cpu's pagevecs.
  * Either "cpu" is the current CPU, and preemption has already been
  * disabled; or "cpu" is being hot-unplugged, and is already dead.
@@ -292,6 +347,29 @@ static void drain_cpu_pagevecs(int cpu)
 		pagevec_move_tail(pvec);
 		local_irq_restore(flags);
 	}
+
+	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+	if (pagevec_count(pvec))
+		____pagevec_lru_deactivate(pvec);
+}
+
+/**
+ * deactivate_page - forcefully deactivate a page
+ * @page: page to deactivate
+ *
+ * This function hints the VM that @page is a good reclaim candidate,
+ * for example if its invalidation fails due to the page being dirty
+ * or under writeback.
+ */
+void deactivate_page(struct page *page)
+{
+	if (likely(get_page_unless_zero(page))) {
+		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+
+		if (!pagevec_add(pvec, page))
+			____pagevec_lru_deactivate(pvec);
+		put_cpu_var(lru_deactivate_pvecs);
+	}
 }
 
 void lru_add_drain(void)
diff --git a/mm/truncate.c b/mm/truncate.c
index cd94607..ef03cbc 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -332,7 +332,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 {
 	struct pagevec pvec;
 	pgoff_t next = start;
-	unsigned long ret = 0;
+	unsigned long ret;
+	unsigned long count = 0;
 	int i;
 
 	pagevec_init(&pvec, 0);
@@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			if (lock_failed)
 				continue;
 
-			ret += invalidate_inode_page(page);
-
+			ret = invalidate_inode_page(page);
+			/*
+			 * If the page is dirty or under writeback, we can not
+			 * invalidate it now.  But we assume that attempted
+			 * invalidation is a hint that the page is no longer
+			 * of interest and try to speed up its reclaim.
+			 */
+			if (!ret && (PageDirty(page) || PageWriteback(page)))
+				deactivate_page(page);
+			count += ret;
 			unlock_page(page);
 			if (next > end)
 				break;
@@ -369,7 +378,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 		mem_cgroup_uncharge_end();
 		cond_resched();
 	}
-	return ret;
+	return count;
 }
 EXPORT_SYMBOL(invalidate_mapping_pages);
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-05 17:29   ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Peter Zijlstra, Wu Fengguang, Johannes Weiner, Nick Piggin,
	Mel Gorman

Recently, there are reported problem about thrashing.
(http://marc.info/?l=rsync&m=128885034930933&w=2)
It happens by backup workloads(ex, nightly rsync).
That's because the workload makes just use-once pages
and touches pages twice. It promotes the page into
active list so that it results in working set page eviction.

Some app developer want to support POSIX_FADV_NOREUSE.
But other OSes don't support it, either.
(http://marc.info/?l=linux-mm&m=128928979512086&w=2)

By other approach, app developers use POSIX_FADV_DONTNEED.
But it has a problem. If kernel meets page is writing
during invalidate_mapping_pages, it can't work.
It is very hard for application programmer to use it.
Because they always have to sync data before calling
fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
be discardable. At last, they can't use deferred write of kernel
so that they could see performance loss.
(http://insights.oetiker.ch/linux/fadvise.html)

In fact, invalidation is very big hint to reclaimer.
It means we don't use the page any more. So let's move
the writing page into inactive list's head.

Why I need the page to head, Dirty/Writeback page would be flushed
sooner or later. It can prevent writeout of pageout which is less
effective than flusher's writeout.

Originally, I reused lru_demote of Peter with some change so added
his Signed-off-by.

Reported-by: Ben Gamari <bgamari.foss@gmail.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>

Andrew. Before applying this series, please drop below two patches.
 mm-deactivate-invalidated-pages.patch
 mm-deactivate-invalidated-pages-fix.patch

Changelog since v3:
 - Change function comments - suggested by Johannes
 - Change function name - suggested by Johannes
 - add only dirty/writeback pages to deactive pagevec

Changelog since v2:
 - mapped page leaves alone - suggested by Mel
 - pass part related PG_reclaim in next patch.

Changelog since v1:
 - modify description
 - correct typo
 - add some comment

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/swap.h |    1 +
 mm/swap.c            |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/truncate.c        |   17 ++++++++--
 3 files changed, 92 insertions(+), 4 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index eba53e7..605ab62 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -213,6 +213,7 @@ extern void mark_page_accessed(struct page *);
 extern void lru_add_drain(void);
 extern int lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
+extern void deactivate_page(struct page *page);
 extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/mm/swap.c b/mm/swap.c
index d5822b0..1f36f6f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -39,6 +39,8 @@ int page_cluster;
 
 static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
+
 
 /*
  * This path almost never happens for VM activity - pages are normally
@@ -267,6 +269,59 @@ void add_page_to_unevictable_list(struct page *page)
 }
 
 /*
+ * If the page can not be invalidated, it is moved to the
+ * inactive list to speed up its reclaim.  It is moved to the
+ * head of the list, rather than the tail, to give the flusher
+ * threads some time to write it out, as this is much more
+ * effective than the single-page writeout from reclaim.
+ */
+static void lru_deactivate(struct page *page, struct zone *zone)
+{
+	int lru, file;
+
+	if (!PageLRU(page) || !PageActive(page))
+		return;
+
+	/* Some processes are using the page */
+	if (page_mapped(page))
+		return;
+
+	file = page_is_file_cache(page);
+	lru = page_lru_base_type(page);
+	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
+	ClearPageActive(page);
+	ClearPageReferenced(page);
+	add_page_to_lru_list(zone, page, lru);
+	__count_vm_event(PGDEACTIVATE);
+
+	update_page_reclaim_stat(zone, page, file, 0);
+}
+
+static void ____pagevec_lru_deactivate(struct pagevec *pvec)
+{
+	int i;
+	struct zone *zone = NULL;
+
+	for (i = 0; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+		struct zone *pagezone = page_zone(page);
+
+		if (pagezone != zone) {
+			if (zone)
+				spin_unlock_irq(&zone->lru_lock);
+			zone = pagezone;
+			spin_lock_irq(&zone->lru_lock);
+		}
+		lru_deactivate(page, zone);
+	}
+	if (zone)
+		spin_unlock_irq(&zone->lru_lock);
+
+	release_pages(pvec->pages, pvec->nr, pvec->cold);
+	pagevec_reinit(pvec);
+}
+
+/*
  * Drain pages out of the cpu's pagevecs.
  * Either "cpu" is the current CPU, and preemption has already been
  * disabled; or "cpu" is being hot-unplugged, and is already dead.
@@ -292,6 +347,29 @@ static void drain_cpu_pagevecs(int cpu)
 		pagevec_move_tail(pvec);
 		local_irq_restore(flags);
 	}
+
+	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+	if (pagevec_count(pvec))
+		____pagevec_lru_deactivate(pvec);
+}
+
+/**
+ * deactivate_page - forcefully deactivate a page
+ * @page: page to deactivate
+ *
+ * This function hints the VM that @page is a good reclaim candidate,
+ * for example if its invalidation fails due to the page being dirty
+ * or under writeback.
+ */
+void deactivate_page(struct page *page)
+{
+	if (likely(get_page_unless_zero(page))) {
+		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+
+		if (!pagevec_add(pvec, page))
+			____pagevec_lru_deactivate(pvec);
+		put_cpu_var(lru_deactivate_pvecs);
+	}
 }
 
 void lru_add_drain(void)
diff --git a/mm/truncate.c b/mm/truncate.c
index cd94607..ef03cbc 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -332,7 +332,8 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 {
 	struct pagevec pvec;
 	pgoff_t next = start;
-	unsigned long ret = 0;
+	unsigned long ret;
+	unsigned long count = 0;
 	int i;
 
 	pagevec_init(&pvec, 0);
@@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			if (lock_failed)
 				continue;
 
-			ret += invalidate_inode_page(page);
-
+			ret = invalidate_inode_page(page);
+			/*
+			 * If the page is dirty or under writeback, we can not
+			 * invalidate it now.  But we assume that attempted
+			 * invalidation is a hint that the page is no longer
+			 * of interest and try to speed up its reclaim.
+			 */
+			if (!ret && (PageDirty(page) || PageWriteback(page)))
+				deactivate_page(page);
+			count += ret;
 			unlock_page(page);
 			if (next > end)
 				break;
@@ -369,7 +378,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 		mem_cgroup_uncharge_end();
 		cond_resched();
 	}
-	return ret;
+	return count;
 }
 EXPORT_SYMBOL(invalidate_mapping_pages);
 
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
  2010-12-05 17:29 ` Minchan Kim
@ 2010-12-05 17:29   ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Balbir Singh, KAMEZAWA Hiroyuki

Golbal page reclaim moves reclaimalbe pages into inactive list
to reclaim asap. This patch apply the rule in memcg.
It can help to prevent unnecessary working page eviction of memcg.

Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/memcontrol.h |    6 ++++++
 mm/memcontrol.c            |   27 +++++++++++++++++++++++++++
 mm/swap.c                  |    3 ++-
 3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 067115c..8317f5c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -62,6 +62,7 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 					gfp_t gfp_mask);
 extern void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru);
 extern void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru);
+extern void mem_cgroup_rotate_reclaimable_page(struct page *page);
 extern void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru);
 extern void mem_cgroup_del_lru(struct page *page);
 extern void mem_cgroup_move_lists(struct page *page,
@@ -207,6 +208,11 @@ static inline void mem_cgroup_del_lru_list(struct page *page, int lru)
 	return ;
 }
 
+static inline inline void mem_cgroup_rotate_reclaimable_page(struct page *page)
+{
+	return ;
+}
+
 static inline void mem_cgroup_rotate_lru_list(struct page *page, int lru)
 {
 	return ;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 729beb7..f9435be 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -829,6 +829,33 @@ void mem_cgroup_del_lru(struct page *page)
 	mem_cgroup_del_lru_list(page, page_lru(page));
 }
 
+/*
+ * Writeback is about to end against a page which has been marked for immediate
+ * reclaim.  If it still appears to be reclaimable, move it to the tail of the
+ * inactive list.
+ */
+void mem_cgroup_rotate_reclaimable_page(struct page *page)
+{
+	struct mem_cgroup_per_zone *mz;
+	struct page_cgroup *pc;
+	enum lru_list lru = page_lru_base_type(page);
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	/*
+	 * Used bit is set without atomic ops but after smp_wmb().
+	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
+	 */
+	smp_rmb();
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+		return;
+	mz = page_cgroup_zoneinfo(pc);
+	list_move_tail(&pc->lru, &mz->lists[lru]);
+}
+
 void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 {
 	struct mem_cgroup_per_zone *mz;
diff --git a/mm/swap.c b/mm/swap.c
index 1f36f6f..0fe98e7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -122,8 +122,9 @@ static void pagevec_move_tail(struct pagevec *pvec)
 		}
 		if (PageLRU(page) && !PageActive(page) &&
 					!PageUnevictable(page)) {
-			int lru = page_lru_base_type(page);
+			enum lru_list lru = page_lru_base_type(page);
 			list_move_tail(&page->lru, &zone->lru[lru].list);
+			mem_cgroup_rotate_reclaimable_page(page);
 			pgmoved++;
 		}
 	}
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
@ 2010-12-05 17:29   ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Balbir Singh, KAMEZAWA Hiroyuki

Golbal page reclaim moves reclaimalbe pages into inactive list
to reclaim asap. This patch apply the rule in memcg.
It can help to prevent unnecessary working page eviction of memcg.

Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
---
 include/linux/memcontrol.h |    6 ++++++
 mm/memcontrol.c            |   27 +++++++++++++++++++++++++++
 mm/swap.c                  |    3 ++-
 3 files changed, 35 insertions(+), 1 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index 067115c..8317f5c 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -62,6 +62,7 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
 					gfp_t gfp_mask);
 extern void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru);
 extern void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru);
+extern void mem_cgroup_rotate_reclaimable_page(struct page *page);
 extern void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru);
 extern void mem_cgroup_del_lru(struct page *page);
 extern void mem_cgroup_move_lists(struct page *page,
@@ -207,6 +208,11 @@ static inline void mem_cgroup_del_lru_list(struct page *page, int lru)
 	return ;
 }
 
+static inline inline void mem_cgroup_rotate_reclaimable_page(struct page *page)
+{
+	return ;
+}
+
 static inline void mem_cgroup_rotate_lru_list(struct page *page, int lru)
 {
 	return ;
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 729beb7..f9435be 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -829,6 +829,33 @@ void mem_cgroup_del_lru(struct page *page)
 	mem_cgroup_del_lru_list(page, page_lru(page));
 }
 
+/*
+ * Writeback is about to end against a page which has been marked for immediate
+ * reclaim.  If it still appears to be reclaimable, move it to the tail of the
+ * inactive list.
+ */
+void mem_cgroup_rotate_reclaimable_page(struct page *page)
+{
+	struct mem_cgroup_per_zone *mz;
+	struct page_cgroup *pc;
+	enum lru_list lru = page_lru_base_type(page);
+
+	if (mem_cgroup_disabled())
+		return;
+
+	pc = lookup_page_cgroup(page);
+	/*
+	 * Used bit is set without atomic ops but after smp_wmb().
+	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
+	 */
+	smp_rmb();
+	/* unused or root page is not rotated. */
+	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
+		return;
+	mz = page_cgroup_zoneinfo(pc);
+	list_move_tail(&pc->lru, &mz->lists[lru]);
+}
+
 void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
 {
 	struct mem_cgroup_per_zone *mz;
diff --git a/mm/swap.c b/mm/swap.c
index 1f36f6f..0fe98e7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -122,8 +122,9 @@ static void pagevec_move_tail(struct pagevec *pvec)
 		}
 		if (PageLRU(page) && !PageActive(page) &&
 					!PageUnevictable(page)) {
-			int lru = page_lru_base_type(page);
+			enum lru_list lru = page_lru_base_type(page);
 			list_move_tail(&page->lru, &zone->lru[lru].list);
+			mem_cgroup_rotate_reclaimable_page(page);
 			pgmoved++;
 		}
 	}
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-05 17:29 ` Minchan Kim
@ 2010-12-05 17:29   ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Wu Fengguang, Johannes Weiner, Nick Piggin

invalidate_mapping_pages is very big hint to reclaimer.
It means user doesn't want to use the page any more.
So in order to prevent working set page eviction, this patch
move the page into tail of inactive list by PG_reclaim.

Please, remember that pages in inactive list are working set
as well as active list. If we don't move pages into inactive list's
tail, pages near by tail of inactive list can be evicted although
we have a big clue about useless pages. It's totally bad.

Now PG_readahead/PG_reclaim is shared.
fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
preventing fast reclaiming readahead marker page.

In this series, PG_reclaim is used by invalidated page, too.
If VM find the page is invalidated and it's dirty, it sets PG_reclaim
to reclaim asap. Then, when the dirty page will be writeback,
clear_page_dirty_for_io will clear PG_reclaim unconditionally.
It disturbs this serie's goal.

I think it's okay to clear PG_readahead when the page is dirty, not
writeback time. So this patch moves ClearPageReadahead.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>

Changelog since v3:
 - move page which ends up writeback in pagevec on inactive's tail
	- suggested by Johannes

Changelog since v2:
 - put ClearPageReclaim in set_page_dirty - suggested by Wu.

Changelog since v1:
 - make the invalidated page reclaim asap - suggested by Andrew.
---
 mm/page-writeback.c |   12 +++++++++++-
 mm/swap.c           |   39 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fc93802..88587a5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 
+	/*
+	 * readahead/lru_deactivate_page could remain
+	 * PG_readahead/PG_reclaim due to race with end_page_writeback
+	 * About readahead, if the page is written, the flags would be
+	 * reset. So no problem.
+	 * About lru_deactivate_page, if the page is redirty, the flag
+	 * will be reset. So no problem. but if the page is used by readahead
+	 * it will confuse readahead and  make it restart the size rampup
+	 * process. But it's a trivial problem.
+	 */
+	ClearPageReclaim(page);
 	if (likely(mapping)) {
 		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
 #ifdef CONFIG_BLOCK
@@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
 
 	BUG_ON(!PageLocked(page));
 
-	ClearPageReclaim(page);
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		/*
 		 * Yes, Virginia, this is indeed insane.
diff --git a/mm/swap.c b/mm/swap.c
index 0fe98e7..0f23998 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -275,26 +275,59 @@ void add_page_to_unevictable_list(struct page *page)
  * head of the list, rather than the tail, to give the flusher
  * threads some time to write it out, as this is much more
  * effective than the single-page writeout from reclaim.
+ *
+ * If the page isn't page_mapped and dirty/writeback, the page
+ * could reclaim asap using PG_reclaim.
+ *
+ * 1. active, mapped page -> none
+ * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
+ * 3. inactive, mapped page -> none
+ * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
+ * 5. Others -> none
+ *
+ * In 4, why it moves inactive's head, the VM expects the page would
+ * be write it out by flusher threads as this is much more effective
+ * than the single-page writeout from reclaim.
  */
 static void lru_deactivate(struct page *page, struct zone *zone)
 {
 	int lru, file;
+	bool active;
 
-	if (!PageLRU(page) || !PageActive(page))
+	if (!PageLRU(page))
 		return;
 
 	/* Some processes are using the page */
 	if (page_mapped(page))
 		return;
 
+	active = PageActive(page);
+
 	file = page_is_file_cache(page);
 	lru = page_lru_base_type(page);
-	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
+	del_page_from_lru_list(zone, page, lru + active);
 	ClearPageActive(page);
 	ClearPageReferenced(page);
 	add_page_to_lru_list(zone, page, lru);
-	__count_vm_event(PGDEACTIVATE);
 
+	if (PageWriteback(page) || PageDirty(page)) {
+		/*
+		 * PG_reclaim could be raced with end_page_writeback
+		 * It can make readahead confusing.  But race window
+		 * is _really_ small and  it's non-critical problem.
+		 */
+		SetPageReclaim(page);
+	} else {
+		/*
+		 * The page's writeback ends up during pagevec
+		 * We moves tha page into tail of inactive.
+		 */
+		list_move_tail(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_rotate_reclaimable_page(page);
+	}
+
+	if (active)
+		__count_vm_event(PGDEACTIVATE);
 	update_page_reclaim_stat(zone, page, file, 0);
 }
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-05 17:29   ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Wu Fengguang, Johannes Weiner, Nick Piggin

invalidate_mapping_pages is very big hint to reclaimer.
It means user doesn't want to use the page any more.
So in order to prevent working set page eviction, this patch
move the page into tail of inactive list by PG_reclaim.

Please, remember that pages in inactive list are working set
as well as active list. If we don't move pages into inactive list's
tail, pages near by tail of inactive list can be evicted although
we have a big clue about useless pages. It's totally bad.

Now PG_readahead/PG_reclaim is shared.
fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
preventing fast reclaiming readahead marker page.

In this series, PG_reclaim is used by invalidated page, too.
If VM find the page is invalidated and it's dirty, it sets PG_reclaim
to reclaim asap. Then, when the dirty page will be writeback,
clear_page_dirty_for_io will clear PG_reclaim unconditionally.
It disturbs this serie's goal.

I think it's okay to clear PG_readahead when the page is dirty, not
writeback time. So this patch moves ClearPageReadahead.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>

Changelog since v3:
 - move page which ends up writeback in pagevec on inactive's tail
	- suggested by Johannes

Changelog since v2:
 - put ClearPageReclaim in set_page_dirty - suggested by Wu.

Changelog since v1:
 - make the invalidated page reclaim asap - suggested by Andrew.
---
 mm/page-writeback.c |   12 +++++++++++-
 mm/swap.c           |   39 ++++++++++++++++++++++++++++++++++++---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index fc93802..88587a5 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -1250,6 +1250,17 @@ int set_page_dirty(struct page *page)
 {
 	struct address_space *mapping = page_mapping(page);
 
+	/*
+	 * readahead/lru_deactivate_page could remain
+	 * PG_readahead/PG_reclaim due to race with end_page_writeback
+	 * About readahead, if the page is written, the flags would be
+	 * reset. So no problem.
+	 * About lru_deactivate_page, if the page is redirty, the flag
+	 * will be reset. So no problem. but if the page is used by readahead
+	 * it will confuse readahead and  make it restart the size rampup
+	 * process. But it's a trivial problem.
+	 */
+	ClearPageReclaim(page);
 	if (likely(mapping)) {
 		int (*spd)(struct page *) = mapping->a_ops->set_page_dirty;
 #ifdef CONFIG_BLOCK
@@ -1307,7 +1318,6 @@ int clear_page_dirty_for_io(struct page *page)
 
 	BUG_ON(!PageLocked(page));
 
-	ClearPageReclaim(page);
 	if (mapping && mapping_cap_account_dirty(mapping)) {
 		/*
 		 * Yes, Virginia, this is indeed insane.
diff --git a/mm/swap.c b/mm/swap.c
index 0fe98e7..0f23998 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -275,26 +275,59 @@ void add_page_to_unevictable_list(struct page *page)
  * head of the list, rather than the tail, to give the flusher
  * threads some time to write it out, as this is much more
  * effective than the single-page writeout from reclaim.
+ *
+ * If the page isn't page_mapped and dirty/writeback, the page
+ * could reclaim asap using PG_reclaim.
+ *
+ * 1. active, mapped page -> none
+ * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
+ * 3. inactive, mapped page -> none
+ * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
+ * 5. Others -> none
+ *
+ * In 4, why it moves inactive's head, the VM expects the page would
+ * be write it out by flusher threads as this is much more effective
+ * than the single-page writeout from reclaim.
  */
 static void lru_deactivate(struct page *page, struct zone *zone)
 {
 	int lru, file;
+	bool active;
 
-	if (!PageLRU(page) || !PageActive(page))
+	if (!PageLRU(page))
 		return;
 
 	/* Some processes are using the page */
 	if (page_mapped(page))
 		return;
 
+	active = PageActive(page);
+
 	file = page_is_file_cache(page);
 	lru = page_lru_base_type(page);
-	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
+	del_page_from_lru_list(zone, page, lru + active);
 	ClearPageActive(page);
 	ClearPageReferenced(page);
 	add_page_to_lru_list(zone, page, lru);
-	__count_vm_event(PGDEACTIVATE);
 
+	if (PageWriteback(page) || PageDirty(page)) {
+		/*
+		 * PG_reclaim could be raced with end_page_writeback
+		 * It can make readahead confusing.  But race window
+		 * is _really_ small and  it's non-critical problem.
+		 */
+		SetPageReclaim(page);
+	} else {
+		/*
+		 * The page's writeback ends up during pagevec
+		 * We moves tha page into tail of inactive.
+		 */
+		list_move_tail(&page->lru, &zone->lru[lru].list);
+		mem_cgroup_rotate_reclaimable_page(page);
+	}
+
+	if (active)
+		__count_vm_event(PGDEACTIVATE);
 	update_page_reclaim_stat(zone, page, file, 0);
 }
 
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 5/7] add profile information for invalidated page reclaim
  2010-12-05 17:29 ` Minchan Kim
@ 2010-12-05 17:29   ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Wu Fengguang, Johannes Weiner, Nick Piggin, Mel Gorman

This patch adds profile information about invalidated page reclaim.
It's just for profiling for test so it would be discard when the series
are merged.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/vmstat.h |    4 ++--
 mm/swap.c              |    3 +++
 mm/vmstat.c            |    3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 833e676..c38ad95 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -30,8 +30,8 @@
 
 enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
-		PGFREE, PGACTIVATE, PGDEACTIVATE,
-		PGFAULT, PGMAJFAULT,
+		PGFREE, PGACTIVATE, PGDEACTIVATE, PGINVALIDATE,
+		PGRECLAIM, PGFAULT, PGMAJFAULT,
 		FOR_ALL_ZONES(PGREFILL),
 		FOR_ALL_ZONES(PGSTEAL),
 		FOR_ALL_ZONES(PGSCAN_KSWAPD),
diff --git a/mm/swap.c b/mm/swap.c
index 0f23998..2f21e6e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -317,6 +317,7 @@ static void lru_deactivate(struct page *page, struct zone *zone)
 		 * is _really_ small and  it's non-critical problem.
 		 */
 		SetPageReclaim(page);
+		__count_vm_event(PGRECLAIM);
 	} else {
 		/*
 		 * The page's writeback ends up during pagevec
@@ -328,6 +329,8 @@ static void lru_deactivate(struct page *page, struct zone *zone)
 
 	if (active)
 		__count_vm_event(PGDEACTIVATE);
+
+	__count_vm_event(PGINVALIDATE);
 	update_page_reclaim_stat(zone, page, file, 0);
 }
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3555636..ef6102d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -818,6 +818,9 @@ static const char * const vmstat_text[] = {
 	"pgactivate",
 	"pgdeactivate",
 
+	"pginvalidate",
+	"pgreclaim",
+
 	"pgfault",
 	"pgmajfault",
 
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 5/7] add profile information for invalidated page reclaim
@ 2010-12-05 17:29   ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Wu Fengguang, Johannes Weiner, Nick Piggin, Mel Gorman

This patch adds profile information about invalidated page reclaim.
It's just for profiling for test so it would be discard when the series
are merged.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/vmstat.h |    4 ++--
 mm/swap.c              |    3 +++
 mm/vmstat.c            |    3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 833e676..c38ad95 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -30,8 +30,8 @@
 
 enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
-		PGFREE, PGACTIVATE, PGDEACTIVATE,
-		PGFAULT, PGMAJFAULT,
+		PGFREE, PGACTIVATE, PGDEACTIVATE, PGINVALIDATE,
+		PGRECLAIM, PGFAULT, PGMAJFAULT,
 		FOR_ALL_ZONES(PGREFILL),
 		FOR_ALL_ZONES(PGSTEAL),
 		FOR_ALL_ZONES(PGSCAN_KSWAPD),
diff --git a/mm/swap.c b/mm/swap.c
index 0f23998..2f21e6e 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -317,6 +317,7 @@ static void lru_deactivate(struct page *page, struct zone *zone)
 		 * is _really_ small and  it's non-critical problem.
 		 */
 		SetPageReclaim(page);
+		__count_vm_event(PGRECLAIM);
 	} else {
 		/*
 		 * The page's writeback ends up during pagevec
@@ -328,6 +329,8 @@ static void lru_deactivate(struct page *page, struct zone *zone)
 
 	if (active)
 		__count_vm_event(PGDEACTIVATE);
+
+	__count_vm_event(PGINVALIDATE);
 	update_page_reclaim_stat(zone, page, file, 0);
 }
 
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 3555636..ef6102d 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -818,6 +818,9 @@ static const char * const vmstat_text[] = {
 	"pgactivate",
 	"pgdeactivate",
 
+	"pginvalidate",
+	"pgreclaim",
+
 	"pgfault",
 	"pgmajfault",
 
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 6/7] Remove zap_details NULL dependency
  2010-12-05 17:29 ` Minchan Kim
@ 2010-12-05 17:29   ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang,
	Hugh Dickins

Some functions used zap_details depends on assumption that
zap_details parameter should be NULLed if some fields are 0.

This patch removes that dependency for next patch easy review/merge.
It should not chanage behavior.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hughd@google.com>
---
 include/linux/mm.h |    8 ++++++++
 mm/madvise.c       |   15 +++++++++------
 mm/memory.c        |   14 ++++++++------
 mm/mmap.c          |    6 ++++--
 4 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e097df6..6522ae4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -773,6 +773,14 @@ struct zap_details {
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
 };
 
+#define __ZAP_DETAILS_INITIALIZER(name) \
+		{ .nonlinear_vma = NULL \
+		, .check_mapping = NULL \
+		, .i_mmap_lock = NULL }
+
+#define DEFINE_ZAP_DETAILS(name)		\
+	struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
+
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 		pte_t pte);
 
diff --git a/mm/madvise.c b/mm/madvise.c
index 319528b..bfa17aa 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -162,18 +162,21 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 			     struct vm_area_struct ** prev,
 			     unsigned long start, unsigned long end)
 {
+	DEFINE_ZAP_DETAILS(details);
+
 	*prev = vma;
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
 		return -EINVAL;
 
 	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
-		struct zap_details details = {
-			.nonlinear_vma = vma,
-			.last_index = ULONG_MAX,
-		};
+		details.nonlinear_vma = vma;
+		details.last_index = ULONG_MAX;
+
 		zap_page_range(vma, start, end - start, &details);
-	} else
-		zap_page_range(vma, start, end - start, NULL);
+	} else {
+
+		zap_page_range(vma, start, end - start, &details);
+	}
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index ebfeedf..c0879bb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -900,6 +900,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 
 	init_rss_vec(rss);
 
+	if (!details->check_mapping && !details->nonlinear_vma)
+		details = NULL;
+
 	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
 	arch_enter_lazy_mmu_mode();
 	do {
@@ -1038,9 +1041,6 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
 	pgd_t *pgd;
 	unsigned long next;
 
-	if (details && !details->check_mapping && !details->nonlinear_vma)
-		details = NULL;
-
 	BUG_ON(addr >= end);
 	mem_cgroup_uncharge_start();
 	tlb_start_vma(tlb, vma);
@@ -1102,7 +1102,7 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
-	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
+	spinlock_t *i_mmap_lock = details->i_mmap_lock;
 	int fullmm = (*tlbp)->fullmm;
 	struct mm_struct *mm = vma->vm_mm;
 
@@ -1217,10 +1217,11 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size)
 {
+	DEFINE_ZAP_DETAILS(details);
 	if (address < vma->vm_start || address + size > vma->vm_end ||
 	    		!(vma->vm_flags & VM_PFNMAP))
 		return -1;
-	zap_page_range(vma, address, size, NULL);
+	zap_page_range(vma, address, size, &details);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(zap_vma_ptes);
@@ -2577,7 +2578,8 @@ restart:
 void unmap_mapping_range(struct address_space *mapping,
 		loff_t const holebegin, loff_t const holelen, int even_cows)
 {
-	struct zap_details details;
+	DEFINE_ZAP_DETAILS(details);
+
 	pgoff_t hba = holebegin >> PAGE_SHIFT;
 	pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
diff --git a/mm/mmap.c b/mm/mmap.c
index b179abb..31d2594 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1900,11 +1900,12 @@ static void unmap_region(struct mm_struct *mm,
 	struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
 	struct mmu_gather *tlb;
 	unsigned long nr_accounted = 0;
+	DEFINE_ZAP_DETAILS(details);
 
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
-	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
+	unmap_vmas(&tlb, vma, start, end, &nr_accounted, &details);
 	vm_unacct_memory(nr_accounted);
 	free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
@@ -2254,6 +2255,7 @@ void exit_mmap(struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	unsigned long nr_accounted = 0;
 	unsigned long end;
+	DEFINE_ZAP_DETAILS(details);
 
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
@@ -2278,7 +2280,7 @@ void exit_mmap(struct mm_struct *mm)
 	tlb = tlb_gather_mmu(mm, 1);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
-	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
+	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, &details);
 	vm_unacct_memory(nr_accounted);
 
 	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 6/7] Remove zap_details NULL dependency
@ 2010-12-05 17:29   ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang,
	Hugh Dickins

Some functions used zap_details depends on assumption that
zap_details parameter should be NULLed if some fields are 0.

This patch removes that dependency for next patch easy review/merge.
It should not chanage behavior.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hughd@google.com>
---
 include/linux/mm.h |    8 ++++++++
 mm/madvise.c       |   15 +++++++++------
 mm/memory.c        |   14 ++++++++------
 mm/mmap.c          |    6 ++++--
 4 files changed, 29 insertions(+), 14 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e097df6..6522ae4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -773,6 +773,14 @@ struct zap_details {
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
 };
 
+#define __ZAP_DETAILS_INITIALIZER(name) \
+		{ .nonlinear_vma = NULL \
+		, .check_mapping = NULL \
+		, .i_mmap_lock = NULL }
+
+#define DEFINE_ZAP_DETAILS(name)		\
+	struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
+
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
 		pte_t pte);
 
diff --git a/mm/madvise.c b/mm/madvise.c
index 319528b..bfa17aa 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -162,18 +162,21 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 			     struct vm_area_struct ** prev,
 			     unsigned long start, unsigned long end)
 {
+	DEFINE_ZAP_DETAILS(details);
+
 	*prev = vma;
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
 		return -EINVAL;
 
 	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
-		struct zap_details details = {
-			.nonlinear_vma = vma,
-			.last_index = ULONG_MAX,
-		};
+		details.nonlinear_vma = vma;
+		details.last_index = ULONG_MAX;
+
 		zap_page_range(vma, start, end - start, &details);
-	} else
-		zap_page_range(vma, start, end - start, NULL);
+	} else {
+
+		zap_page_range(vma, start, end - start, &details);
+	}
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index ebfeedf..c0879bb 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -900,6 +900,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 
 	init_rss_vec(rss);
 
+	if (!details->check_mapping && !details->nonlinear_vma)
+		details = NULL;
+
 	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
 	arch_enter_lazy_mmu_mode();
 	do {
@@ -1038,9 +1041,6 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
 	pgd_t *pgd;
 	unsigned long next;
 
-	if (details && !details->check_mapping && !details->nonlinear_vma)
-		details = NULL;
-
 	BUG_ON(addr >= end);
 	mem_cgroup_uncharge_start();
 	tlb_start_vma(tlb, vma);
@@ -1102,7 +1102,7 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
-	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
+	spinlock_t *i_mmap_lock = details->i_mmap_lock;
 	int fullmm = (*tlbp)->fullmm;
 	struct mm_struct *mm = vma->vm_mm;
 
@@ -1217,10 +1217,11 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
 int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size)
 {
+	DEFINE_ZAP_DETAILS(details);
 	if (address < vma->vm_start || address + size > vma->vm_end ||
 	    		!(vma->vm_flags & VM_PFNMAP))
 		return -1;
-	zap_page_range(vma, address, size, NULL);
+	zap_page_range(vma, address, size, &details);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(zap_vma_ptes);
@@ -2577,7 +2578,8 @@ restart:
 void unmap_mapping_range(struct address_space *mapping,
 		loff_t const holebegin, loff_t const holelen, int even_cows)
 {
-	struct zap_details details;
+	DEFINE_ZAP_DETAILS(details);
+
 	pgoff_t hba = holebegin >> PAGE_SHIFT;
 	pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
 
diff --git a/mm/mmap.c b/mm/mmap.c
index b179abb..31d2594 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1900,11 +1900,12 @@ static void unmap_region(struct mm_struct *mm,
 	struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
 	struct mmu_gather *tlb;
 	unsigned long nr_accounted = 0;
+	DEFINE_ZAP_DETAILS(details);
 
 	lru_add_drain();
 	tlb = tlb_gather_mmu(mm, 0);
 	update_hiwater_rss(mm);
-	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
+	unmap_vmas(&tlb, vma, start, end, &nr_accounted, &details);
 	vm_unacct_memory(nr_accounted);
 	free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
 				 next? next->vm_start: 0);
@@ -2254,6 +2255,7 @@ void exit_mmap(struct mm_struct *mm)
 	struct vm_area_struct *vma;
 	unsigned long nr_accounted = 0;
 	unsigned long end;
+	DEFINE_ZAP_DETAILS(details);
 
 	/* mm's last user has gone, and its about to be pulled down */
 	mmu_notifier_release(mm);
@@ -2278,7 +2280,7 @@ void exit_mmap(struct mm_struct *mm)
 	tlb = tlb_gather_mmu(mm, 1);
 	/* update_hiwater_rss(mm) here? but nobody should be looking */
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
-	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
+	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, &details);
 	vm_unacct_memory(nr_accounted);
 
 	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
  2010-12-05 17:29 ` Minchan Kim
@ 2010-12-05 17:29   ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang,
	Hugh Dickins

Now zap_pte_range alwayas activates pages which are pte_young &&
!VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
it's unnecessary since the page wouldn't use any more.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hughd@google.com>

Changelog since v3:
 - Change variable name - suggested by Johannes
 - Union ignore_references with zap_details - suggested by Hugh

Changelog since v2:
 - remove unnecessary description

Changelog since v1:
 - change word from promote to activate
 - add activate argument to zap_pte_range and family function
---
 include/linux/mm.h |    4 +++-
 mm/madvise.c       |    6 +++---
 mm/memory.c        |    5 ++++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6522ae4..e57190f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -771,12 +771,14 @@ struct zap_details {
 	pgoff_t last_index;			/* Highest page->index to unmap */
 	spinlock_t *i_mmap_lock;		/* For unmap_mapping_range: */
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
+	bool ignore_references;			/* For page activation */
 };
 
 #define __ZAP_DETAILS_INITIALIZER(name) \
                 { .nonlinear_vma = NULL \
 		, .check_mapping = NULL \
-		, .i_mmap_lock = NULL }
+		, .i_mmap_lock = NULL	\
+		, .ignore_references = false }
 
 #define DEFINE_ZAP_DETAILS(name)		\
 	struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
diff --git a/mm/madvise.c b/mm/madvise.c
index bfa17aa..8e7aba3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -163,6 +163,7 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 			     unsigned long start, unsigned long end)
 {
 	DEFINE_ZAP_DETAILS(details);
+	details.ignore_references = true;
 
 	*prev = vma;
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
@@ -173,10 +174,9 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 		details.last_index = ULONG_MAX;
 
 		zap_page_range(vma, start, end - start, &details);
-	} else {
-
+	} else
 		zap_page_range(vma, start, end - start, &details);
-	}
+
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index c0879bb..44d87e1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -897,6 +897,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 	pte_t *pte;
 	spinlock_t *ptl;
 	int rss[NR_MM_COUNTERS];
+	bool ignore_references = details->ignore_references;
 
 	init_rss_vec(rss);
 
@@ -952,7 +953,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
 				if (pte_young(ptent) &&
-				    likely(!VM_SequentialReadHint(vma)))
+				    likely(!VM_SequentialReadHint(vma)) &&
+					!ignore_references)
 					mark_page_accessed(page);
 				rss[MM_FILEPAGES]--;
 			}
@@ -1218,6 +1220,7 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size)
 {
 	DEFINE_ZAP_DETAILS(details);
+	details.ignore_references = true;
 	if (address < vma->vm_start || address + size > vma->vm_end ||
 	    		!(vma->vm_flags & VM_PFNMAP))
 		return -1;
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 109+ messages in thread

* [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
@ 2010-12-05 17:29   ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-05 17:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Rik van Riel, KOSAKI Motohiro, linux-mm, LKML, Minchan Kim,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang,
	Hugh Dickins

Now zap_pte_range alwayas activates pages which are pte_young &&
!VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
it's unnecessary since the page wouldn't use any more.

Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Hugh Dickins <hughd@google.com>

Changelog since v3:
 - Change variable name - suggested by Johannes
 - Union ignore_references with zap_details - suggested by Hugh

Changelog since v2:
 - remove unnecessary description

Changelog since v1:
 - change word from promote to activate
 - add activate argument to zap_pte_range and family function
---
 include/linux/mm.h |    4 +++-
 mm/madvise.c       |    6 +++---
 mm/memory.c        |    5 ++++-
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 6522ae4..e57190f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -771,12 +771,14 @@ struct zap_details {
 	pgoff_t last_index;			/* Highest page->index to unmap */
 	spinlock_t *i_mmap_lock;		/* For unmap_mapping_range: */
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
+	bool ignore_references;			/* For page activation */
 };
 
 #define __ZAP_DETAILS_INITIALIZER(name) \
                 { .nonlinear_vma = NULL \
 		, .check_mapping = NULL \
-		, .i_mmap_lock = NULL }
+		, .i_mmap_lock = NULL	\
+		, .ignore_references = false }
 
 #define DEFINE_ZAP_DETAILS(name)		\
 	struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
diff --git a/mm/madvise.c b/mm/madvise.c
index bfa17aa..8e7aba3 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -163,6 +163,7 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 			     unsigned long start, unsigned long end)
 {
 	DEFINE_ZAP_DETAILS(details);
+	details.ignore_references = true;
 
 	*prev = vma;
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
@@ -173,10 +174,9 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 		details.last_index = ULONG_MAX;
 
 		zap_page_range(vma, start, end - start, &details);
-	} else {
-
+	} else
 		zap_page_range(vma, start, end - start, &details);
-	}
+
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index c0879bb..44d87e1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -897,6 +897,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 	pte_t *pte;
 	spinlock_t *ptl;
 	int rss[NR_MM_COUNTERS];
+	bool ignore_references = details->ignore_references;
 
 	init_rss_vec(rss);
 
@@ -952,7 +953,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
 				if (pte_young(ptent) &&
-				    likely(!VM_SequentialReadHint(vma)))
+				    likely(!VM_SequentialReadHint(vma)) &&
+					!ignore_references)
 					mark_page_accessed(page);
 				rss[MM_FILEPAGES]--;
 			}
@@ -1218,6 +1220,7 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
 		unsigned long size)
 {
 	DEFINE_ZAP_DETAILS(details);
+	details.ignore_references = true;
 	if (address < vma->vm_start || address + size > vma->vm_end ||
 	    		!(vma->vm_flags & VM_PFNMAP))
 		return -1;
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-06  0:04     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 109+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-06  0:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Balbir Singh

On Mon,  6 Dec 2010 02:29:11 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Golbal page reclaim moves reclaimalbe pages into inactive list
> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.
> 
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
@ 2010-12-06  0:04     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 109+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-06  0:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Balbir Singh

On Mon,  6 Dec 2010 02:29:11 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Golbal page reclaim moves reclaimalbe pages into inactive list
> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.
> 
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] Fix checkpatch's report in swap.c
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-06  1:47     ` Rik van Riel
  -1 siblings, 0 replies; 109+ messages in thread
From: Rik van Riel @ 2010-12-06  1:47 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML

On 12/05/2010 12:29 PM, Minchan Kim wrote:
> checkpatch reports following problems.
> It's a very annoying. This patch fixes it.

> total: 1 errors, 8 warnings, 517 lines checked
>
> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] Fix checkpatch's report in swap.c
@ 2010-12-06  1:47     ` Rik van Riel
  0 siblings, 0 replies; 109+ messages in thread
From: Rik van Riel @ 2010-12-06  1:47 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML

On 12/05/2010 12:29 PM, Minchan Kim wrote:
> checkpatch reports following problems.
> It's a very annoying. This patch fixes it.

> total: 1 errors, 8 warnings, 517 lines checked
>
> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-06  3:04     ` Rik van Riel
  -1 siblings, 0 replies; 109+ messages in thread
From: Rik van Riel @ 2010-12-06  3:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML, Balbir Singh,
	KAMEZAWA Hiroyuki

On 12/05/2010 12:29 PM, Minchan Kim wrote:
> Golbal page reclaim moves reclaimalbe pages into inactive list
> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.

The patch is right, but the description is wrong.

The rotate_reclaimable_page function moves just written out
pages, which the VM wanted to reclaim, to the end of the
inactive list.  That way the VM will find those pages first
next time it needs to free memory.

> Cc: Balbir Singh<balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel<riel@redhat.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
@ 2010-12-06  3:04     ` Rik van Riel
  0 siblings, 0 replies; 109+ messages in thread
From: Rik van Riel @ 2010-12-06  3:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML, Balbir Singh,
	KAMEZAWA Hiroyuki

On 12/05/2010 12:29 PM, Minchan Kim wrote:
> Golbal page reclaim moves reclaimalbe pages into inactive list
> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.

The patch is right, but the description is wrong.

The rotate_reclaimable_page function moves just written out
pages, which the VM wanted to reclaim, to the end of the
inactive list.  That way the VM will find those pages first
next time it needs to free memory.

> Cc: Balbir Singh<balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel<riel@redhat.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>

Reviewed-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 5/7] add profile information for invalidated page reclaim
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-06  3:24     ` Rik van Riel
  -1 siblings, 0 replies; 109+ messages in thread
From: Rik van Riel @ 2010-12-06  3:24 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin, Mel Gorman

On 12/05/2010 12:29 PM, Minchan Kim wrote:
> This patch adds profile information about invalidated page reclaim.
> It's just for profiling for test so it would be discard when the series
> are merged.
>
> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>
> Cc: Rik van Riel<riel@redhat.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Wu Fengguang<fengguang.wu@intel.com>
> Cc: Johannes Weiner<hannes@cmpxchg.org>
> Cc: Nick Piggin<npiggin@kernel.dk>
> Cc: Mel Gorman<mel@csn.ul.ie>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 5/7] add profile information for invalidated page reclaim
@ 2010-12-06  3:24     ` Rik van Riel
  0 siblings, 0 replies; 109+ messages in thread
From: Rik van Riel @ 2010-12-06  3:24 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin, Mel Gorman

On 12/05/2010 12:29 PM, Minchan Kim wrote:
> This patch adds profile information about invalidated page reclaim.
> It's just for profiling for test so it would be discard when the series
> are merged.
>
> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>
> Cc: Rik van Riel<riel@redhat.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Wu Fengguang<fengguang.wu@intel.com>
> Cc: Johannes Weiner<hannes@cmpxchg.org>
> Cc: Nick Piggin<npiggin@kernel.dk>
> Cc: Mel Gorman<mel@csn.ul.ie>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 6/7] Remove zap_details NULL dependency
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-06  3:25     ` Rik van Riel
  -1 siblings, 0 replies; 109+ messages in thread
From: Rik van Riel @ 2010-12-06  3:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML, Johannes Weiner,
	Nick Piggin, Mel Gorman, Wu Fengguang, Hugh Dickins

On 12/05/2010 12:29 PM, Minchan Kim wrote:
> Some functions used zap_details depends on assumption that
> zap_details parameter should be NULLed if some fields are 0.
>
> This patch removes that dependency for next patch easy review/merge.
> It should not chanage behavior.
>
> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>
> Cc: Rik van Riel<riel@redhat.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner<hannes@cmpxchg.org>
> Cc: Nick Piggin<npiggin@kernel.dk>
> Cc: Mel Gorman<mel@csn.ul.ie>
> Cc: Wu Fengguang<fengguang.wu@intel.com>
> Cc: Hugh Dickins<hughd@google.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 6/7] Remove zap_details NULL dependency
@ 2010-12-06  3:25     ` Rik van Riel
  0 siblings, 0 replies; 109+ messages in thread
From: Rik van Riel @ 2010-12-06  3:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML, Johannes Weiner,
	Nick Piggin, Mel Gorman, Wu Fengguang, Hugh Dickins

On 12/05/2010 12:29 PM, Minchan Kim wrote:
> Some functions used zap_details depends on assumption that
> zap_details parameter should be NULLed if some fields are 0.
>
> This patch removes that dependency for next patch easy review/merge.
> It should not chanage behavior.
>
> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>
> Cc: Rik van Riel<riel@redhat.com>
> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner<hannes@cmpxchg.org>
> Cc: Nick Piggin<npiggin@kernel.dk>
> Cc: Mel Gorman<mel@csn.ul.ie>
> Cc: Wu Fengguang<fengguang.wu@intel.com>
> Cc: Hugh Dickins<hughd@google.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-06  3:34     ` Balbir Singh
  -1 siblings, 0 replies; 109+ messages in thread
From: Balbir Singh @ 2010-12-06  3:34 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	KAMEZAWA Hiroyuki

* MinChan Kim <minchan.kim@gmail.com> [2010-12-06 02:29:11]:

> Golbal page reclaim moves reclaimalbe pages into inactive list

Some typos here and Rik already pointed out some other changes.

> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.
> 
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> ---
>  include/linux/memcontrol.h |    6 ++++++
>  mm/memcontrol.c            |   27 +++++++++++++++++++++++++++
>  mm/swap.c                  |    3 ++-
>  3 files changed, 35 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 067115c..8317f5c 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -62,6 +62,7 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>  					gfp_t gfp_mask);
>  extern void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru);
>  extern void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru);
> +extern void mem_cgroup_rotate_reclaimable_page(struct page *page);
>  extern void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru);
>  extern void mem_cgroup_del_lru(struct page *page);
>  extern void mem_cgroup_move_lists(struct page *page,
> @@ -207,6 +208,11 @@ static inline void mem_cgroup_del_lru_list(struct page *page, int lru)
>  	return ;
>  }
> 
> +static inline inline void mem_cgroup_rotate_reclaimable_page(struct page *page)
> +{
> +	return ;
> +}
> +
>  static inline void mem_cgroup_rotate_lru_list(struct page *page, int lru)
>  {
>  	return ;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 729beb7..f9435be 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -829,6 +829,33 @@ void mem_cgroup_del_lru(struct page *page)
>  	mem_cgroup_del_lru_list(page, page_lru(page));
>  }
> 
> +/*
> + * Writeback is about to end against a page which has been marked for immediate
> + * reclaim.  If it still appears to be reclaimable, move it to the tail of the
> + * inactive list.
> + */
> +void mem_cgroup_rotate_reclaimable_page(struct page *page)
> +{
> +	struct mem_cgroup_per_zone *mz;
> +	struct page_cgroup *pc;
> +	enum lru_list lru = page_lru_base_type(page);
> +
> +	if (mem_cgroup_disabled())
> +		return;
> +
> +	pc = lookup_page_cgroup(page);
> +	/*
> +	 * Used bit is set without atomic ops but after smp_wmb().
> +	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> +	 */
> +	smp_rmb();
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
> +		return;
> +	mz = page_cgroup_zoneinfo(pc);
> +	list_move_tail(&pc->lru, &mz->lists[lru]);
> +}
> +
>  void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  {
>  	struct mem_cgroup_per_zone *mz;
> diff --git a/mm/swap.c b/mm/swap.c
> index 1f36f6f..0fe98e7 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -122,8 +122,9 @@ static void pagevec_move_tail(struct pagevec *pvec)
>  		}
>  		if (PageLRU(page) && !PageActive(page) &&
>  					!PageUnevictable(page)) {
> -			int lru = page_lru_base_type(page);
> +			enum lru_list lru = page_lru_base_type(page);
>  			list_move_tail(&page->lru, &zone->lru[lru].list);
> +			mem_cgroup_rotate_reclaimable_page(page);
>  			pgmoved++;
>  		}
>  	}

Looks good, do you have any numbers, workloads that benefit? I agree
that keeping both global and memcg reclaim in sync is a good idea.

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
 

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
@ 2010-12-06  3:34     ` Balbir Singh
  0 siblings, 0 replies; 109+ messages in thread
From: Balbir Singh @ 2010-12-06  3:34 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	KAMEZAWA Hiroyuki

* MinChan Kim <minchan.kim@gmail.com> [2010-12-06 02:29:11]:

> Golbal page reclaim moves reclaimalbe pages into inactive list

Some typos here and Rik already pointed out some other changes.

> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.
> 
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> ---
>  include/linux/memcontrol.h |    6 ++++++
>  mm/memcontrol.c            |   27 +++++++++++++++++++++++++++
>  mm/swap.c                  |    3 ++-
>  3 files changed, 35 insertions(+), 1 deletions(-)
> 
> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index 067115c..8317f5c 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -62,6 +62,7 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>  					gfp_t gfp_mask);
>  extern void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru);
>  extern void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru);
> +extern void mem_cgroup_rotate_reclaimable_page(struct page *page);
>  extern void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru);
>  extern void mem_cgroup_del_lru(struct page *page);
>  extern void mem_cgroup_move_lists(struct page *page,
> @@ -207,6 +208,11 @@ static inline void mem_cgroup_del_lru_list(struct page *page, int lru)
>  	return ;
>  }
> 
> +static inline inline void mem_cgroup_rotate_reclaimable_page(struct page *page)
> +{
> +	return ;
> +}
> +
>  static inline void mem_cgroup_rotate_lru_list(struct page *page, int lru)
>  {
>  	return ;
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 729beb7..f9435be 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -829,6 +829,33 @@ void mem_cgroup_del_lru(struct page *page)
>  	mem_cgroup_del_lru_list(page, page_lru(page));
>  }
> 
> +/*
> + * Writeback is about to end against a page which has been marked for immediate
> + * reclaim.  If it still appears to be reclaimable, move it to the tail of the
> + * inactive list.
> + */
> +void mem_cgroup_rotate_reclaimable_page(struct page *page)
> +{
> +	struct mem_cgroup_per_zone *mz;
> +	struct page_cgroup *pc;
> +	enum lru_list lru = page_lru_base_type(page);
> +
> +	if (mem_cgroup_disabled())
> +		return;
> +
> +	pc = lookup_page_cgroup(page);
> +	/*
> +	 * Used bit is set without atomic ops but after smp_wmb().
> +	 * For making pc->mem_cgroup visible, insert smp_rmb() here.
> +	 */
> +	smp_rmb();
> +	/* unused or root page is not rotated. */
> +	if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
> +		return;
> +	mz = page_cgroup_zoneinfo(pc);
> +	list_move_tail(&pc->lru, &mz->lists[lru]);
> +}
> +
>  void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>  {
>  	struct mem_cgroup_per_zone *mz;
> diff --git a/mm/swap.c b/mm/swap.c
> index 1f36f6f..0fe98e7 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -122,8 +122,9 @@ static void pagevec_move_tail(struct pagevec *pvec)
>  		}
>  		if (PageLRU(page) && !PageActive(page) &&
>  					!PageUnevictable(page)) {
> -			int lru = page_lru_base_type(page);
> +			enum lru_list lru = page_lru_base_type(page);
>  			list_move_tail(&page->lru, &zone->lru[lru].list);
> +			mem_cgroup_rotate_reclaimable_page(page);
>  			pgmoved++;
>  		}
>  	}

Looks good, do you have any numbers, workloads that benefit? I agree
that keeping both global and memcg reclaim in sync is a good idea.

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-06 14:53     ` Mel Gorman
  -1 siblings, 0 replies; 109+ messages in thread
From: Mel Gorman @ 2010-12-06 14:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Johannes Weiner, Nick Piggin

On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
> 
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
> 
> By other approach, app developers use POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
> 
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
> 
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. It can prevent writeout of pageout which is less
> effective than flusher's writeout.
> 
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
> 
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> 
> Andrew. Before applying this series, please drop below two patches.
>  mm-deactivate-invalidated-pages.patch
>  mm-deactivate-invalidated-pages-fix.patch
> 
> Changelog since v3:
>  - Change function comments - suggested by Johannes
>  - Change function name - suggested by Johannes
>  - add only dirty/writeback pages to deactive pagevec
> 
> Changelog since v2:
>  - mapped page leaves alone - suggested by Mel
>  - pass part related PG_reclaim in next patch.
> 
> Changelog since v1:
>  - modify description
>  - correct typo
>  - add some comment
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> ---
>  include/linux/swap.h |    1 +
>  mm/swap.c            |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/truncate.c        |   17 ++++++++--
>  3 files changed, 92 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..605ab62 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -213,6 +213,7 @@ extern void mark_page_accessed(struct page *);
>  extern void lru_add_drain(void);
>  extern int lru_add_drain_all(void);
>  extern void rotate_reclaimable_page(struct page *page);
> +extern void deactivate_page(struct page *page);
>  extern void swap_setup(void);
>  
>  extern void add_page_to_unevictable_list(struct page *page);
> diff --git a/mm/swap.c b/mm/swap.c
> index d5822b0..1f36f6f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -39,6 +39,8 @@ int page_cluster;
>  
>  static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
>  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
> +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
> +
>  
>  /*
>   * This path almost never happens for VM activity - pages are normally

Unnecessary whitespace change there but otherwise;

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-06 14:53     ` Mel Gorman
  0 siblings, 0 replies; 109+ messages in thread
From: Mel Gorman @ 2010-12-06 14:53 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Johannes Weiner, Nick Piggin

On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> Recently, there are reported problem about thrashing.
> (http://marc.info/?l=rsync&m=128885034930933&w=2)
> It happens by backup workloads(ex, nightly rsync).
> That's because the workload makes just use-once pages
> and touches pages twice. It promotes the page into
> active list so that it results in working set page eviction.
> 
> Some app developer want to support POSIX_FADV_NOREUSE.
> But other OSes don't support it, either.
> (http://marc.info/?l=linux-mm&m=128928979512086&w=2)
> 
> By other approach, app developers use POSIX_FADV_DONTNEED.
> But it has a problem. If kernel meets page is writing
> during invalidate_mapping_pages, it can't work.
> It is very hard for application programmer to use it.
> Because they always have to sync data before calling
> fadivse(..POSIX_FADV_DONTNEED) to make sure the pages could
> be discardable. At last, they can't use deferred write of kernel
> so that they could see performance loss.
> (http://insights.oetiker.ch/linux/fadvise.html)
> 
> In fact, invalidation is very big hint to reclaimer.
> It means we don't use the page any more. So let's move
> the writing page into inactive list's head.
> 
> Why I need the page to head, Dirty/Writeback page would be flushed
> sooner or later. It can prevent writeout of pageout which is less
> effective than flusher's writeout.
> 
> Originally, I reused lru_demote of Peter with some change so added
> his Signed-off-by.
> 
> Reported-by: Ben Gamari <bgamari.foss@gmail.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Signed-off-by: Peter Zijlstra <peterz@infradead.org>
> Acked-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> 
> Andrew. Before applying this series, please drop below two patches.
>  mm-deactivate-invalidated-pages.patch
>  mm-deactivate-invalidated-pages-fix.patch
> 
> Changelog since v3:
>  - Change function comments - suggested by Johannes
>  - Change function name - suggested by Johannes
>  - add only dirty/writeback pages to deactive pagevec
> 
> Changelog since v2:
>  - mapped page leaves alone - suggested by Mel
>  - pass part related PG_reclaim in next patch.
> 
> Changelog since v1:
>  - modify description
>  - correct typo
>  - add some comment
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> ---
>  include/linux/swap.h |    1 +
>  mm/swap.c            |   78 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/truncate.c        |   17 ++++++++--
>  3 files changed, 92 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index eba53e7..605ab62 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -213,6 +213,7 @@ extern void mark_page_accessed(struct page *);
>  extern void lru_add_drain(void);
>  extern int lru_add_drain_all(void);
>  extern void rotate_reclaimable_page(struct page *page);
> +extern void deactivate_page(struct page *page);
>  extern void swap_setup(void);
>  
>  extern void add_page_to_unevictable_list(struct page *page);
> diff --git a/mm/swap.c b/mm/swap.c
> index d5822b0..1f36f6f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -39,6 +39,8 @@ int page_cluster;
>  
>  static DEFINE_PER_CPU(struct pagevec[NR_LRU_LISTS], lru_add_pvecs);
>  static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
> +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
> +
>  
>  /*
>   * This path almost never happens for VM activity - pages are normally

Unnecessary whitespace change there but otherwise;

Acked-by: Mel Gorman <mel@csn.ul.ie>

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
  2010-12-06  3:04     ` Rik van Riel
@ 2010-12-07  0:17       ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07  0:17 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML, Balbir Singh,
	KAMEZAWA Hiroyuki

On Mon, Dec 6, 2010 at 12:04 PM, Rik van Riel <riel@redhat.com> wrote:
> On 12/05/2010 12:29 PM, Minchan Kim wrote:
>>
>> Golbal page reclaim moves reclaimalbe pages into inactive list
>> to reclaim asap. This patch apply the rule in memcg.
>> It can help to prevent unnecessary working page eviction of memcg.
>
> The patch is right, but the description is wrong.
>
> The rotate_reclaimable_page function moves just written out
> pages, which the VM wanted to reclaim, to the end of the
> inactive list.  That way the VM will find those pages first
> next time it needs to free memory.

Will fix.

>
>> Cc: Balbir Singh<balbir@linux.vnet.ibm.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Rik van Riel<riel@redhat.com>
>> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
>> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>
>
> Reviewed-by: Rik van Riel <riel@redhat.com>

Thanks, Rik.

>
> --
> All rights reversed
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
@ 2010-12-07  0:17       ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07  0:17 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, KOSAKI Motohiro, linux-mm, LKML, Balbir Singh,
	KAMEZAWA Hiroyuki

On Mon, Dec 6, 2010 at 12:04 PM, Rik van Riel <riel@redhat.com> wrote:
> On 12/05/2010 12:29 PM, Minchan Kim wrote:
>>
>> Golbal page reclaim moves reclaimalbe pages into inactive list
>> to reclaim asap. This patch apply the rule in memcg.
>> It can help to prevent unnecessary working page eviction of memcg.
>
> The patch is right, but the description is wrong.
>
> The rotate_reclaimable_page function moves just written out
> pages, which the VM wanted to reclaim, to the end of the
> inactive list.  That way the VM will find those pages first
> next time it needs to free memory.

Will fix.

>
>> Cc: Balbir Singh<balbir@linux.vnet.ibm.com>
>> Cc: KAMEZAWA Hiroyuki<kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Rik van Riel<riel@redhat.com>
>> Cc: KOSAKI Motohiro<kosaki.motohiro@jp.fujitsu.com>
>> Signed-off-by: Minchan Kim<minchan.kim@gmail.com>
>
> Reviewed-by: Rik van Riel <riel@redhat.com>

Thanks, Rik.

>
> --
> All rights reversed
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
  2010-12-06  3:34     ` Balbir Singh
@ 2010-12-07  0:20       ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07  0:20 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	KAMEZAWA Hiroyuki

Hi Balbir,

On Mon, Dec 6, 2010 at 12:34 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * MinChan Kim <minchan.kim@gmail.com> [2010-12-06 02:29:11]:
>
>> Golbal page reclaim moves reclaimalbe pages into inactive list
>
> Some typos here and Rik already pointed out some other changes.
>
>> to reclaim asap. This patch apply the rule in memcg.
>> It can help to prevent unnecessary working page eviction of memcg.
>>
>> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> ---
>>  include/linux/memcontrol.h |    6 ++++++
>>  mm/memcontrol.c            |   27 +++++++++++++++++++++++++++
>>  mm/swap.c                  |    3 ++-
>>  3 files changed, 35 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 067115c..8317f5c 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -62,6 +62,7 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>>                                       gfp_t gfp_mask);
>>  extern void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru);
>>  extern void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru);
>> +extern void mem_cgroup_rotate_reclaimable_page(struct page *page);
>>  extern void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru);
>>  extern void mem_cgroup_del_lru(struct page *page);
>>  extern void mem_cgroup_move_lists(struct page *page,
>> @@ -207,6 +208,11 @@ static inline void mem_cgroup_del_lru_list(struct page *page, int lru)
>>       return ;
>>  }
>>
>> +static inline inline void mem_cgroup_rotate_reclaimable_page(struct page *page)
>> +{
>> +     return ;
>> +}
>> +
>>  static inline void mem_cgroup_rotate_lru_list(struct page *page, int lru)
>>  {
>>       return ;
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 729beb7..f9435be 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -829,6 +829,33 @@ void mem_cgroup_del_lru(struct page *page)
>>       mem_cgroup_del_lru_list(page, page_lru(page));
>>  }
>>
>> +/*
>> + * Writeback is about to end against a page which has been marked for immediate
>> + * reclaim.  If it still appears to be reclaimable, move it to the tail of the
>> + * inactive list.
>> + */
>> +void mem_cgroup_rotate_reclaimable_page(struct page *page)
>> +{
>> +     struct mem_cgroup_per_zone *mz;
>> +     struct page_cgroup *pc;
>> +     enum lru_list lru = page_lru_base_type(page);
>> +
>> +     if (mem_cgroup_disabled())
>> +             return;
>> +
>> +     pc = lookup_page_cgroup(page);
>> +     /*
>> +      * Used bit is set without atomic ops but after smp_wmb().
>> +      * For making pc->mem_cgroup visible, insert smp_rmb() here.
>> +      */
>> +     smp_rmb();
>> +     /* unused or root page is not rotated. */
>> +     if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
>> +             return;
>> +     mz = page_cgroup_zoneinfo(pc);
>> +     list_move_tail(&pc->lru, &mz->lists[lru]);
>> +}
>> +
>>  void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>>  {
>>       struct mem_cgroup_per_zone *mz;
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 1f36f6f..0fe98e7 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -122,8 +122,9 @@ static void pagevec_move_tail(struct pagevec *pvec)
>>               }
>>               if (PageLRU(page) && !PageActive(page) &&
>>                                       !PageUnevictable(page)) {
>> -                     int lru = page_lru_base_type(page);
>> +                     enum lru_list lru = page_lru_base_type(page);
>>                       list_move_tail(&page->lru, &zone->lru[lru].list);
>> +                     mem_cgroup_rotate_reclaimable_page(page);
>>                       pgmoved++;
>>               }
>>       }
>
> Looks good, do you have any numbers, workloads that benefit? I agree
> that keeping both global and memcg reclaim in sync is a good idea.

This patch series for Ben's fadvise problem in rsync.
As I fix the global reclaim, I found this patch could help memcg, too.
If Ben is busy, I will measure the benefit.

>
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

Thanks, Balbir.

>
>
> --
>        Three Cheers,
>        Balbir
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
@ 2010-12-07  0:20       ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07  0:20 UTC (permalink / raw)
  To: balbir
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	KAMEZAWA Hiroyuki

Hi Balbir,

On Mon, Dec 6, 2010 at 12:34 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * MinChan Kim <minchan.kim@gmail.com> [2010-12-06 02:29:11]:
>
>> Golbal page reclaim moves reclaimalbe pages into inactive list
>
> Some typos here and Rik already pointed out some other changes.
>
>> to reclaim asap. This patch apply the rule in memcg.
>> It can help to prevent unnecessary working page eviction of memcg.
>>
>> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
>> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> ---
>>  include/linux/memcontrol.h |    6 ++++++
>>  mm/memcontrol.c            |   27 +++++++++++++++++++++++++++
>>  mm/swap.c                  |    3 ++-
>>  3 files changed, 35 insertions(+), 1 deletions(-)
>>
>> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
>> index 067115c..8317f5c 100644
>> --- a/include/linux/memcontrol.h
>> +++ b/include/linux/memcontrol.h
>> @@ -62,6 +62,7 @@ extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
>>                                       gfp_t gfp_mask);
>>  extern void mem_cgroup_add_lru_list(struct page *page, enum lru_list lru);
>>  extern void mem_cgroup_del_lru_list(struct page *page, enum lru_list lru);
>> +extern void mem_cgroup_rotate_reclaimable_page(struct page *page);
>>  extern void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru);
>>  extern void mem_cgroup_del_lru(struct page *page);
>>  extern void mem_cgroup_move_lists(struct page *page,
>> @@ -207,6 +208,11 @@ static inline void mem_cgroup_del_lru_list(struct page *page, int lru)
>>       return ;
>>  }
>>
>> +static inline inline void mem_cgroup_rotate_reclaimable_page(struct page *page)
>> +{
>> +     return ;
>> +}
>> +
>>  static inline void mem_cgroup_rotate_lru_list(struct page *page, int lru)
>>  {
>>       return ;
>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>> index 729beb7..f9435be 100644
>> --- a/mm/memcontrol.c
>> +++ b/mm/memcontrol.c
>> @@ -829,6 +829,33 @@ void mem_cgroup_del_lru(struct page *page)
>>       mem_cgroup_del_lru_list(page, page_lru(page));
>>  }
>>
>> +/*
>> + * Writeback is about to end against a page which has been marked for immediate
>> + * reclaim.  If it still appears to be reclaimable, move it to the tail of the
>> + * inactive list.
>> + */
>> +void mem_cgroup_rotate_reclaimable_page(struct page *page)
>> +{
>> +     struct mem_cgroup_per_zone *mz;
>> +     struct page_cgroup *pc;
>> +     enum lru_list lru = page_lru_base_type(page);
>> +
>> +     if (mem_cgroup_disabled())
>> +             return;
>> +
>> +     pc = lookup_page_cgroup(page);
>> +     /*
>> +      * Used bit is set without atomic ops but after smp_wmb().
>> +      * For making pc->mem_cgroup visible, insert smp_rmb() here.
>> +      */
>> +     smp_rmb();
>> +     /* unused or root page is not rotated. */
>> +     if (!PageCgroupUsed(pc) || mem_cgroup_is_root(pc->mem_cgroup))
>> +             return;
>> +     mz = page_cgroup_zoneinfo(pc);
>> +     list_move_tail(&pc->lru, &mz->lists[lru]);
>> +}
>> +
>>  void mem_cgroup_rotate_lru_list(struct page *page, enum lru_list lru)
>>  {
>>       struct mem_cgroup_per_zone *mz;
>> diff --git a/mm/swap.c b/mm/swap.c
>> index 1f36f6f..0fe98e7 100644
>> --- a/mm/swap.c
>> +++ b/mm/swap.c
>> @@ -122,8 +122,9 @@ static void pagevec_move_tail(struct pagevec *pvec)
>>               }
>>               if (PageLRU(page) && !PageActive(page) &&
>>                                       !PageUnevictable(page)) {
>> -                     int lru = page_lru_base_type(page);
>> +                     enum lru_list lru = page_lru_base_type(page);
>>                       list_move_tail(&page->lru, &zone->lru[lru].list);
>> +                     mem_cgroup_rotate_reclaimable_page(page);
>>                       pgmoved++;
>>               }
>>       }
>
> Looks good, do you have any numbers, workloads that benefit? I agree
> that keeping both global and memcg reclaim in sync is a good idea.

This patch series for Ben's fadvise problem in rsync.
As I fix the global reclaim, I found this patch could help memcg, too.
If Ben is busy, I will measure the benefit.

>
> Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>

Thanks, Balbir.

>
>
> --
>        Three Cheers,
>        Balbir
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 0/7] f/madivse(DONTNEED) support
       [not found] ` <AANLkTim71krrCcmhTTCZTzxeUDkvOdBTOkeYQu6EXt32@mail.gmail.com>
@ 2010-12-07  1:52   ` Ben Gamari
  2010-12-07  2:16     ` Minchan Kim
  0 siblings, 1 reply; 109+ messages in thread
From: Ben Gamari @ 2010-12-07  1:52 UTC (permalink / raw)
  To: Minchan Kim; +Cc: linux-mm

[-- Attachment #1: Type: text/plain, Size: 1584 bytes --]

On Tue, 7 Dec 2010 09:24:54 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
> Sorry missing you in Cc.
> 
> Ben. Could you test this series?
> You don't need to apply patch all so just apply 2,3,4,5.
> This patches are based on mmotm-12-02 so if you suffered this patches
> from applying your kernel, I can help you. :)
> 
I am very sorry for my lack of responsiveness. It's the last week of the
semester so things have been a bit busy. Nevertheless, I did do some
testing on v3 of the patch last week. Unfortunately, the results weren't
so promising, although this very well could be due to problems with my
test. For both patched and unpatched rsyncs and kernels I did roughly
the following,

  $ rm -Rf $DEST
  $ cat /proc/vmstat > vmstat-pre
  $ time rsync -a $SRC $DEST
  $ cat /proc/vmstat > vmstat-post
  $ time rsync -a $SRC $DEST
  $ cat /proc/vmstat > vmstat-post-warm

Where $DEST and $SRC both reside on local a SATA drive (hdparm reports
read speeds of about 100MByte/sec). I ran this (test.sh) three times on
both a patched kernel (2.6.37-rc3) and an unpatched kernel
(2.6.37-rc3-mm1). The results can be found in the attached tarball.

Judging by the results, something is horribly wrong. The "drop" (patched
rsync) runtimes are generally 3x longer than the "nodrop" (unpatched
rsync) runtimes in the case of the patched kernel. This suggests that
rsync is doing something I did not anticipate.

I'll redo the test tonight with v4 of the patch and will
investigate the source of the performance drop as soon as the
school-related workload subsides.

Cheers,

- Ben



[-- Attachment #2: patchv3-benchmark.tar.gz --]
[-- Type: application/octet-stream, Size: 14605 bytes --]

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 0/7] f/madivse(DONTNEED) support
  2010-12-07  1:52   ` [PATCH v4 0/7] f/madivse(DONTNEED) support Ben Gamari
@ 2010-12-07  2:16     ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07  2:16 UTC (permalink / raw)
  To: Ben Gamari; +Cc: linux-mm

Hi Ben,
Thanks for the testing.

On Tue, Dec 7, 2010 at 10:52 AM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> On Tue, 7 Dec 2010 09:24:54 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
>> Sorry missing you in Cc.
>>
>> Ben. Could you test this series?
>> You don't need to apply patch all so just apply 2,3,4,5.
>> This patches are based on mmotm-12-02 so if you suffered this patches
>> from applying your kernel, I can help you. :)
>>
> I am very sorry for my lack of responsiveness. It's the last week of the
> semester so things have been a bit busy. Nevertheless, I did do some
> testing on v3 of the patch last week. Unfortunately, the results weren't
> so promising, although this very well could be due to problems with my
> test. For both patched and unpatched rsyncs and kernels I did roughly
> the following,
>
>  $ rm -Rf $DEST
>  $ cat /proc/vmstat > vmstat-pre
>  $ time rsync -a $SRC $DEST
>  $ cat /proc/vmstat > vmstat-post
>  $ time rsync -a $SRC $DEST
>  $ cat /proc/vmstat > vmstat-post-warm
>
> Where $DEST and $SRC both reside on local a SATA drive (hdparm reports
> read speeds of about 100MByte/sec). I ran this (test.sh) three times on
> both a patched kernel (2.6.37-rc3) and an unpatched kernel
> (2.6.37-rc3-mm1). The results can be found in the attached tarball.

1. 2.6.37-rc3?
2. 2.6.37-rc3-mm1?
3. 2.6.37-rc3-mm1-fadvise patch?

Maybe you test it on 2, 3.
To be clear, what is kernels you used?

>
> Judging by the results, something is horribly wrong. The "drop" (patched
> rsync) runtimes are generally 3x longer than the "nodrop" (unpatched
> rsync) runtimes in the case of the patched kernel. This suggests that
> rsync is doing something I did not anticipate.

Does it take 3x longer in non-patched kernel?
If your workload doesn't use huge memory, fadvise(dontneed) may have
longer time because it have to invalidate the page.
In that case, the purpose isn't the performance but prevent eviction
working set page of other processes.
If your workload uses huge memory, it can help performance, too
because it help that reclaimer can reclaim unnecessary pages and keep
the working set.

Note : in my v3 version, I didn't rotate the page already done
writeback in tail of inactive. But it is very likely to happen because
the pages are pagevec.

So I think v4 can help you more.
And please, attach cat /proc/vmstat with time result.

Again, Thanks for the tesing, Ben.


>
> I'll redo the test tonight with v4 of the patch and will
> investigate the source of the performance drop as soon as the
> school-related workload subsides.
>
> Cheers,
>
> - Ben
>
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 6/7] Remove zap_details NULL dependency
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-07  4:26     ` Hugh Dickins
  -1 siblings, 0 replies; 109+ messages in thread
From: Hugh Dickins @ 2010-12-07  4:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Mon, 6 Dec 2010, Minchan Kim wrote:

> Some functions used zap_details depends on assumption that
> zap_details parameter should be NULLed if some fields are 0.
> 
> This patch removes that dependency for next patch easy review/merge.
> It should not chanage behavior.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Hugh Dickins <hughd@google.com>

Sorry, while I do like that you're now using the details block,
you seem to be adding overhead in various places without actually
simplifying anything - you insist that everything passes down an
initialized details block, and then in the end force the pointer
to NULL again in all the common cases.

Which seems odd.  I could understand if you were going to scrap
the NULL details optimization altogether; but I think that (for
the original optimization reasons) you're right to force it to NULL
in the end, so then why initialize the block at all those call sites?

> ---
>  include/linux/mm.h |    8 ++++++++
>  mm/madvise.c       |   15 +++++++++------
>  mm/memory.c        |   14 ++++++++------
>  mm/mmap.c          |    6 ++++--
>  4 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e097df6..6522ae4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -773,6 +773,14 @@ struct zap_details {
>  	unsigned long truncate_count;		/* Compare vm_truncate_count */
>  };
>  
> +#define __ZAP_DETAILS_INITIALIZER(name) \
> +		{ .nonlinear_vma = NULL \
> +		, .check_mapping = NULL \
> +		, .i_mmap_lock = NULL }
> +
> +#define DEFINE_ZAP_DETAILS(name)		\
> +	struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)

Okay.

> +
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  		pte_t pte);
>  
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 319528b..bfa17aa 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -162,18 +162,21 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>  			     struct vm_area_struct ** prev,
>  			     unsigned long start, unsigned long end)
>  {
> +	DEFINE_ZAP_DETAILS(details);
> +
>  	*prev = vma;
>  	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>  		return -EINVAL;
>  
>  	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
> -		struct zap_details details = {
> -			.nonlinear_vma = vma,
> -			.last_index = ULONG_MAX,
> -		};
> +		details.nonlinear_vma = vma;
> +		details.last_index = ULONG_MAX;
> +
>  		zap_page_range(vma, start, end - start, &details);
> -	} else
> -		zap_page_range(vma, start, end - start, NULL);
> +	} else {
> +
> +		zap_page_range(vma, start, end - start, &details);
> +	}

You end up with two identical zap_page_range() lines:
better have one after the if {} without an else.

>  	return 0;
>  }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index ebfeedf..c0879bb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -900,6 +900,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  
>  	init_rss_vec(rss);
>  
> +	if (!details->check_mapping && !details->nonlinear_vma)
> +		details = NULL;
> +

Aside from its necessity in the next patch, I thoroughly approve of
your moving this optimization here: it is confusing, and better that
it be done near where the fields are used, than off at the higher level.

>  	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
>  	arch_enter_lazy_mmu_mode();
>  	do {
> @@ -1038,9 +1041,6 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
>  	pgd_t *pgd;
>  	unsigned long next;
>  
> -	if (details && !details->check_mapping && !details->nonlinear_vma)
> -		details = NULL;
> -

Yes, I put it there because that was the highest point at which
it could then be done, so it was optimal from a do-it-fewest-times
point of view; but not at all helpful in understanding what's going
on, much better as you have it.

>  	BUG_ON(addr >= end);
>  	mem_cgroup_uncharge_start();
>  	tlb_start_vma(tlb, vma);
> @@ -1102,7 +1102,7 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>  	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
>  	int tlb_start_valid = 0;
>  	unsigned long start = start_addr;
> -	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
> +	spinlock_t *i_mmap_lock = details->i_mmap_lock;

This appears to be the sole improvement from insisting that everywhere
sets up an initialized details block.  I don't think this is worth it.

>  	int fullmm = (*tlbp)->fullmm;
>  	struct mm_struct *mm = vma->vm_mm;
>  
> @@ -1217,10 +1217,11 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
>  int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>  		unsigned long size)
>  {
> +	DEFINE_ZAP_DETAILS(details);

Overhead.

>  	if (address < vma->vm_start || address + size > vma->vm_end ||
>  	    		!(vma->vm_flags & VM_PFNMAP))
>  		return -1;
> -	zap_page_range(vma, address, size, NULL);
> +	zap_page_range(vma, address, size, &details);
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(zap_vma_ptes);
> @@ -2577,7 +2578,8 @@ restart:
>  void unmap_mapping_range(struct address_space *mapping,
>  		loff_t const holebegin, loff_t const holelen, int even_cows)
>  {
> -	struct zap_details details;
> +	DEFINE_ZAP_DETAILS(details);
> +
>  	pgoff_t hba = holebegin >> PAGE_SHIFT;
>  	pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b179abb..31d2594 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1900,11 +1900,12 @@ static void unmap_region(struct mm_struct *mm,
>  	struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
>  	struct mmu_gather *tlb;
>  	unsigned long nr_accounted = 0;
> +	DEFINE_ZAP_DETAILS(details);

Overhead.

>  
>  	lru_add_drain();
>  	tlb = tlb_gather_mmu(mm, 0);
>  	update_hiwater_rss(mm);
> -	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
> +	unmap_vmas(&tlb, vma, start, end, &nr_accounted, &details);
>  	vm_unacct_memory(nr_accounted);
>  	free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
>  				 next? next->vm_start: 0);
> @@ -2254,6 +2255,7 @@ void exit_mmap(struct mm_struct *mm)
>  	struct vm_area_struct *vma;
>  	unsigned long nr_accounted = 0;
>  	unsigned long end;
> +	DEFINE_ZAP_DETAILS(details);

Overhead.

>  
>  	/* mm's last user has gone, and its about to be pulled down */
>  	mmu_notifier_release(mm);
> @@ -2278,7 +2280,7 @@ void exit_mmap(struct mm_struct *mm)
>  	tlb = tlb_gather_mmu(mm, 1);
>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
> -	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
> +	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, &details);
>  	vm_unacct_memory(nr_accounted);
>  
>  	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
> -- 

Am I being too fussy?

Hugh

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 6/7] Remove zap_details NULL dependency
@ 2010-12-07  4:26     ` Hugh Dickins
  0 siblings, 0 replies; 109+ messages in thread
From: Hugh Dickins @ 2010-12-07  4:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Mon, 6 Dec 2010, Minchan Kim wrote:

> Some functions used zap_details depends on assumption that
> zap_details parameter should be NULLed if some fields are 0.
> 
> This patch removes that dependency for next patch easy review/merge.
> It should not chanage behavior.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Hugh Dickins <hughd@google.com>

Sorry, while I do like that you're now using the details block,
you seem to be adding overhead in various places without actually
simplifying anything - you insist that everything passes down an
initialized details block, and then in the end force the pointer
to NULL again in all the common cases.

Which seems odd.  I could understand if you were going to scrap
the NULL details optimization altogether; but I think that (for
the original optimization reasons) you're right to force it to NULL
in the end, so then why initialize the block at all those call sites?

> ---
>  include/linux/mm.h |    8 ++++++++
>  mm/madvise.c       |   15 +++++++++------
>  mm/memory.c        |   14 ++++++++------
>  mm/mmap.c          |    6 ++++--
>  4 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e097df6..6522ae4 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -773,6 +773,14 @@ struct zap_details {
>  	unsigned long truncate_count;		/* Compare vm_truncate_count */
>  };
>  
> +#define __ZAP_DETAILS_INITIALIZER(name) \
> +		{ .nonlinear_vma = NULL \
> +		, .check_mapping = NULL \
> +		, .i_mmap_lock = NULL }
> +
> +#define DEFINE_ZAP_DETAILS(name)		\
> +	struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)

Okay.

> +
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>  		pte_t pte);
>  
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 319528b..bfa17aa 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -162,18 +162,21 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>  			     struct vm_area_struct ** prev,
>  			     unsigned long start, unsigned long end)
>  {
> +	DEFINE_ZAP_DETAILS(details);
> +
>  	*prev = vma;
>  	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>  		return -EINVAL;
>  
>  	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
> -		struct zap_details details = {
> -			.nonlinear_vma = vma,
> -			.last_index = ULONG_MAX,
> -		};
> +		details.nonlinear_vma = vma;
> +		details.last_index = ULONG_MAX;
> +
>  		zap_page_range(vma, start, end - start, &details);
> -	} else
> -		zap_page_range(vma, start, end - start, NULL);
> +	} else {
> +
> +		zap_page_range(vma, start, end - start, &details);
> +	}

You end up with two identical zap_page_range() lines:
better have one after the if {} without an else.

>  	return 0;
>  }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index ebfeedf..c0879bb 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -900,6 +900,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  
>  	init_rss_vec(rss);
>  
> +	if (!details->check_mapping && !details->nonlinear_vma)
> +		details = NULL;
> +

Aside from its necessity in the next patch, I thoroughly approve of
your moving this optimization here: it is confusing, and better that
it be done near where the fields are used, than off at the higher level.

>  	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
>  	arch_enter_lazy_mmu_mode();
>  	do {
> @@ -1038,9 +1041,6 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
>  	pgd_t *pgd;
>  	unsigned long next;
>  
> -	if (details && !details->check_mapping && !details->nonlinear_vma)
> -		details = NULL;
> -

Yes, I put it there because that was the highest point at which
it could then be done, so it was optimal from a do-it-fewest-times
point of view; but not at all helpful in understanding what's going
on, much better as you have it.

>  	BUG_ON(addr >= end);
>  	mem_cgroup_uncharge_start();
>  	tlb_start_vma(tlb, vma);
> @@ -1102,7 +1102,7 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>  	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
>  	int tlb_start_valid = 0;
>  	unsigned long start = start_addr;
> -	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
> +	spinlock_t *i_mmap_lock = details->i_mmap_lock;

This appears to be the sole improvement from insisting that everywhere
sets up an initialized details block.  I don't think this is worth it.

>  	int fullmm = (*tlbp)->fullmm;
>  	struct mm_struct *mm = vma->vm_mm;
>  
> @@ -1217,10 +1217,11 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
>  int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>  		unsigned long size)
>  {
> +	DEFINE_ZAP_DETAILS(details);

Overhead.

>  	if (address < vma->vm_start || address + size > vma->vm_end ||
>  	    		!(vma->vm_flags & VM_PFNMAP))
>  		return -1;
> -	zap_page_range(vma, address, size, NULL);
> +	zap_page_range(vma, address, size, &details);
>  	return 0;
>  }
>  EXPORT_SYMBOL_GPL(zap_vma_ptes);
> @@ -2577,7 +2578,8 @@ restart:
>  void unmap_mapping_range(struct address_space *mapping,
>  		loff_t const holebegin, loff_t const holelen, int even_cows)
>  {
> -	struct zap_details details;
> +	DEFINE_ZAP_DETAILS(details);
> +
>  	pgoff_t hba = holebegin >> PAGE_SHIFT;
>  	pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
>  
> diff --git a/mm/mmap.c b/mm/mmap.c
> index b179abb..31d2594 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1900,11 +1900,12 @@ static void unmap_region(struct mm_struct *mm,
>  	struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
>  	struct mmu_gather *tlb;
>  	unsigned long nr_accounted = 0;
> +	DEFINE_ZAP_DETAILS(details);

Overhead.

>  
>  	lru_add_drain();
>  	tlb = tlb_gather_mmu(mm, 0);
>  	update_hiwater_rss(mm);
> -	unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
> +	unmap_vmas(&tlb, vma, start, end, &nr_accounted, &details);
>  	vm_unacct_memory(nr_accounted);
>  	free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
>  				 next? next->vm_start: 0);
> @@ -2254,6 +2255,7 @@ void exit_mmap(struct mm_struct *mm)
>  	struct vm_area_struct *vma;
>  	unsigned long nr_accounted = 0;
>  	unsigned long end;
> +	DEFINE_ZAP_DETAILS(details);

Overhead.

>  
>  	/* mm's last user has gone, and its about to be pulled down */
>  	mmu_notifier_release(mm);
> @@ -2278,7 +2280,7 @@ void exit_mmap(struct mm_struct *mm)
>  	tlb = tlb_gather_mmu(mm, 1);
>  	/* update_hiwater_rss(mm) here? but nobody should be looking */
>  	/* Use -1 here to ensure all VMAs in the mm are unmapped */
> -	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
> +	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, &details);
>  	vm_unacct_memory(nr_accounted);
>  
>  	free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
> -- 

Am I being too fussy?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-07  4:48     ` Hugh Dickins
  -1 siblings, 0 replies; 109+ messages in thread
From: Hugh Dickins @ 2010-12-07  4:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Mon, 6 Dec 2010, Minchan Kim wrote:

> Now zap_pte_range alwayas activates pages which are pte_young &&
> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
> it's unnecessary since the page wouldn't use any more.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> 
> Changelog since v3:
>  - Change variable name - suggested by Johannes
>  - Union ignore_references with zap_details - suggested by Hugh
> 
> Changelog since v2:
>  - remove unnecessary description
> 
> Changelog since v1:
>  - change word from promote to activate
>  - add activate argument to zap_pte_range and family function
> ---
>  include/linux/mm.h |    4 +++-
>  mm/madvise.c       |    6 +++---
>  mm/memory.c        |    5 ++++-
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6522ae4..e57190f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -771,12 +771,14 @@ struct zap_details {
>  	pgoff_t last_index;			/* Highest page->index to unmap */
>  	spinlock_t *i_mmap_lock;		/* For unmap_mapping_range: */
>  	unsigned long truncate_count;		/* Compare vm_truncate_count */
> +	bool ignore_references;			/* For page activation */
>  };
>  
>  #define __ZAP_DETAILS_INITIALIZER(name) \
>                  { .nonlinear_vma = NULL \
>  		, .check_mapping = NULL \
> -		, .i_mmap_lock = NULL }
> +		, .i_mmap_lock = NULL	\
> +		, .ignore_references = false }

Okay.

>  
>  #define DEFINE_ZAP_DETAILS(name)		\
>  	struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
> diff --git a/mm/madvise.c b/mm/madvise.c
> index bfa17aa..8e7aba3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -163,6 +163,7 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>  			     unsigned long start, unsigned long end)
>  {
>  	DEFINE_ZAP_DETAILS(details);
> +	details.ignore_references = true;
>  
>  	*prev = vma;
>  	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
> @@ -173,10 +174,9 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>  		details.last_index = ULONG_MAX;
>  
>  		zap_page_range(vma, start, end - start, &details);
> -	} else {
> -
> +	} else
>  		zap_page_range(vma, start, end - start, &details);
> -	}
> +

As in the previous patch, you have the same in the if {} and the else.

>  	return 0;
>  }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index c0879bb..44d87e1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -897,6 +897,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  	pte_t *pte;
>  	spinlock_t *ptl;
>  	int rss[NR_MM_COUNTERS];
> +	bool ignore_references = details->ignore_references;
>  
>  	init_rss_vec(rss);
>  
> @@ -952,7 +953,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  				if (pte_dirty(ptent))
>  					set_page_dirty(page);
>  				if (pte_young(ptent) &&
> -				    likely(!VM_SequentialReadHint(vma)))
> +				    likely(!VM_SequentialReadHint(vma)) &&
> +					!ignore_references)

I think ignore_references is about as likely as VM_SequentialReadHint:
I'd probably just omit that "likely()" nowadays, but you might prefer
to put your "|| !ignore_references" inside.

Hmm, actually it would probably be better to say something like

	mark_accessed = true;
	if (VM_SequentialReadHint(vma) ||
            (details && details->ignore_references))
		mark_accessed = false;

on entry to zap_pte_range().

>  					mark_page_accessed(page);
>  				rss[MM_FILEPAGES]--;
>  			}
> @@ -1218,6 +1220,7 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>  		unsigned long size)
>  {
>  	DEFINE_ZAP_DETAILS(details);
> +	details.ignore_references = true;
>  	if (address < vma->vm_start || address + size > vma->vm_end ||
>  	    		!(vma->vm_flags & VM_PFNMAP))
>  		return -1;
> -- 

Unnecessary here (would make more sense in the truncation case,
but not necessary there either): zap_vma_ptes() is only being used on
GRU's un-cowable VM_PFNMAP area, so vm_normal_page() won't even give
you a non-NULL page to mark.

Hugh

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
@ 2010-12-07  4:48     ` Hugh Dickins
  0 siblings, 0 replies; 109+ messages in thread
From: Hugh Dickins @ 2010-12-07  4:48 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Mon, 6 Dec 2010, Minchan Kim wrote:

> Now zap_pte_range alwayas activates pages which are pte_young &&
> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
> it's unnecessary since the page wouldn't use any more.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> 
> Changelog since v3:
>  - Change variable name - suggested by Johannes
>  - Union ignore_references with zap_details - suggested by Hugh
> 
> Changelog since v2:
>  - remove unnecessary description
> 
> Changelog since v1:
>  - change word from promote to activate
>  - add activate argument to zap_pte_range and family function
> ---
>  include/linux/mm.h |    4 +++-
>  mm/madvise.c       |    6 +++---
>  mm/memory.c        |    5 ++++-
>  3 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 6522ae4..e57190f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -771,12 +771,14 @@ struct zap_details {
>  	pgoff_t last_index;			/* Highest page->index to unmap */
>  	spinlock_t *i_mmap_lock;		/* For unmap_mapping_range: */
>  	unsigned long truncate_count;		/* Compare vm_truncate_count */
> +	bool ignore_references;			/* For page activation */
>  };
>  
>  #define __ZAP_DETAILS_INITIALIZER(name) \
>                  { .nonlinear_vma = NULL \
>  		, .check_mapping = NULL \
> -		, .i_mmap_lock = NULL }
> +		, .i_mmap_lock = NULL	\
> +		, .ignore_references = false }

Okay.

>  
>  #define DEFINE_ZAP_DETAILS(name)		\
>  	struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
> diff --git a/mm/madvise.c b/mm/madvise.c
> index bfa17aa..8e7aba3 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -163,6 +163,7 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>  			     unsigned long start, unsigned long end)
>  {
>  	DEFINE_ZAP_DETAILS(details);
> +	details.ignore_references = true;
>  
>  	*prev = vma;
>  	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
> @@ -173,10 +174,9 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>  		details.last_index = ULONG_MAX;
>  
>  		zap_page_range(vma, start, end - start, &details);
> -	} else {
> -
> +	} else
>  		zap_page_range(vma, start, end - start, &details);
> -	}
> +

As in the previous patch, you have the same in the if {} and the else.

>  	return 0;
>  }
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index c0879bb..44d87e1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -897,6 +897,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  	pte_t *pte;
>  	spinlock_t *ptl;
>  	int rss[NR_MM_COUNTERS];
> +	bool ignore_references = details->ignore_references;
>  
>  	init_rss_vec(rss);
>  
> @@ -952,7 +953,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  				if (pte_dirty(ptent))
>  					set_page_dirty(page);
>  				if (pte_young(ptent) &&
> -				    likely(!VM_SequentialReadHint(vma)))
> +				    likely(!VM_SequentialReadHint(vma)) &&
> +					!ignore_references)

I think ignore_references is about as likely as VM_SequentialReadHint:
I'd probably just omit that "likely()" nowadays, but you might prefer
to put your "|| !ignore_references" inside.

Hmm, actually it would probably be better to say something like

	mark_accessed = true;
	if (VM_SequentialReadHint(vma) ||
            (details && details->ignore_references))
		mark_accessed = false;

on entry to zap_pte_range().

>  					mark_page_accessed(page);
>  				rss[MM_FILEPAGES]--;
>  			}
> @@ -1218,6 +1220,7 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>  		unsigned long size)
>  {
>  	DEFINE_ZAP_DETAILS(details);
> +	details.ignore_references = true;
>  	if (address < vma->vm_start || address + size > vma->vm_end ||
>  	    		!(vma->vm_flags & VM_PFNMAP))
>  		return -1;
> -- 

Unnecessary here (would make more sense in the truncation case,
but not necessary there either): zap_vma_ptes() is only being used on
GRU's un-cowable VM_PFNMAP area, so vm_normal_page() won't even give
you a non-NULL page to mark.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 6/7] Remove zap_details NULL dependency
  2010-12-07  4:26     ` Hugh Dickins
@ 2010-12-07  5:30       ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07  5:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

Hi Hugh,

On Tue, Dec 7, 2010 at 1:26 PM, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 6 Dec 2010, Minchan Kim wrote:
>
>> Some functions used zap_details depends on assumption that
>> zap_details parameter should be NULLed if some fields are 0.
>>
>> This patch removes that dependency for next patch easy review/merge.
>> It should not chanage behavior.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: Hugh Dickins <hughd@google.com>
>
> Sorry, while I do like that you're now using the details block,
> you seem to be adding overhead in various places without actually
> simplifying anything - you insist that everything passes down an
> initialized details block, and then in the end force the pointer
> to NULL again in all the common cases.
>
> Which seems odd.  I could understand if you were going to scrap
> the NULL details optimization altogether; but I think that (for
> the original optimization reasons) you're right to force it to NULL
> in the end, so then why initialize the block at all those call sites?
>
>> ---
>>  include/linux/mm.h |    8 ++++++++
>>  mm/madvise.c       |   15 +++++++++------
>>  mm/memory.c        |   14 ++++++++------
>>  mm/mmap.c          |    6 ++++--
>>  4 files changed, 29 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index e097df6..6522ae4 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -773,6 +773,14 @@ struct zap_details {
>>       unsigned long truncate_count;           /* Compare vm_truncate_count */
>>  };
>>
>> +#define __ZAP_DETAILS_INITIALIZER(name) \
>> +             { .nonlinear_vma = NULL \
>> +             , .check_mapping = NULL \
>> +             , .i_mmap_lock = NULL }
>> +
>> +#define DEFINE_ZAP_DETAILS(name)             \
>> +     struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
>
> Okay.
>
>> +
>>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>               pte_t pte);
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 319528b..bfa17aa 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -162,18 +162,21 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>>                            struct vm_area_struct ** prev,
>>                            unsigned long start, unsigned long end)
>>  {
>> +     DEFINE_ZAP_DETAILS(details);
>> +
>>       *prev = vma;
>>       if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>>               return -EINVAL;
>>
>>       if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
>> -             struct zap_details details = {
>> -                     .nonlinear_vma = vma,
>> -                     .last_index = ULONG_MAX,
>> -             };
>> +             details.nonlinear_vma = vma;
>> +             details.last_index = ULONG_MAX;
>> +
>>               zap_page_range(vma, start, end - start, &details);
>> -     } else
>> -             zap_page_range(vma, start, end - start, NULL);
>> +     } else {
>> +
>> +             zap_page_range(vma, start, end - start, &details);
>> +     }
>
> You end up with two identical zap_page_range() lines:
> better have one after the if {} without an else.
>

Okay. Will fix.

>>       return 0;
>>  }
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index ebfeedf..c0879bb 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -900,6 +900,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>
>>       init_rss_vec(rss);
>>
>> +     if (!details->check_mapping && !details->nonlinear_vma)
>> +             details = NULL;
>> +
>
> Aside from its necessity in the next patch, I thoroughly approve of
> your moving this optimization here: it is confusing, and better that
> it be done near where the fields are used, than off at the higher level.

Thanks.

>
>>       pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
>>       arch_enter_lazy_mmu_mode();
>>       do {
>> @@ -1038,9 +1041,6 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
>>       pgd_t *pgd;
>>       unsigned long next;
>>
>> -     if (details && !details->check_mapping && !details->nonlinear_vma)
>> -             details = NULL;
>> -
>
> Yes, I put it there because that was the highest point at which
> it could then be done, so it was optimal from a do-it-fewest-times
> point of view; but not at all helpful in understanding what's going
> on, much better as you have it.
>
>>       BUG_ON(addr >= end);
>>       mem_cgroup_uncharge_start();
>>       tlb_start_vma(tlb, vma);
>> @@ -1102,7 +1102,7 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>>       unsigned long tlb_start = 0;    /* For tlb_finish_mmu */
>>       int tlb_start_valid = 0;
>>       unsigned long start = start_addr;
>> -     spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
>> +     spinlock_t *i_mmap_lock = details->i_mmap_lock;
>
> This appears to be the sole improvement from insisting that everywhere
> sets up an initialized details block.  I don't think this is worth it.
>
>>       int fullmm = (*tlbp)->fullmm;
>>       struct mm_struct *mm = vma->vm_mm;
>>
>> @@ -1217,10 +1217,11 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
>>  int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>>               unsigned long size)
>>  {
>> +     DEFINE_ZAP_DETAILS(details);
>
> Overhead.
>
>>       if (address < vma->vm_start || address + size > vma->vm_end ||
>>                       !(vma->vm_flags & VM_PFNMAP))
>>               return -1;
>> -     zap_page_range(vma, address, size, NULL);
>> +     zap_page_range(vma, address, size, &details);
>>       return 0;
>>  }
>>  EXPORT_SYMBOL_GPL(zap_vma_ptes);
>> @@ -2577,7 +2578,8 @@ restart:
>>  void unmap_mapping_range(struct address_space *mapping,
>>               loff_t const holebegin, loff_t const holelen, int even_cows)
>>  {
>> -     struct zap_details details;
>> +     DEFINE_ZAP_DETAILS(details);
>> +
>>       pgoff_t hba = holebegin >> PAGE_SHIFT;
>>       pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index b179abb..31d2594 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1900,11 +1900,12 @@ static void unmap_region(struct mm_struct *mm,
>>       struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
>>       struct mmu_gather *tlb;
>>       unsigned long nr_accounted = 0;
>> +     DEFINE_ZAP_DETAILS(details);
>
> Overhead.
>
>>
>>       lru_add_drain();
>>       tlb = tlb_gather_mmu(mm, 0);
>>       update_hiwater_rss(mm);
>> -     unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
>> +     unmap_vmas(&tlb, vma, start, end, &nr_accounted, &details);
>>       vm_unacct_memory(nr_accounted);
>>       free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
>>                                next? next->vm_start: 0);
>> @@ -2254,6 +2255,7 @@ void exit_mmap(struct mm_struct *mm)
>>       struct vm_area_struct *vma;
>>       unsigned long nr_accounted = 0;
>>       unsigned long end;
>> +     DEFINE_ZAP_DETAILS(details);
>
> Overhead.
>
>>
>>       /* mm's last user has gone, and its about to be pulled down */
>>       mmu_notifier_release(mm);
>> @@ -2278,7 +2280,7 @@ void exit_mmap(struct mm_struct *mm)
>>       tlb = tlb_gather_mmu(mm, 1);
>>       /* update_hiwater_rss(mm) here? but nobody should be looking */
>>       /* Use -1 here to ensure all VMAs in the mm are unmapped */
>> -     end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
>> +     end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, &details);
>>       vm_unacct_memory(nr_accounted);
>>
>>       free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
>> --
>
> Am I being too fussy?

Never. It's a good comment.
I don't want add overhead unnecessary.

will fix and resend.

Thanks for the review, Hugh.

>
> Hugh
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 6/7] Remove zap_details NULL dependency
@ 2010-12-07  5:30       ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07  5:30 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

Hi Hugh,

On Tue, Dec 7, 2010 at 1:26 PM, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 6 Dec 2010, Minchan Kim wrote:
>
>> Some functions used zap_details depends on assumption that
>> zap_details parameter should be NULLed if some fields are 0.
>>
>> This patch removes that dependency for next patch easy review/merge.
>> It should not chanage behavior.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: Hugh Dickins <hughd@google.com>
>
> Sorry, while I do like that you're now using the details block,
> you seem to be adding overhead in various places without actually
> simplifying anything - you insist that everything passes down an
> initialized details block, and then in the end force the pointer
> to NULL again in all the common cases.
>
> Which seems odd.  I could understand if you were going to scrap
> the NULL details optimization altogether; but I think that (for
> the original optimization reasons) you're right to force it to NULL
> in the end, so then why initialize the block at all those call sites?
>
>> ---
>>  include/linux/mm.h |    8 ++++++++
>>  mm/madvise.c       |   15 +++++++++------
>>  mm/memory.c        |   14 ++++++++------
>>  mm/mmap.c          |    6 ++++--
>>  4 files changed, 29 insertions(+), 14 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index e097df6..6522ae4 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -773,6 +773,14 @@ struct zap_details {
>>       unsigned long truncate_count;           /* Compare vm_truncate_count */
>>  };
>>
>> +#define __ZAP_DETAILS_INITIALIZER(name) \
>> +             { .nonlinear_vma = NULL \
>> +             , .check_mapping = NULL \
>> +             , .i_mmap_lock = NULL }
>> +
>> +#define DEFINE_ZAP_DETAILS(name)             \
>> +     struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
>
> Okay.
>
>> +
>>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>>               pte_t pte);
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 319528b..bfa17aa 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -162,18 +162,21 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>>                            struct vm_area_struct ** prev,
>>                            unsigned long start, unsigned long end)
>>  {
>> +     DEFINE_ZAP_DETAILS(details);
>> +
>>       *prev = vma;
>>       if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>>               return -EINVAL;
>>
>>       if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
>> -             struct zap_details details = {
>> -                     .nonlinear_vma = vma,
>> -                     .last_index = ULONG_MAX,
>> -             };
>> +             details.nonlinear_vma = vma;
>> +             details.last_index = ULONG_MAX;
>> +
>>               zap_page_range(vma, start, end - start, &details);
>> -     } else
>> -             zap_page_range(vma, start, end - start, NULL);
>> +     } else {
>> +
>> +             zap_page_range(vma, start, end - start, &details);
>> +     }
>
> You end up with two identical zap_page_range() lines:
> better have one after the if {} without an else.
>

Okay. Will fix.

>>       return 0;
>>  }
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index ebfeedf..c0879bb 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -900,6 +900,9 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>
>>       init_rss_vec(rss);
>>
>> +     if (!details->check_mapping && !details->nonlinear_vma)
>> +             details = NULL;
>> +
>
> Aside from its necessity in the next patch, I thoroughly approve of
> your moving this optimization here: it is confusing, and better that
> it be done near where the fields are used, than off at the higher level.

Thanks.

>
>>       pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
>>       arch_enter_lazy_mmu_mode();
>>       do {
>> @@ -1038,9 +1041,6 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
>>       pgd_t *pgd;
>>       unsigned long next;
>>
>> -     if (details && !details->check_mapping && !details->nonlinear_vma)
>> -             details = NULL;
>> -
>
> Yes, I put it there because that was the highest point at which
> it could then be done, so it was optimal from a do-it-fewest-times
> point of view; but not at all helpful in understanding what's going
> on, much better as you have it.
>
>>       BUG_ON(addr >= end);
>>       mem_cgroup_uncharge_start();
>>       tlb_start_vma(tlb, vma);
>> @@ -1102,7 +1102,7 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>>       unsigned long tlb_start = 0;    /* For tlb_finish_mmu */
>>       int tlb_start_valid = 0;
>>       unsigned long start = start_addr;
>> -     spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
>> +     spinlock_t *i_mmap_lock = details->i_mmap_lock;
>
> This appears to be the sole improvement from insisting that everywhere
> sets up an initialized details block.  I don't think this is worth it.
>
>>       int fullmm = (*tlbp)->fullmm;
>>       struct mm_struct *mm = vma->vm_mm;
>>
>> @@ -1217,10 +1217,11 @@ unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address,
>>  int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>>               unsigned long size)
>>  {
>> +     DEFINE_ZAP_DETAILS(details);
>
> Overhead.
>
>>       if (address < vma->vm_start || address + size > vma->vm_end ||
>>                       !(vma->vm_flags & VM_PFNMAP))
>>               return -1;
>> -     zap_page_range(vma, address, size, NULL);
>> +     zap_page_range(vma, address, size, &details);
>>       return 0;
>>  }
>>  EXPORT_SYMBOL_GPL(zap_vma_ptes);
>> @@ -2577,7 +2578,8 @@ restart:
>>  void unmap_mapping_range(struct address_space *mapping,
>>               loff_t const holebegin, loff_t const holelen, int even_cows)
>>  {
>> -     struct zap_details details;
>> +     DEFINE_ZAP_DETAILS(details);
>> +
>>       pgoff_t hba = holebegin >> PAGE_SHIFT;
>>       pgoff_t hlen = (holelen + PAGE_SIZE - 1) >> PAGE_SHIFT;
>>
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index b179abb..31d2594 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1900,11 +1900,12 @@ static void unmap_region(struct mm_struct *mm,
>>       struct vm_area_struct *next = prev? prev->vm_next: mm->mmap;
>>       struct mmu_gather *tlb;
>>       unsigned long nr_accounted = 0;
>> +     DEFINE_ZAP_DETAILS(details);
>
> Overhead.
>
>>
>>       lru_add_drain();
>>       tlb = tlb_gather_mmu(mm, 0);
>>       update_hiwater_rss(mm);
>> -     unmap_vmas(&tlb, vma, start, end, &nr_accounted, NULL);
>> +     unmap_vmas(&tlb, vma, start, end, &nr_accounted, &details);
>>       vm_unacct_memory(nr_accounted);
>>       free_pgtables(tlb, vma, prev? prev->vm_end: FIRST_USER_ADDRESS,
>>                                next? next->vm_start: 0);
>> @@ -2254,6 +2255,7 @@ void exit_mmap(struct mm_struct *mm)
>>       struct vm_area_struct *vma;
>>       unsigned long nr_accounted = 0;
>>       unsigned long end;
>> +     DEFINE_ZAP_DETAILS(details);
>
> Overhead.
>
>>
>>       /* mm's last user has gone, and its about to be pulled down */
>>       mmu_notifier_release(mm);
>> @@ -2278,7 +2280,7 @@ void exit_mmap(struct mm_struct *mm)
>>       tlb = tlb_gather_mmu(mm, 1);
>>       /* update_hiwater_rss(mm) here? but nobody should be looking */
>>       /* Use -1 here to ensure all VMAs in the mm are unmapped */
>> -     end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
>> +     end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, &details);
>>       vm_unacct_memory(nr_accounted);
>>
>>       free_pgtables(tlb, vma, FIRST_USER_ADDRESS, 0);
>> --
>
> Am I being too fussy?

Never. It's a good comment.
I don't want add overhead unnecessary.

will fix and resend.

Thanks for the review, Hugh.

>
> Hugh
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
  2010-12-07  4:48     ` Hugh Dickins
  (?)
@ 2010-12-07  5:44     ` Minchan Kim
  2010-12-08  7:26         ` Hugh Dickins
  -1 siblings, 1 reply; 109+ messages in thread
From: Minchan Kim @ 2010-12-07  5:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

[-- Attachment #1: Type: text/plain, Size: 9656 bytes --]

On Tue, Dec 7, 2010 at 1:48 PM, Hugh Dickins <hughd@google.com> wrote:
> On Mon, 6 Dec 2010, Minchan Kim wrote:
>
>> Now zap_pte_range alwayas activates pages which are pte_young &&
>> !VM_SequentialReadHint(vma). But in case of calling MADV_DONTNEED,
>> it's unnecessary since the page wouldn't use any more.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Acked-by: Rik van Riel <riel@redhat.com>
>> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: Hugh Dickins <hughd@google.com>
>>
>> Changelog since v3:
>>  - Change variable name - suggested by Johannes
>>  - Union ignore_references with zap_details - suggested by Hugh
>>
>> Changelog since v2:
>>  - remove unnecessary description
>>
>> Changelog since v1:
>>  - change word from promote to activate
>>  - add activate argument to zap_pte_range and family function
>> ---
>>  include/linux/mm.h |    4 +++-
>>  mm/madvise.c       |    6 +++---
>>  mm/memory.c        |    5 ++++-
>>  3 files changed, 10 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 6522ae4..e57190f 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -771,12 +771,14 @@ struct zap_details {
>>       pgoff_t last_index;                     /* Highest page->index to unmap */
>>       spinlock_t *i_mmap_lock;                /* For unmap_mapping_range: */
>>       unsigned long truncate_count;           /* Compare vm_truncate_count */
>> +     bool ignore_references;                 /* For page activation */
>>  };
>>
>>  #define __ZAP_DETAILS_INITIALIZER(name) \
>>                  { .nonlinear_vma = NULL \
>>               , .check_mapping = NULL \
>> -             , .i_mmap_lock = NULL }
>> +             , .i_mmap_lock = NULL   \
>> +             , .ignore_references = false }
>
> Okay.
>
>>
>>  #define DEFINE_ZAP_DETAILS(name)             \
>>       struct zap_details name = __ZAP_DETAILS_INITIALIZER(name)
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index bfa17aa..8e7aba3 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -163,6 +163,7 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>>                            unsigned long start, unsigned long end)
>>  {
>>       DEFINE_ZAP_DETAILS(details);
>> +     details.ignore_references = true;
>>
>>       *prev = vma;
>>       if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>> @@ -173,10 +174,9 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>>               details.last_index = ULONG_MAX;
>>
>>               zap_page_range(vma, start, end - start, &details);
>> -     } else {
>> -
>> +     } else
>>               zap_page_range(vma, start, end - start, &details);
>> -     }
>> +
>
> As in the previous patch, you have the same in the if {} and the else.
>
>>       return 0;
>>  }
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index c0879bb..44d87e1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -897,6 +897,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>       pte_t *pte;
>>       spinlock_t *ptl;
>>       int rss[NR_MM_COUNTERS];
>> +     bool ignore_references = details->ignore_references;
>>
>>       init_rss_vec(rss);
>>
>> @@ -952,7 +953,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>                               if (pte_dirty(ptent))
>>                                       set_page_dirty(page);
>>                               if (pte_young(ptent) &&
>> -                                 likely(!VM_SequentialReadHint(vma)))
>> +                                 likely(!VM_SequentialReadHint(vma)) &&
>> +                                     !ignore_references)
>
> I think ignore_references is about as likely as VM_SequentialReadHint:
> I'd probably just omit that "likely()" nowadays, but you might prefer
> to put your "|| !ignore_references" inside.
>
> Hmm, actually it would probably be better to say something like
>
>        mark_accessed = true;
>        if (VM_SequentialReadHint(vma) ||
>            (details && details->ignore_references))
>                mark_accessed = false;
>
> on entry to zap_pte_range().
>
>>                                       mark_page_accessed(page);
>>                               rss[MM_FILEPAGES]--;
>>                       }
>> @@ -1218,6 +1220,7 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
>>               unsigned long size)
>>  {
>>       DEFINE_ZAP_DETAILS(details);
>> +     details.ignore_references = true;
>>       if (address < vma->vm_start || address + size > vma->vm_end ||
>>                       !(vma->vm_flags & VM_PFNMAP))
>>               return -1;
>> --
>
> Unnecessary here (would make more sense in the truncation case,
> but not necessary there either): zap_vma_ptes() is only being used on
> GRU's un-cowable VM_PFNMAP area, so vm_normal_page() won't even give
> you a non-NULL page to mark.

Thanks for the notice.

How about this? Although it doesn't remove null dependency, it meet my
goal without big overhead.
It's just quick patch. If you agree, I will resend this version as formal patch.
(If you suffered from seeing below word-wrapped source, see the
attachment. I asked to google two time to support text-plain mode in
gmail web but I can't receive any response until now. ;(. Lots of
kernel developer in google. Please support this mode for us who can't
use SMTP although it's a very small VOC)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e097df6..14ae918 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -771,6 +771,7 @@ struct zap_details {
        pgoff_t last_index;                     /* Highest page->index
to unmap */
        spinlock_t *i_mmap_lock;                /* For unmap_mapping_range: */
        unsigned long truncate_count;           /* Compare vm_truncate_count */
+       int ignore_reference;
 };

 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/madvise.c b/mm/madvise.c
index 319528b..fdb0253 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -162,18 +162,22 @@ static long madvise_dontneed(struct vm_area_struct * vma,
                             struct vm_area_struct ** prev,
                             unsigned long start, unsigned long end)
 {
+       struct zap_details details ;
+
        *prev = vma;
        if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
                return -EINVAL;

        if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
-               struct zap_details details = {
-                       .nonlinear_vma = vma,
-                       .last_index = ULONG_MAX,
-               };
-               zap_page_range(vma, start, end - start, &details);
-       } else
-               zap_page_range(vma, start, end - start, NULL);
+               details.nonlinear_vma = vma;
+               details.last_index = ULONG_MAX;
+       } else {
+               details.nonlinear_vma = NULL;
+               details.last_index = NULL;
+       }
+
+       details.ignore_references = true;
+       zap_page_range(vma, start, end - start, &details);
        return 0;
 }

diff --git a/mm/memory.c b/mm/memory.c
index ebfeedf..d46ac42 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -897,9 +897,15 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
        pte_t *pte;
        spinlock_t *ptl;
        int rss[NR_MM_COUNTERS];
-
+       bool ignore_reference = false;
        init_rss_vec(rss);

+       if (details && ((!details->check_mapping && !details->nonlinear_vma)
+                                        || !details->ignore_reference))
+               details = NULL;
+
        pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
        arch_enter_lazy_mmu_mode();
        do {
@@ -949,7 +955,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
                                if (pte_dirty(ptent))
                                        set_page_dirty(page);
                                if (pte_young(ptent) &&
-                                   likely(!VM_SequentialReadHint(vma)))
+                                   likely(!VM_SequentialReadHint(vma)) &&
+                                   likely(!ignore_reference))
                                        mark_page_accessed(page);
                                rss[MM_FILEPAGES]--;
                        }
@@ -1038,8 +1045,6 @@ static unsigned long unmap_page_range(struct
mmu_gather *tlb,
        pgd_t *pgd;
        unsigned long next;

-       if (details && !details->check_mapping && !details->nonlinear_vma)
-               details = NULL;

        BUG_ON(addr >= end);
        mem_cgroup_uncharge_start();
@@ -1102,7 +1107,8 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
        unsigned long tlb_start = 0;    /* For tlb_finish_mmu */
        int tlb_start_valid = 0;
        unsigned long start = start_addr;
-       spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
+       spinlock_t *i_mmap_lock = details ?
+               (detais->check_mapping ? details->i_mmap_lock: NULL) : NULL;
        int fullmm = (*tlbp)->fullmm;
        struct mm_struct *mm = vma->vm_mm;



>
> Hugh
>



-- 
Kind regards,
Minchan Kim

[-- Attachment #2: madvise.patch --]
[-- Type: text/x-diff, Size: 2952 bytes --]

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e097df6..14ae918 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -771,6 +771,7 @@ struct zap_details {
 	pgoff_t last_index;			/* Highest page->index to unmap */
 	spinlock_t *i_mmap_lock;		/* For unmap_mapping_range: */
 	unsigned long truncate_count;		/* Compare vm_truncate_count */
+	int ignore_reference;
 };
 
 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/madvise.c b/mm/madvise.c
index 319528b..fdb0253 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -162,18 +162,22 @@ static long madvise_dontneed(struct vm_area_struct * vma,
 			     struct vm_area_struct ** prev,
 			     unsigned long start, unsigned long end)
 {
+	struct zap_details details ;
+
 	*prev = vma;
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
 		return -EINVAL;
 
 	if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
-		struct zap_details details = {
-			.nonlinear_vma = vma,
-			.last_index = ULONG_MAX,
-		};
-		zap_page_range(vma, start, end - start, &details);
-	} else
-		zap_page_range(vma, start, end - start, NULL);
+		details.nonlinear_vma = vma;
+		details.last_index = ULONG_MAX;
+	} else {
+		details.nonlinear_vma = NULL;
+		details.last_index = NULL;
+	}
+
+	details.ignore_reference = true;
+	zap_page_range(vma, start, end - start, &details);
 	return 0;
 }
 
diff --git a/mm/memory.c b/mm/memory.c
index ebfeedf..8aa0190 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -897,9 +897,13 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 	pte_t *pte;
 	spinlock_t *ptl;
 	int rss[NR_MM_COUNTERS];
-
+	bool ignore_reference = false;
 	init_rss_vec(rss);
 
+	if (details && ((!details->check_mapping && !details->nonlinear_vma)
+					 || !details->ignore_reference))
+		details = NULL;
+
 	pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
 	arch_enter_lazy_mmu_mode();
 	do {
@@ -949,7 +953,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				if (pte_dirty(ptent))
 					set_page_dirty(page);
 				if (pte_young(ptent) &&
-				    likely(!VM_SequentialReadHint(vma)))
+				    likely(!VM_SequentialReadHint(vma)) &&
+				    likely(!ignore_reference))
 					mark_page_accessed(page);
 				rss[MM_FILEPAGES]--;
 			}
@@ -1038,8 +1043,6 @@ static unsigned long unmap_page_range(struct mmu_gather *tlb,
 	pgd_t *pgd;
 	unsigned long next;
 
-	if (details && !details->check_mapping && !details->nonlinear_vma)
-		details = NULL;
 
 	BUG_ON(addr >= end);
 	mem_cgroup_uncharge_start();
@@ -1102,7 +1105,8 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
 	unsigned long tlb_start = 0;	/* For tlb_finish_mmu */
 	int tlb_start_valid = 0;
 	unsigned long start = start_addr;
-	spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
+	spinlock_t *i_mmap_lock = details ?
+		(detais->check_mapping ? details->i_mmap_lock: NULL) : NULL;
 	int fullmm = (*tlbp)->fullmm;
 	struct mm_struct *mm = vma->vm_mm;
 

^ permalink raw reply related	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] Fix checkpatch's report in swap.c
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-07 14:37     ` Johannes Weiner
  -1 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 14:37 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML

On Mon, Dec 06, 2010 at 02:29:09AM +0900, Minchan Kim wrote:
> checkpatch reports following problems.
> It's a very annoying. This patch fixes it.
>
> barrios@barrios-desktop:~/linux-2.6$ ./scripts/checkpatch.pl -f mm/swap.c
> WARNING: line over 80 characters
> +		if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(mark_page_accessed);
> 
> ERROR: code indent should use tabs where possible
> +  ^I^I}$
> 
> WARNING: please, no space before tabs
> +  ^I^I}$
> 
> WARNING: please, no spaces at the start of a line
> +  ^I^I}$
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(__pagevec_release);
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(____pagevec_lru_add);
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(pagevec_lookup);
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(pagevec_lookup_tag);
> 
> total: 1 errors, 8 warnings, 517 lines checked
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 1/7] Fix checkpatch's report in swap.c
@ 2010-12-07 14:37     ` Johannes Weiner
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 14:37 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML

On Mon, Dec 06, 2010 at 02:29:09AM +0900, Minchan Kim wrote:
> checkpatch reports following problems.
> It's a very annoying. This patch fixes it.
>
> barrios@barrios-desktop:~/linux-2.6$ ./scripts/checkpatch.pl -f mm/swap.c
> WARNING: line over 80 characters
> +		if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(mark_page_accessed);
> 
> ERROR: code indent should use tabs where possible
> +  ^I^I}$
> 
> WARNING: please, no space before tabs
> +  ^I^I}$
> 
> WARNING: please, no spaces at the start of a line
> +  ^I^I}$
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(__pagevec_release);
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(____pagevec_lru_add);
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(pagevec_lookup);
> 
> WARNING: EXPORT_SYMBOL(foo); should immediately follow its function/variable
> +EXPORT_SYMBOL(pagevec_lookup_tag);
> 
> total: 1 errors, 8 warnings, 517 lines checked
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-07 14:49     ` Johannes Weiner
  -1 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 14:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> Changelog since v3:
>  - Change function comments - suggested by Johannes
>  - Change function name - suggested by Johannes
>  - add only dirty/writeback pages to deactive pagevec

Why the extra check?

> @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>  			if (lock_failed)
>  				continue;
>  
> -			ret += invalidate_inode_page(page);
> -
> +			ret = invalidate_inode_page(page);
> +			/*
> +			 * If the page is dirty or under writeback, we can not
> +			 * invalidate it now.  But we assume that attempted
> +			 * invalidation is a hint that the page is no longer
> +			 * of interest and try to speed up its reclaim.
> +			 */
> +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> +				deactivate_page(page);

The writeback completion handler does not take the page lock, so you
can still miss pages that finish writeback before this test, no?

Can you explain why you felt the need to add these checks?

Thanks!

	Hannes

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-07 14:49     ` Johannes Weiner
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 14:49 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> Changelog since v3:
>  - Change function comments - suggested by Johannes
>  - Change function name - suggested by Johannes
>  - add only dirty/writeback pages to deactive pagevec

Why the extra check?

> @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>  			if (lock_failed)
>  				continue;
>  
> -			ret += invalidate_inode_page(page);
> -
> +			ret = invalidate_inode_page(page);
> +			/*
> +			 * If the page is dirty or under writeback, we can not
> +			 * invalidate it now.  But we assume that attempted
> +			 * invalidation is a hint that the page is no longer
> +			 * of interest and try to speed up its reclaim.
> +			 */
> +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> +				deactivate_page(page);

The writeback completion handler does not take the page lock, so you
can still miss pages that finish writeback before this test, no?

Can you explain why you felt the need to add these checks?

Thanks!

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-07 14:52     ` Johannes Weiner
  -1 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 14:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Balbir Singh, KAMEZAWA Hiroyuki

On Mon, Dec 06, 2010 at 02:29:11AM +0900, Minchan Kim wrote:
> Golbal page reclaim moves reclaimalbe pages into inactive list
> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.
> 
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
@ 2010-12-07 14:52     ` Johannes Weiner
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 14:52 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Balbir Singh, KAMEZAWA Hiroyuki

On Mon, Dec 06, 2010 at 02:29:11AM +0900, Minchan Kim wrote:
> Golbal page reclaim moves reclaimalbe pages into inactive list
> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.
> 
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>

Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-07 15:05     ` Johannes Weiner
  -1 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 15:05 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Wu Fengguang, Nick Piggin

On Mon, Dec 06, 2010 at 02:29:12AM +0900, Minchan Kim wrote:
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -275,26 +275,59 @@ void add_page_to_unevictable_list(struct page *page)
>   * head of the list, rather than the tail, to give the flusher
>   * threads some time to write it out, as this is much more
>   * effective than the single-page writeout from reclaim.
> + *
> + * If the page isn't page_mapped and dirty/writeback, the page
> + * could reclaim asap using PG_reclaim.
> + *
> + * 1. active, mapped page -> none
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim

         inactive, clean -> inactive, tail

> + * 5. Others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be write it out by flusher threads as this is much more effective
> + * than the single-page writeout from reclaim.
>   */
>  static void lru_deactivate(struct page *page, struct zone *zone)
>  {
>  	int lru, file;
> +	bool active;
>  
> -	if (!PageLRU(page) || !PageActive(page))
> +	if (!PageLRU(page))
>  		return;
>  
>  	/* Some processes are using the page */
>  	if (page_mapped(page))
>  		return;
>  
> +	active = PageActive(page);
> +
>  	file = page_is_file_cache(page);
>  	lru = page_lru_base_type(page);
> -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> +	del_page_from_lru_list(zone, page, lru + active);
>  	ClearPageActive(page);
>  	ClearPageReferenced(page);
>  	add_page_to_lru_list(zone, page, lru);
> -	__count_vm_event(PGDEACTIVATE);
>
> +	if (PageWriteback(page) || PageDirty(page)) {
> +		/*
> +		 * PG_reclaim could be raced with end_page_writeback
> +		 * It can make readahead confusing.  But race window
> +		 * is _really_ small and  it's non-critical problem.
> +		 */
> +		SetPageReclaim(page);
> +	} else {
> +		/*
> +		 * The page's writeback ends up during pagevec
> +		 * We moves tha page into tail of inactive.
> +		 */
> +		list_move_tail(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_rotate_reclaimable_page(page);

I think you also need to increase PGROTATED here.

Other than that,
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-07 15:05     ` Johannes Weiner
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 15:05 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Wu Fengguang, Nick Piggin

On Mon, Dec 06, 2010 at 02:29:12AM +0900, Minchan Kim wrote:
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -275,26 +275,59 @@ void add_page_to_unevictable_list(struct page *page)
>   * head of the list, rather than the tail, to give the flusher
>   * threads some time to write it out, as this is much more
>   * effective than the single-page writeout from reclaim.
> + *
> + * If the page isn't page_mapped and dirty/writeback, the page
> + * could reclaim asap using PG_reclaim.
> + *
> + * 1. active, mapped page -> none
> + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> + * 3. inactive, mapped page -> none
> + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim

         inactive, clean -> inactive, tail

> + * 5. Others -> none
> + *
> + * In 4, why it moves inactive's head, the VM expects the page would
> + * be write it out by flusher threads as this is much more effective
> + * than the single-page writeout from reclaim.
>   */
>  static void lru_deactivate(struct page *page, struct zone *zone)
>  {
>  	int lru, file;
> +	bool active;
>  
> -	if (!PageLRU(page) || !PageActive(page))
> +	if (!PageLRU(page))
>  		return;
>  
>  	/* Some processes are using the page */
>  	if (page_mapped(page))
>  		return;
>  
> +	active = PageActive(page);
> +
>  	file = page_is_file_cache(page);
>  	lru = page_lru_base_type(page);
> -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> +	del_page_from_lru_list(zone, page, lru + active);
>  	ClearPageActive(page);
>  	ClearPageReferenced(page);
>  	add_page_to_lru_list(zone, page, lru);
> -	__count_vm_event(PGDEACTIVATE);
>
> +	if (PageWriteback(page) || PageDirty(page)) {
> +		/*
> +		 * PG_reclaim could be raced with end_page_writeback
> +		 * It can make readahead confusing.  But race window
> +		 * is _really_ small and  it's non-critical problem.
> +		 */
> +		SetPageReclaim(page);
> +	} else {
> +		/*
> +		 * The page's writeback ends up during pagevec
> +		 * We moves tha page into tail of inactive.
> +		 */
> +		list_move_tail(&page->lru, &zone->lru[lru].list);
> +		mem_cgroup_rotate_reclaimable_page(page);

I think you also need to increase PGROTATED here.

Other than that,
Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-07 14:49     ` Johannes Weiner
@ 2010-12-07 15:07       ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07 15:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> > Changelog since v3:
> >  - Change function comments - suggested by Johannes
> >  - Change function name - suggested by Johannes
> >  - add only dirty/writeback pages to deactive pagevec
> 
> Why the extra check?
> 
> > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> >  			if (lock_failed)
> >  				continue;
> >  
> > -			ret += invalidate_inode_page(page);
> > -
> > +			ret = invalidate_inode_page(page);
> > +			/*
> > +			 * If the page is dirty or under writeback, we can not
> > +			 * invalidate it now.  But we assume that attempted
> > +			 * invalidation is a hint that the page is no longer
> > +			 * of interest and try to speed up its reclaim.
> > +			 */
> > +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> > +				deactivate_page(page);
> 
> The writeback completion handler does not take the page lock, so you
> can still miss pages that finish writeback before this test, no?

Yes. but I think it's rare and even though it happens, it's not critical.
> 
> Can you explain why you felt the need to add these checks?

invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
Look invalidate_complete_page. As easiest example, if the page has buffer and
try_to_release_page can't release the buffer, it could return 0.

I want to check this.

> 
> Thanks!
> 
> 	Hannes

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-07 15:07       ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07 15:07 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> > Changelog since v3:
> >  - Change function comments - suggested by Johannes
> >  - Change function name - suggested by Johannes
> >  - add only dirty/writeback pages to deactive pagevec
> 
> Why the extra check?
> 
> > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> >  			if (lock_failed)
> >  				continue;
> >  
> > -			ret += invalidate_inode_page(page);
> > -
> > +			ret = invalidate_inode_page(page);
> > +			/*
> > +			 * If the page is dirty or under writeback, we can not
> > +			 * invalidate it now.  But we assume that attempted
> > +			 * invalidation is a hint that the page is no longer
> > +			 * of interest and try to speed up its reclaim.
> > +			 */
> > +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> > +				deactivate_page(page);
> 
> The writeback completion handler does not take the page lock, so you
> can still miss pages that finish writeback before this test, no?

Yes. but I think it's rare and even though it happens, it's not critical.
> 
> Can you explain why you felt the need to add these checks?

invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
Look invalidate_complete_page. As easiest example, if the page has buffer and
try_to_release_page can't release the buffer, it could return 0.

I want to check this.

> 
> Thanks!
> 
> 	Hannes

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-07 15:07       ` Minchan Kim
@ 2010-12-07 15:19         ` Johannes Weiner
  -1 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 15:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
> On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> > > Changelog since v3:
> > >  - Change function comments - suggested by Johannes
> > >  - Change function name - suggested by Johannes
> > >  - add only dirty/writeback pages to deactive pagevec
> > 
> > Why the extra check?
> > 
> > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > >  			if (lock_failed)
> > >  				continue;
> > >  
> > > -			ret += invalidate_inode_page(page);
> > > -
> > > +			ret = invalidate_inode_page(page);
> > > +			/*
> > > +			 * If the page is dirty or under writeback, we can not
> > > +			 * invalidate it now.  But we assume that attempted
> > > +			 * invalidation is a hint that the page is no longer
> > > +			 * of interest and try to speed up its reclaim.
> > > +			 */
> > > +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> > > +				deactivate_page(page);
> > 
> > The writeback completion handler does not take the page lock, so you
> > can still miss pages that finish writeback before this test, no?
> 
> Yes. but I think it's rare and even though it happens, it's not critical.
> > 
> > Can you explain why you felt the need to add these checks?
> 
> invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
> Look invalidate_complete_page. As easiest example, if the page has buffer and
> try_to_release_page can't release the buffer, it could return 0.

Ok, but somebody still tried to truncate the page, so why shouldn't we
try to reclaim it?  The reason for deactivating at this location is
that truncation is a strong hint for reclaim, not that it failed due
to dirty/writeback pages.

What's the problem with deactivating pages where try_to_release_page()
failed?

I don't think we should add more logic than necessary.  If there is a
good reason for it, it needs to get a code comment at least.

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-07 15:19         ` Johannes Weiner
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 15:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
> On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> > > Changelog since v3:
> > >  - Change function comments - suggested by Johannes
> > >  - Change function name - suggested by Johannes
> > >  - add only dirty/writeback pages to deactive pagevec
> > 
> > Why the extra check?
> > 
> > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > >  			if (lock_failed)
> > >  				continue;
> > >  
> > > -			ret += invalidate_inode_page(page);
> > > -
> > > +			ret = invalidate_inode_page(page);
> > > +			/*
> > > +			 * If the page is dirty or under writeback, we can not
> > > +			 * invalidate it now.  But we assume that attempted
> > > +			 * invalidation is a hint that the page is no longer
> > > +			 * of interest and try to speed up its reclaim.
> > > +			 */
> > > +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> > > +				deactivate_page(page);
> > 
> > The writeback completion handler does not take the page lock, so you
> > can still miss pages that finish writeback before this test, no?
> 
> Yes. but I think it's rare and even though it happens, it's not critical.
> > 
> > Can you explain why you felt the need to add these checks?
> 
> invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
> Look invalidate_complete_page. As easiest example, if the page has buffer and
> try_to_release_page can't release the buffer, it could return 0.

Ok, but somebody still tried to truncate the page, so why shouldn't we
try to reclaim it?  The reason for deactivating at this location is
that truncation is a strong hint for reclaim, not that it failed due
to dirty/writeback pages.

What's the problem with deactivating pages where try_to_release_page()
failed?

I don't think we should add more logic than necessary.  If there is a
good reason for it, it needs to get a code comment at least.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-07 15:05     ` Johannes Weiner
@ 2010-12-07 15:21       ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07 15:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Wu Fengguang, Nick Piggin

On Tue, Dec 07, 2010 at 04:05:25PM +0100, Johannes Weiner wrote:
> On Mon, Dec 06, 2010 at 02:29:12AM +0900, Minchan Kim wrote:
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -275,26 +275,59 @@ void add_page_to_unevictable_list(struct page *page)
> >   * head of the list, rather than the tail, to give the flusher
> >   * threads some time to write it out, as this is much more
> >   * effective than the single-page writeout from reclaim.
> > + *
> > + * If the page isn't page_mapped and dirty/writeback, the page
> > + * could reclaim asap using PG_reclaim.
> > + *
> > + * 1. active, mapped page -> none
> > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 3. inactive, mapped page -> none
> > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> 
>          inactive, clean -> inactive, tail

Indeed. I missed it.

> 
> > + * 5. Others -> none
> > + *
> > + * In 4, why it moves inactive's head, the VM expects the page would
> > + * be write it out by flusher threads as this is much more effective
> > + * than the single-page writeout from reclaim.
> >   */
> >  static void lru_deactivate(struct page *page, struct zone *zone)
> >  {
> >  	int lru, file;
> > +	bool active;
> >  
> > -	if (!PageLRU(page) || !PageActive(page))
> > +	if (!PageLRU(page))
> >  		return;
> >  
> >  	/* Some processes are using the page */
> >  	if (page_mapped(page))
> >  		return;
> >  
> > +	active = PageActive(page);
> > +
> >  	file = page_is_file_cache(page);
> >  	lru = page_lru_base_type(page);
> > -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> > +	del_page_from_lru_list(zone, page, lru + active);
> >  	ClearPageActive(page);
> >  	ClearPageReferenced(page);
> >  	add_page_to_lru_list(zone, page, lru);
> > -	__count_vm_event(PGDEACTIVATE);
> >
> > +	if (PageWriteback(page) || PageDirty(page)) {
> > +		/*
> > +		 * PG_reclaim could be raced with end_page_writeback
> > +		 * It can make readahead confusing.  But race window
> > +		 * is _really_ small and  it's non-critical problem.
> > +		 */
> > +		SetPageReclaim(page);
> > +	} else {
> > +		/*
> > +		 * The page's writeback ends up during pagevec
> > +		 * We moves tha page into tail of inactive.
> > +		 */
> > +		list_move_tail(&page->lru, &zone->lru[lru].list);
> > +		mem_cgroup_rotate_reclaimable_page(page);
> 
> I think you also need to increase PGROTATED here.

Absolutely.

Thanks, Hannes. :)
> 
> Other than that,
> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-07 15:21       ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07 15:21 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Wu Fengguang, Nick Piggin

On Tue, Dec 07, 2010 at 04:05:25PM +0100, Johannes Weiner wrote:
> On Mon, Dec 06, 2010 at 02:29:12AM +0900, Minchan Kim wrote:
> > --- a/mm/swap.c
> > +++ b/mm/swap.c
> > @@ -275,26 +275,59 @@ void add_page_to_unevictable_list(struct page *page)
> >   * head of the list, rather than the tail, to give the flusher
> >   * threads some time to write it out, as this is much more
> >   * effective than the single-page writeout from reclaim.
> > + *
> > + * If the page isn't page_mapped and dirty/writeback, the page
> > + * could reclaim asap using PG_reclaim.
> > + *
> > + * 1. active, mapped page -> none
> > + * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
> > + * 3. inactive, mapped page -> none
> > + * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
> 
>          inactive, clean -> inactive, tail

Indeed. I missed it.

> 
> > + * 5. Others -> none
> > + *
> > + * In 4, why it moves inactive's head, the VM expects the page would
> > + * be write it out by flusher threads as this is much more effective
> > + * than the single-page writeout from reclaim.
> >   */
> >  static void lru_deactivate(struct page *page, struct zone *zone)
> >  {
> >  	int lru, file;
> > +	bool active;
> >  
> > -	if (!PageLRU(page) || !PageActive(page))
> > +	if (!PageLRU(page))
> >  		return;
> >  
> >  	/* Some processes are using the page */
> >  	if (page_mapped(page))
> >  		return;
> >  
> > +	active = PageActive(page);
> > +
> >  	file = page_is_file_cache(page);
> >  	lru = page_lru_base_type(page);
> > -	del_page_from_lru_list(zone, page, lru + LRU_ACTIVE);
> > +	del_page_from_lru_list(zone, page, lru + active);
> >  	ClearPageActive(page);
> >  	ClearPageReferenced(page);
> >  	add_page_to_lru_list(zone, page, lru);
> > -	__count_vm_event(PGDEACTIVATE);
> >
> > +	if (PageWriteback(page) || PageDirty(page)) {
> > +		/*
> > +		 * PG_reclaim could be raced with end_page_writeback
> > +		 * It can make readahead confusing.  But race window
> > +		 * is _really_ small and  it's non-critical problem.
> > +		 */
> > +		SetPageReclaim(page);
> > +	} else {
> > +		/*
> > +		 * The page's writeback ends up during pagevec
> > +		 * We moves tha page into tail of inactive.
> > +		 */
> > +		list_move_tail(&page->lru, &zone->lru[lru].list);
> > +		mem_cgroup_rotate_reclaimable_page(page);
> 
> I think you also need to increase PGROTATED here.

Absolutely.

Thanks, Hannes. :)
> 
> Other than that,
> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-07 15:19         ` Johannes Weiner
@ 2010-12-07 15:26           ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07 15:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
> On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
> > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> > > > Changelog since v3:
> > > >  - Change function comments - suggested by Johannes
> > > >  - Change function name - suggested by Johannes
> > > >  - add only dirty/writeback pages to deactive pagevec
> > > 
> > > Why the extra check?
> > > 
> > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > > >  			if (lock_failed)
> > > >  				continue;
> > > >  
> > > > -			ret += invalidate_inode_page(page);
> > > > -
> > > > +			ret = invalidate_inode_page(page);
> > > > +			/*
> > > > +			 * If the page is dirty or under writeback, we can not
> > > > +			 * invalidate it now.  But we assume that attempted
> > > > +			 * invalidation is a hint that the page is no longer
> > > > +			 * of interest and try to speed up its reclaim.
> > > > +			 */
> > > > +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> > > > +				deactivate_page(page);
> > > 
> > > The writeback completion handler does not take the page lock, so you
> > > can still miss pages that finish writeback before this test, no?
> > 
> > Yes. but I think it's rare and even though it happens, it's not critical.
> > > 
> > > Can you explain why you felt the need to add these checks?
> > 
> > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
> > Look invalidate_complete_page. As easiest example, if the page has buffer and
> > try_to_release_page can't release the buffer, it could return 0.
> 
> Ok, but somebody still tried to truncate the page, so why shouldn't we
> try to reclaim it?  The reason for deactivating at this location is
> that truncation is a strong hint for reclaim, not that it failed due
> to dirty/writeback pages.
> 
> What's the problem with deactivating pages where try_to_release_page()
> failed?

If try_to_release_page fails and the such pages stay long time in pagevec,
pagevec drain often happens. I think such pages are rare so skip such pages doesn't
hurt goal of this patch.

> 
> I don't think we should add more logic than necessary.  If there is a
> good reason for it, it needs to get a code comment at least.

Above my comment is enough to justify it? If you agree, I can add the comment.

Thanks for careful review, Hannes.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-07 15:26           ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07 15:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
> On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
> > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> > > > Changelog since v3:
> > > >  - Change function comments - suggested by Johannes
> > > >  - Change function name - suggested by Johannes
> > > >  - add only dirty/writeback pages to deactive pagevec
> > > 
> > > Why the extra check?
> > > 
> > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > > >  			if (lock_failed)
> > > >  				continue;
> > > >  
> > > > -			ret += invalidate_inode_page(page);
> > > > -
> > > > +			ret = invalidate_inode_page(page);
> > > > +			/*
> > > > +			 * If the page is dirty or under writeback, we can not
> > > > +			 * invalidate it now.  But we assume that attempted
> > > > +			 * invalidation is a hint that the page is no longer
> > > > +			 * of interest and try to speed up its reclaim.
> > > > +			 */
> > > > +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> > > > +				deactivate_page(page);
> > > 
> > > The writeback completion handler does not take the page lock, so you
> > > can still miss pages that finish writeback before this test, no?
> > 
> > Yes. but I think it's rare and even though it happens, it's not critical.
> > > 
> > > Can you explain why you felt the need to add these checks?
> > 
> > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
> > Look invalidate_complete_page. As easiest example, if the page has buffer and
> > try_to_release_page can't release the buffer, it could return 0.
> 
> Ok, but somebody still tried to truncate the page, so why shouldn't we
> try to reclaim it?  The reason for deactivating at this location is
> that truncation is a strong hint for reclaim, not that it failed due
> to dirty/writeback pages.
> 
> What's the problem with deactivating pages where try_to_release_page()
> failed?

If try_to_release_page fails and the such pages stay long time in pagevec,
pagevec drain often happens. I think such pages are rare so skip such pages doesn't
hurt goal of this patch.

> 
> I don't think we should add more logic than necessary.  If there is a
> good reason for it, it needs to get a code comment at least.

Above my comment is enough to justify it? If you agree, I can add the comment.

Thanks for careful review, Hannes.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-07 15:26           ` Minchan Kim
@ 2010-12-07 15:56             ` Johannes Weiner
  -1 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 15:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Wed, Dec 08, 2010 at 12:26:25AM +0900, Minchan Kim wrote:
> On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
> > On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
> > > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> > > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> > > > > Changelog since v3:
> > > > >  - Change function comments - suggested by Johannes
> > > > >  - Change function name - suggested by Johannes
> > > > >  - add only dirty/writeback pages to deactive pagevec
> > > > 
> > > > Why the extra check?
> > > > 
> > > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > > > >  			if (lock_failed)
> > > > >  				continue;
> > > > >  
> > > > > -			ret += invalidate_inode_page(page);
> > > > > -
> > > > > +			ret = invalidate_inode_page(page);
> > > > > +			/*
> > > > > +			 * If the page is dirty or under writeback, we can not
> > > > > +			 * invalidate it now.  But we assume that attempted
> > > > > +			 * invalidation is a hint that the page is no longer
> > > > > +			 * of interest and try to speed up its reclaim.
> > > > > +			 */
> > > > > +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> > > > > +				deactivate_page(page);
> > > > 
> > > > The writeback completion handler does not take the page lock, so you
> > > > can still miss pages that finish writeback before this test, no?
> > > 
> > > Yes. but I think it's rare and even though it happens, it's not critical.
> > > > 
> > > > Can you explain why you felt the need to add these checks?
> > > 
> > > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
> > > Look invalidate_complete_page. As easiest example, if the page has buffer and
> > > try_to_release_page can't release the buffer, it could return 0.
> > 
> > Ok, but somebody still tried to truncate the page, so why shouldn't we
> > try to reclaim it?  The reason for deactivating at this location is
> > that truncation is a strong hint for reclaim, not that it failed due
> > to dirty/writeback pages.
> > 
> > What's the problem with deactivating pages where try_to_release_page()
> > failed?
> 
> If try_to_release_page fails and the such pages stay long time in pagevec,
> pagevec drain often happens.

You mean because the pagevec becomes full more often?  These are not
many pages you get extra without the checks, the race window is very
small after all.

> I think such pages are rare so skip such pages doesn't hurt goal of
> this patch.

Well, you add extra checks, extra detail to this mechanism.  Instead
of just saying 'tried to truncate, failed, deactivate the page', you
add more ifs and buts.

There should be a real justification for it.  'It can not hurt' is not
a good justification for extra code and making a simple model more
complex.

'It will hurt without treating these pages differently' is a good
justification.  Remember that we have to understand and maintain all
this.  The less checks and operations we need to implement a certain
idea, the better.

Sorry for being so adamant about this, but I think these random checks
are a really sore point of mm code already.

[ For example, we tried discussing lumpy reclaim mode recently and
  none of us could reliably remember how it actually behaved.  There
  are so many special conditions in there that we already end up with
  some of them being dead code and the checks even contradicting each
  other. ]

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-07 15:56             ` Johannes Weiner
  0 siblings, 0 replies; 109+ messages in thread
From: Johannes Weiner @ 2010-12-07 15:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Wed, Dec 08, 2010 at 12:26:25AM +0900, Minchan Kim wrote:
> On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
> > On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
> > > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> > > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> > > > > Changelog since v3:
> > > > >  - Change function comments - suggested by Johannes
> > > > >  - Change function name - suggested by Johannes
> > > > >  - add only dirty/writeback pages to deactive pagevec
> > > > 
> > > > Why the extra check?
> > > > 
> > > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> > > > >  			if (lock_failed)
> > > > >  				continue;
> > > > >  
> > > > > -			ret += invalidate_inode_page(page);
> > > > > -
> > > > > +			ret = invalidate_inode_page(page);
> > > > > +			/*
> > > > > +			 * If the page is dirty or under writeback, we can not
> > > > > +			 * invalidate it now.  But we assume that attempted
> > > > > +			 * invalidation is a hint that the page is no longer
> > > > > +			 * of interest and try to speed up its reclaim.
> > > > > +			 */
> > > > > +			if (!ret && (PageDirty(page) || PageWriteback(page)))
> > > > > +				deactivate_page(page);
> > > > 
> > > > The writeback completion handler does not take the page lock, so you
> > > > can still miss pages that finish writeback before this test, no?
> > > 
> > > Yes. but I think it's rare and even though it happens, it's not critical.
> > > > 
> > > > Can you explain why you felt the need to add these checks?
> > > 
> > > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
> > > Look invalidate_complete_page. As easiest example, if the page has buffer and
> > > try_to_release_page can't release the buffer, it could return 0.
> > 
> > Ok, but somebody still tried to truncate the page, so why shouldn't we
> > try to reclaim it?  The reason for deactivating at this location is
> > that truncation is a strong hint for reclaim, not that it failed due
> > to dirty/writeback pages.
> > 
> > What's the problem with deactivating pages where try_to_release_page()
> > failed?
> 
> If try_to_release_page fails and the such pages stay long time in pagevec,
> pagevec drain often happens.

You mean because the pagevec becomes full more often?  These are not
many pages you get extra without the checks, the race window is very
small after all.

> I think such pages are rare so skip such pages doesn't hurt goal of
> this patch.

Well, you add extra checks, extra detail to this mechanism.  Instead
of just saying 'tried to truncate, failed, deactivate the page', you
add more ifs and buts.

There should be a real justification for it.  'It can not hurt' is not
a good justification for extra code and making a simple model more
complex.

'It will hurt without treating these pages differently' is a good
justification.  Remember that we have to understand and maintain all
this.  The less checks and operations we need to implement a certain
idea, the better.

Sorry for being so adamant about this, but I think these random checks
are a really sore point of mm code already.

[ For example, we tried discussing lumpy reclaim mode recently and
  none of us could reliably remember how it actually behaved.  There
  are so many special conditions in there that we already end up with
  some of them being dead code and the checks even contradicting each
  other. ]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-07 15:56             ` Johannes Weiner
@ 2010-12-07 22:51               ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07 22:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Wed, Dec 8, 2010 at 12:56 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Wed, Dec 08, 2010 at 12:26:25AM +0900, Minchan Kim wrote:
>> On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
>> > On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
>> > > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
>> > > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
>> > > > > Changelog since v3:
>> > > > >  - Change function comments - suggested by Johannes
>> > > > >  - Change function name - suggested by Johannes
>> > > > >  - add only dirty/writeback pages to deactive pagevec
>> > > >
>> > > > Why the extra check?
>> > > >
>> > > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>> > > > >                       if (lock_failed)
>> > > > >                               continue;
>> > > > >
>> > > > > -                     ret += invalidate_inode_page(page);
>> > > > > -
>> > > > > +                     ret = invalidate_inode_page(page);
>> > > > > +                     /*
>> > > > > +                      * If the page is dirty or under writeback, we can not
>> > > > > +                      * invalidate it now.  But we assume that attempted
>> > > > > +                      * invalidation is a hint that the page is no longer
>> > > > > +                      * of interest and try to speed up its reclaim.
>> > > > > +                      */
>> > > > > +                     if (!ret && (PageDirty(page) || PageWriteback(page)))
>> > > > > +                             deactivate_page(page);
>> > > >
>> > > > The writeback completion handler does not take the page lock, so you
>> > > > can still miss pages that finish writeback before this test, no?
>> > >
>> > > Yes. but I think it's rare and even though it happens, it's not critical.
>> > > >
>> > > > Can you explain why you felt the need to add these checks?
>> > >
>> > > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
>> > > Look invalidate_complete_page. As easiest example, if the page has buffer and
>> > > try_to_release_page can't release the buffer, it could return 0.
>> >
>> > Ok, but somebody still tried to truncate the page, so why shouldn't we
>> > try to reclaim it?  The reason for deactivating at this location is
>> > that truncation is a strong hint for reclaim, not that it failed due
>> > to dirty/writeback pages.
>> >
>> > What's the problem with deactivating pages where try_to_release_page()
>> > failed?
>>
>> If try_to_release_page fails and the such pages stay long time in pagevec,
>> pagevec drain often happens.
>
> You mean because the pagevec becomes full more often?  These are not
> many pages you get extra without the checks, the race window is very
> small after all.

Right.
It was a totally bad answer. The work in midnight makes my mind to be hurt. :)

Another point is that we can move such pages(!try_to_release_page,
someone else holding the ref) into tail of inactive.
We can't expect such pages will be freed sooner or later and it can
stir lru pages unnecessary.
On the other hand it's a _really_ rare so couldn't we move the pages into tail?
If it can be justified, I will remove the check.
What do you think about it?

>
>> I think such pages are rare so skip such pages doesn't hurt goal of
>> this patch.
>
> Well, you add extra checks, extra detail to this mechanism.  Instead
> of just saying 'tried to truncate, failed, deactivate the page', you
> add more ifs and buts.
>
> There should be a real justification for it.  'It can not hurt' is not
> a good justification for extra code and making a simple model more
> complex.
>
> 'It will hurt without treating these pages differently' is a good
> justification.  Remember that we have to understand and maintain all
> this.  The less checks and operations we need to implement a certain
> idea, the better.
>
> Sorry for being so adamant about this, but I think these random checks
> are a really sore point of mm code already.

Never mind. I totally support your opinion.
It always make me confusing to review the mm codes.
Nowadays, many reviewers want detail comment and description. Given
that mm code changing are bigger, I believe it's a way to go

>
> [ For example, we tried discussing lumpy reclaim mode recently and
>  none of us could reliably remember how it actually behaved.  There
>  are so many special conditions in there that we already end up with
>  some of them being dead code and the checks even contradicting each
>  other. ]
>

Thanks for good comment, Hannes.




-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-07 22:51               ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-07 22:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Peter Zijlstra, Wu Fengguang, Nick Piggin, Mel Gorman

On Wed, Dec 8, 2010 at 12:56 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Wed, Dec 08, 2010 at 12:26:25AM +0900, Minchan Kim wrote:
>> On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
>> > On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
>> > > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
>> > > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
>> > > > > Changelog since v3:
>> > > > >  - Change function comments - suggested by Johannes
>> > > > >  - Change function name - suggested by Johannes
>> > > > >  - add only dirty/writeback pages to deactive pagevec
>> > > >
>> > > > Why the extra check?
>> > > >
>> > > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>> > > > >                       if (lock_failed)
>> > > > >                               continue;
>> > > > >
>> > > > > -                     ret += invalidate_inode_page(page);
>> > > > > -
>> > > > > +                     ret = invalidate_inode_page(page);
>> > > > > +                     /*
>> > > > > +                      * If the page is dirty or under writeback, we can not
>> > > > > +                      * invalidate it now.  But we assume that attempted
>> > > > > +                      * invalidation is a hint that the page is no longer
>> > > > > +                      * of interest and try to speed up its reclaim.
>> > > > > +                      */
>> > > > > +                     if (!ret && (PageDirty(page) || PageWriteback(page)))
>> > > > > +                             deactivate_page(page);
>> > > >
>> > > > The writeback completion handler does not take the page lock, so you
>> > > > can still miss pages that finish writeback before this test, no?
>> > >
>> > > Yes. but I think it's rare and even though it happens, it's not critical.
>> > > >
>> > > > Can you explain why you felt the need to add these checks?
>> > >
>> > > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
>> > > Look invalidate_complete_page. As easiest example, if the page has buffer and
>> > > try_to_release_page can't release the buffer, it could return 0.
>> >
>> > Ok, but somebody still tried to truncate the page, so why shouldn't we
>> > try to reclaim it?  The reason for deactivating at this location is
>> > that truncation is a strong hint for reclaim, not that it failed due
>> > to dirty/writeback pages.
>> >
>> > What's the problem with deactivating pages where try_to_release_page()
>> > failed?
>>
>> If try_to_release_page fails and the such pages stay long time in pagevec,
>> pagevec drain often happens.
>
> You mean because the pagevec becomes full more often?  These are not
> many pages you get extra without the checks, the race window is very
> small after all.

Right.
It was a totally bad answer. The work in midnight makes my mind to be hurt. :)

Another point is that we can move such pages(!try_to_release_page,
someone else holding the ref) into tail of inactive.
We can't expect such pages will be freed sooner or later and it can
stir lru pages unnecessary.
On the other hand it's a _really_ rare so couldn't we move the pages into tail?
If it can be justified, I will remove the check.
What do you think about it?

>
>> I think such pages are rare so skip such pages doesn't hurt goal of
>> this patch.
>
> Well, you add extra checks, extra detail to this mechanism.  Instead
> of just saying 'tried to truncate, failed, deactivate the page', you
> add more ifs and buts.
>
> There should be a real justification for it.  'It can not hurt' is not
> a good justification for extra code and making a simple model more
> complex.
>
> 'It will hurt without treating these pages differently' is a good
> justification.  Remember that we have to understand and maintain all
> this.  The less checks and operations we need to implement a certain
> idea, the better.
>
> Sorry for being so adamant about this, but I think these random checks
> are a really sore point of mm code already.

Never mind. I totally support your opinion.
It always make me confusing to review the mm codes.
Nowadays, many reviewers want detail comment and description. Given
that mm code changing are bigger, I believe it's a way to go

>
> [ For example, we tried discussing lumpy reclaim mode recently and
>  none of us could reliably remember how it actually behaved.  There
>  are so many special conditions in there that we already end up with
>  some of them being dead code and the checks even contradicting each
>  other. ]
>

Thanks for good comment, Hannes.




-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-07 22:51               ` Minchan Kim
@ 2010-12-08  0:56                 ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 109+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-08  0:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, KOSAKI Motohiro,
	linux-mm, LKML, Peter Zijlstra, Wu Fengguang, Nick Piggin,
	Mel Gorman

On Wed, 8 Dec 2010 07:51:25 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Wed, Dec 8, 2010 at 12:56 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Wed, Dec 08, 2010 at 12:26:25AM +0900, Minchan Kim wrote:
> >> On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
> >> > On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
> >> > > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> >> > > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> >> > > > > Changelog since v3:
> >> > > > >  - Change function comments - suggested by Johannes
> >> > > > >  - Change function name - suggested by Johannes
> >> > > > >  - add only dirty/writeback pages to deactive pagevec
> >> > > >
> >> > > > Why the extra check?
> >> > > >
> >> > > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> >> > > > >                       if (lock_failed)
> >> > > > >                               continue;
> >> > > > >
> >> > > > > -                     ret += invalidate_inode_page(page);
> >> > > > > -
> >> > > > > +                     ret = invalidate_inode_page(page);
> >> > > > > +                     /*
> >> > > > > +                      * If the page is dirty or under writeback, we can not
> >> > > > > +                      * invalidate it now.  But we assume that attempted
> >> > > > > +                      * invalidation is a hint that the page is no longer
> >> > > > > +                      * of interest and try to speed up its reclaim.
> >> > > > > +                      */
> >> > > > > +                     if (!ret && (PageDirty(page) || PageWriteback(page)))
> >> > > > > +                             deactivate_page(page);
> >> > > >
> >> > > > The writeback completion handler does not take the page lock, so you
> >> > > > can still miss pages that finish writeback before this test, no?
> >> > >
> >> > > Yes. but I think it's rare and even though it happens, it's not critical.
> >> > > >
> >> > > > Can you explain why you felt the need to add these checks?
> >> > >
> >> > > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
> >> > > Look invalidate_complete_page. As easiest example, if the page has buffer and
> >> > > try_to_release_page can't release the buffer, it could return 0.
> >> >
> >> > Ok, but somebody still tried to truncate the page, so why shouldn't we
> >> > try to reclaim it?  The reason for deactivating at this location is
> >> > that truncation is a strong hint for reclaim, not that it failed due
> >> > to dirty/writeback pages.
> >> >
> >> > What's the problem with deactivating pages where try_to_release_page()
> >> > failed?
> >>
> >> If try_to_release_page fails and the such pages stay long time in pagevec,
> >> pagevec drain often happens.
> >
> > You mean because the pagevec becomes full more often?  These are not
> > many pages you get extra without the checks, the race window is very
> > small after all.
> 
> Right.
> It was a totally bad answer. The work in midnight makes my mind to be hurt. :)
> 
> Another point is that we can move such pages(!try_to_release_page,
> someone else holding the ref) into tail of inactive.
> We can't expect such pages will be freed sooner or later and it can
> stir lru pages unnecessary.
> On the other hand it's a _really_ rare so couldn't we move the pages into tail?
> If it can be justified, I will remove the check.
> What do you think about it?
> 

I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
extra LRU rotation.

Thanks,
-Kame



^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-08  0:56                 ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 109+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-08  0:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, KOSAKI Motohiro,
	linux-mm, LKML, Peter Zijlstra, Wu Fengguang, Nick Piggin,
	Mel Gorman

On Wed, 8 Dec 2010 07:51:25 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> On Wed, Dec 8, 2010 at 12:56 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> > On Wed, Dec 08, 2010 at 12:26:25AM +0900, Minchan Kim wrote:
> >> On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
> >> > On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
> >> > > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
> >> > > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
> >> > > > > Changelog since v3:
> >> > > > > A - Change function comments - suggested by Johannes
> >> > > > > A - Change function name - suggested by Johannes
> >> > > > > A - add only dirty/writeback pages to deactive pagevec
> >> > > >
> >> > > > Why the extra check?
> >> > > >
> >> > > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
> >> > > > > A  A  A  A  A  A  A  A  A  A  A  if (lock_failed)
> >> > > > > A  A  A  A  A  A  A  A  A  A  A  A  A  A  A  continue;
> >> > > > >
> >> > > > > - A  A  A  A  A  A  A  A  A  A  ret += invalidate_inode_page(page);
> >> > > > > -
> >> > > > > + A  A  A  A  A  A  A  A  A  A  ret = invalidate_inode_page(page);
> >> > > > > + A  A  A  A  A  A  A  A  A  A  /*
> >> > > > > + A  A  A  A  A  A  A  A  A  A  A * If the page is dirty or under writeback, we can not
> >> > > > > + A  A  A  A  A  A  A  A  A  A  A * invalidate it now. A But we assume that attempted
> >> > > > > + A  A  A  A  A  A  A  A  A  A  A * invalidation is a hint that the page is no longer
> >> > > > > + A  A  A  A  A  A  A  A  A  A  A * of interest and try to speed up its reclaim.
> >> > > > > + A  A  A  A  A  A  A  A  A  A  A */
> >> > > > > + A  A  A  A  A  A  A  A  A  A  if (!ret && (PageDirty(page) || PageWriteback(page)))
> >> > > > > + A  A  A  A  A  A  A  A  A  A  A  A  A  A  deactivate_page(page);
> >> > > >
> >> > > > The writeback completion handler does not take the page lock, so you
> >> > > > can still miss pages that finish writeback before this test, no?
> >> > >
> >> > > Yes. but I think it's rare and even though it happens, it's not critical.
> >> > > >
> >> > > > Can you explain why you felt the need to add these checks?
> >> > >
> >> > > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
> >> > > Look invalidate_complete_page. As easiest example, if the page has buffer and
> >> > > try_to_release_page can't release the buffer, it could return 0.
> >> >
> >> > Ok, but somebody still tried to truncate the page, so why shouldn't we
> >> > try to reclaim it? A The reason for deactivating at this location is
> >> > that truncation is a strong hint for reclaim, not that it failed due
> >> > to dirty/writeback pages.
> >> >
> >> > What's the problem with deactivating pages where try_to_release_page()
> >> > failed?
> >>
> >> If try_to_release_page fails and the such pages stay long time in pagevec,
> >> pagevec drain often happens.
> >
> > You mean because the pagevec becomes full more often? A These are not
> > many pages you get extra without the checks, the race window is very
> > small after all.
> 
> Right.
> It was a totally bad answer. The work in midnight makes my mind to be hurt. :)
> 
> Another point is that we can move such pages(!try_to_release_page,
> someone else holding the ref) into tail of inactive.
> We can't expect such pages will be freed sooner or later and it can
> stir lru pages unnecessary.
> On the other hand it's a _really_ rare so couldn't we move the pages into tail?
> If it can be justified, I will remove the check.
> What do you think about it?
> 

I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
extra LRU rotation.

Thanks,
-Kame


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-08  0:56                 ` KAMEZAWA Hiroyuki
@ 2010-12-08  1:43                   ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  1:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, KOSAKI Motohiro,
	linux-mm, LKML, Peter Zijlstra, Wu Fengguang, Nick Piggin,
	Mel Gorman

Hi Kame,

On Wed, Dec 8, 2010 at 9:56 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 8 Dec 2010 07:51:25 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Wed, Dec 8, 2010 at 12:56 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> > On Wed, Dec 08, 2010 at 12:26:25AM +0900, Minchan Kim wrote:
>> >> On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
>> >> > On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
>> >> > > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
>> >> > > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
>> >> > > > > Changelog since v3:
>> >> > > > >  - Change function comments - suggested by Johannes
>> >> > > > >  - Change function name - suggested by Johannes
>> >> > > > >  - add only dirty/writeback pages to deactive pagevec
>> >> > > >
>> >> > > > Why the extra check?
>> >> > > >
>> >> > > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>> >> > > > >                       if (lock_failed)
>> >> > > > >                               continue;
>> >> > > > >
>> >> > > > > -                     ret += invalidate_inode_page(page);
>> >> > > > > -
>> >> > > > > +                     ret = invalidate_inode_page(page);
>> >> > > > > +                     /*
>> >> > > > > +                      * If the page is dirty or under writeback, we can not
>> >> > > > > +                      * invalidate it now.  But we assume that attempted
>> >> > > > > +                      * invalidation is a hint that the page is no longer
>> >> > > > > +                      * of interest and try to speed up its reclaim.
>> >> > > > > +                      */
>> >> > > > > +                     if (!ret && (PageDirty(page) || PageWriteback(page)))
>> >> > > > > +                             deactivate_page(page);
>> >> > > >
>> >> > > > The writeback completion handler does not take the page lock, so you
>> >> > > > can still miss pages that finish writeback before this test, no?
>> >> > >
>> >> > > Yes. but I think it's rare and even though it happens, it's not critical.
>> >> > > >
>> >> > > > Can you explain why you felt the need to add these checks?
>> >> > >
>> >> > > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
>> >> > > Look invalidate_complete_page. As easiest example, if the page has buffer and
>> >> > > try_to_release_page can't release the buffer, it could return 0.
>> >> >
>> >> > Ok, but somebody still tried to truncate the page, so why shouldn't we
>> >> > try to reclaim it?  The reason for deactivating at this location is
>> >> > that truncation is a strong hint for reclaim, not that it failed due
>> >> > to dirty/writeback pages.
>> >> >
>> >> > What's the problem with deactivating pages where try_to_release_page()
>> >> > failed?
>> >>
>> >> If try_to_release_page fails and the such pages stay long time in pagevec,
>> >> pagevec drain often happens.
>> >
>> > You mean because the pagevec becomes full more often?  These are not
>> > many pages you get extra without the checks, the race window is very
>> > small after all.
>>
>> Right.
>> It was a totally bad answer. The work in midnight makes my mind to be hurt. :)
>>
>> Another point is that we can move such pages(!try_to_release_page,
>> someone else holding the ref) into tail of inactive.
>> We can't expect such pages will be freed sooner or later and it can
>> stir lru pages unnecessary.
>> On the other hand it's a _really_ rare so couldn't we move the pages into tail?
>> If it can be justified, I will remove the check.
>> What do you think about it?
>>
>
> I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
> extra LRU rotation.

It can make the code clean.
As far as I think, victim list does following as.

1. select victim pages by strong hint
2. move the page from LRU to victim
3. reclaimer always peeks victim list before diving into LRU list.
4-1. If the victim pages is used by others or dirty, it can be moved
into LRU, again or remain the page in victim list.
If the page is remained victim, when do we move it into LRU again if
the reclaimer continues to fail the page?
We have to put the new rule.
4-2. If the victim pages isn't used by others and clean, we can
reclaim the page asap.

AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
unevictable) for cleaning the minor codes.
In addition, we have to put the new rule so it would make the LRU code
complicated.
I remember how unevictable feature merge is hard.

But I am not against if we have more usecases. In this case, it's
valuable to implement it although it's not easy.

Thanks, Kame.

>
> Thanks,
> -Kame
>
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-08  1:43                   ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  1:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, KOSAKI Motohiro,
	linux-mm, LKML, Peter Zijlstra, Wu Fengguang, Nick Piggin,
	Mel Gorman

Hi Kame,

On Wed, Dec 8, 2010 at 9:56 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 8 Dec 2010 07:51:25 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> On Wed, Dec 8, 2010 at 12:56 AM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> > On Wed, Dec 08, 2010 at 12:26:25AM +0900, Minchan Kim wrote:
>> >> On Tue, Dec 07, 2010 at 04:19:39PM +0100, Johannes Weiner wrote:
>> >> > On Wed, Dec 08, 2010 at 12:07:10AM +0900, Minchan Kim wrote:
>> >> > > On Tue, Dec 07, 2010 at 03:49:24PM +0100, Johannes Weiner wrote:
>> >> > > > On Mon, Dec 06, 2010 at 02:29:10AM +0900, Minchan Kim wrote:
>> >> > > > > Changelog since v3:
>> >> > > > >  - Change function comments - suggested by Johannes
>> >> > > > >  - Change function name - suggested by Johannes
>> >> > > > >  - add only dirty/writeback pages to deactive pagevec
>> >> > > >
>> >> > > > Why the extra check?
>> >> > > >
>> >> > > > > @@ -359,8 +360,16 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
>> >> > > > >                       if (lock_failed)
>> >> > > > >                               continue;
>> >> > > > >
>> >> > > > > -                     ret += invalidate_inode_page(page);
>> >> > > > > -
>> >> > > > > +                     ret = invalidate_inode_page(page);
>> >> > > > > +                     /*
>> >> > > > > +                      * If the page is dirty or under writeback, we can not
>> >> > > > > +                      * invalidate it now.  But we assume that attempted
>> >> > > > > +                      * invalidation is a hint that the page is no longer
>> >> > > > > +                      * of interest and try to speed up its reclaim.
>> >> > > > > +                      */
>> >> > > > > +                     if (!ret && (PageDirty(page) || PageWriteback(page)))
>> >> > > > > +                             deactivate_page(page);
>> >> > > >
>> >> > > > The writeback completion handler does not take the page lock, so you
>> >> > > > can still miss pages that finish writeback before this test, no?
>> >> > >
>> >> > > Yes. but I think it's rare and even though it happens, it's not critical.
>> >> > > >
>> >> > > > Can you explain why you felt the need to add these checks?
>> >> > >
>> >> > > invalidate_inode_page can return 0 although the pages is !{dirty|writeback}.
>> >> > > Look invalidate_complete_page. As easiest example, if the page has buffer and
>> >> > > try_to_release_page can't release the buffer, it could return 0.
>> >> >
>> >> > Ok, but somebody still tried to truncate the page, so why shouldn't we
>> >> > try to reclaim it?  The reason for deactivating at this location is
>> >> > that truncation is a strong hint for reclaim, not that it failed due
>> >> > to dirty/writeback pages.
>> >> >
>> >> > What's the problem with deactivating pages where try_to_release_page()
>> >> > failed?
>> >>
>> >> If try_to_release_page fails and the such pages stay long time in pagevec,
>> >> pagevec drain often happens.
>> >
>> > You mean because the pagevec becomes full more often?  These are not
>> > many pages you get extra without the checks, the race window is very
>> > small after all.
>>
>> Right.
>> It was a totally bad answer. The work in midnight makes my mind to be hurt. :)
>>
>> Another point is that we can move such pages(!try_to_release_page,
>> someone else holding the ref) into tail of inactive.
>> We can't expect such pages will be freed sooner or later and it can
>> stir lru pages unnecessary.
>> On the other hand it's a _really_ rare so couldn't we move the pages into tail?
>> If it can be justified, I will remove the check.
>> What do you think about it?
>>
>
> I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
> extra LRU rotation.

It can make the code clean.
As far as I think, victim list does following as.

1. select victim pages by strong hint
2. move the page from LRU to victim
3. reclaimer always peeks victim list before diving into LRU list.
4-1. If the victim pages is used by others or dirty, it can be moved
into LRU, again or remain the page in victim list.
If the page is remained victim, when do we move it into LRU again if
the reclaimer continues to fail the page?
We have to put the new rule.
4-2. If the victim pages isn't used by others and clean, we can
reclaim the page asap.

AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
unevictable) for cleaning the minor codes.
In addition, we have to put the new rule so it would make the LRU code
complicated.
I remember how unevictable feature merge is hard.

But I am not against if we have more usecases. In this case, it's
valuable to implement it although it's not easy.

Thanks, Kame.

>
> Thanks,
> -Kame
>
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-08  1:43                   ` Minchan Kim
@ 2010-12-08  1:56                     ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 109+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-08  1:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, KOSAKI Motohiro,
	linux-mm, LKML, Peter Zijlstra, Wu Fengguang, Nick Piggin,
	Mel Gorman

On Wed, 8 Dec 2010 10:43:08 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Hi Kame,
> 
Hi,

> > I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
> > extra LRU rotation.
> 
> It can make the code clean.
> As far as I think, victim list does following as.
> 
> 1. select victim pages by strong hint
> 2. move the page from LRU to victim
> 3. reclaimer always peeks victim list before diving into LRU list.
> 4-1. If the victim pages is used by others or dirty, it can be moved
> into LRU, again or remain the page in victim list.
> If the page is remained victim, when do we move it into LRU again if
> the reclaimer continues to fail the page?
When sometone touches it.

> We have to put the new rule.
> 4-2. If the victim pages isn't used by others and clean, we can
> reclaim the page asap.
> 
> AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
> I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
> unevictable) for cleaning the minor codes.
> In addition, we have to put the new rule so it would make the LRU code
> complicated.
> I remember how unevictable feature merge is hard.
> 
yes, it was hard.

> But I am not against if we have more usecases. In this case, it's
> valuable to implement it although it's not easy.
> 

I wonder "victim list" can be used for something like Cleancache, when
we have very-low-latency backend devices.
And we may able to have page-cache-limit, which Balbir proposed as.

  - kvictimed? will move unmappedd page caches to victim list
This may work like a InactiveClean list which we had before and make
sizing easy.

Thanks,
-Kame




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-08  1:56                     ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 109+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-08  1:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, KOSAKI Motohiro,
	linux-mm, LKML, Peter Zijlstra, Wu Fengguang, Nick Piggin,
	Mel Gorman

On Wed, 8 Dec 2010 10:43:08 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Hi Kame,
> 
Hi,

> > I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
> > extra LRU rotation.
> 
> It can make the code clean.
> As far as I think, victim list does following as.
> 
> 1. select victim pages by strong hint
> 2. move the page from LRU to victim
> 3. reclaimer always peeks victim list before diving into LRU list.
> 4-1. If the victim pages is used by others or dirty, it can be moved
> into LRU, again or remain the page in victim list.
> If the page is remained victim, when do we move it into LRU again if
> the reclaimer continues to fail the page?
When sometone touches it.

> We have to put the new rule.
> 4-2. If the victim pages isn't used by others and clean, we can
> reclaim the page asap.
> 
> AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
> I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
> unevictable) for cleaning the minor codes.
> In addition, we have to put the new rule so it would make the LRU code
> complicated.
> I remember how unevictable feature merge is hard.
> 
yes, it was hard.

> But I am not against if we have more usecases. In this case, it's
> valuable to implement it although it's not easy.
> 

I wonder "victim list" can be used for something like Cleancache, when
we have very-low-latency backend devices.
And we may able to have page-cache-limit, which Balbir proposed as.

  - kvictimed? will move unmappedd page caches to victim list
This may work like a InactiveClean list which we had before and make
sizing easy.

Thanks,
-Kame



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-08  1:56                     ` KAMEZAWA Hiroyuki
@ 2010-12-08  2:15                       ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  2:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, KOSAKI Motohiro,
	linux-mm, LKML, Peter Zijlstra, Wu Fengguang, Nick Piggin,
	Mel Gorman, Balbir Singh

On Wed, Dec 8, 2010 at 10:56 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 8 Dec 2010 10:43:08 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi Kame,
>>
> Hi,
>
>> > I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
>> > extra LRU rotation.
>>
>> It can make the code clean.
>> As far as I think, victim list does following as.
>>
>> 1. select victim pages by strong hint
>> 2. move the page from LRU to victim
>> 3. reclaimer always peeks victim list before diving into LRU list.
>> 4-1. If the victim pages is used by others or dirty, it can be moved
>> into LRU, again or remain the page in victim list.
>> If the page is remained victim, when do we move it into LRU again if
>> the reclaimer continues to fail the page?
> When sometone touches it.
>
>> We have to put the new rule.
>> 4-2. If the victim pages isn't used by others and clean, we can
>> reclaim the page asap.
>>
>> AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
>> I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
>> unevictable) for cleaning the minor codes.
>> In addition, we have to put the new rule so it would make the LRU code
>> complicated.
>> I remember how unevictable feature merge is hard.
>>
> yes, it was hard.
>
>> But I am not against if we have more usecases. In this case, it's
>> valuable to implement it although it's not easy.
>>
>
> I wonder "victim list" can be used for something like Cleancache, when
> we have very-low-latency backend devices.
> And we may able to have page-cache-limit, which Balbir proposed as.

Yes, I thought that, too. I think it would be a good feature in embedded system.

>
>  - kvictimed? will move unmappedd page caches to victim list
> This may work like a InactiveClean list which we had before and make
> sizing easy.
>

Before further discuss, we need customer's confirm.
We know very well it is very hard to merge if anyone doesn't use.

Balbir, What do think about it?


> Thanks,
> -Kame
>
>
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-08  2:15                       ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  2:15 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Johannes Weiner, Andrew Morton, Rik van Riel, KOSAKI Motohiro,
	linux-mm, LKML, Peter Zijlstra, Wu Fengguang, Nick Piggin,
	Mel Gorman, Balbir Singh

On Wed, Dec 8, 2010 at 10:56 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Wed, 8 Dec 2010 10:43:08 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Hi Kame,
>>
> Hi,
>
>> > I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
>> > extra LRU rotation.
>>
>> It can make the code clean.
>> As far as I think, victim list does following as.
>>
>> 1. select victim pages by strong hint
>> 2. move the page from LRU to victim
>> 3. reclaimer always peeks victim list before diving into LRU list.
>> 4-1. If the victim pages is used by others or dirty, it can be moved
>> into LRU, again or remain the page in victim list.
>> If the page is remained victim, when do we move it into LRU again if
>> the reclaimer continues to fail the page?
> When sometone touches it.
>
>> We have to put the new rule.
>> 4-2. If the victim pages isn't used by others and clean, we can
>> reclaim the page asap.
>>
>> AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
>> I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
>> unevictable) for cleaning the minor codes.
>> In addition, we have to put the new rule so it would make the LRU code
>> complicated.
>> I remember how unevictable feature merge is hard.
>>
> yes, it was hard.
>
>> But I am not against if we have more usecases. In this case, it's
>> valuable to implement it although it's not easy.
>>
>
> I wonder "victim list" can be used for something like Cleancache, when
> we have very-low-latency backend devices.
> And we may able to have page-cache-limit, which Balbir proposed as.

Yes, I thought that, too. I think it would be a good feature in embedded system.

>
>  - kvictimed? will move unmappedd page caches to victim list
> This may work like a InactiveClean list which we had before and make
> sizing easy.
>

Before further discuss, we need customer's confirm.
We know very well it is very hard to merge if anyone doesn't use.

Balbir, What do think about it?


> Thanks,
> -Kame
>
>
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-08  2:15                       ` Minchan Kim
@ 2010-12-08  6:56                         ` Balbir Singh
  -1 siblings, 0 replies; 109+ messages in thread
From: Balbir Singh @ 2010-12-08  6:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	KOSAKI Motohiro, linux-mm, LKML, Peter Zijlstra, Wu Fengguang,
	Nick Piggin, Mel Gorman

* MinChan Kim <minchan.kim@gmail.com> [2010-12-08 11:15:19]:

> On Wed, Dec 8, 2010 at 10:56 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 8 Dec 2010 10:43:08 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> Hi Kame,
> >>
> > Hi,
> >
> >> > I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
> >> > extra LRU rotation.
> >>
> >> It can make the code clean.
> >> As far as I think, victim list does following as.
> >>
> >> 1. select victim pages by strong hint
> >> 2. move the page from LRU to victim
> >> 3. reclaimer always peeks victim list before diving into LRU list.
> >> 4-1. If the victim pages is used by others or dirty, it can be moved
> >> into LRU, again or remain the page in victim list.
> >> If the page is remained victim, when do we move it into LRU again if
> >> the reclaimer continues to fail the page?
> > When sometone touches it.
> >
> >> We have to put the new rule.
> >> 4-2. If the victim pages isn't used by others and clean, we can
> >> reclaim the page asap.
> >>
> >> AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
> >> I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
> >> unevictable) for cleaning the minor codes.
> >> In addition, we have to put the new rule so it would make the LRU code
> >> complicated.
> >> I remember how unevictable feature merge is hard.
> >>
> > yes, it was hard.
> >
> >> But I am not against if we have more usecases. In this case, it's
> >> valuable to implement it although it's not easy.
> >>
> >
> > I wonder "victim list" can be used for something like Cleancache, when
> > we have very-low-latency backend devices.
> > And we may able to have page-cache-limit, which Balbir proposed as.
> 
> Yes, I thought that, too. I think it would be a good feature in embedded system.
> 
> >
> >  - kvictimed? will move unmappedd page caches to victim list
> > This may work like a InactiveClean list which we had before and make
> > sizing easy.
> >
> 
> Before further discuss, we need customer's confirm.
> We know very well it is very hard to merge if anyone doesn't use.
> 
> Balbir, What do think about it?
> 

The idea seems interesting, I am in the process of refreshing my
patches for unmapped page cache control. I presume the process of
filling the victim list will be similar to what I have or unmapped
page cache isolation.

> 
> > Thanks,
> > -Kame
> >
> >
> >
> >
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim

-- 
	Three Cheers,
	Balbir

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-08  6:56                         ` Balbir Singh
  0 siblings, 0 replies; 109+ messages in thread
From: Balbir Singh @ 2010-12-08  6:56 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	KOSAKI Motohiro, linux-mm, LKML, Peter Zijlstra, Wu Fengguang,
	Nick Piggin, Mel Gorman

* MinChan Kim <minchan.kim@gmail.com> [2010-12-08 11:15:19]:

> On Wed, Dec 8, 2010 at 10:56 AM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > On Wed, 8 Dec 2010 10:43:08 +0900
> > Minchan Kim <minchan.kim@gmail.com> wrote:
> >
> >> Hi Kame,
> >>
> > Hi,
> >
> >> > I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
> >> > extra LRU rotation.
> >>
> >> It can make the code clean.
> >> As far as I think, victim list does following as.
> >>
> >> 1. select victim pages by strong hint
> >> 2. move the page from LRU to victim
> >> 3. reclaimer always peeks victim list before diving into LRU list.
> >> 4-1. If the victim pages is used by others or dirty, it can be moved
> >> into LRU, again or remain the page in victim list.
> >> If the page is remained victim, when do we move it into LRU again if
> >> the reclaimer continues to fail the page?
> > When sometone touches it.
> >
> >> We have to put the new rule.
> >> 4-2. If the victim pages isn't used by others and clean, we can
> >> reclaim the page asap.
> >>
> >> AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
> >> I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
> >> unevictable) for cleaning the minor codes.
> >> In addition, we have to put the new rule so it would make the LRU code
> >> complicated.
> >> I remember how unevictable feature merge is hard.
> >>
> > yes, it was hard.
> >
> >> But I am not against if we have more usecases. In this case, it's
> >> valuable to implement it although it's not easy.
> >>
> >
> > I wonder "victim list" can be used for something like Cleancache, when
> > we have very-low-latency backend devices.
> > And we may able to have page-cache-limit, which Balbir proposed as.
> 
> Yes, I thought that, too. I think it would be a good feature in embedded system.
> 
> >
> >  - kvictimed? will move unmappedd page caches to victim list
> > This may work like a InactiveClean list which we had before and make
> > sizing easy.
> >
> 
> Before further discuss, we need customer's confirm.
> We know very well it is very hard to merge if anyone doesn't use.
> 
> Balbir, What do think about it?
> 

The idea seems interesting, I am in the process of refreshing my
patches for unmapped page cache control. I presume the process of
filling the victim list will be similar to what I have or unmapped
page cache isolation.

> 
> > Thanks,
> > -Kame
> >
> >
> >
> >
> 
> 
> 
> -- 
> Kind regards,
> Minchan Kim

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
  2010-12-07  5:44     ` Minchan Kim
@ 2010-12-08  7:26         ` Hugh Dickins
  0 siblings, 0 replies; 109+ messages in thread
From: Hugh Dickins @ 2010-12-08  7:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Tue, 7 Dec 2010, Minchan Kim wrote:
> 
> How about this? Although it doesn't remove null dependency, it meet my
> goal without big overhead.
> It's just quick patch.

Roughly, yes; by "just quick patch" I take you to mean that I should
not waste time on all the minor carelessnesses scattered through it.

> If you agree, I will resend this version as formal patch.
> (If you suffered from seeing below word-wrapped source, see the
> attachment. I asked to google two time to support text-plain mode in
> gmail web but I can't receive any response until now. ;(. Lots of
> kernel developer in google. Please support this mode for us who can't
> use SMTP although it's a very small VOC)

Tiresome.  Seems not to be high on gmail's priorities.
It's sad to see even Linus attaching patches these days.

> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e097df6..14ae918 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -771,6 +771,7 @@ struct zap_details {
>         pgoff_t last_index;                     /* Highest page->index
> to unmap */
>         spinlock_t *i_mmap_lock;                /* For unmap_mapping_range: */
>         unsigned long truncate_count;           /* Compare vm_truncate_count */
> +       int ignore_reference;
>  };
> 
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 319528b..fdb0253 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -162,18 +162,22 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>                              struct vm_area_struct ** prev,
>                              unsigned long start, unsigned long end)
>  {
> +       struct zap_details details ;
> +
>         *prev = vma;
>         if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>                 return -EINVAL;
> 
>         if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
> -               struct zap_details details = {
> -                       .nonlinear_vma = vma,
> -                       .last_index = ULONG_MAX,
> -               };
> -               zap_page_range(vma, start, end - start, &details);
> -       } else
> -               zap_page_range(vma, start, end - start, NULL);
> +               details.nonlinear_vma = vma;
> +               details.last_index = ULONG_MAX;
> +       } else {
> +               details.nonlinear_vma = NULL;
> +               details.last_index = NULL;
> +       }
> +
> +       details.ignore_references = true;
> +       zap_page_range(vma, start, end - start, &details);
>         return 0;
>  }
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index ebfeedf..d46ac42 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -897,9 +897,15 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>         pte_t *pte;
>         spinlock_t *ptl;
>         int rss[NR_MM_COUNTERS];
> -
> +       bool ignore_reference = false;
>         init_rss_vec(rss);
> 
> +       if (details && ((!details->check_mapping && !details->nonlinear_vma)
> +                                        || !details->ignore_reference))
> +               details = NULL;
> +

	bool mark_accessed = true;

	if (VM_SequentialReadHint(vma) ||
	    (details && details->ignore_reference))
		mark_accessed = false;
	if (details && !details->check_mapping && !details->nonlinear_vma)
		details = NULL;
     

>         pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
>         arch_enter_lazy_mmu_mode();
>         do {
> @@ -949,7 +955,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>                                 if (pte_dirty(ptent))
>                                         set_page_dirty(page);
>                                 if (pte_young(ptent) &&
> -                                   likely(!VM_SequentialReadHint(vma)))
> +                                   likely(!VM_SequentialReadHint(vma)) &&
> +                                   likely(!ignore_reference))
>                                         mark_page_accessed(page);

				if (pte_young(ptent) && mark_accessed)
					mark_page_accessed(page);


>                                 rss[MM_FILEPAGES]--;
>                         }
> @@ -1038,8 +1045,6 @@ static unsigned long unmap_page_range(struct
> mmu_gather *tlb,
>         pgd_t *pgd;
>         unsigned long next;
> 
> -       if (details && !details->check_mapping && !details->nonlinear_vma)
> -               details = NULL;
> 
>         BUG_ON(addr >= end);
>         mem_cgroup_uncharge_start();
> @@ -1102,7 +1107,8 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>         unsigned long tlb_start = 0;    /* For tlb_finish_mmu */
>         int tlb_start_valid = 0;
>         unsigned long start = start_addr;
> -       spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
> +       spinlock_t *i_mmap_lock = details ?
> +               (detais->check_mapping ? details->i_mmap_lock: NULL) : NULL;

Why that change?

Hugh

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
@ 2010-12-08  7:26         ` Hugh Dickins
  0 siblings, 0 replies; 109+ messages in thread
From: Hugh Dickins @ 2010-12-08  7:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Tue, 7 Dec 2010, Minchan Kim wrote:
> 
> How about this? Although it doesn't remove null dependency, it meet my
> goal without big overhead.
> It's just quick patch.

Roughly, yes; by "just quick patch" I take you to mean that I should
not waste time on all the minor carelessnesses scattered through it.

> If you agree, I will resend this version as formal patch.
> (If you suffered from seeing below word-wrapped source, see the
> attachment. I asked to google two time to support text-plain mode in
> gmail web but I can't receive any response until now. ;(. Lots of
> kernel developer in google. Please support this mode for us who can't
> use SMTP although it's a very small VOC)

Tiresome.  Seems not to be high on gmail's priorities.
It's sad to see even Linus attaching patches these days.

> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e097df6..14ae918 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -771,6 +771,7 @@ struct zap_details {
>         pgoff_t last_index;                     /* Highest page->index
> to unmap */
>         spinlock_t *i_mmap_lock;                /* For unmap_mapping_range: */
>         unsigned long truncate_count;           /* Compare vm_truncate_count */
> +       int ignore_reference;
>  };
> 
>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 319528b..fdb0253 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -162,18 +162,22 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>                              struct vm_area_struct ** prev,
>                              unsigned long start, unsigned long end)
>  {
> +       struct zap_details details ;
> +
>         *prev = vma;
>         if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>                 return -EINVAL;
> 
>         if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
> -               struct zap_details details = {
> -                       .nonlinear_vma = vma,
> -                       .last_index = ULONG_MAX,
> -               };
> -               zap_page_range(vma, start, end - start, &details);
> -       } else
> -               zap_page_range(vma, start, end - start, NULL);
> +               details.nonlinear_vma = vma;
> +               details.last_index = ULONG_MAX;
> +       } else {
> +               details.nonlinear_vma = NULL;
> +               details.last_index = NULL;
> +       }
> +
> +       details.ignore_references = true;
> +       zap_page_range(vma, start, end - start, &details);
>         return 0;
>  }
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index ebfeedf..d46ac42 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -897,9 +897,15 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>         pte_t *pte;
>         spinlock_t *ptl;
>         int rss[NR_MM_COUNTERS];
> -
> +       bool ignore_reference = false;
>         init_rss_vec(rss);
> 
> +       if (details && ((!details->check_mapping && !details->nonlinear_vma)
> +                                        || !details->ignore_reference))
> +               details = NULL;
> +

	bool mark_accessed = true;

	if (VM_SequentialReadHint(vma) ||
	    (details && details->ignore_reference))
		mark_accessed = false;
	if (details && !details->check_mapping && !details->nonlinear_vma)
		details = NULL;
     

>         pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
>         arch_enter_lazy_mmu_mode();
>         do {
> @@ -949,7 +955,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>                                 if (pte_dirty(ptent))
>                                         set_page_dirty(page);
>                                 if (pte_young(ptent) &&
> -                                   likely(!VM_SequentialReadHint(vma)))
> +                                   likely(!VM_SequentialReadHint(vma)) &&
> +                                   likely(!ignore_reference))
>                                         mark_page_accessed(page);

				if (pte_young(ptent) && mark_accessed)
					mark_page_accessed(page);


>                                 rss[MM_FILEPAGES]--;
>                         }
> @@ -1038,8 +1045,6 @@ static unsigned long unmap_page_range(struct
> mmu_gather *tlb,
>         pgd_t *pgd;
>         unsigned long next;
> 
> -       if (details && !details->check_mapping && !details->nonlinear_vma)
> -               details = NULL;
> 
>         BUG_ON(addr >= end);
>         mem_cgroup_uncharge_start();
> @@ -1102,7 +1107,8 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>         unsigned long tlb_start = 0;    /* For tlb_finish_mmu */
>         int tlb_start_valid = 0;
>         unsigned long start = start_addr;
> -       spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
> +       spinlock_t *i_mmap_lock = details ?
> +               (detais->check_mapping ? details->i_mmap_lock: NULL) : NULL;

Why that change?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
  2010-12-08  7:26         ` Hugh Dickins
@ 2010-12-08  7:55           ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  7:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Wed, Dec 8, 2010 at 4:26 PM, Hugh Dickins <hughd@google.com> wrote:
> On Tue, 7 Dec 2010, Minchan Kim wrote:
>>
>> How about this? Although it doesn't remove null dependency, it meet my
>> goal without big overhead.
>> It's just quick patch.
>
> Roughly, yes; by "just quick patch" I take you to mean that I should
> not waste time on all the minor carelessnesses scattered through it.
>
>> If you agree, I will resend this version as formal patch.
>> (If you suffered from seeing below word-wrapped source, see the
>> attachment. I asked to google two time to support text-plain mode in
>> gmail web but I can't receive any response until now. ;(. Lots of
>> kernel developer in google. Please support this mode for us who can't
>> use SMTP although it's a very small VOC)
>
> Tiresome.  Seems not to be high on gmail's priorities.
> It's sad to see even Linus attaching patches these days.

That encourages me(But I don't mean I will use attachment again. :)).

>
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index e097df6..14ae918 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -771,6 +771,7 @@ struct zap_details {
>>         pgoff_t last_index;                     /* Highest page->index
>> to unmap */
>>         spinlock_t *i_mmap_lock;                /* For unmap_mapping_range: */
>>         unsigned long truncate_count;           /* Compare vm_truncate_count */
>> +       int ignore_reference;
>>  };
>>
>>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 319528b..fdb0253 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -162,18 +162,22 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>>                              struct vm_area_struct ** prev,
>>                              unsigned long start, unsigned long end)
>>  {
>> +       struct zap_details details ;
>> +
>>         *prev = vma;
>>         if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>>                 return -EINVAL;
>>
>>         if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
>> -               struct zap_details details = {
>> -                       .nonlinear_vma = vma,
>> -                       .last_index = ULONG_MAX,
>> -               };
>> -               zap_page_range(vma, start, end - start, &details);
>> -       } else
>> -               zap_page_range(vma, start, end - start, NULL);
>> +               details.nonlinear_vma = vma;
>> +               details.last_index = ULONG_MAX;
>> +       } else {
>> +               details.nonlinear_vma = NULL;
>> +               details.last_index = NULL;
>> +       }
>> +
>> +       details.ignore_references = true;
>> +       zap_page_range(vma, start, end - start, &details);
>>         return 0;
>>  }
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index ebfeedf..d46ac42 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -897,9 +897,15 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>         pte_t *pte;
>>         spinlock_t *ptl;
>>         int rss[NR_MM_COUNTERS];
>> -
>> +       bool ignore_reference = false;
>>         init_rss_vec(rss);
>>
>> +       if (details && ((!details->check_mapping && !details->nonlinear_vma)
>> +                                        || !details->ignore_reference))
>> +               details = NULL;
>> +
>
>        bool mark_accessed = true;
>
>        if (VM_SequentialReadHint(vma) ||
>            (details && details->ignore_reference))
>                mark_accessed = false;
>        if (details && !details->check_mapping && !details->nonlinear_vma)
>                details = NULL;
>
>
>>         pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
>>         arch_enter_lazy_mmu_mode();
>>         do {
>> @@ -949,7 +955,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>                                 if (pte_dirty(ptent))
>>                                         set_page_dirty(page);
>>                                 if (pte_young(ptent) &&
>> -                                   likely(!VM_SequentialReadHint(vma)))
>> +                                   likely(!VM_SequentialReadHint(vma)) &&
>> +                                   likely(!ignore_reference))
>>                                         mark_page_accessed(page);
>
>                                if (pte_young(ptent) && mark_accessed)
>                                        mark_page_accessed(page);
>
>

Much clean.

>>                                 rss[MM_FILEPAGES]--;
>>                         }
>> @@ -1038,8 +1045,6 @@ static unsigned long unmap_page_range(struct
>> mmu_gather *tlb,
>>         pgd_t *pgd;
>>         unsigned long next;
>>
>> -       if (details && !details->check_mapping && !details->nonlinear_vma)
>> -               details = NULL;
>>
>>         BUG_ON(addr >= end);
>>         mem_cgroup_uncharge_start();
>> @@ -1102,7 +1107,8 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>>         unsigned long tlb_start = 0;    /* For tlb_finish_mmu */
>>         int tlb_start_valid = 0;
>>         unsigned long start = start_addr;
>> -       spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
>> +       spinlock_t *i_mmap_lock = details ?
>> +               (detais->check_mapping ? details->i_mmap_lock: NULL) : NULL;
>
> Why that change?

It has done very careless. Sorry for that. I thought i_mmap_lock
always is used with check_mapping. Clear wrong!
My concern is that if we don't have such routine, caller use only
ingore_reference should initialize i_mmap_lock with NULL.
It's bad.

Hmm...

>
> Hugh
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 7/7] Prevent activation of page in madvise_dontneed
@ 2010-12-08  7:55           ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  7:55 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrew Morton, Rik van Riel, KOSAKI Motohiro, linux-mm, LKML,
	Johannes Weiner, Nick Piggin, Mel Gorman, Wu Fengguang

On Wed, Dec 8, 2010 at 4:26 PM, Hugh Dickins <hughd@google.com> wrote:
> On Tue, 7 Dec 2010, Minchan Kim wrote:
>>
>> How about this? Although it doesn't remove null dependency, it meet my
>> goal without big overhead.
>> It's just quick patch.
>
> Roughly, yes; by "just quick patch" I take you to mean that I should
> not waste time on all the minor carelessnesses scattered through it.
>
>> If you agree, I will resend this version as formal patch.
>> (If you suffered from seeing below word-wrapped source, see the
>> attachment. I asked to google two time to support text-plain mode in
>> gmail web but I can't receive any response until now. ;(. Lots of
>> kernel developer in google. Please support this mode for us who can't
>> use SMTP although it's a very small VOC)
>
> Tiresome.  Seems not to be high on gmail's priorities.
> It's sad to see even Linus attaching patches these days.

That encourages me(But I don't mean I will use attachment again. :)).

>
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index e097df6..14ae918 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -771,6 +771,7 @@ struct zap_details {
>>         pgoff_t last_index;                     /* Highest page->index
>> to unmap */
>>         spinlock_t *i_mmap_lock;                /* For unmap_mapping_range: */
>>         unsigned long truncate_count;           /* Compare vm_truncate_count */
>> +       int ignore_reference;
>>  };
>>
>>  struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr,
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 319528b..fdb0253 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -162,18 +162,22 @@ static long madvise_dontneed(struct vm_area_struct * vma,
>>                              struct vm_area_struct ** prev,
>>                              unsigned long start, unsigned long end)
>>  {
>> +       struct zap_details details ;
>> +
>>         *prev = vma;
>>         if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>>                 return -EINVAL;
>>
>>         if (unlikely(vma->vm_flags & VM_NONLINEAR)) {
>> -               struct zap_details details = {
>> -                       .nonlinear_vma = vma,
>> -                       .last_index = ULONG_MAX,
>> -               };
>> -               zap_page_range(vma, start, end - start, &details);
>> -       } else
>> -               zap_page_range(vma, start, end - start, NULL);
>> +               details.nonlinear_vma = vma;
>> +               details.last_index = ULONG_MAX;
>> +       } else {
>> +               details.nonlinear_vma = NULL;
>> +               details.last_index = NULL;
>> +       }
>> +
>> +       details.ignore_references = true;
>> +       zap_page_range(vma, start, end - start, &details);
>>         return 0;
>>  }
>>
>> diff --git a/mm/memory.c b/mm/memory.c
>> index ebfeedf..d46ac42 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -897,9 +897,15 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>         pte_t *pte;
>>         spinlock_t *ptl;
>>         int rss[NR_MM_COUNTERS];
>> -
>> +       bool ignore_reference = false;
>>         init_rss_vec(rss);
>>
>> +       if (details && ((!details->check_mapping && !details->nonlinear_vma)
>> +                                        || !details->ignore_reference))
>> +               details = NULL;
>> +
>
>        bool mark_accessed = true;
>
>        if (VM_SequentialReadHint(vma) ||
>            (details && details->ignore_reference))
>                mark_accessed = false;
>        if (details && !details->check_mapping && !details->nonlinear_vma)
>                details = NULL;
>
>
>>         pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
>>         arch_enter_lazy_mmu_mode();
>>         do {
>> @@ -949,7 +955,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>>                                 if (pte_dirty(ptent))
>>                                         set_page_dirty(page);
>>                                 if (pte_young(ptent) &&
>> -                                   likely(!VM_SequentialReadHint(vma)))
>> +                                   likely(!VM_SequentialReadHint(vma)) &&
>> +                                   likely(!ignore_reference))
>>                                         mark_page_accessed(page);
>
>                                if (pte_young(ptent) && mark_accessed)
>                                        mark_page_accessed(page);
>
>

Much clean.

>>                                 rss[MM_FILEPAGES]--;
>>                         }
>> @@ -1038,8 +1045,6 @@ static unsigned long unmap_page_range(struct
>> mmu_gather *tlb,
>>         pgd_t *pgd;
>>         unsigned long next;
>>
>> -       if (details && !details->check_mapping && !details->nonlinear_vma)
>> -               details = NULL;
>>
>>         BUG_ON(addr >= end);
>>         mem_cgroup_uncharge_start();
>> @@ -1102,7 +1107,8 @@ unsigned long unmap_vmas(struct mmu_gather **tlbp,
>>         unsigned long tlb_start = 0;    /* For tlb_finish_mmu */
>>         int tlb_start_valid = 0;
>>         unsigned long start = start_addr;
>> -       spinlock_t *i_mmap_lock = details? details->i_mmap_lock: NULL;
>> +       spinlock_t *i_mmap_lock = details ?
>> +               (detais->check_mapping ? details->i_mmap_lock: NULL) : NULL;
>
> Why that change?

It has done very careless. Sorry for that. I thought i_mmap_lock
always is used with check_mapping. Clear wrong!
My concern is that if we don't have such routine, caller use only
ingore_reference should initialize i_mmap_lock with NULL.
It's bad.

Hmm...

>
> Hugh
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 5/7] add profile information for invalidated page reclaim
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-08  8:02     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 109+ messages in thread
From: KOSAKI Motohiro @ 2010-12-08  8:02 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin, Mel Gorman

> This patch adds profile information about invalidated page reclaim.
> It's just for profiling for test so it would be discard when the series
> are merged.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> ---
>  include/linux/vmstat.h |    4 ++--
>  mm/swap.c              |    3 +++
>  mm/vmstat.c            |    3 +++
>  3 files changed, 8 insertions(+), 2 deletions(-)

Today, we have tracepoint. tracepoint has no overhead if it's unused.
but vmstat has a overhead even if unused.

Then, all new vmstat proposal should be described why you think it is
frequently used from administrators.




> 
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 833e676..c38ad95 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -30,8 +30,8 @@
>  
>  enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  		FOR_ALL_ZONES(PGALLOC),
> -		PGFREE, PGACTIVATE, PGDEACTIVATE,
> -		PGFAULT, PGMAJFAULT,
> +		PGFREE, PGACTIVATE, PGDEACTIVATE, PGINVALIDATE,
> +		PGRECLAIM, PGFAULT, PGMAJFAULT,
>  		FOR_ALL_ZONES(PGREFILL),
>  		FOR_ALL_ZONES(PGSTEAL),
>  		FOR_ALL_ZONES(PGSCAN_KSWAPD),
> diff --git a/mm/swap.c b/mm/swap.c
> index 0f23998..2f21e6e 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -317,6 +317,7 @@ static void lru_deactivate(struct page *page, struct zone *zone)
>  		 * is _really_ small and  it's non-critical problem.
>  		 */
>  		SetPageReclaim(page);
> +		__count_vm_event(PGRECLAIM);

Um. No.
This is not reclaim operation anyway. Userland folks shouldn't know
you override PG_reclaim. It's implementaion internal information.



>  	} else {
>  		/*
>  		 * The page's writeback ends up during pagevec
> @@ -328,6 +329,8 @@ static void lru_deactivate(struct page *page, struct zone *zone)
>  
>  	if (active)
>  		__count_vm_event(PGDEACTIVATE);
> +
> +	__count_vm_event(PGINVALIDATE);
>  	update_page_reclaim_stat(zone, page, file, 0);

I have similar complains as above.
If you use PGINVALIDATE, other invalidate pass should update this counter too.


>  }
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 3555636..ef6102d 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -818,6 +818,9 @@ static const char * const vmstat_text[] = {
>  	"pgactivate",
>  	"pgdeactivate",
>  
> +	"pginvalidate",
> +	"pgreclaim",
> +
>  	"pgfault",
>  	"pgmajfault",
>  
> -- 
> 1.7.0.4
> 




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 5/7] add profile information for invalidated page reclaim
@ 2010-12-08  8:02     ` KOSAKI Motohiro
  0 siblings, 0 replies; 109+ messages in thread
From: KOSAKI Motohiro @ 2010-12-08  8:02 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin, Mel Gorman

> This patch adds profile information about invalidated page reclaim.
> It's just for profiling for test so it would be discard when the series
> are merged.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>
> Cc: Mel Gorman <mel@csn.ul.ie>
> ---
>  include/linux/vmstat.h |    4 ++--
>  mm/swap.c              |    3 +++
>  mm/vmstat.c            |    3 +++
>  3 files changed, 8 insertions(+), 2 deletions(-)

Today, we have tracepoint. tracepoint has no overhead if it's unused.
but vmstat has a overhead even if unused.

Then, all new vmstat proposal should be described why you think it is
frequently used from administrators.




> 
> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> index 833e676..c38ad95 100644
> --- a/include/linux/vmstat.h
> +++ b/include/linux/vmstat.h
> @@ -30,8 +30,8 @@
>  
>  enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>  		FOR_ALL_ZONES(PGALLOC),
> -		PGFREE, PGACTIVATE, PGDEACTIVATE,
> -		PGFAULT, PGMAJFAULT,
> +		PGFREE, PGACTIVATE, PGDEACTIVATE, PGINVALIDATE,
> +		PGRECLAIM, PGFAULT, PGMAJFAULT,
>  		FOR_ALL_ZONES(PGREFILL),
>  		FOR_ALL_ZONES(PGSTEAL),
>  		FOR_ALL_ZONES(PGSCAN_KSWAPD),
> diff --git a/mm/swap.c b/mm/swap.c
> index 0f23998..2f21e6e 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -317,6 +317,7 @@ static void lru_deactivate(struct page *page, struct zone *zone)
>  		 * is _really_ small and  it's non-critical problem.
>  		 */
>  		SetPageReclaim(page);
> +		__count_vm_event(PGRECLAIM);

Um. No.
This is not reclaim operation anyway. Userland folks shouldn't know
you override PG_reclaim. It's implementaion internal information.



>  	} else {
>  		/*
>  		 * The page's writeback ends up during pagevec
> @@ -328,6 +329,8 @@ static void lru_deactivate(struct page *page, struct zone *zone)
>  
>  	if (active)
>  		__count_vm_event(PGDEACTIVATE);
> +
> +	__count_vm_event(PGINVALIDATE);
>  	update_page_reclaim_stat(zone, page, file, 0);

I have similar complains as above.
If you use PGINVALIDATE, other invalidate pass should update this counter too.


>  }
>  
> diff --git a/mm/vmstat.c b/mm/vmstat.c
> index 3555636..ef6102d 100644
> --- a/mm/vmstat.c
> +++ b/mm/vmstat.c
> @@ -818,6 +818,9 @@ static const char * const vmstat_text[] = {
>  	"pgactivate",
>  	"pgdeactivate",
>  
> +	"pginvalidate",
> +	"pgreclaim",
> +
>  	"pgfault",
>  	"pgmajfault",
>  
> -- 
> 1.7.0.4
> 



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-08  8:04     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 109+ messages in thread
From: KOSAKI Motohiro @ 2010-12-08  8:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

> invalidate_mapping_pages is very big hint to reclaimer.
> It means user doesn't want to use the page any more.
> So in order to prevent working set page eviction, this patch
> move the page into tail of inactive list by PG_reclaim.
> 
> Please, remember that pages in inactive list are working set
> as well as active list. If we don't move pages into inactive list's
> tail, pages near by tail of inactive list can be evicted although
> we have a big clue about useless pages. It's totally bad.
> 
> Now PG_readahead/PG_reclaim is shared.
> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> preventing fast reclaiming readahead marker page.
> 
> In this series, PG_reclaim is used by invalidated page, too.
> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> to reclaim asap. Then, when the dirty page will be writeback,
> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> It disturbs this serie's goal.
> 
> I think it's okay to clear PG_readahead when the page is dirty, not
> writeback time. So this patch moves ClearPageReadahead.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Acked-by: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>

Until anyone should data, I will not ack this. This patch increase
VM state, but benefit is doubious.





^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-08  8:04     ` KOSAKI Motohiro
  0 siblings, 0 replies; 109+ messages in thread
From: KOSAKI Motohiro @ 2010-12-08  8:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

> invalidate_mapping_pages is very big hint to reclaimer.
> It means user doesn't want to use the page any more.
> So in order to prevent working set page eviction, this patch
> move the page into tail of inactive list by PG_reclaim.
> 
> Please, remember that pages in inactive list are working set
> as well as active list. If we don't move pages into inactive list's
> tail, pages near by tail of inactive list can be evicted although
> we have a big clue about useless pages. It's totally bad.
> 
> Now PG_readahead/PG_reclaim is shared.
> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
> preventing fast reclaiming readahead marker page.
> 
> In this series, PG_reclaim is used by invalidated page, too.
> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
> to reclaim asap. Then, when the dirty page will be writeback,
> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
> It disturbs this serie's goal.
> 
> I think it's okay to clear PG_readahead when the page is dirty, not
> writeback time. So this patch moves ClearPageReadahead.
> 
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> Acked-by: Rik van Riel <riel@redhat.com>
> Acked-by: Mel Gorman <mel@csn.ul.ie>
> Cc: Wu Fengguang <fengguang.wu@intel.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Nick Piggin <npiggin@kernel.dk>

Until anyone should data, I will not ack this. This patch increase
VM state, but benefit is doubious.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
  2010-12-05 17:29   ` Minchan Kim
@ 2010-12-08  8:08     ` KOSAKI Motohiro
  -1 siblings, 0 replies; 109+ messages in thread
From: KOSAKI Motohiro @ 2010-12-08  8:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Balbir Singh, KAMEZAWA Hiroyuki

> Golbal page reclaim moves reclaimalbe pages into inactive list
> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.
> 
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>

Good catch!
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list
@ 2010-12-08  8:08     ` KOSAKI Motohiro
  0 siblings, 0 replies; 109+ messages in thread
From: KOSAKI Motohiro @ 2010-12-08  8:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Balbir Singh, KAMEZAWA Hiroyuki

> Golbal page reclaim moves reclaimalbe pages into inactive list
> to reclaim asap. This patch apply the rule in memcg.
> It can help to prevent unnecessary working page eviction of memcg.
> 
> Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>

Good catch!
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 5/7] add profile information for invalidated page reclaim
  2010-12-08  8:02     ` KOSAKI Motohiro
@ 2010-12-08  8:13       ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  8:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin, Mel Gorman

Hi KOSAKI,

On Wed, Dec 8, 2010 at 5:02 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> This patch adds profile information about invalidated page reclaim.
>> It's just for profiling for test so it would be discard when the series
>> are merged.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> ---
>>  include/linux/vmstat.h |    4 ++--
>>  mm/swap.c              |    3 +++
>>  mm/vmstat.c            |    3 +++
>>  3 files changed, 8 insertions(+), 2 deletions(-)
>
> Today, we have tracepoint. tracepoint has no overhead if it's unused.
> but vmstat has a overhead even if unused.
>
> Then, all new vmstat proposal should be described why you think it is
> frequently used from administrators.

It's just for easy gathering the data when Ben will test.
I never want to merge it in upstream and even mmtom.

If you don't like it for just testing, I am happy to change it with tracepoint.

Thanks.
-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 5/7] add profile information for invalidated page reclaim
@ 2010-12-08  8:13       ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  8:13 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin, Mel Gorman

Hi KOSAKI,

On Wed, Dec 8, 2010 at 5:02 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> This patch adds profile information about invalidated page reclaim.
>> It's just for profiling for test so it would be discard when the series
>> are merged.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Cc: Rik van Riel <riel@redhat.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> ---
>>  include/linux/vmstat.h |    4 ++--
>>  mm/swap.c              |    3 +++
>>  mm/vmstat.c            |    3 +++
>>  3 files changed, 8 insertions(+), 2 deletions(-)
>
> Today, we have tracepoint. tracepoint has no overhead if it's unused.
> but vmstat has a overhead even if unused.
>
> Then, all new vmstat proposal should be described why you think it is
> frequently used from administrators.

It's just for easy gathering the data when Ben will test.
I never want to merge it in upstream and even mmtom.

If you don't like it for just testing, I am happy to change it with tracepoint.

Thanks.
-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-08  8:04     ` KOSAKI Motohiro
@ 2010-12-08  8:16       ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  8:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin

On Wed, Dec 8, 2010 at 5:04 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> invalidate_mapping_pages is very big hint to reclaimer.
>> It means user doesn't want to use the page any more.
>> So in order to prevent working set page eviction, this patch
>> move the page into tail of inactive list by PG_reclaim.
>>
>> Please, remember that pages in inactive list are working set
>> as well as active list. If we don't move pages into inactive list's
>> tail, pages near by tail of inactive list can be evicted although
>> we have a big clue about useless pages. It's totally bad.
>>
>> Now PG_readahead/PG_reclaim is shared.
>> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
>> preventing fast reclaiming readahead marker page.
>>
>> In this series, PG_reclaim is used by invalidated page, too.
>> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
>> to reclaim asap. Then, when the dirty page will be writeback,
>> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
>> It disturbs this serie's goal.
>>
>> I think it's okay to clear PG_readahead when the page is dirty, not
>> writeback time. So this patch moves ClearPageReadahead.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Acked-by: Rik van Riel <riel@redhat.com>
>> Acked-by: Mel Gorman <mel@csn.ul.ie>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>
> Until anyone should data, I will not ack this. This patch increase
> VM state, but benefit is doubious.

Make sense to me. If Ben is busy, I will measure it and send the result.
Thanks!

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-08  8:16       ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08  8:16 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin

On Wed, Dec 8, 2010 at 5:04 PM, KOSAKI Motohiro
<kosaki.motohiro@jp.fujitsu.com> wrote:
>> invalidate_mapping_pages is very big hint to reclaimer.
>> It means user doesn't want to use the page any more.
>> So in order to prevent working set page eviction, this patch
>> move the page into tail of inactive list by PG_reclaim.
>>
>> Please, remember that pages in inactive list are working set
>> as well as active list. If we don't move pages into inactive list's
>> tail, pages near by tail of inactive list can be evicted although
>> we have a big clue about useless pages. It's totally bad.
>>
>> Now PG_readahead/PG_reclaim is shared.
>> fe3cba17 added ClearPageReclaim into clear_page_dirty_for_io for
>> preventing fast reclaiming readahead marker page.
>>
>> In this series, PG_reclaim is used by invalidated page, too.
>> If VM find the page is invalidated and it's dirty, it sets PG_reclaim
>> to reclaim asap. Then, when the dirty page will be writeback,
>> clear_page_dirty_for_io will clear PG_reclaim unconditionally.
>> It disturbs this serie's goal.
>>
>> I think it's okay to clear PG_readahead when the page is dirty, not
>> writeback time. So this patch moves ClearPageReadahead.
>>
>> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
>> Acked-by: Rik van Riel <riel@redhat.com>
>> Acked-by: Mel Gorman <mel@csn.ul.ie>
>> Cc: Wu Fengguang <fengguang.wu@intel.com>
>> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Nick Piggin <npiggin@kernel.dk>
>
> Until anyone should data, I will not ack this. This patch increase
> VM state, but benefit is doubious.

Make sense to me. If Ben is busy, I will measure it and send the result.
Thanks!

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 5/7] add profile information for invalidated page reclaim
  2010-12-08  8:13       ` Minchan Kim
@ 2010-12-08  8:36         ` KOSAKI Motohiro
  -1 siblings, 0 replies; 109+ messages in thread
From: KOSAKI Motohiro @ 2010-12-08  8:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin, Mel Gorman

> Hi KOSAKI,
> 
> On Wed, Dec 8, 2010 at 5:02 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> >> This patch adds profile information about invalidated page reclaim.
> >> It's just for profiling for test so it would be discard when the series
> >> are merged.
> >>
> >> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> >> Cc: Rik van Riel <riel@redhat.com>
> >> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> >> Cc: Wu Fengguang <fengguang.wu@intel.com>
> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >> Cc: Nick Piggin <npiggin@kernel.dk>
> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> ---
> >>  include/linux/vmstat.h |    4 ++--
> >>  mm/swap.c              |    3 +++
> >>  mm/vmstat.c            |    3 +++
> >>  3 files changed, 8 insertions(+), 2 deletions(-)
> >
> > Today, we have tracepoint. tracepoint has no overhead if it's unused.
> > but vmstat has a overhead even if unused.
> >
> > Then, all new vmstat proposal should be described why you think it is
> > frequently used from administrators.
> 
> It's just for easy gathering the data when Ben will test.
> I never want to merge it in upstream and even mmtom.

Ok, I had not understand your intention. Thank you.



> If you don't like it for just testing, I am happy to change it with tracepoint.






^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 5/7] add profile information for invalidated page reclaim
@ 2010-12-08  8:36         ` KOSAKI Motohiro
  0 siblings, 0 replies; 109+ messages in thread
From: KOSAKI Motohiro @ 2010-12-08  8:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: kosaki.motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin, Mel Gorman

> Hi KOSAKI,
> 
> On Wed, Dec 8, 2010 at 5:02 PM, KOSAKI Motohiro
> <kosaki.motohiro@jp.fujitsu.com> wrote:
> >> This patch adds profile information about invalidated page reclaim.
> >> It's just for profiling for test so it would be discard when the series
> >> are merged.
> >>
> >> Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
> >> Cc: Rik van Riel <riel@redhat.com>
> >> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> >> Cc: Wu Fengguang <fengguang.wu@intel.com>
> >> Cc: Johannes Weiner <hannes@cmpxchg.org>
> >> Cc: Nick Piggin <npiggin@kernel.dk>
> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> ---
> >>  include/linux/vmstat.h |    4 ++--
> >>  mm/swap.c              |    3 +++
> >>  mm/vmstat.c            |    3 +++
> >>  3 files changed, 8 insertions(+), 2 deletions(-)
> >
> > Today, we have tracepoint. tracepoint has no overhead if it's unused.
> > but vmstat has a overhead even if unused.
> >
> > Then, all new vmstat proposal should be described why you think it is
> > frequently used from administrators.
> 
> It's just for easy gathering the data when Ben will test.
> I never want to merge it in upstream and even mmtom.

Ok, I had not understand your intention. Thank you.



> If you don't like it for just testing, I am happy to change it with tracepoint.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-08  8:16       ` Minchan Kim
@ 2010-12-08 13:01         ` Ben Gamari
  -1 siblings, 0 replies; 109+ messages in thread
From: Ben Gamari @ 2010-12-08 13:01 UTC (permalink / raw)
  To: Minchan Kim, KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin

> Make sense to me. If Ben is busy, I will measure it and send the result.

I've done measurements on the patched kernel. All that remains is to do
measurements on the baseline unpached case. To summarize the results
thusfar,

Times:
=======
                       user    sys     %cpu    inputs           outputs
Patched, drop          142     64      46      13557744         14052744
Patched, nodrop        55      57      33      13557936         13556680

vmstat:
========
                        free_pages      inact_anon      act_anon        inact_file      act_file        dirtied      written  reclaim
Patched, drop, pre      306043          37541           185463          276266          153955          3689674      3604959  1550641
Patched, drop, post     13233           38462           175252          536346          178792          5527564      5371563  3169155

Patched, nodrop, pre    475211          38602           175242          81979           178820          5527592      5371554  3169155
Patched, nodrop, post   7697            38959           176986          547984          180855          7324836      7132158  3169155

Altogether, it seems that something is horribly wrong, most likely with
my test (or rsync patch). I'll do the baseline benchmarks today.

Thoughts?

Thanks,

- Ben


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-08 13:01         ` Ben Gamari
  0 siblings, 0 replies; 109+ messages in thread
From: Ben Gamari @ 2010-12-08 13:01 UTC (permalink / raw)
  To: Minchan Kim, KOSAKI Motohiro
  Cc: Andrew Morton, Rik van Riel, linux-mm, LKML, Wu Fengguang,
	Johannes Weiner, Nick Piggin

> Make sense to me. If Ben is busy, I will measure it and send the result.

I've done measurements on the patched kernel. All that remains is to do
measurements on the baseline unpached case. To summarize the results
thusfar,

Times:
=======
                       user    sys     %cpu    inputs           outputs
Patched, drop          142     64      46      13557744         14052744
Patched, nodrop        55      57      33      13557936         13556680

vmstat:
========
                        free_pages      inact_anon      act_anon        inact_file      act_file        dirtied      written  reclaim
Patched, drop, pre      306043          37541           185463          276266          153955          3689674      3604959  1550641
Patched, drop, post     13233           38462           175252          536346          178792          5527564      5371563  3169155

Patched, nodrop, pre    475211          38602           175242          81979           178820          5527592      5371554  3169155
Patched, nodrop, post   7697            38959           176986          547984          180855          7324836      7132158  3169155

Altogether, it seems that something is horribly wrong, most likely with
my test (or rsync patch). I'll do the baseline benchmarks today.

Thoughts?

Thanks,

- Ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-08 13:01         ` Ben Gamari
@ 2010-12-08 23:10           ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08 23:10 UTC (permalink / raw)
  To: Ben Gamari
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Wed, Dec 8, 2010 at 10:01 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
>> Make sense to me. If Ben is busy, I will measure it and send the result.
>
> I've done measurements on the patched kernel. All that remains is to do
> measurements on the baseline unpached case. To summarize the results
> thusfar,
>
> Times:
> =======
>                       user    sys     %cpu    inputs           outputs
> Patched, drop          142     64      46      13557744         14052744
> Patched, nodrop        55      57      33      13557936         13556680
>
> vmstat:
> ========
>                        free_pages      inact_anon      act_anon        inact_file      act_file        dirtied      written  reclaim
> Patched, drop, pre      306043          37541           185463          276266          153955          3689674      3604959  1550641
> Patched, drop, post     13233           38462           175252          536346          178792          5527564      5371563  3169155
>
> Patched, nodrop, pre    475211          38602           175242          81979           178820          5527592      5371554  3169155
> Patched, nodrop, post   7697            38959           176986          547984          180855          7324836      7132158  3169155
>
> Altogether, it seems that something is horribly wrong, most likely with
> my test (or rsync patch). I'll do the baseline benchmarks today.
>
> Thoughts?


How do you test it?
I think patch's effect would be good in big memory pressure environment.

Quickly I did it on my desktop environment.(2G DRAM)
So it's not completed result. I will test more when out of office.

Used kernel : mmotm-12-02 + my patch series
Used rsync :
1. rsync_normal : v3.0.7 vanilla
2. rsync_patch : v3.0.7 + Ben's patch(fadvise)

Test scenario :
* kernel full compile
* git clone linux-kernel
* rsync local host directory to local host dst directory


1) rsync_normal : 89.08user 127.48system 33:22.24elapsed
2) rsync_patch : 88.42user 135.26system 31:30.56elapsed

1) rsync_normal vmstat :
pgfault : 45538203
pgmajfault : 4181

pgactivate 377416
pgdeactivate 34183
pginvalidate 0
pgreclaim 0

pgsteal_dma 0
pgsteal_normal 2144469
pgsteal_high 2884412
pgsteal_movable 0
pgscan_kswapd_dma 0
pgscan_kswapd_normal 2149739
pgscan_kswapd_high 2909140
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_normal 647
pgscan_direct_high 716
pgscan_direct_movable 0
pginodesteal 0
slabs_scanned 1737344
kswapd_steal 5028353
kswapd_inodesteal 438910
pageoutrun 81208
allocstall 9
pgrotated 1642

2) rsync_patch vmstat:

pgfault : 47570231
pgmajfault : 2669

pgactivate 391806
pgdeactivate 36861
pginvalidate 1685065
pgreclaim 1685065

pgrefill_dma 0
pgrefill_normal 32025
pgrefill_high 9619
pgrefill_movable 0
pgsteal_dma 0
pgsteal_normal 744904
pgsteal_high 1079709
pgsteal_movable 0
pgscan_kswapd_dma 0
pgscan_kswapd_normal 745017
pgscan_kswapd_high 1096660
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_normal 0
pgscan_direct_high 0
pgscan_direct_movable 0
pginodesteal 0
slabs_scanned 1896960
kswapd_steal 1824613
kswapd_inodesteal 703499
pageoutrun 26828
allocstall 0
pgrotated 1681570

In summary,
Unfortunately, the number of fault is increased (47570231 - 45538203)
but pgmajfault is reduced (4181 - 2669).

The number of scanning is much reduced. 2149739 -> 745017, 2909140 ->
1096660 and even no direct reclaim in patched rsync.

The number of steal is much reduced. 2144469 -> 744904, 2884412 ->
1079709, 5028353 -> 1824613.

The elapsed time is reduced 2 minutes.

I think result is good. Reduced the steal number could imply prevent
eviction of working set pages.

It has a good result with small effort(small scanning).

I will resend with more exact measurement after repeated test.

> Thanks,
>
> - Ben
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-08 23:10           ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-08 23:10 UTC (permalink / raw)
  To: Ben Gamari
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Wed, Dec 8, 2010 at 10:01 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
>> Make sense to me. If Ben is busy, I will measure it and send the result.
>
> I've done measurements on the patched kernel. All that remains is to do
> measurements on the baseline unpached case. To summarize the results
> thusfar,
>
> Times:
> =======
>                       user    sys     %cpu    inputs           outputs
> Patched, drop          142     64      46      13557744         14052744
> Patched, nodrop        55      57      33      13557936         13556680
>
> vmstat:
> ========
>                        free_pages      inact_anon      act_anon        inact_file      act_file        dirtied      written  reclaim
> Patched, drop, pre      306043          37541           185463          276266          153955          3689674      3604959  1550641
> Patched, drop, post     13233           38462           175252          536346          178792          5527564      5371563  3169155
>
> Patched, nodrop, pre    475211          38602           175242          81979           178820          5527592      5371554  3169155
> Patched, nodrop, post   7697            38959           176986          547984          180855          7324836      7132158  3169155
>
> Altogether, it seems that something is horribly wrong, most likely with
> my test (or rsync patch). I'll do the baseline benchmarks today.
>
> Thoughts?


How do you test it?
I think patch's effect would be good in big memory pressure environment.

Quickly I did it on my desktop environment.(2G DRAM)
So it's not completed result. I will test more when out of office.

Used kernel : mmotm-12-02 + my patch series
Used rsync :
1. rsync_normal : v3.0.7 vanilla
2. rsync_patch : v3.0.7 + Ben's patch(fadvise)

Test scenario :
* kernel full compile
* git clone linux-kernel
* rsync local host directory to local host dst directory


1) rsync_normal : 89.08user 127.48system 33:22.24elapsed
2) rsync_patch : 88.42user 135.26system 31:30.56elapsed

1) rsync_normal vmstat :
pgfault : 45538203
pgmajfault : 4181

pgactivate 377416
pgdeactivate 34183
pginvalidate 0
pgreclaim 0

pgsteal_dma 0
pgsteal_normal 2144469
pgsteal_high 2884412
pgsteal_movable 0
pgscan_kswapd_dma 0
pgscan_kswapd_normal 2149739
pgscan_kswapd_high 2909140
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_normal 647
pgscan_direct_high 716
pgscan_direct_movable 0
pginodesteal 0
slabs_scanned 1737344
kswapd_steal 5028353
kswapd_inodesteal 438910
pageoutrun 81208
allocstall 9
pgrotated 1642

2) rsync_patch vmstat:

pgfault : 47570231
pgmajfault : 2669

pgactivate 391806
pgdeactivate 36861
pginvalidate 1685065
pgreclaim 1685065

pgrefill_dma 0
pgrefill_normal 32025
pgrefill_high 9619
pgrefill_movable 0
pgsteal_dma 0
pgsteal_normal 744904
pgsteal_high 1079709
pgsteal_movable 0
pgscan_kswapd_dma 0
pgscan_kswapd_normal 745017
pgscan_kswapd_high 1096660
pgscan_kswapd_movable 0
pgscan_direct_dma 0
pgscan_direct_normal 0
pgscan_direct_high 0
pgscan_direct_movable 0
pginodesteal 0
slabs_scanned 1896960
kswapd_steal 1824613
kswapd_inodesteal 703499
pageoutrun 26828
allocstall 0
pgrotated 1681570

In summary,
Unfortunately, the number of fault is increased (47570231 - 45538203)
but pgmajfault is reduced (4181 - 2669).

The number of scanning is much reduced. 2149739 -> 745017, 2909140 ->
1096660 and even no direct reclaim in patched rsync.

The number of steal is much reduced. 2144469 -> 744904, 2884412 ->
1079709, 5028353 -> 1824613.

The elapsed time is reduced 2 minutes.

I think result is good. Reduced the steal number could imply prevent
eviction of working set pages.

It has a good result with small effort(small scanning).

I will resend with more exact measurement after repeated test.

> Thanks,
>
> - Ben
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
  2010-12-08  6:56                         ` Balbir Singh
@ 2010-12-09  0:19                           ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-09  0:19 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	KOSAKI Motohiro, linux-mm, LKML, Peter Zijlstra, Wu Fengguang,
	Nick Piggin, Mel Gorman

On Wed, Dec 8, 2010 at 3:56 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * MinChan Kim <minchan.kim@gmail.com> [2010-12-08 11:15:19]:
>
>> On Wed, Dec 8, 2010 at 10:56 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Wed, 8 Dec 2010 10:43:08 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> Hi Kame,
>> >>
>> > Hi,
>> >
>> >> > I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
>> >> > extra LRU rotation.
>> >>
>> >> It can make the code clean.
>> >> As far as I think, victim list does following as.
>> >>
>> >> 1. select victim pages by strong hint
>> >> 2. move the page from LRU to victim
>> >> 3. reclaimer always peeks victim list before diving into LRU list.
>> >> 4-1. If the victim pages is used by others or dirty, it can be moved
>> >> into LRU, again or remain the page in victim list.
>> >> If the page is remained victim, when do we move it into LRU again if
>> >> the reclaimer continues to fail the page?
>> > When sometone touches it.
>> >
>> >> We have to put the new rule.
>> >> 4-2. If the victim pages isn't used by others and clean, we can
>> >> reclaim the page asap.
>> >>
>> >> AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
>> >> I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
>> >> unevictable) for cleaning the minor codes.
>> >> In addition, we have to put the new rule so it would make the LRU code
>> >> complicated.
>> >> I remember how unevictable feature merge is hard.
>> >>
>> > yes, it was hard.
>> >
>> >> But I am not against if we have more usecases. In this case, it's
>> >> valuable to implement it although it's not easy.
>> >>
>> >
>> > I wonder "victim list" can be used for something like Cleancache, when
>> > we have very-low-latency backend devices.
>> > And we may able to have page-cache-limit, which Balbir proposed as.
>>
>> Yes, I thought that, too. I think it would be a good feature in embedded system.
>>
>> >
>> >  - kvictimed? will move unmappedd page caches to victim list
>> > This may work like a InactiveClean list which we had before and make
>> > sizing easy.
>> >
>>
>> Before further discuss, we need customer's confirm.
>> We know very well it is very hard to merge if anyone doesn't use.
>>
>> Balbir, What do think about it?
>>
>
> The idea seems interesting, I am in the process of refreshing my
> patches for unmapped page cache control. I presume the process of
> filling the victim list will be similar to what I have or unmapped
> page cache isolation.

I saw your previous implementation. It doesn't have any benefit from
victim list.
It needs scanning pfns, select unmapped page and move it into victim list.
I think we might need kvictimd as Kame said but I am not convinced.
If I have a trouble with implementing my series, I might think it. But
until now, I think it's not bad and rough test result isn't bad.

To be honest, I think victim list(or cleanlist) is to be another
project. If it is completed, maybe we can make my patches simple.
I approve page cache limit control POV but it should be another project.
So I want to merge this series then if we need really victim list,
let's consider at that time.

Anyway, I will see your next version to find needs of victim list.

>
>>
>> > Thanks,
>> > -Kame
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Kind regards,
>> Minchan Kim
>
> --
>        Three Cheers,
>        Balbir
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 2/7] deactivate invalidated pages
@ 2010-12-09  0:19                           ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-09  0:19 UTC (permalink / raw)
  To: balbir
  Cc: KAMEZAWA Hiroyuki, Johannes Weiner, Andrew Morton, Rik van Riel,
	KOSAKI Motohiro, linux-mm, LKML, Peter Zijlstra, Wu Fengguang,
	Nick Piggin, Mel Gorman

On Wed, Dec 8, 2010 at 3:56 PM, Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> * MinChan Kim <minchan.kim@gmail.com> [2010-12-08 11:15:19]:
>
>> On Wed, Dec 8, 2010 at 10:56 AM, KAMEZAWA Hiroyuki
>> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
>> > On Wed, 8 Dec 2010 10:43:08 +0900
>> > Minchan Kim <minchan.kim@gmail.com> wrote:
>> >
>> >> Hi Kame,
>> >>
>> > Hi,
>> >
>> >> > I wonder ...how about adding "victim" list for "Reclaim" pages ? Then, we don't need
>> >> > extra LRU rotation.
>> >>
>> >> It can make the code clean.
>> >> As far as I think, victim list does following as.
>> >>
>> >> 1. select victim pages by strong hint
>> >> 2. move the page from LRU to victim
>> >> 3. reclaimer always peeks victim list before diving into LRU list.
>> >> 4-1. If the victim pages is used by others or dirty, it can be moved
>> >> into LRU, again or remain the page in victim list.
>> >> If the page is remained victim, when do we move it into LRU again if
>> >> the reclaimer continues to fail the page?
>> > When sometone touches it.
>> >
>> >> We have to put the new rule.
>> >> 4-2. If the victim pages isn't used by others and clean, we can
>> >> reclaim the page asap.
>> >>
>> >> AFAIK, strong hints are just two(invalidation, readahead max window heuristic).
>> >> I am not sure it's valuable to add new hierarchy(ie, LRU, victim,
>> >> unevictable) for cleaning the minor codes.
>> >> In addition, we have to put the new rule so it would make the LRU code
>> >> complicated.
>> >> I remember how unevictable feature merge is hard.
>> >>
>> > yes, it was hard.
>> >
>> >> But I am not against if we have more usecases. In this case, it's
>> >> valuable to implement it although it's not easy.
>> >>
>> >
>> > I wonder "victim list" can be used for something like Cleancache, when
>> > we have very-low-latency backend devices.
>> > And we may able to have page-cache-limit, which Balbir proposed as.
>>
>> Yes, I thought that, too. I think it would be a good feature in embedded system.
>>
>> >
>> >  - kvictimed? will move unmappedd page caches to victim list
>> > This may work like a InactiveClean list which we had before and make
>> > sizing easy.
>> >
>>
>> Before further discuss, we need customer's confirm.
>> We know very well it is very hard to merge if anyone doesn't use.
>>
>> Balbir, What do think about it?
>>
>
> The idea seems interesting, I am in the process of refreshing my
> patches for unmapped page cache control. I presume the process of
> filling the victim list will be similar to what I have or unmapped
> page cache isolation.

I saw your previous implementation. It doesn't have any benefit from
victim list.
It needs scanning pfns, select unmapped page and move it into victim list.
I think we might need kvictimd as Kame said but I am not convinced.
If I have a trouble with implementing my series, I might think it. But
until now, I think it's not bad and rough test result isn't bad.

To be honest, I think victim list(or cleanlist) is to be another
project. If it is completed, maybe we can make my patches simple.
I approve page cache limit control POV but it should be another project.
So I want to merge this series then if we need really victim list,
let's consider at that time.

Anyway, I will see your next version to find needs of victim list.

>
>>
>> > Thanks,
>> > -Kame
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Kind regards,
>> Minchan Kim
>
> --
>        Three Cheers,
>        Balbir
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-08 23:10           ` Minchan Kim
@ 2010-12-13 15:31             ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-13 15:31 UTC (permalink / raw)
  To: Ben Gamari
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Thu, Dec 09, 2010 at 08:10:17AM +0900, Minchan Kim wrote:
> On Wed, Dec 8, 2010 at 10:01 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> >> Make sense to me. If Ben is busy, I will measure it and send the result.
> >
> > I've done measurements on the patched kernel. All that remains is to do
> > measurements on the baseline unpached case. To summarize the results
> > thusfar,
> >
> > Times:
> > =======
> > ? ? ? ? ? ? ? ? ? ? ? user ? ?sys ? ? %cpu ? ?inputs ? ? ? ? ? outputs
> > Patched, drop ? ? ? ? ?142 ? ? 64 ? ? ?46 ? ? ?13557744 ? ? ? ? 14052744
> > Patched, nodrop ? ? ? ?55 ? ? ?57 ? ? ?33 ? ? ?13557936 ? ? ? ? 13556680
> >
> > vmstat:
> > ========
> > ? ? ? ? ? ? ? ? ? ? ? ?free_pages ? ? ?inact_anon ? ? ?act_anon ? ? ? ?inact_file ? ? ?act_file ? ? ? ?dirtied ? ? ?written ?reclaim
> > Patched, drop, pre ? ? ?306043 ? ? ? ? ?37541 ? ? ? ? ? 185463 ? ? ? ? ?276266 ? ? ? ? ?153955 ? ? ? ? ?3689674 ? ? ?3604959 ?1550641
> > Patched, drop, post ? ? 13233 ? ? ? ? ? 38462 ? ? ? ? ? 175252 ? ? ? ? ?536346 ? ? ? ? ?178792 ? ? ? ? ?5527564 ? ? ?5371563 ?3169155
> >
> > Patched, nodrop, pre ? ?475211 ? ? ? ? ?38602 ? ? ? ? ? 175242 ? ? ? ? ?81979 ? ? ? ? ? 178820 ? ? ? ? ?5527592 ? ? ?5371554 ?3169155
> > Patched, nodrop, post ? 7697 ? ? ? ? ? ?38959 ? ? ? ? ? 176986 ? ? ? ? ?547984 ? ? ? ? ?180855 ? ? ? ? ?7324836 ? ? ?7132158 ?3169155
> >
> > Altogether, it seems that something is horribly wrong, most likely with
> > my test (or rsync patch). I'll do the baseline benchmarks today.
> >
> > Thoughts?
> 
> 
> How do you test it?
> I think patch's effect would be good in big memory pressure environment.
> 
> Quickly I did it on my desktop environment.(2G DRAM)
> So it's not completed result. I will test more when out of office.
> 
> Used kernel : mmotm-12-02 + my patch series
> Used rsync :
> 1. rsync_normal : v3.0.7 vanilla
> 2. rsync_patch : v3.0.7 + Ben's patch(fadvise)
> 
> Test scenario :
> * kernel full compile
> * git clone linux-kernel
> * rsync local host directory to local host dst directory
> 
> 
> 1) rsync_normal : 89.08user 127.48system 33:22.24elapsed
> 2) rsync_patch : 88.42user 135.26system 31:30.56elapsed
> 
> 1) rsync_normal vmstat :
> pgfault : 45538203
> pgmajfault : 4181
> 
> pgactivate 377416
> pgdeactivate 34183
> pginvalidate 0
> pgreclaim 0
> 
> pgsteal_dma 0
> pgsteal_normal 2144469
> pgsteal_high 2884412
> pgsteal_movable 0
> pgscan_kswapd_dma 0
> pgscan_kswapd_normal 2149739
> pgscan_kswapd_high 2909140
> pgscan_kswapd_movable 0
> pgscan_direct_dma 0
> pgscan_direct_normal 647
> pgscan_direct_high 716
> pgscan_direct_movable 0
> pginodesteal 0
> slabs_scanned 1737344
> kswapd_steal 5028353
> kswapd_inodesteal 438910
> pageoutrun 81208
> allocstall 9
> pgrotated 1642
> 
> 2) rsync_patch vmstat:
> 
> pgfault : 47570231
> pgmajfault : 2669
> 
> pgactivate 391806
> pgdeactivate 36861
> pginvalidate 1685065
> pgreclaim 1685065
> 
> pgrefill_dma 0
> pgrefill_normal 32025
> pgrefill_high 9619
> pgrefill_movable 0
> pgsteal_dma 0
> pgsteal_normal 744904
> pgsteal_high 1079709
> pgsteal_movable 0
> pgscan_kswapd_dma 0
> pgscan_kswapd_normal 745017
> pgscan_kswapd_high 1096660
> pgscan_kswapd_movable 0
> pgscan_direct_dma 0
> pgscan_direct_normal 0
> pgscan_direct_high 0
> pgscan_direct_movable 0
> pginodesteal 0
> slabs_scanned 1896960
> kswapd_steal 1824613
> kswapd_inodesteal 703499
> pageoutrun 26828
> allocstall 0
> pgrotated 1681570
> 
> In summary,
> Unfortunately, the number of fault is increased (47570231 - 45538203)
> but pgmajfault is reduced (4181 - 2669).
> 
> The number of scanning is much reduced. 2149739 -> 745017, 2909140 ->
> 1096660 and even no direct reclaim in patched rsync.
> 
> The number of steal is much reduced. 2144469 -> 744904, 2884412 ->
> 1079709, 5028353 -> 1824613.
> 
> The elapsed time is reduced 2 minutes.
> 
> I think result is good. Reduced the steal number could imply prevent
> eviction of working set pages.
> 
> It has a good result with small effort(small scanning).
> 
> I will resend with more exact measurement after repeated test.
> 
> > Thanks,
> >
> > - Ben
> >
> >

Test Environment :
DRAM : 2G, CPU : Intel(R) Core(TM)2 CPU
Rsync backup directory size : 16G

rsync version is 3.0.7.
rsync patch is Ben's fadivse.
stress scenario do following jobs with parallel.

1. make all -j4 linux, git clone linux-kernel
2. git clone linux-kernel
3. rsync src dst

nrns : no-patched rsync + no stress
prns : patched rsync + no stress
nrs  : no-patched rsync + stress
prs  : patched rsync + stress

pginvalidate : the number of dirty/writeback pages which is invalidated by fadvise
pgreclaim : pages moved PG_reclaim trick in inactive's tail

In summary, my patch enhances a littie bit about elapsed time in
memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
with x2. It means reclaim latency is short and doesn't evict working set
pages due to invalidated pages.

Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
3 minute about elapsed time in stress environment. 
I think it's due to reduce scanning, reclaim overhead.

In no-stress enviroment, fadivse makes program little bit slow.
I think because there are many pgfault. I don't know why it happens.
Could you guess why it happens?

Before futher work, I hope listen opinions.
Any comment is welcome.

Thanks.

== CUT_HERE ==

mmotm-12-03-fadvise

nrns 				prns 				nrs 				prs

27:29.49			28:29.31			41:19.64			40:31.80
pginvalidate 0			pginvalidate 1941654		pginvalidate 0			pginvalidate 1773948
pgreclaim 0			pgreclaim 1941654		pgreclaim 0			pgreclaim 1773947
pgfault 254691			pgfault 462927			pgfault 61865725		pgfault 61015497
pgmajfault 206			pgmajfault 234			pgmajfault 3552			pgmajfault 2240
pgsteal_normal 3047828		pgsteal_normal 1581276		pgsteal_normal 3142791		pgsteal_normal 1845419
pgsteal_high 4757751		pgsteal_high 2419625		pgsteal_high 5351655		pgsteal_high 2731594
pgscan_kswapd_dma 0		pgscan_kswapd_dma 0		pgscan_kswapd_dma 0		pgscan_kswapd_dma 0
pgscan_kswapd_normal 3047960	pgscan_kswapd_normal 1581780	pgscan_kswapd_normal 3146659	pgscan_kswapd_normal 1846071
pgscan_kswapd_high 4758492	pgscan_kswapd_high 2428427	pgscan_kswapd_high 5359960	pgscan_kswapd_high 2732502
pgscan_kswapd_movable 0		pgscan_kswapd_movable 0		pgscan_kswapd_movable 0		pgscan_kswapd_movable 0
pgscan_direct_normal 0		pgscan_direct_normal 0		pgscan_direct_normal 0		pgscan_direct_normal 0
pgscan_direct_high 0		pgscan_direct_high 0		pgscan_direct_high 0		pgscan_direct_high 0
slabs_scanned 1408512		slabs_scanned 1672704		slabs_scanned 1839360		slabs_scanned 2049792
kswapd_steal 7805579		kswapd_steal 4000901		kswapd_steal 8494446		kswapd_steal 4577013
kswapd_inodesteal 146398	kswapd_inodesteal 502321	kswapd_inodesteal 280462	kswapd_inodesteal 493576
pageoutrun 96384		pageoutrun 49907		pageoutrun 139530		pageoutrun 62029
allocstall 0			allocstall 0			allocstall 0			allocstall 0
pgrotated 0			pgrotated 1935875		pgrotated 2511			pgrotated 1768188

mmotm-12-03
nrns 				prns 				nrs 				prs

27:38.84			29:50.22			41:12.10			43:46.63
pgfault 256793			pgfault 415539			pgfault 61205117		pgfault 66495840
pgmajfault 216			pgmajfault 276			pgmajfault 3577			pgmajfault 3383
pgsteal_normal 2963119		pgsteal_normal 1743418		pgsteal_normal 3247443		pgsteal_normal 1904633
pgsteal_high 4832522		pgsteal_high 2718769		pgsteal_high 5202506		pgsteal_high 3124215
pgscan_kswapd_normal 2963164	pgscan_kswapd_normal 1743717	pgscan_kswapd_normal 3252646	pgscan_kswapd_normal 1908354
pgscan_kswapd_high 4832815	pgscan_kswapd_high 2727233	pgscan_kswapd_high 5213874	pgscan_kswapd_high 3132365
pgscan_direct_normal 0		pgscan_direct_normal 0		pgscan_direct_normal 70		pgscan_direct_normal 0
pgscan_direct_high 0		pgscan_direct_high 0		pgscan_direct_high 98		pgscan_direct_high 0
slabs_scanned 1409408		slabs_scanned 1360512		slabs_scanned 1837440		slabs_scanned 1897856
kswapd_steal 7795641		kswapd_steal 4462187		kswapd_steal 8449781		kswapd_steal 5028848
kswapd_inodesteal 142039	kswapd_inodesteal 178		kswapd_inodesteal 311758	kswapd_inodesteal 255974
pageoutrun 97538		pageoutrun 55949		pageoutrun 130136		pageoutrun 84663
allocstall 0			allocstall 0			allocstall 2			allocstall 0
pgrotated 0			pgrotated 0			pgrotated 0			pgrotated 0

> 
> -- 
> Kind regards,
> Minchan Kim

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-13 15:31             ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-13 15:31 UTC (permalink / raw)
  To: Ben Gamari
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Thu, Dec 09, 2010 at 08:10:17AM +0900, Minchan Kim wrote:
> On Wed, Dec 8, 2010 at 10:01 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> >> Make sense to me. If Ben is busy, I will measure it and send the result.
> >
> > I've done measurements on the patched kernel. All that remains is to do
> > measurements on the baseline unpached case. To summarize the results
> > thusfar,
> >
> > Times:
> > =======
> > ? ? ? ? ? ? ? ? ? ? ? user ? ?sys ? ? %cpu ? ?inputs ? ? ? ? ? outputs
> > Patched, drop ? ? ? ? ?142 ? ? 64 ? ? ?46 ? ? ?13557744 ? ? ? ? 14052744
> > Patched, nodrop ? ? ? ?55 ? ? ?57 ? ? ?33 ? ? ?13557936 ? ? ? ? 13556680
> >
> > vmstat:
> > ========
> > ? ? ? ? ? ? ? ? ? ? ? ?free_pages ? ? ?inact_anon ? ? ?act_anon ? ? ? ?inact_file ? ? ?act_file ? ? ? ?dirtied ? ? ?written ?reclaim
> > Patched, drop, pre ? ? ?306043 ? ? ? ? ?37541 ? ? ? ? ? 185463 ? ? ? ? ?276266 ? ? ? ? ?153955 ? ? ? ? ?3689674 ? ? ?3604959 ?1550641
> > Patched, drop, post ? ? 13233 ? ? ? ? ? 38462 ? ? ? ? ? 175252 ? ? ? ? ?536346 ? ? ? ? ?178792 ? ? ? ? ?5527564 ? ? ?5371563 ?3169155
> >
> > Patched, nodrop, pre ? ?475211 ? ? ? ? ?38602 ? ? ? ? ? 175242 ? ? ? ? ?81979 ? ? ? ? ? 178820 ? ? ? ? ?5527592 ? ? ?5371554 ?3169155
> > Patched, nodrop, post ? 7697 ? ? ? ? ? ?38959 ? ? ? ? ? 176986 ? ? ? ? ?547984 ? ? ? ? ?180855 ? ? ? ? ?7324836 ? ? ?7132158 ?3169155
> >
> > Altogether, it seems that something is horribly wrong, most likely with
> > my test (or rsync patch). I'll do the baseline benchmarks today.
> >
> > Thoughts?
> 
> 
> How do you test it?
> I think patch's effect would be good in big memory pressure environment.
> 
> Quickly I did it on my desktop environment.(2G DRAM)
> So it's not completed result. I will test more when out of office.
> 
> Used kernel : mmotm-12-02 + my patch series
> Used rsync :
> 1. rsync_normal : v3.0.7 vanilla
> 2. rsync_patch : v3.0.7 + Ben's patch(fadvise)
> 
> Test scenario :
> * kernel full compile
> * git clone linux-kernel
> * rsync local host directory to local host dst directory
> 
> 
> 1) rsync_normal : 89.08user 127.48system 33:22.24elapsed
> 2) rsync_patch : 88.42user 135.26system 31:30.56elapsed
> 
> 1) rsync_normal vmstat :
> pgfault : 45538203
> pgmajfault : 4181
> 
> pgactivate 377416
> pgdeactivate 34183
> pginvalidate 0
> pgreclaim 0
> 
> pgsteal_dma 0
> pgsteal_normal 2144469
> pgsteal_high 2884412
> pgsteal_movable 0
> pgscan_kswapd_dma 0
> pgscan_kswapd_normal 2149739
> pgscan_kswapd_high 2909140
> pgscan_kswapd_movable 0
> pgscan_direct_dma 0
> pgscan_direct_normal 647
> pgscan_direct_high 716
> pgscan_direct_movable 0
> pginodesteal 0
> slabs_scanned 1737344
> kswapd_steal 5028353
> kswapd_inodesteal 438910
> pageoutrun 81208
> allocstall 9
> pgrotated 1642
> 
> 2) rsync_patch vmstat:
> 
> pgfault : 47570231
> pgmajfault : 2669
> 
> pgactivate 391806
> pgdeactivate 36861
> pginvalidate 1685065
> pgreclaim 1685065
> 
> pgrefill_dma 0
> pgrefill_normal 32025
> pgrefill_high 9619
> pgrefill_movable 0
> pgsteal_dma 0
> pgsteal_normal 744904
> pgsteal_high 1079709
> pgsteal_movable 0
> pgscan_kswapd_dma 0
> pgscan_kswapd_normal 745017
> pgscan_kswapd_high 1096660
> pgscan_kswapd_movable 0
> pgscan_direct_dma 0
> pgscan_direct_normal 0
> pgscan_direct_high 0
> pgscan_direct_movable 0
> pginodesteal 0
> slabs_scanned 1896960
> kswapd_steal 1824613
> kswapd_inodesteal 703499
> pageoutrun 26828
> allocstall 0
> pgrotated 1681570
> 
> In summary,
> Unfortunately, the number of fault is increased (47570231 - 45538203)
> but pgmajfault is reduced (4181 - 2669).
> 
> The number of scanning is much reduced. 2149739 -> 745017, 2909140 ->
> 1096660 and even no direct reclaim in patched rsync.
> 
> The number of steal is much reduced. 2144469 -> 744904, 2884412 ->
> 1079709, 5028353 -> 1824613.
> 
> The elapsed time is reduced 2 minutes.
> 
> I think result is good. Reduced the steal number could imply prevent
> eviction of working set pages.
> 
> It has a good result with small effort(small scanning).
> 
> I will resend with more exact measurement after repeated test.
> 
> > Thanks,
> >
> > - Ben
> >
> >

Test Environment :
DRAM : 2G, CPU : Intel(R) Core(TM)2 CPU
Rsync backup directory size : 16G

rsync version is 3.0.7.
rsync patch is Ben's fadivse.
stress scenario do following jobs with parallel.

1. make all -j4 linux, git clone linux-kernel
2. git clone linux-kernel
3. rsync src dst

nrns : no-patched rsync + no stress
prns : patched rsync + no stress
nrs  : no-patched rsync + stress
prs  : patched rsync + stress

pginvalidate : the number of dirty/writeback pages which is invalidated by fadvise
pgreclaim : pages moved PG_reclaim trick in inactive's tail

In summary, my patch enhances a littie bit about elapsed time in
memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
with x2. It means reclaim latency is short and doesn't evict working set
pages due to invalidated pages.

Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
3 minute about elapsed time in stress environment. 
I think it's due to reduce scanning, reclaim overhead.

In no-stress enviroment, fadivse makes program little bit slow.
I think because there are many pgfault. I don't know why it happens.
Could you guess why it happens?

Before futher work, I hope listen opinions.
Any comment is welcome.

Thanks.

== CUT_HERE ==

mmotm-12-03-fadvise

nrns 				prns 				nrs 				prs

27:29.49			28:29.31			41:19.64			40:31.80
pginvalidate 0			pginvalidate 1941654		pginvalidate 0			pginvalidate 1773948
pgreclaim 0			pgreclaim 1941654		pgreclaim 0			pgreclaim 1773947
pgfault 254691			pgfault 462927			pgfault 61865725		pgfault 61015497
pgmajfault 206			pgmajfault 234			pgmajfault 3552			pgmajfault 2240
pgsteal_normal 3047828		pgsteal_normal 1581276		pgsteal_normal 3142791		pgsteal_normal 1845419
pgsteal_high 4757751		pgsteal_high 2419625		pgsteal_high 5351655		pgsteal_high 2731594
pgscan_kswapd_dma 0		pgscan_kswapd_dma 0		pgscan_kswapd_dma 0		pgscan_kswapd_dma 0
pgscan_kswapd_normal 3047960	pgscan_kswapd_normal 1581780	pgscan_kswapd_normal 3146659	pgscan_kswapd_normal 1846071
pgscan_kswapd_high 4758492	pgscan_kswapd_high 2428427	pgscan_kswapd_high 5359960	pgscan_kswapd_high 2732502
pgscan_kswapd_movable 0		pgscan_kswapd_movable 0		pgscan_kswapd_movable 0		pgscan_kswapd_movable 0
pgscan_direct_normal 0		pgscan_direct_normal 0		pgscan_direct_normal 0		pgscan_direct_normal 0
pgscan_direct_high 0		pgscan_direct_high 0		pgscan_direct_high 0		pgscan_direct_high 0
slabs_scanned 1408512		slabs_scanned 1672704		slabs_scanned 1839360		slabs_scanned 2049792
kswapd_steal 7805579		kswapd_steal 4000901		kswapd_steal 8494446		kswapd_steal 4577013
kswapd_inodesteal 146398	kswapd_inodesteal 502321	kswapd_inodesteal 280462	kswapd_inodesteal 493576
pageoutrun 96384		pageoutrun 49907		pageoutrun 139530		pageoutrun 62029
allocstall 0			allocstall 0			allocstall 0			allocstall 0
pgrotated 0			pgrotated 1935875		pgrotated 2511			pgrotated 1768188

mmotm-12-03
nrns 				prns 				nrs 				prs

27:38.84			29:50.22			41:12.10			43:46.63
pgfault 256793			pgfault 415539			pgfault 61205117		pgfault 66495840
pgmajfault 216			pgmajfault 276			pgmajfault 3577			pgmajfault 3383
pgsteal_normal 2963119		pgsteal_normal 1743418		pgsteal_normal 3247443		pgsteal_normal 1904633
pgsteal_high 4832522		pgsteal_high 2718769		pgsteal_high 5202506		pgsteal_high 3124215
pgscan_kswapd_normal 2963164	pgscan_kswapd_normal 1743717	pgscan_kswapd_normal 3252646	pgscan_kswapd_normal 1908354
pgscan_kswapd_high 4832815	pgscan_kswapd_high 2727233	pgscan_kswapd_high 5213874	pgscan_kswapd_high 3132365
pgscan_direct_normal 0		pgscan_direct_normal 0		pgscan_direct_normal 70		pgscan_direct_normal 0
pgscan_direct_high 0		pgscan_direct_high 0		pgscan_direct_high 98		pgscan_direct_high 0
slabs_scanned 1409408		slabs_scanned 1360512		slabs_scanned 1837440		slabs_scanned 1897856
kswapd_steal 7795641		kswapd_steal 4462187		kswapd_steal 8449781		kswapd_steal 5028848
kswapd_inodesteal 142039	kswapd_inodesteal 178		kswapd_inodesteal 311758	kswapd_inodesteal 255974
pageoutrun 97538		pageoutrun 55949		pageoutrun 130136		pageoutrun 84663
allocstall 0			allocstall 0			allocstall 2			allocstall 0
pgrotated 0			pgrotated 0			pgrotated 0			pgrotated 0

> 
> -- 
> Kind regards,
> Minchan Kim

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-13 15:31             ` Minchan Kim
@ 2010-12-13 20:06               ` Ben Gamari
  -1 siblings, 0 replies; 109+ messages in thread
From: Ben Gamari @ 2010-12-13 20:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, 14 Dec 2010 00:31:05 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
> In summary, my patch enhances a littie bit about elapsed time in
> memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
> with x2. It means reclaim latency is short and doesn't evict working set
> pages due to invalidated pages.
> 
Thank you very much for this testing! I'm very sorry I've been unable to
contribute more recently. My last exam is on Wednesday and besides some
grading that is the end of the semester.  Is there anything you would
like me to do? Perhaps reproducing these results on my setup would be
useful?

> Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
> effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
> 3 minute about elapsed time in stress environment. 
> I think it's due to reduce scanning, reclaim overhead.
> 
Good good. This looks quite promising.

> In no-stress enviroment, fadivse makes program little bit slow.
> I think because there are many pgfault. I don't know why it happens.
> Could you guess why it happens?
> 
Hmm, nothing comes to mind. As I've said in the past, rsync should
require each page only once. Perhaps perf might offer some insight into
where this time is being spent?

- Ben

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-13 20:06               ` Ben Gamari
  0 siblings, 0 replies; 109+ messages in thread
From: Ben Gamari @ 2010-12-13 20:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, 14 Dec 2010 00:31:05 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
> In summary, my patch enhances a littie bit about elapsed time in
> memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
> with x2. It means reclaim latency is short and doesn't evict working set
> pages due to invalidated pages.
> 
Thank you very much for this testing! I'm very sorry I've been unable to
contribute more recently. My last exam is on Wednesday and besides some
grading that is the end of the semester.  Is there anything you would
like me to do? Perhaps reproducing these results on my setup would be
useful?

> Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
> effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
> 3 minute about elapsed time in stress environment. 
> I think it's due to reduce scanning, reclaim overhead.
> 
Good good. This looks quite promising.

> In no-stress enviroment, fadivse makes program little bit slow.
> I think because there are many pgfault. I don't know why it happens.
> Could you guess why it happens?
> 
Hmm, nothing comes to mind. As I've said in the past, rsync should
require each page only once. Perhaps perf might offer some insight into
where this time is being spent?

- Ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-13 15:31             ` Minchan Kim
@ 2010-12-14  2:07               ` KAMEZAWA Hiroyuki
  -1 siblings, 0 replies; 109+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-14  2:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Ben Gamari, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, LKML, Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, 14 Dec 2010 00:31:05 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Test Environment :
> DRAM : 2G, CPU : Intel(R) Core(TM)2 CPU
> Rsync backup directory size : 16G
> 
> rsync version is 3.0.7.
> rsync patch is Ben's fadivse.
> stress scenario do following jobs with parallel.
> 
> 1. make all -j4 linux, git clone linux-kernel
> 2. git clone linux-kernel
> 3. rsync src dst
> 
> nrns : no-patched rsync + no stress
> prns : patched rsync + no stress
> nrs  : no-patched rsync + stress
> prs  : patched rsync + stress
> 
> pginvalidate : the number of dirty/writeback pages which is invalidated by fadvise
> pgreclaim : pages moved PG_reclaim trick in inactive's tail
> 
> In summary, my patch enhances a littie bit about elapsed time in
> memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
> with x2. It means reclaim latency is short and doesn't evict working set
> pages due to invalidated pages.
> 
> Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
> effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
> 3 minute about elapsed time in stress environment. 
> I think it's due to reduce scanning, reclaim overhead.
> 
> In no-stress enviroment, fadivse makes program little bit slow.
> I think because there are many pgfault. I don't know why it happens.
> Could you guess why it happens?
> 
> Before futher work, I hope listen opinions.
> Any comment is welcome.
> 

At first, the improvement seems great. Thank you for your effort.

> In no-stress enviroment, fadivse makes program little bit slow.
> I think because there are many pgfault. I don't know why it happens.
> Could you guess why it happens?
>

Are there no program which accesses a directory rsync'ed ?

Thanks,
-Kame


^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-14  2:07               ` KAMEZAWA Hiroyuki
  0 siblings, 0 replies; 109+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-12-14  2:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Ben Gamari, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, LKML, Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, 14 Dec 2010 00:31:05 +0900
Minchan Kim <minchan.kim@gmail.com> wrote:

> Test Environment :
> DRAM : 2G, CPU : Intel(R) Core(TM)2 CPU
> Rsync backup directory size : 16G
> 
> rsync version is 3.0.7.
> rsync patch is Ben's fadivse.
> stress scenario do following jobs with parallel.
> 
> 1. make all -j4 linux, git clone linux-kernel
> 2. git clone linux-kernel
> 3. rsync src dst
> 
> nrns : no-patched rsync + no stress
> prns : patched rsync + no stress
> nrs  : no-patched rsync + stress
> prs  : patched rsync + stress
> 
> pginvalidate : the number of dirty/writeback pages which is invalidated by fadvise
> pgreclaim : pages moved PG_reclaim trick in inactive's tail
> 
> In summary, my patch enhances a littie bit about elapsed time in
> memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
> with x2. It means reclaim latency is short and doesn't evict working set
> pages due to invalidated pages.
> 
> Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
> effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
> 3 minute about elapsed time in stress environment. 
> I think it's due to reduce scanning, reclaim overhead.
> 
> In no-stress enviroment, fadivse makes program little bit slow.
> I think because there are many pgfault. I don't know why it happens.
> Could you guess why it happens?
> 
> Before futher work, I hope listen opinions.
> Any comment is welcome.
> 

At first, the improvement seems great. Thank you for your effort.

> In no-stress enviroment, fadivse makes program little bit slow.
> I think because there are many pgfault. I don't know why it happens.
> Could you guess why it happens?
>

Are there no program which accesses a directory rsync'ed ?

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-14  2:07               ` KAMEZAWA Hiroyuki
@ 2010-12-14  2:34                 ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-14  2:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Ben Gamari, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, LKML, Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, Dec 14, 2010 at 11:07 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 14 Dec 2010 00:31:05 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Test Environment :
>> DRAM : 2G, CPU : Intel(R) Core(TM)2 CPU
>> Rsync backup directory size : 16G
>>
>> rsync version is 3.0.7.
>> rsync patch is Ben's fadivse.
>> stress scenario do following jobs with parallel.
>>
>> 1. make all -j4 linux, git clone linux-kernel
>> 2. git clone linux-kernel
>> 3. rsync src dst
>>
>> nrns : no-patched rsync + no stress
>> prns : patched rsync + no stress
>> nrs  : no-patched rsync + stress
>> prs  : patched rsync + stress
>>
>> pginvalidate : the number of dirty/writeback pages which is invalidated by fadvise
>> pgreclaim : pages moved PG_reclaim trick in inactive's tail
>>
>> In summary, my patch enhances a littie bit about elapsed time in
>> memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
>> with x2. It means reclaim latency is short and doesn't evict working set
>> pages due to invalidated pages.
>>
>> Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
>> effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
>> 3 minute about elapsed time in stress environment.
>> I think it's due to reduce scanning, reclaim overhead.
>>
>> In no-stress enviroment, fadivse makes program little bit slow.
>> I think because there are many pgfault. I don't know why it happens.
>> Could you guess why it happens?
>>
>> Before futher work, I hope listen opinions.
>> Any comment is welcome.
>>
>
> At first, the improvement seems great. Thank you for your effort.

Thanks, Kame.

>
>> In no-stress enviroment, fadivse makes program little bit slow.
>> I think because there are many pgfault. I don't know why it happens.
>> Could you guess why it happens?
>>
>
> Are there no program which accesses a directory rsync'ed ?

Maybe. some programs might have mmaped files.
But although it happens, deactivate_page found that and it does
nothing so the page can't be reclaimed.

>
> Thanks,
> -Kame
>
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-14  2:34                 ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-14  2:34 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Ben Gamari, KOSAKI Motohiro, Andrew Morton, Rik van Riel,
	linux-mm, LKML, Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, Dec 14, 2010 at 11:07 AM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> On Tue, 14 Dec 2010 00:31:05 +0900
> Minchan Kim <minchan.kim@gmail.com> wrote:
>
>> Test Environment :
>> DRAM : 2G, CPU : Intel(R) Core(TM)2 CPU
>> Rsync backup directory size : 16G
>>
>> rsync version is 3.0.7.
>> rsync patch is Ben's fadivse.
>> stress scenario do following jobs with parallel.
>>
>> 1. make all -j4 linux, git clone linux-kernel
>> 2. git clone linux-kernel
>> 3. rsync src dst
>>
>> nrns : no-patched rsync + no stress
>> prns : patched rsync + no stress
>> nrs  : no-patched rsync + stress
>> prs  : patched rsync + stress
>>
>> pginvalidate : the number of dirty/writeback pages which is invalidated by fadvise
>> pgreclaim : pages moved PG_reclaim trick in inactive's tail
>>
>> In summary, my patch enhances a littie bit about elapsed time in
>> memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
>> with x2. It means reclaim latency is short and doesn't evict working set
>> pages due to invalidated pages.
>>
>> Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
>> effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
>> 3 minute about elapsed time in stress environment.
>> I think it's due to reduce scanning, reclaim overhead.
>>
>> In no-stress enviroment, fadivse makes program little bit slow.
>> I think because there are many pgfault. I don't know why it happens.
>> Could you guess why it happens?
>>
>> Before futher work, I hope listen opinions.
>> Any comment is welcome.
>>
>
> At first, the improvement seems great. Thank you for your effort.

Thanks, Kame.

>
>> In no-stress enviroment, fadivse makes program little bit slow.
>> I think because there are many pgfault. I don't know why it happens.
>> Could you guess why it happens?
>>
>
> Are there no program which accesses a directory rsync'ed ?

Maybe. some programs might have mmaped files.
But although it happens, deactivate_page found that and it does
nothing so the page can't be reclaimed.

>
> Thanks,
> -Kame
>
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-13 20:06               ` Ben Gamari
@ 2010-12-14  2:36                 ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-14  2:36 UTC (permalink / raw)
  To: Ben Gamari
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

Hi Ben,

On Tue, Dec 14, 2010 at 5:06 AM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> On Tue, 14 Dec 2010 00:31:05 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
>> In summary, my patch enhances a littie bit about elapsed time in
>> memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
>> with x2. It means reclaim latency is short and doesn't evict working set
>> pages due to invalidated pages.
>>
> Thank you very much for this testing! I'm very sorry I've been unable to
> contribute more recently. My last exam is on Wednesday and besides some
> grading that is the end of the semester.  Is there anything you would

No worry. I hope you have great grade in your exam. :)

> like me to do? Perhaps reproducing these results on my setup would be
> useful?

Thanks very much if you do.

>
>> Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
>> effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
>> 3 minute about elapsed time in stress environment.
>> I think it's due to reduce scanning, reclaim overhead.
>>
> Good good. This looks quite promising.

Thanks, Ben.

>
>> In no-stress enviroment, fadivse makes program little bit slow.
>> I think because there are many pgfault. I don't know why it happens.
>> Could you guess why it happens?
>>
> Hmm, nothing comes to mind. As I've said in the past, rsync should
> require each page only once. Perhaps perf might offer some insight into
> where this time is being spent?

Maybe. I will have a plan to look into that.

>
> - Ben
>



-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2010-12-14  2:36                 ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2010-12-14  2:36 UTC (permalink / raw)
  To: Ben Gamari
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

Hi Ben,

On Tue, Dec 14, 2010 at 5:06 AM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> On Tue, 14 Dec 2010 00:31:05 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
>> In summary, my patch enhances a littie bit about elapsed time in
>> memory pressure environment and enhance reclaim effectivness(reclaim/reclaim)
>> with x2. It means reclaim latency is short and doesn't evict working set
>> pages due to invalidated pages.
>>
> Thank you very much for this testing! I'm very sorry I've been unable to
> contribute more recently. My last exam is on Wednesday and besides some
> grading that is the end of the semester.  Is there anything you would

No worry. I hope you have great grade in your exam. :)

> like me to do? Perhaps reproducing these results on my setup would be
> useful?

Thanks very much if you do.

>
>> Look at reclaim effectivness. Patched rsync enhances x2 about reclaim
>> effectiveness and compared to mmotm-12-03, mmotm-12-03-fadvise enhances
>> 3 minute about elapsed time in stress environment.
>> I think it's due to reduce scanning, reclaim overhead.
>>
> Good good. This looks quite promising.

Thanks, Ben.

>
>> In no-stress enviroment, fadivse makes program little bit slow.
>> I think because there are many pgfault. I don't know why it happens.
>> Could you guess why it happens?
>>
> Hmm, nothing comes to mind. As I've said in the past, rsync should
> require each page only once. Perhaps perf might offer some insight into
> where this time is being spent?

Maybe. I will have a plan to look into that.

>
> - Ben
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2010-12-14  2:36                 ` Minchan Kim
@ 2011-07-25  3:08                   ` Ben Gamari
  -1 siblings, 0 replies; 109+ messages in thread
From: Ben Gamari @ 2011-07-25  3:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, 14 Dec 2010 11:36:12 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
> Hi Ben,
> 
> [snipped]

What exactly happened to this set? It seems it dropped off my radar and
otherwise never was merged. Was it not tested enough, the benefit not
great enough, or did it simply slip through the cracks?

Cheers,

- Ben

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2011-07-25  3:08                   ` Ben Gamari
  0 siblings, 0 replies; 109+ messages in thread
From: Ben Gamari @ 2011-07-25  3:08 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

On Tue, 14 Dec 2010 11:36:12 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
> Hi Ben,
> 
> [snipped]

What exactly happened to this set? It seems it dropped off my radar and
otherwise never was merged. Was it not tested enough, the benefit not
great enough, or did it simply slip through the cracks?

Cheers,

- Ben

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
  2011-07-25  3:08                   ` Ben Gamari
@ 2011-07-25  3:47                     ` Minchan Kim
  -1 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2011-07-25  3:47 UTC (permalink / raw)
  To: Ben Gamari
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

Hi Ben,

On Mon, Jul 25, 2011 at 12:08 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> On Tue, 14 Dec 2010 11:36:12 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
>> Hi Ben,
>>
>> [snipped]
>
> What exactly happened to this set? It seems it dropped off my radar and
> otherwise never was merged. Was it not tested enough, the benefit not
> great enough, or did it simply slip through the cracks?

It is already merged.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=315601809d124d046abd6c3ffa346d0dbd7aa29d

Thanks.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 109+ messages in thread

* Re: [PATCH v4 4/7] Reclaim invalidated page ASAP
@ 2011-07-25  3:47                     ` Minchan Kim
  0 siblings, 0 replies; 109+ messages in thread
From: Minchan Kim @ 2011-07-25  3:47 UTC (permalink / raw)
  To: Ben Gamari
  Cc: KOSAKI Motohiro, Andrew Morton, Rik van Riel, linux-mm, LKML,
	Wu Fengguang, Johannes Weiner, Nick Piggin

Hi Ben,

On Mon, Jul 25, 2011 at 12:08 PM, Ben Gamari <bgamari.foss@gmail.com> wrote:
> On Tue, 14 Dec 2010 11:36:12 +0900, Minchan Kim <minchan.kim@gmail.com> wrote:
>> Hi Ben,
>>
>> [snipped]
>
> What exactly happened to this set? It seems it dropped off my radar and
> otherwise never was merged. Was it not tested enough, the benefit not
> great enough, or did it simply slip through the cracks?

It is already merged.
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=315601809d124d046abd6c3ffa346d0dbd7aa29d

Thanks.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 109+ messages in thread

end of thread, other threads:[~2011-07-25  3:47 UTC | newest]

Thread overview: 109+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-05 17:29 [PATCH v4 0/7] f/madivse(DONTNEED) support Minchan Kim
2010-12-05 17:29 ` Minchan Kim
2010-12-05 17:29 ` [PATCH v4 1/7] Fix checkpatch's report in swap.c Minchan Kim
2010-12-05 17:29   ` Minchan Kim
2010-12-06  1:47   ` Rik van Riel
2010-12-06  1:47     ` Rik van Riel
2010-12-07 14:37   ` Johannes Weiner
2010-12-07 14:37     ` Johannes Weiner
2010-12-05 17:29 ` [PATCH v4 2/7] deactivate invalidated pages Minchan Kim
2010-12-05 17:29   ` Minchan Kim
2010-12-06 14:53   ` Mel Gorman
2010-12-06 14:53     ` Mel Gorman
2010-12-07 14:49   ` Johannes Weiner
2010-12-07 14:49     ` Johannes Weiner
2010-12-07 15:07     ` Minchan Kim
2010-12-07 15:07       ` Minchan Kim
2010-12-07 15:19       ` Johannes Weiner
2010-12-07 15:19         ` Johannes Weiner
2010-12-07 15:26         ` Minchan Kim
2010-12-07 15:26           ` Minchan Kim
2010-12-07 15:56           ` Johannes Weiner
2010-12-07 15:56             ` Johannes Weiner
2010-12-07 22:51             ` Minchan Kim
2010-12-07 22:51               ` Minchan Kim
2010-12-08  0:56               ` KAMEZAWA Hiroyuki
2010-12-08  0:56                 ` KAMEZAWA Hiroyuki
2010-12-08  1:43                 ` Minchan Kim
2010-12-08  1:43                   ` Minchan Kim
2010-12-08  1:56                   ` KAMEZAWA Hiroyuki
2010-12-08  1:56                     ` KAMEZAWA Hiroyuki
2010-12-08  2:15                     ` Minchan Kim
2010-12-08  2:15                       ` Minchan Kim
2010-12-08  6:56                       ` Balbir Singh
2010-12-08  6:56                         ` Balbir Singh
2010-12-09  0:19                         ` Minchan Kim
2010-12-09  0:19                           ` Minchan Kim
2010-12-05 17:29 ` [PATCH v4 3/7] move memcg reclaimable page into tail of inactive list Minchan Kim
2010-12-05 17:29   ` Minchan Kim
2010-12-06  0:04   ` KAMEZAWA Hiroyuki
2010-12-06  0:04     ` KAMEZAWA Hiroyuki
2010-12-06  3:04   ` Rik van Riel
2010-12-06  3:04     ` Rik van Riel
2010-12-07  0:17     ` Minchan Kim
2010-12-07  0:17       ` Minchan Kim
2010-12-06  3:34   ` Balbir Singh
2010-12-06  3:34     ` Balbir Singh
2010-12-07  0:20     ` Minchan Kim
2010-12-07  0:20       ` Minchan Kim
2010-12-07 14:52   ` Johannes Weiner
2010-12-07 14:52     ` Johannes Weiner
2010-12-08  8:08   ` KOSAKI Motohiro
2010-12-08  8:08     ` KOSAKI Motohiro
2010-12-05 17:29 ` [PATCH v4 4/7] Reclaim invalidated page ASAP Minchan Kim
2010-12-05 17:29   ` Minchan Kim
2010-12-07 15:05   ` Johannes Weiner
2010-12-07 15:05     ` Johannes Weiner
2010-12-07 15:21     ` Minchan Kim
2010-12-07 15:21       ` Minchan Kim
2010-12-08  8:04   ` KOSAKI Motohiro
2010-12-08  8:04     ` KOSAKI Motohiro
2010-12-08  8:16     ` Minchan Kim
2010-12-08  8:16       ` Minchan Kim
2010-12-08 13:01       ` Ben Gamari
2010-12-08 13:01         ` Ben Gamari
2010-12-08 23:10         ` Minchan Kim
2010-12-08 23:10           ` Minchan Kim
2010-12-13 15:31           ` Minchan Kim
2010-12-13 15:31             ` Minchan Kim
2010-12-13 20:06             ` Ben Gamari
2010-12-13 20:06               ` Ben Gamari
2010-12-14  2:36               ` Minchan Kim
2010-12-14  2:36                 ` Minchan Kim
2011-07-25  3:08                 ` Ben Gamari
2011-07-25  3:08                   ` Ben Gamari
2011-07-25  3:47                   ` Minchan Kim
2011-07-25  3:47                     ` Minchan Kim
2010-12-14  2:07             ` KAMEZAWA Hiroyuki
2010-12-14  2:07               ` KAMEZAWA Hiroyuki
2010-12-14  2:34               ` Minchan Kim
2010-12-14  2:34                 ` Minchan Kim
2010-12-05 17:29 ` [PATCH v4 5/7] add profile information for invalidated page reclaim Minchan Kim
2010-12-05 17:29   ` Minchan Kim
2010-12-06  3:24   ` Rik van Riel
2010-12-06  3:24     ` Rik van Riel
2010-12-08  8:02   ` KOSAKI Motohiro
2010-12-08  8:02     ` KOSAKI Motohiro
2010-12-08  8:13     ` Minchan Kim
2010-12-08  8:13       ` Minchan Kim
2010-12-08  8:36       ` KOSAKI Motohiro
2010-12-08  8:36         ` KOSAKI Motohiro
2010-12-05 17:29 ` [PATCH v4 6/7] Remove zap_details NULL dependency Minchan Kim
2010-12-05 17:29   ` Minchan Kim
2010-12-06  3:25   ` Rik van Riel
2010-12-06  3:25     ` Rik van Riel
2010-12-07  4:26   ` Hugh Dickins
2010-12-07  4:26     ` Hugh Dickins
2010-12-07  5:30     ` Minchan Kim
2010-12-07  5:30       ` Minchan Kim
2010-12-05 17:29 ` [PATCH v4 7/7] Prevent activation of page in madvise_dontneed Minchan Kim
2010-12-05 17:29   ` Minchan Kim
2010-12-07  4:48   ` Hugh Dickins
2010-12-07  4:48     ` Hugh Dickins
2010-12-07  5:44     ` Minchan Kim
2010-12-08  7:26       ` Hugh Dickins
2010-12-08  7:26         ` Hugh Dickins
2010-12-08  7:55         ` Minchan Kim
2010-12-08  7:55           ` Minchan Kim
     [not found] ` <AANLkTim71krrCcmhTTCZTzxeUDkvOdBTOkeYQu6EXt32@mail.gmail.com>
2010-12-07  1:52   ` [PATCH v4 0/7] f/madivse(DONTNEED) support Ben Gamari
2010-12-07  2:16     ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.