All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-24 13:49 ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-24 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Rik van Riel, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

There were some reported problems in the past about trashing page cache
when a backup software (i.e., rsync) touches a huge amount of pages (see
for example [1]).

This problem has been almost fixed by the Minchan Kim's patch [2] and a
proper use of fadvise() in the backup software. For example this patch
set [3] has been proposed for inclusion in rsync.

However, there can be still other similar trashing problems: when the
backup software reads all the source files, some of them may be part of
the actual working set of the system. When a POSIX_FADV_DONTNEED is
performed _all_ pages are evicted from pagecache, both the working set
and the use-once pages touched only by the backup software.

A previous proposal [4] tried to resolve this problem being less
agressive in invalidating active pages, moving them to the inactive list
intead of just evict them from the page cache.

However, this approach changed completely the old behavior of
invalidate_mapping_pages(), that is not only used by fadvise.

The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
invalidation policy.

With POSIX_FADV_NOREUSE active pages are moved to the tail of the
inactive list, and pages in the inactive list are just removed from page
cache. Pages mapped by other processes or unevictable pages are not
touched at all.

In this way if the backup was the only user of a page, that page will be
immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
If the page was also touched by other tasks it'll be moved to the
inactive list, having another chance of being re-added to the working
set, or simply reclaimed when memory is needed.

In conclusion, now userspace applications that want to drop some page
cache pages can choose between the following advices:

 POSIX_FADV_DONTNEED = drop page cache if possible
 POSIX_FADV_NOREUSE = reduce page cache eligibility

Testcase:

  - create a 1GB file called "zero"
  - run md5sum zero to read all the pages in page cache (this is to
    simulate the user activity on this file)
  - run rsync zero zero_copy
  - re-run md5sum zero (user activity on the working set) and measure
    the time to complete this command

rsync has been patched with [3], using POSIX_FADV_DONTNEED in one case and
POSIX_FADV_NOREUSE in the other case.

Results:

 - after the backup run:
   # perf stat -e block:block_bio_queue md5sum zero

                  avg elapsed time      block:block_bio_queue
 rsync-dontneed              4.24s                      2,072
 rsync-noreuse               2.22s                          0

[1] http://marc.info/?l=rsync&m=128885034930933&w=2
[2] https://lkml.org/lkml/2011/2/20/57
[3] http://lists.samba.org/archive/rsync/2010-November/025827.html
[4] https://lkml.org/lkml/2011/6/23/35

ChangeLog v2 -> v3:
 - do not change the old POSIX_FADV_DONTNEED behavior and implement the less
   aggressive page cache invalidation policy using POSIX_FADV_NOREUSE
 - fix comments in the code

[PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
[PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE

 include/linux/fs.h |    7 +++++--
 mm/fadvise.c       |   11 ++++++-----
 mm/swap.c          |    2 +-
 mm/truncate.c      |   40 ++++++++++++++++++++++++++++++----------
 4 files changed, 42 insertions(+), 18 deletions(-)

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-24 13:49 ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-24 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Rik van Riel, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

There were some reported problems in the past about trashing page cache
when a backup software (i.e., rsync) touches a huge amount of pages (see
for example [1]).

This problem has been almost fixed by the Minchan Kim's patch [2] and a
proper use of fadvise() in the backup software. For example this patch
set [3] has been proposed for inclusion in rsync.

However, there can be still other similar trashing problems: when the
backup software reads all the source files, some of them may be part of
the actual working set of the system. When a POSIX_FADV_DONTNEED is
performed _all_ pages are evicted from pagecache, both the working set
and the use-once pages touched only by the backup software.

A previous proposal [4] tried to resolve this problem being less
agressive in invalidating active pages, moving them to the inactive list
intead of just evict them from the page cache.

However, this approach changed completely the old behavior of
invalidate_mapping_pages(), that is not only used by fadvise.

The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
invalidation policy.

With POSIX_FADV_NOREUSE active pages are moved to the tail of the
inactive list, and pages in the inactive list are just removed from page
cache. Pages mapped by other processes or unevictable pages are not
touched at all.

In this way if the backup was the only user of a page, that page will be
immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
If the page was also touched by other tasks it'll be moved to the
inactive list, having another chance of being re-added to the working
set, or simply reclaimed when memory is needed.

In conclusion, now userspace applications that want to drop some page
cache pages can choose between the following advices:

 POSIX_FADV_DONTNEED = drop page cache if possible
 POSIX_FADV_NOREUSE = reduce page cache eligibility

Testcase:

  - create a 1GB file called "zero"
  - run md5sum zero to read all the pages in page cache (this is to
    simulate the user activity on this file)
  - run rsync zero zero_copy
  - re-run md5sum zero (user activity on the working set) and measure
    the time to complete this command

rsync has been patched with [3], using POSIX_FADV_DONTNEED in one case and
POSIX_FADV_NOREUSE in the other case.

Results:

 - after the backup run:
   # perf stat -e block:block_bio_queue md5sum zero

                  avg elapsed time      block:block_bio_queue
 rsync-dontneed              4.24s                      2,072
 rsync-noreuse               2.22s                          0

[1] http://marc.info/?l=rsync&m=128885034930933&w=2
[2] https://lkml.org/lkml/2011/2/20/57
[3] http://lists.samba.org/archive/rsync/2010-November/025827.html
[4] https://lkml.org/lkml/2011/6/23/35

ChangeLog v2 -> v3:
 - do not change the old POSIX_FADV_DONTNEED behavior and implement the less
   aggressive page cache invalidation policy using POSIX_FADV_NOREUSE
 - fix comments in the code

[PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
[PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE

 include/linux/fs.h |    7 +++++--
 mm/fadvise.c       |   11 ++++++-----
 mm/swap.c          |    2 +-
 mm/truncate.c      |   40 ++++++++++++++++++++++++++++++----------
 4 files changed, 42 insertions(+), 18 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
  2011-06-24 13:49 ` Andrea Righi
@ 2011-06-24 13:49   ` Andrea Righi
  -1 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-24 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Rik van Riel, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

This new function accepts an additional parameter respect to the old
invalidate_mapping_pages() that allows to specify when we want to apply
an aggressive policy to drop file cache pages or when we just want to
reduce cache eligibility.

The new prototype is the following:

 unsigned long __invalidate_mapping_pages(struct address_space *mapping,
 		pgoff_t start, pgoff_t end, bool force)

When force is true pages are always dropped if possible. When force is
false inactive pages are dropped and active pages are moved to the
inactive list.

This can be used to apply different levels of page cache invalidation
(e.g, by fadvise).

The old invalidate_mapping_pages() behavior can be mapped to
__invalidate_mapping_pages(..., true) using a C-preprocessor macro.

Signed-off-by: Andrea Righi <andrea@betterlinux.com>
---
 include/linux/fs.h |    7 +++++--
 mm/swap.c          |    2 +-
 mm/truncate.c      |   40 ++++++++++++++++++++++++++++++----------
 3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e73e2e..33beefe 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2149,8 +2149,11 @@ extern int check_disk_change(struct block_device *);
 extern int __invalidate_device(struct block_device *, bool);
 extern int invalidate_partition(struct gendisk *, int);
 #endif
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
-					pgoff_t start, pgoff_t end);
+
+#define invalidate_mapping_pages(__mapping, __start, __end)	\
+		__invalidate_mapping_pages(__mapping, __start, __end, true)
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+					pgoff_t start, pgoff_t end, bool force);
 
 static inline void invalidate_remote_inode(struct inode *inode)
 {
diff --git a/mm/swap.c b/mm/swap.c
index 3a442f1..a8fe6ac 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -413,7 +413,7 @@ void add_page_to_unevictable_list(struct page *page)
  * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
  * 3. inactive, mapped page -> none
  * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
- * 5. inactive, clean -> inactive, tail
+ * 5. [in]active, clean -> inactive, tail
  * 6. Others -> none
  *
  * In 4, why it moves inactive's head, the VM expects the page would
diff --git a/mm/truncate.c b/mm/truncate.c
index 3a29a61..90f3a97 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
 EXPORT_SYMBOL(truncate_inode_pages);
 
 /**
- * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
+ * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
  * @mapping: the address_space which holds the pages to invalidate
  * @start: the offset 'from' which to invalidate
  * @end: the offset 'to' which to invalidate (inclusive)
+ * @force: always drop pages when true (otherwise, reduce cache eligibility)
  *
  * This function only removes the unlocked pages, if you want to
  * remove all the pages of one inode, you must call truncate_inode_pages.
  *
- * invalidate_mapping_pages() will not block on IO activity. It will not
- * invalidate pages which are dirty, locked, under writeback or mapped into
- * pagetables.
+ * The @force parameter can be used to apply a more aggressive policy (when
+ * true) that will always drop pages from page cache when possible, or to just
+ * reduce cache eligibility (when false). In the last case active pages will be
+ * moved to the tail of the inactive list by deactivate_page(); inactive pages
+ * will be dropped in both cases.
+ *
+ * __invalidate_mapping_pages() will not block on IO activity. It will not
+ * invalidate pages which are dirty, locked, under writeback, mapped into
+ * pagetables, or on active lru when @force is false.
  */
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
-		pgoff_t start, pgoff_t end)
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+		pgoff_t start, pgoff_t end, bool force)
 {
 	struct pagevec pvec;
 	pgoff_t next = start;
@@ -356,11 +363,24 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			next++;
 			if (lock_failed)
 				continue;
-
-			ret = invalidate_inode_page(page);
+			/*
+			 * Invalidation of active page is rather aggressive as
+			 * we can't make sure it's not a working set of other
+			 * processes.
+			 *
+			 * When force is false, deactivate_page() would move
+			 * active page into inactive's tail so the page will
+			 * have a chance to activate again if other processes
+			 * touch it.
+			 */
+			if (!force && PageActive(page))
+				ret = 0;
+			else
+				ret = invalidate_inode_page(page);
 			unlock_page(page);
 			/*
-			 * Invalidation is a hint that the page is no longer
+			 * Invalidation of an inactive page (or any page when
+			 * force is true) is a hint that the page is no longer
 			 * of interest and try to speed up its reclaim.
 			 */
 			if (!ret)
@@ -375,7 +395,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 	}
 	return count;
 }
-EXPORT_SYMBOL(invalidate_mapping_pages);
+EXPORT_SYMBOL(__invalidate_mapping_pages);
 
 /*
  * This is like invalidate_complete_page(), except it ignores the page's
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
@ 2011-06-24 13:49   ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-24 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Rik van Riel, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

This new function accepts an additional parameter respect to the old
invalidate_mapping_pages() that allows to specify when we want to apply
an aggressive policy to drop file cache pages or when we just want to
reduce cache eligibility.

The new prototype is the following:

 unsigned long __invalidate_mapping_pages(struct address_space *mapping,
 		pgoff_t start, pgoff_t end, bool force)

When force is true pages are always dropped if possible. When force is
false inactive pages are dropped and active pages are moved to the
inactive list.

This can be used to apply different levels of page cache invalidation
(e.g, by fadvise).

The old invalidate_mapping_pages() behavior can be mapped to
__invalidate_mapping_pages(..., true) using a C-preprocessor macro.

Signed-off-by: Andrea Righi <andrea@betterlinux.com>
---
 include/linux/fs.h |    7 +++++--
 mm/swap.c          |    2 +-
 mm/truncate.c      |   40 ++++++++++++++++++++++++++++++----------
 3 files changed, 36 insertions(+), 13 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 6e73e2e..33beefe 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2149,8 +2149,11 @@ extern int check_disk_change(struct block_device *);
 extern int __invalidate_device(struct block_device *, bool);
 extern int invalidate_partition(struct gendisk *, int);
 #endif
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
-					pgoff_t start, pgoff_t end);
+
+#define invalidate_mapping_pages(__mapping, __start, __end)	\
+		__invalidate_mapping_pages(__mapping, __start, __end, true)
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+					pgoff_t start, pgoff_t end, bool force);
 
 static inline void invalidate_remote_inode(struct inode *inode)
 {
diff --git a/mm/swap.c b/mm/swap.c
index 3a442f1..a8fe6ac 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -413,7 +413,7 @@ void add_page_to_unevictable_list(struct page *page)
  * 2. active, dirty/writeback page -> inactive, head, PG_reclaim
  * 3. inactive, mapped page -> none
  * 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
- * 5. inactive, clean -> inactive, tail
+ * 5. [in]active, clean -> inactive, tail
  * 6. Others -> none
  *
  * In 4, why it moves inactive's head, the VM expects the page would
diff --git a/mm/truncate.c b/mm/truncate.c
index 3a29a61..90f3a97 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
 EXPORT_SYMBOL(truncate_inode_pages);
 
 /**
- * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
+ * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
  * @mapping: the address_space which holds the pages to invalidate
  * @start: the offset 'from' which to invalidate
  * @end: the offset 'to' which to invalidate (inclusive)
+ * @force: always drop pages when true (otherwise, reduce cache eligibility)
  *
  * This function only removes the unlocked pages, if you want to
  * remove all the pages of one inode, you must call truncate_inode_pages.
  *
- * invalidate_mapping_pages() will not block on IO activity. It will not
- * invalidate pages which are dirty, locked, under writeback or mapped into
- * pagetables.
+ * The @force parameter can be used to apply a more aggressive policy (when
+ * true) that will always drop pages from page cache when possible, or to just
+ * reduce cache eligibility (when false). In the last case active pages will be
+ * moved to the tail of the inactive list by deactivate_page(); inactive pages
+ * will be dropped in both cases.
+ *
+ * __invalidate_mapping_pages() will not block on IO activity. It will not
+ * invalidate pages which are dirty, locked, under writeback, mapped into
+ * pagetables, or on active lru when @force is false.
  */
-unsigned long invalidate_mapping_pages(struct address_space *mapping,
-		pgoff_t start, pgoff_t end)
+unsigned long __invalidate_mapping_pages(struct address_space *mapping,
+		pgoff_t start, pgoff_t end, bool force)
 {
 	struct pagevec pvec;
 	pgoff_t next = start;
@@ -356,11 +363,24 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 			next++;
 			if (lock_failed)
 				continue;
-
-			ret = invalidate_inode_page(page);
+			/*
+			 * Invalidation of active page is rather aggressive as
+			 * we can't make sure it's not a working set of other
+			 * processes.
+			 *
+			 * When force is false, deactivate_page() would move
+			 * active page into inactive's tail so the page will
+			 * have a chance to activate again if other processes
+			 * touch it.
+			 */
+			if (!force && PageActive(page))
+				ret = 0;
+			else
+				ret = invalidate_inode_page(page);
 			unlock_page(page);
 			/*
-			 * Invalidation is a hint that the page is no longer
+			 * Invalidation of an inactive page (or any page when
+			 * force is true) is a hint that the page is no longer
 			 * of interest and try to speed up its reclaim.
 			 */
 			if (!ret)
@@ -375,7 +395,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 	}
 	return count;
 }
-EXPORT_SYMBOL(invalidate_mapping_pages);
+EXPORT_SYMBOL(__invalidate_mapping_pages);
 
 /*
  * This is like invalidate_complete_page(), except it ignores the page's
-- 
1.7.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE
  2011-06-24 13:49 ` Andrea Righi
@ 2011-06-24 13:49   ` Andrea Righi
  -1 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-24 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Rik van Riel, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

There were some reported problems in the past about trashing page cache
when a backup software (i.e., rsync) touches a huge amount of pages (see
for example [1]).

This problem has been almost fixed by the Minchan Kim's patch [2] and a
proper use of fadvise() in the backup software. For example this patch
set [3] has been proposed for inclusion in rsync.

However, there can be still other similar trashing problems: when the
backup software reads all the source files, some of them may be part of
the actual working set of the system. When a POSIX_FADV_DONTNEED is
performed _all_ pages are evicted from pagecache, both the working set
and the use-once pages touched only by the backup software.

A previous proposal [4] tried to resolve this problem being less
agressive in invalidating active pages, moving them to the inactive list
intead of just evict them from the page cache.

However, this approach changed completely the old behavior of
invalidate_mapping_pages(), that is not only used by fadvise.

The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
invalidation policy.

With POSIX_FADV_NOREUSE active pages are moved to the tail of the
inactive list, and pages in the inactive list are just removed from page
cache. Pages mapped by other processes or unevictable pages are not
touched at all.

In this way if the backup was the only user of a page, that page will be
immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
If the page was also touched by other tasks it'll be moved to the
inactive list, having another chance of being re-added to the working
set, or simply reclaimed when memory is needed.

In conclusion, now userspace applications that want to drop some page
cache pages can choose between the following advices:

 POSIX_FADV_DONTNEED = drop page cache if possible
 POSIX_FADV_NOREUSE = reduce page cache eligibility

[1] http://marc.info/?l=rsync&m=128885034930933&w=2
[2] https://lkml.org/lkml/2011/2/20/57
[3] http://lists.samba.org/archive/rsync/2010-November/025827.html
[4] https://lkml.org/lkml/2011/6/23/35

Signed-off-by: Andrea Righi <andrea@betterlinux.com>
---
 mm/fadvise.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index 8d723c9..bcc79ac 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -33,7 +33,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 	pgoff_t start_index;
 	pgoff_t end_index;
 	unsigned long nrpages;
-	int ret = 0;
+	int ret = 0, force = true;
 
 	if (!file)
 		return -EBADF;
@@ -106,7 +106,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 		nrpages = end_index - start_index + 1;
 		if (!nrpages)
 			nrpages = ~0UL;
-		
+
 		ret = force_page_cache_readahead(mapping, file,
 				start_index,
 				nrpages);
@@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 			ret = 0;
 		break;
 	case POSIX_FADV_NOREUSE:
-		break;
+		/* Reduce cache eligibility */
+		force = false;
 	case POSIX_FADV_DONTNEED:
 		if (!bdi_write_congested(mapping->backing_dev_info))
 			filemap_flush(mapping);
@@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 		end_index = (endbyte >> PAGE_CACHE_SHIFT);
 
 		if (end_index >= start_index)
-			invalidate_mapping_pages(mapping, start_index,
-						end_index);
+			__invalidate_mapping_pages(mapping, start_index,
+						end_index, force);
 		break;
 	default:
 		ret = -EINVAL;
-- 
1.7.4.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE
@ 2011-06-24 13:49   ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-24 13:49 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Minchan Kim, Rik van Riel, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

There were some reported problems in the past about trashing page cache
when a backup software (i.e., rsync) touches a huge amount of pages (see
for example [1]).

This problem has been almost fixed by the Minchan Kim's patch [2] and a
proper use of fadvise() in the backup software. For example this patch
set [3] has been proposed for inclusion in rsync.

However, there can be still other similar trashing problems: when the
backup software reads all the source files, some of them may be part of
the actual working set of the system. When a POSIX_FADV_DONTNEED is
performed _all_ pages are evicted from pagecache, both the working set
and the use-once pages touched only by the backup software.

A previous proposal [4] tried to resolve this problem being less
agressive in invalidating active pages, moving them to the inactive list
intead of just evict them from the page cache.

However, this approach changed completely the old behavior of
invalidate_mapping_pages(), that is not only used by fadvise.

The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
invalidation policy.

With POSIX_FADV_NOREUSE active pages are moved to the tail of the
inactive list, and pages in the inactive list are just removed from page
cache. Pages mapped by other processes or unevictable pages are not
touched at all.

In this way if the backup was the only user of a page, that page will be
immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
If the page was also touched by other tasks it'll be moved to the
inactive list, having another chance of being re-added to the working
set, or simply reclaimed when memory is needed.

In conclusion, now userspace applications that want to drop some page
cache pages can choose between the following advices:

 POSIX_FADV_DONTNEED = drop page cache if possible
 POSIX_FADV_NOREUSE = reduce page cache eligibility

[1] http://marc.info/?l=rsync&m=128885034930933&w=2
[2] https://lkml.org/lkml/2011/2/20/57
[3] http://lists.samba.org/archive/rsync/2010-November/025827.html
[4] https://lkml.org/lkml/2011/6/23/35

Signed-off-by: Andrea Righi <andrea@betterlinux.com>
---
 mm/fadvise.c |   11 ++++++-----
 1 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/mm/fadvise.c b/mm/fadvise.c
index 8d723c9..bcc79ac 100644
--- a/mm/fadvise.c
+++ b/mm/fadvise.c
@@ -33,7 +33,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 	pgoff_t start_index;
 	pgoff_t end_index;
 	unsigned long nrpages;
-	int ret = 0;
+	int ret = 0, force = true;
 
 	if (!file)
 		return -EBADF;
@@ -106,7 +106,7 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 		nrpages = end_index - start_index + 1;
 		if (!nrpages)
 			nrpages = ~0UL;
-		
+
 		ret = force_page_cache_readahead(mapping, file,
 				start_index,
 				nrpages);
@@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 			ret = 0;
 		break;
 	case POSIX_FADV_NOREUSE:
-		break;
+		/* Reduce cache eligibility */
+		force = false;
 	case POSIX_FADV_DONTNEED:
 		if (!bdi_write_congested(mapping->backing_dev_info))
 			filemap_flush(mapping);
@@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
 		end_index = (endbyte >> PAGE_CACHE_SHIFT);
 
 		if (end_index >= start_index)
-			invalidate_mapping_pages(mapping, start_index,
-						end_index);
+			__invalidate_mapping_pages(mapping, start_index,
+						end_index, force);
 		break;
 	default:
 		ret = -EINVAL;
-- 
1.7.4.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
  2011-06-24 13:49   ` Andrea Righi
@ 2011-06-26 22:46     ` Rik van Riel
  -1 siblings, 0 replies; 31+ messages in thread
From: Rik van Riel @ 2011-06-26 22:46 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

On 06/24/2011 09:49 AM, Andrea Righi wrote:

> diff --git a/mm/truncate.c b/mm/truncate.c
> index 3a29a61..90f3a97 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
>   EXPORT_SYMBOL(truncate_inode_pages);
>
>   /**
> - * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> + * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
>    * @mapping: the address_space which holds the pages to invalidate
>    * @start: the offset 'from' which to invalidate
>    * @end: the offset 'to' which to invalidate (inclusive)
> + * @force: always drop pages when true (otherwise, reduce cache eligibility)

I don't like the parameter name "force".

The parameter determines whether or not pages actually get
invalidated, so I'm guessing the parameter name should
reflect the function...

Maybe something like "invalidate"?

-- 
All rights reversed

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
@ 2011-06-26 22:46     ` Rik van Riel
  0 siblings, 0 replies; 31+ messages in thread
From: Rik van Riel @ 2011-06-26 22:46 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

On 06/24/2011 09:49 AM, Andrea Righi wrote:

> diff --git a/mm/truncate.c b/mm/truncate.c
> index 3a29a61..90f3a97 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
>   EXPORT_SYMBOL(truncate_inode_pages);
>
>   /**
> - * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> + * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
>    * @mapping: the address_space which holds the pages to invalidate
>    * @start: the offset 'from' which to invalidate
>    * @end: the offset 'to' which to invalidate (inclusive)
> + * @force: always drop pages when true (otherwise, reduce cache eligibility)

I don't like the parameter name "force".

The parameter determines whether or not pages actually get
invalidated, so I'm guessing the parameter name should
reflect the function...

Maybe something like "invalidate"?

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE
  2011-06-24 13:49   ` Andrea Righi
@ 2011-06-26 22:47     ` Rik van Riel
  -1 siblings, 0 replies; 31+ messages in thread
From: Rik van Riel @ 2011-06-26 22:47 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

On 06/24/2011 09:49 AM, Andrea Righi wrote:

> @@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
>   			ret = 0;
>   		break;
>   	case POSIX_FADV_NOREUSE:
> -		break;
> +		/* Reduce cache eligibility */
> +		force = false;
>   	case POSIX_FADV_DONTNEED:
>   		if (!bdi_write_congested(mapping->backing_dev_info))
>   			filemap_flush(mapping);

And the same is true here.  "force" is just not a very
descriptive name.

> @@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
>   		end_index = (endbyte>>  PAGE_CACHE_SHIFT);
>
>   		if (end_index>= start_index)
> -			invalidate_mapping_pages(mapping, start_index,
> -						end_index);
> +			__invalidate_mapping_pages(mapping, start_index,
> +						end_index, force);
>   		break;
>   	default:
>   		ret = -EINVAL;


-- 
All rights reversed

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE
@ 2011-06-26 22:47     ` Rik van Riel
  0 siblings, 0 replies; 31+ messages in thread
From: Rik van Riel @ 2011-06-26 22:47 UTC (permalink / raw)
  To: Andrea Righi
  Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

On 06/24/2011 09:49 AM, Andrea Righi wrote:

> @@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
>   			ret = 0;
>   		break;
>   	case POSIX_FADV_NOREUSE:
> -		break;
> +		/* Reduce cache eligibility */
> +		force = false;
>   	case POSIX_FADV_DONTNEED:
>   		if (!bdi_write_congested(mapping->backing_dev_info))
>   			filemap_flush(mapping);

And the same is true here.  "force" is just not a very
descriptive name.

> @@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
>   		end_index = (endbyte>>  PAGE_CACHE_SHIFT);
>
>   		if (end_index>= start_index)
> -			invalidate_mapping_pages(mapping, start_index,
> -						end_index);
> +			__invalidate_mapping_pages(mapping, start_index,
> +						end_index, force);
>   		break;
>   	default:
>   		ret = -EINVAL;


-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-06-24 13:49 ` Andrea Righi
@ 2011-06-27  3:04   ` KOSAKI Motohiro
  -1 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2011-06-27  3:04 UTC (permalink / raw)
  To: andrea
  Cc: akpm, minchan.kim, riel, peterz, hannes, kamezawa.hiroyu,
	aarcange, hughd, jamesjer, marcus, matt, tytso, shaohua.li, P,
	linux-mm, linux-kernel

(2011/06/24 22:49), Andrea Righi wrote:
> There were some reported problems in the past about trashing page cache
> when a backup software (i.e., rsync) touches a huge amount of pages (see
> for example [1]).
> 
> This problem has been almost fixed by the Minchan Kim's patch [2] and a
> proper use of fadvise() in the backup software. For example this patch
> set [3] has been proposed for inclusion in rsync.
> 
> However, there can be still other similar trashing problems: when the
> backup software reads all the source files, some of them may be part of
> the actual working set of the system. When a POSIX_FADV_DONTNEED is
> performed _all_ pages are evicted from pagecache, both the working set
> and the use-once pages touched only by the backup software.
> 
> A previous proposal [4] tried to resolve this problem being less
> agressive in invalidating active pages, moving them to the inactive list
> intead of just evict them from the page cache.
> 
> However, this approach changed completely the old behavior of
> invalidate_mapping_pages(), that is not only used by fadvise.
> 
> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> invalidation policy.
> 
> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> inactive list, and pages in the inactive list are just removed from page
> cache. Pages mapped by other processes or unevictable pages are not
> touched at all.
> 
> In this way if the backup was the only user of a page, that page will be
> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> If the page was also touched by other tasks it'll be moved to the
> inactive list, having another chance of being re-added to the working
> set, or simply reclaimed when memory is needed.
> 
> In conclusion, now userspace applications that want to drop some page
> cache pages can choose between the following advices:
> 
>  POSIX_FADV_DONTNEED = drop page cache if possible
>  POSIX_FADV_NOREUSE = reduce page cache eligibility

Eeek.

Your POSIX_FADV_NOREUSE is very different from POSIX definition.
POSIX says,

       POSIX_FADV_NOREUSE
              Specifies that the application expects to access the specified data once  and  then
              not reuse it thereafter.

IfI understand correctly, it designed for calling _before_ data access
and to be expected may prevent lru activation. But your NORESE is designed
for calling _after_ data access. Big difference might makes a chance of
portability issue.






^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-27  3:04   ` KOSAKI Motohiro
  0 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2011-06-27  3:04 UTC (permalink / raw)
  To: andrea
  Cc: akpm, minchan.kim, riel, peterz, hannes, kamezawa.hiroyu,
	aarcange, hughd, jamesjer, marcus, matt, tytso, shaohua.li, P,
	linux-mm, linux-kernel

(2011/06/24 22:49), Andrea Righi wrote:
> There were some reported problems in the past about trashing page cache
> when a backup software (i.e., rsync) touches a huge amount of pages (see
> for example [1]).
> 
> This problem has been almost fixed by the Minchan Kim's patch [2] and a
> proper use of fadvise() in the backup software. For example this patch
> set [3] has been proposed for inclusion in rsync.
> 
> However, there can be still other similar trashing problems: when the
> backup software reads all the source files, some of them may be part of
> the actual working set of the system. When a POSIX_FADV_DONTNEED is
> performed _all_ pages are evicted from pagecache, both the working set
> and the use-once pages touched only by the backup software.
> 
> A previous proposal [4] tried to resolve this problem being less
> agressive in invalidating active pages, moving them to the inactive list
> intead of just evict them from the page cache.
> 
> However, this approach changed completely the old behavior of
> invalidate_mapping_pages(), that is not only used by fadvise.
> 
> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> invalidation policy.
> 
> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> inactive list, and pages in the inactive list are just removed from page
> cache. Pages mapped by other processes or unevictable pages are not
> touched at all.
> 
> In this way if the backup was the only user of a page, that page will be
> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> If the page was also touched by other tasks it'll be moved to the
> inactive list, having another chance of being re-added to the working
> set, or simply reclaimed when memory is needed.
> 
> In conclusion, now userspace applications that want to drop some page
> cache pages can choose between the following advices:
> 
>  POSIX_FADV_DONTNEED = drop page cache if possible
>  POSIX_FADV_NOREUSE = reduce page cache eligibility

Eeek.

Your POSIX_FADV_NOREUSE is very different from POSIX definition.
POSIX says,

       POSIX_FADV_NOREUSE
              Specifies that the application expects to access the specified data once  and  then
              not reuse it thereafter.

IfI understand correctly, it designed for calling _before_ data access
and to be expected may prevent lru activation. But your NORESE is designed
for calling _after_ data access. Big difference might makes a chance of
portability issue.





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
  2011-06-26 22:46     ` Rik van Riel
@ 2011-06-27  7:05       ` Andrea Righi
  -1 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27  7:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

On Sun, Jun 26, 2011 at 06:46:46PM -0400, Rik van Riel wrote:
> On 06/24/2011 09:49 AM, Andrea Righi wrote:
> 
> >diff --git a/mm/truncate.c b/mm/truncate.c
> >index 3a29a61..90f3a97 100644
> >--- a/mm/truncate.c
> >+++ b/mm/truncate.c
> >@@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
> >  EXPORT_SYMBOL(truncate_inode_pages);
> >
> >  /**
> >- * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> >+ * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> >   * @mapping: the address_space which holds the pages to invalidate
> >   * @start: the offset 'from' which to invalidate
> >   * @end: the offset 'to' which to invalidate (inclusive)
> >+ * @force: always drop pages when true (otherwise, reduce cache eligibility)
> 
> I don't like the parameter name "force".

Agreed.

> 
> The parameter determines whether or not pages actually get
> invalidated, so I'm guessing the parameter name should
> reflect the function...
> 
> Maybe something like "invalidate"?

Sounds better.

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages()
@ 2011-06-27  7:05       ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27  7:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

On Sun, Jun 26, 2011 at 06:46:46PM -0400, Rik van Riel wrote:
> On 06/24/2011 09:49 AM, Andrea Righi wrote:
> 
> >diff --git a/mm/truncate.c b/mm/truncate.c
> >index 3a29a61..90f3a97 100644
> >--- a/mm/truncate.c
> >+++ b/mm/truncate.c
> >@@ -312,20 +312,27 @@ void truncate_inode_pages(struct address_space *mapping, loff_t lstart)
> >  EXPORT_SYMBOL(truncate_inode_pages);
> >
> >  /**
> >- * invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> >+ * __invalidate_mapping_pages - Invalidate all the unlocked pages of one inode
> >   * @mapping: the address_space which holds the pages to invalidate
> >   * @start: the offset 'from' which to invalidate
> >   * @end: the offset 'to' which to invalidate (inclusive)
> >+ * @force: always drop pages when true (otherwise, reduce cache eligibility)
> 
> I don't like the parameter name "force".

Agreed.

> 
> The parameter determines whether or not pages actually get
> invalidated, so I'm guessing the parameter name should
> reflect the function...
> 
> Maybe something like "invalidate"?

Sounds better.

-Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE
  2011-06-26 22:47     ` Rik van Riel
@ 2011-06-27  7:05       ` Andrea Righi
  -1 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27  7:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

On Sun, Jun 26, 2011 at 06:47:37PM -0400, Rik van Riel wrote:
> On 06/24/2011 09:49 AM, Andrea Righi wrote:
> 
> >@@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> >  			ret = 0;
> >  		break;
> >  	case POSIX_FADV_NOREUSE:
> >-		break;
> >+		/* Reduce cache eligibility */
> >+		force = false;
> >  	case POSIX_FADV_DONTNEED:
> >  		if (!bdi_write_congested(mapping->backing_dev_info))
> >  			filemap_flush(mapping);
> 
> And the same is true here.  "force" is just not a very
> descriptive name.

OK, I'll change the name to "invalidate" in the next version of the
patch.

Thanks,
-Andrea

> 
> >@@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> >  		end_index = (endbyte>>  PAGE_CACHE_SHIFT);
> >
> >  		if (end_index>= start_index)
> >-			invalidate_mapping_pages(mapping, start_index,
> >-						end_index);
> >+			__invalidate_mapping_pages(mapping, start_index,
> >+						end_index, force);
> >  		break;
> >  	default:
> >  		ret = -EINVAL;
> 
> 
> -- 
> All rights reversed

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE
@ 2011-06-27  7:05       ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27  7:05 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Andrew Morton, Minchan Kim, Peter Zijlstra, Johannes Weiner,
	KAMEZAWA Hiroyuki, Andrea Arcangeli, Hugh Dickins, Jerry James,
	Marcus Sorensen, Matt Heaton, KOSAKI Motohiro, Theodore Tso,
	Shaohua Li, Pádraig Brady, linux-mm, LKML

On Sun, Jun 26, 2011 at 06:47:37PM -0400, Rik van Riel wrote:
> On 06/24/2011 09:49 AM, Andrea Righi wrote:
> 
> >@@ -114,7 +114,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> >  			ret = 0;
> >  		break;
> >  	case POSIX_FADV_NOREUSE:
> >-		break;
> >+		/* Reduce cache eligibility */
> >+		force = false;
> >  	case POSIX_FADV_DONTNEED:
> >  		if (!bdi_write_congested(mapping->backing_dev_info))
> >  			filemap_flush(mapping);
> 
> And the same is true here.  "force" is just not a very
> descriptive name.

OK, I'll change the name to "invalidate" in the next version of the
patch.

Thanks,
-Andrea

> 
> >@@ -124,8 +125,8 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
> >  		end_index = (endbyte>>  PAGE_CACHE_SHIFT);
> >
> >  		if (end_index>= start_index)
> >-			invalidate_mapping_pages(mapping, start_index,
> >-						end_index);
> >+			__invalidate_mapping_pages(mapping, start_index,
> >+						end_index, force);
> >  		break;
> >  	default:
> >  		ret = -EINVAL;
> 
> 
> -- 
> All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-06-27  3:04   ` KOSAKI Motohiro
@ 2011-06-27  7:11     ` Andrea Righi
  -1 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27  7:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: akpm, minchan.kim, riel, peterz, hannes, kamezawa.hiroyu,
	aarcange, hughd, jamesjer, marcus, matt, tytso, shaohua.li, P,
	linux-mm, linux-kernel

On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
> (2011/06/24 22:49), Andrea Righi wrote:
> > There were some reported problems in the past about trashing page cache
> > when a backup software (i.e., rsync) touches a huge amount of pages (see
> > for example [1]).
> > 
> > This problem has been almost fixed by the Minchan Kim's patch [2] and a
> > proper use of fadvise() in the backup software. For example this patch
> > set [3] has been proposed for inclusion in rsync.
> > 
> > However, there can be still other similar trashing problems: when the
> > backup software reads all the source files, some of them may be part of
> > the actual working set of the system. When a POSIX_FADV_DONTNEED is
> > performed _all_ pages are evicted from pagecache, both the working set
> > and the use-once pages touched only by the backup software.
> > 
> > A previous proposal [4] tried to resolve this problem being less
> > agressive in invalidating active pages, moving them to the inactive list
> > intead of just evict them from the page cache.
> > 
> > However, this approach changed completely the old behavior of
> > invalidate_mapping_pages(), that is not only used by fadvise.
> > 
> > The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> > invalidation policy.
> > 
> > With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> > inactive list, and pages in the inactive list are just removed from page
> > cache. Pages mapped by other processes or unevictable pages are not
> > touched at all.
> > 
> > In this way if the backup was the only user of a page, that page will be
> > immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> > If the page was also touched by other tasks it'll be moved to the
> > inactive list, having another chance of being re-added to the working
> > set, or simply reclaimed when memory is needed.
> > 
> > In conclusion, now userspace applications that want to drop some page
> > cache pages can choose between the following advices:
> > 
> >  POSIX_FADV_DONTNEED = drop page cache if possible
> >  POSIX_FADV_NOREUSE = reduce page cache eligibility
> 
> Eeek.
> 
> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
> POSIX says,
> 
>        POSIX_FADV_NOREUSE
>               Specifies that the application expects to access the specified data once  and  then
>               not reuse it thereafter.
> 
> IfI understand correctly, it designed for calling _before_ data access
> and to be expected may prevent lru activation. But your NORESE is designed
> for calling _after_ data access. Big difference might makes a chance of
> portability issue.

You're right. NOREUSE is designed to implement drop behind policy.

I'll post a new patch that will plug this logic in DONTNEED (like the
presious version), but without breaking the old /proc/sys/vm/drop_caches
behavior.

Thanks,
-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-27  7:11     ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27  7:11 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: akpm, minchan.kim, riel, peterz, hannes, kamezawa.hiroyu,
	aarcange, hughd, jamesjer, marcus, matt, tytso, shaohua.li, P,
	linux-mm, linux-kernel

On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
> (2011/06/24 22:49), Andrea Righi wrote:
> > There were some reported problems in the past about trashing page cache
> > when a backup software (i.e., rsync) touches a huge amount of pages (see
> > for example [1]).
> > 
> > This problem has been almost fixed by the Minchan Kim's patch [2] and a
> > proper use of fadvise() in the backup software. For example this patch
> > set [3] has been proposed for inclusion in rsync.
> > 
> > However, there can be still other similar trashing problems: when the
> > backup software reads all the source files, some of them may be part of
> > the actual working set of the system. When a POSIX_FADV_DONTNEED is
> > performed _all_ pages are evicted from pagecache, both the working set
> > and the use-once pages touched only by the backup software.
> > 
> > A previous proposal [4] tried to resolve this problem being less
> > agressive in invalidating active pages, moving them to the inactive list
> > intead of just evict them from the page cache.
> > 
> > However, this approach changed completely the old behavior of
> > invalidate_mapping_pages(), that is not only used by fadvise.
> > 
> > The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> > invalidation policy.
> > 
> > With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> > inactive list, and pages in the inactive list are just removed from page
> > cache. Pages mapped by other processes or unevictable pages are not
> > touched at all.
> > 
> > In this way if the backup was the only user of a page, that page will be
> > immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> > If the page was also touched by other tasks it'll be moved to the
> > inactive list, having another chance of being re-added to the working
> > set, or simply reclaimed when memory is needed.
> > 
> > In conclusion, now userspace applications that want to drop some page
> > cache pages can choose between the following advices:
> > 
> >  POSIX_FADV_DONTNEED = drop page cache if possible
> >  POSIX_FADV_NOREUSE = reduce page cache eligibility
> 
> Eeek.
> 
> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
> POSIX says,
> 
>        POSIX_FADV_NOREUSE
>               Specifies that the application expects to access the specified data once  and  then
>               not reuse it thereafter.
> 
> IfI understand correctly, it designed for calling _before_ data access
> and to be expected may prevent lru activation. But your NORESE is designed
> for calling _after_ data access. Big difference might makes a chance of
> portability issue.

You're right. NOREUSE is designed to implement drop behind policy.

I'll post a new patch that will plug this logic in DONTNEED (like the
presious version), but without breaking the old /proc/sys/vm/drop_caches
behavior.

Thanks,
-Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-06-27  7:11     ` Andrea Righi
@ 2011-06-27  7:42       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2011-06-27  7:42 UTC (permalink / raw)
  To: andrea
  Cc: akpm, minchan.kim, riel, peterz, hannes, kamezawa.hiroyu,
	aarcange, hughd, jamesjer, marcus, matt, tytso, shaohua.li, P,
	linux-mm, linux-kernel

>>>  POSIX_FADV_DONTNEED = drop page cache if possible
>>>  POSIX_FADV_NOREUSE = reduce page cache eligibility
>>
>> Eeek.
>>
>> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
>> POSIX says,
>>
>>        POSIX_FADV_NOREUSE
>>               Specifies that the application expects to access the specified data once  and  then
>>               not reuse it thereafter.
>>
>> IfI understand correctly, it designed for calling _before_ data access
>> and to be expected may prevent lru activation. But your NORESE is designed
>> for calling _after_ data access. Big difference might makes a chance of
>> portability issue.
> 
> You're right. NOREUSE is designed to implement drop behind policy.
> 
> I'll post a new patch that will plug this logic in DONTNEED (like the
> presious version), but without breaking the old /proc/sys/vm/drop_caches
> behavior.

Great!

thanks.




^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-27  7:42       ` KOSAKI Motohiro
  0 siblings, 0 replies; 31+ messages in thread
From: KOSAKI Motohiro @ 2011-06-27  7:42 UTC (permalink / raw)
  To: andrea
  Cc: akpm, minchan.kim, riel, peterz, hannes, kamezawa.hiroyu,
	aarcange, hughd, jamesjer, marcus, matt, tytso, shaohua.li, P,
	linux-mm, linux-kernel

>>>  POSIX_FADV_DONTNEED = drop page cache if possible
>>>  POSIX_FADV_NOREUSE = reduce page cache eligibility
>>
>> Eeek.
>>
>> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
>> POSIX says,
>>
>>        POSIX_FADV_NOREUSE
>>               Specifies that the application expects to access the specified data once  and  then
>>               not reuse it thereafter.
>>
>> IfI understand correctly, it designed for calling _before_ data access
>> and to be expected may prevent lru activation. But your NORESE is designed
>> for calling _after_ data access. Big difference might makes a chance of
>> portability issue.
> 
> You're right. NOREUSE is designed to implement drop behind policy.
> 
> I'll post a new patch that will plug this logic in DONTNEED (like the
> presious version), but without breaking the old /proc/sys/vm/drop_caches
> behavior.

Great!

thanks.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-06-27  7:11     ` Andrea Righi
@ 2011-06-27 10:17       ` Pádraig Brady
  -1 siblings, 0 replies; 31+ messages in thread
From: Pádraig Brady @ 2011-06-27 10:17 UTC (permalink / raw)
  To: Andrea Righi
  Cc: KOSAKI Motohiro, akpm, minchan.kim, riel, peterz, hannes,
	kamezawa.hiroyu, aarcange, hughd, jamesjer, marcus, matt, tytso,
	shaohua.li, linux-mm, linux-kernel

On 27/06/11 08:11, Andrea Righi wrote:
> On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
>> (2011/06/24 22:49), Andrea Righi wrote:
>>> There were some reported problems in the past about trashing page cache
>>> when a backup software (i.e., rsync) touches a huge amount of pages (see
>>> for example [1]).
>>>
>>> This problem has been almost fixed by the Minchan Kim's patch [2] and a
>>> proper use of fadvise() in the backup software. For example this patch
>>> set [3] has been proposed for inclusion in rsync.
>>>
>>> However, there can be still other similar trashing problems: when the
>>> backup software reads all the source files, some of them may be part of
>>> the actual working set of the system. When a POSIX_FADV_DONTNEED is
>>> performed _all_ pages are evicted from pagecache, both the working set
>>> and the use-once pages touched only by the backup software.
>>>
>>> A previous proposal [4] tried to resolve this problem being less
>>> agressive in invalidating active pages, moving them to the inactive list
>>> intead of just evict them from the page cache.
>>>
>>> However, this approach changed completely the old behavior of
>>> invalidate_mapping_pages(), that is not only used by fadvise.
>>>
>>> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
>>> invalidation policy.
>>>
>>> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
>>> inactive list, and pages in the inactive list are just removed from page
>>> cache. Pages mapped by other processes or unevictable pages are not
>>> touched at all.
>>>
>>> In this way if the backup was the only user of a page, that page will be
>>> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
>>> If the page was also touched by other tasks it'll be moved to the
>>> inactive list, having another chance of being re-added to the working
>>> set, or simply reclaimed when memory is needed.
>>>
>>> In conclusion, now userspace applications that want to drop some page
>>> cache pages can choose between the following advices:
>>>
>>>  POSIX_FADV_DONTNEED = drop page cache if possible
>>>  POSIX_FADV_NOREUSE = reduce page cache eligibility
>>
>> Eeek.
>>
>> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
>> POSIX says,
>>
>>        POSIX_FADV_NOREUSE
>>               Specifies that the application expects to access the specified data once  and  then
>>               not reuse it thereafter.
>>
>> IfI understand correctly, it designed for calling _before_ data access
>> and to be expected may prevent lru activation. But your NORESE is designed
>> for calling _after_ data access. Big difference might makes a chance of
>> portability issue.
> 
> You're right. NOREUSE is designed to implement drop behind policy.

Hmm fair enough.
NOREUSE is meant for specifying you _will_ need the data _once_

Isn't this what rsync actually wants though?
I.E. to specify NOREUSE for the file up front
so it would drop from cache automatically as processed,
(if not already in cache).

I realize that would be a more invasive patch.

> I'll post a new patch that will plug this logic in DONTNEED (like the
> presious version), but without breaking the old /proc/sys/vm/drop_caches
> behavior.

But will that break existing apps (running as root) that expect DONTNEED
to drop cache for a _file_.  Perhaps posix_fadvise() is meant to have
process rather than system scope, but that has not been the case until now.

cheers,
Pádraig.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-27 10:17       ` Pádraig Brady
  0 siblings, 0 replies; 31+ messages in thread
From: Pádraig Brady @ 2011-06-27 10:17 UTC (permalink / raw)
  To: Andrea Righi
  Cc: KOSAKI Motohiro, akpm, minchan.kim, riel, peterz, hannes,
	kamezawa.hiroyu, aarcange, hughd, jamesjer, marcus, matt, tytso,
	shaohua.li, linux-mm, linux-kernel

On 27/06/11 08:11, Andrea Righi wrote:
> On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
>> (2011/06/24 22:49), Andrea Righi wrote:
>>> There were some reported problems in the past about trashing page cache
>>> when a backup software (i.e., rsync) touches a huge amount of pages (see
>>> for example [1]).
>>>
>>> This problem has been almost fixed by the Minchan Kim's patch [2] and a
>>> proper use of fadvise() in the backup software. For example this patch
>>> set [3] has been proposed for inclusion in rsync.
>>>
>>> However, there can be still other similar trashing problems: when the
>>> backup software reads all the source files, some of them may be part of
>>> the actual working set of the system. When a POSIX_FADV_DONTNEED is
>>> performed _all_ pages are evicted from pagecache, both the working set
>>> and the use-once pages touched only by the backup software.
>>>
>>> A previous proposal [4] tried to resolve this problem being less
>>> agressive in invalidating active pages, moving them to the inactive list
>>> intead of just evict them from the page cache.
>>>
>>> However, this approach changed completely the old behavior of
>>> invalidate_mapping_pages(), that is not only used by fadvise.
>>>
>>> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
>>> invalidation policy.
>>>
>>> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
>>> inactive list, and pages in the inactive list are just removed from page
>>> cache. Pages mapped by other processes or unevictable pages are not
>>> touched at all.
>>>
>>> In this way if the backup was the only user of a page, that page will be
>>> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
>>> If the page was also touched by other tasks it'll be moved to the
>>> inactive list, having another chance of being re-added to the working
>>> set, or simply reclaimed when memory is needed.
>>>
>>> In conclusion, now userspace applications that want to drop some page
>>> cache pages can choose between the following advices:
>>>
>>>  POSIX_FADV_DONTNEED = drop page cache if possible
>>>  POSIX_FADV_NOREUSE = reduce page cache eligibility
>>
>> Eeek.
>>
>> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
>> POSIX says,
>>
>>        POSIX_FADV_NOREUSE
>>               Specifies that the application expects to access the specified data once  and  then
>>               not reuse it thereafter.
>>
>> IfI understand correctly, it designed for calling _before_ data access
>> and to be expected may prevent lru activation. But your NORESE is designed
>> for calling _after_ data access. Big difference might makes a chance of
>> portability issue.
> 
> You're right. NOREUSE is designed to implement drop behind policy.

Hmm fair enough.
NOREUSE is meant for specifying you _will_ need the data _once_

Isn't this what rsync actually wants though?
I.E. to specify NOREUSE for the file up front
so it would drop from cache automatically as processed,
(if not already in cache).

I realize that would be a more invasive patch.

> I'll post a new patch that will plug this logic in DONTNEED (like the
> presious version), but without breaking the old /proc/sys/vm/drop_caches
> behavior.

But will that break existing apps (running as root) that expect DONTNEED
to drop cache for a _file_.  Perhaps posix_fadvise() is meant to have
process rather than system scope, but that has not been the case until now.

cheers,
Padraig.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-06-27 10:17       ` Pádraig Brady
@ 2011-06-27 10:29         ` Andrea Righi
  -1 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27 10:29 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: KOSAKI Motohiro, akpm, minchan.kim, riel, peterz, hannes,
	kamezawa.hiroyu, aarcange, hughd, jamesjer, marcus, matt, tytso,
	shaohua.li, linux-mm, linux-kernel

On Mon, Jun 27, 2011 at 11:17:51AM +0100, Pádraig Brady wrote:
> On 27/06/11 08:11, Andrea Righi wrote:
> > On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
> >> (2011/06/24 22:49), Andrea Righi wrote:
> >>> There were some reported problems in the past about trashing page cache
> >>> when a backup software (i.e., rsync) touches a huge amount of pages (see
> >>> for example [1]).
> >>>
> >>> This problem has been almost fixed by the Minchan Kim's patch [2] and a
> >>> proper use of fadvise() in the backup software. For example this patch
> >>> set [3] has been proposed for inclusion in rsync.
> >>>
> >>> However, there can be still other similar trashing problems: when the
> >>> backup software reads all the source files, some of them may be part of
> >>> the actual working set of the system. When a POSIX_FADV_DONTNEED is
> >>> performed _all_ pages are evicted from pagecache, both the working set
> >>> and the use-once pages touched only by the backup software.
> >>>
> >>> A previous proposal [4] tried to resolve this problem being less
> >>> agressive in invalidating active pages, moving them to the inactive list
> >>> intead of just evict them from the page cache.
> >>>
> >>> However, this approach changed completely the old behavior of
> >>> invalidate_mapping_pages(), that is not only used by fadvise.
> >>>
> >>> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> >>> invalidation policy.
> >>>
> >>> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> >>> inactive list, and pages in the inactive list are just removed from page
> >>> cache. Pages mapped by other processes or unevictable pages are not
> >>> touched at all.
> >>>
> >>> In this way if the backup was the only user of a page, that page will be
> >>> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> >>> If the page was also touched by other tasks it'll be moved to the
> >>> inactive list, having another chance of being re-added to the working
> >>> set, or simply reclaimed when memory is needed.
> >>>
> >>> In conclusion, now userspace applications that want to drop some page
> >>> cache pages can choose between the following advices:
> >>>
> >>>  POSIX_FADV_DONTNEED = drop page cache if possible
> >>>  POSIX_FADV_NOREUSE = reduce page cache eligibility
> >>
> >> Eeek.
> >>
> >> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
> >> POSIX says,
> >>
> >>        POSIX_FADV_NOREUSE
> >>               Specifies that the application expects to access the specified data once  and  then
> >>               not reuse it thereafter.
> >>
> >> IfI understand correctly, it designed for calling _before_ data access
> >> and to be expected may prevent lru activation. But your NORESE is designed
> >> for calling _after_ data access. Big difference might makes a chance of
> >> portability issue.
> > 
> > You're right. NOREUSE is designed to implement drop behind policy.
> 
> Hmm fair enough.
> NOREUSE is meant for specifying you _will_ need the data _once_
> 
> Isn't this what rsync actually wants though?
> I.E. to specify NOREUSE for the file up front
> so it would drop from cache automatically as processed,
> (if not already in cache).
> 
> I realize that would be a more invasive patch.
> 
> > I'll post a new patch that will plug this logic in DONTNEED (like the
> > presious version), but without breaking the old /proc/sys/vm/drop_caches
> > behavior.
> 
> But will that break existing apps (running as root) that expect DONTNEED
> to drop cache for a _file_.  Perhaps posix_fadvise() is meant to have
> process rather than system scope, but that has not been the case until now.

The actual problem I think is that apps expect that DONTNEED can be used
to drop cache, but this is not written anywhere in the POSIX standard.

I would also like to have both functionalities: 1) be sure to drop page
cache pages (now there's only a system-wide knob to do this:
/proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
not reuse some pages in the future.

The standard can only provide 2). If we also want 1) at the file
granularity, I think we'd need to introduce something linux specific to
avoid having portability problems.

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-27 10:29         ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27 10:29 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: KOSAKI Motohiro, akpm, minchan.kim, riel, peterz, hannes,
	kamezawa.hiroyu, aarcange, hughd, jamesjer, marcus, matt, tytso,
	shaohua.li, linux-mm, linux-kernel

On Mon, Jun 27, 2011 at 11:17:51AM +0100, Padraig Brady wrote:
> On 27/06/11 08:11, Andrea Righi wrote:
> > On Mon, Jun 27, 2011 at 12:04:41PM +0900, KOSAKI Motohiro wrote:
> >> (2011/06/24 22:49), Andrea Righi wrote:
> >>> There were some reported problems in the past about trashing page cache
> >>> when a backup software (i.e., rsync) touches a huge amount of pages (see
> >>> for example [1]).
> >>>
> >>> This problem has been almost fixed by the Minchan Kim's patch [2] and a
> >>> proper use of fadvise() in the backup software. For example this patch
> >>> set [3] has been proposed for inclusion in rsync.
> >>>
> >>> However, there can be still other similar trashing problems: when the
> >>> backup software reads all the source files, some of them may be part of
> >>> the actual working set of the system. When a POSIX_FADV_DONTNEED is
> >>> performed _all_ pages are evicted from pagecache, both the working set
> >>> and the use-once pages touched only by the backup software.
> >>>
> >>> A previous proposal [4] tried to resolve this problem being less
> >>> agressive in invalidating active pages, moving them to the inactive list
> >>> intead of just evict them from the page cache.
> >>>
> >>> However, this approach changed completely the old behavior of
> >>> invalidate_mapping_pages(), that is not only used by fadvise.
> >>>
> >>> The new solution maps POSIX_FADV_NOREUSE to the less-agressive page
> >>> invalidation policy.
> >>>
> >>> With POSIX_FADV_NOREUSE active pages are moved to the tail of the
> >>> inactive list, and pages in the inactive list are just removed from page
> >>> cache. Pages mapped by other processes or unevictable pages are not
> >>> touched at all.
> >>>
> >>> In this way if the backup was the only user of a page, that page will be
> >>> immediately removed from the page cache by calling POSIX_FADV_NOREUSE.
> >>> If the page was also touched by other tasks it'll be moved to the
> >>> inactive list, having another chance of being re-added to the working
> >>> set, or simply reclaimed when memory is needed.
> >>>
> >>> In conclusion, now userspace applications that want to drop some page
> >>> cache pages can choose between the following advices:
> >>>
> >>>  POSIX_FADV_DONTNEED = drop page cache if possible
> >>>  POSIX_FADV_NOREUSE = reduce page cache eligibility
> >>
> >> Eeek.
> >>
> >> Your POSIX_FADV_NOREUSE is very different from POSIX definition.
> >> POSIX says,
> >>
> >>        POSIX_FADV_NOREUSE
> >>               Specifies that the application expects to access the specified data once  and  then
> >>               not reuse it thereafter.
> >>
> >> IfI understand correctly, it designed for calling _before_ data access
> >> and to be expected may prevent lru activation. But your NORESE is designed
> >> for calling _after_ data access. Big difference might makes a chance of
> >> portability issue.
> > 
> > You're right. NOREUSE is designed to implement drop behind policy.
> 
> Hmm fair enough.
> NOREUSE is meant for specifying you _will_ need the data _once_
> 
> Isn't this what rsync actually wants though?
> I.E. to specify NOREUSE for the file up front
> so it would drop from cache automatically as processed,
> (if not already in cache).
> 
> I realize that would be a more invasive patch.
> 
> > I'll post a new patch that will plug this logic in DONTNEED (like the
> > presious version), but without breaking the old /proc/sys/vm/drop_caches
> > behavior.
> 
> But will that break existing apps (running as root) that expect DONTNEED
> to drop cache for a _file_.  Perhaps posix_fadvise() is meant to have
> process rather than system scope, but that has not been the case until now.

The actual problem I think is that apps expect that DONTNEED can be used
to drop cache, but this is not written anywhere in the POSIX standard.

I would also like to have both functionalities: 1) be sure to drop page
cache pages (now there's only a system-wide knob to do this:
/proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
not reuse some pages in the future.

The standard can only provide 2). If we also want 1) at the file
granularity, I think we'd need to introduce something linux specific to
avoid having portability problems.

-Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-06-27 10:29         ` Andrea Righi
@ 2011-06-27 11:53           ` Pádraig Brady
  -1 siblings, 0 replies; 31+ messages in thread
From: Pádraig Brady @ 2011-06-27 11:53 UTC (permalink / raw)
  To: Andrea Righi
  Cc: KOSAKI Motohiro, akpm, minchan.kim, riel, peterz, hannes,
	kamezawa.hiroyu, aarcange, hughd, jamesjer, marcus, matt, tytso,
	shaohua.li, linux-mm, linux-kernel

On 27/06/11 11:29, Andrea Righi wrote:
> The actual problem I think is that apps expect that DONTNEED can be used
> to drop cache, but this is not written anywhere in the POSIX standard.
> 
> I would also like to have both functionalities: 1) be sure to drop page
> cache pages (now there's only a system-wide knob to do this:
> /proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
> not reuse some pages in the future.
> 
> The standard can only provide 2). If we also want 1) at the file
> granularity, I think we'd need to introduce something linux specific to
> avoid having portability problems.

True, though Linux is the reference for posix_fadvise() implementations,
given its lack of support on other platforms.

So just to summarize for _my_ reference.
You're changing DONTNEED to mean "drop if !PageActive()".
I.E. according to http://linux-mm.org/PageReplacementDesign
"drop if files only accessed once".

This will mean that there is no way currently to
remove a particular file from the cache on linux.
Hopefully that won't affect any of:
http://codesearch.google.com/#search/&q=POSIX_FADV_DONTNEED

Ideally I'd like cache functions for:
 DROP, ADD, ADD if space¹
which could correspond to:
 DONTNEED, WILLNEED, NOREUSE
but what we're going for are these somewhat overlapping functions:
 DROP if used once², ADD, ADD if space

cheers,
Pádraig.

¹ Not implemented yet.

² Hopefully there are no access patterns a single
process can do to make a PageActive as that would
probably not be desired in relation to "Drop if used once"
functionality.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-27 11:53           ` Pádraig Brady
  0 siblings, 0 replies; 31+ messages in thread
From: Pádraig Brady @ 2011-06-27 11:53 UTC (permalink / raw)
  To: Andrea Righi
  Cc: KOSAKI Motohiro, akpm, minchan.kim, riel, peterz, hannes,
	kamezawa.hiroyu, aarcange, hughd, jamesjer, marcus, matt, tytso,
	shaohua.li, linux-mm, linux-kernel

On 27/06/11 11:29, Andrea Righi wrote:
> The actual problem I think is that apps expect that DONTNEED can be used
> to drop cache, but this is not written anywhere in the POSIX standard.
> 
> I would also like to have both functionalities: 1) be sure to drop page
> cache pages (now there's only a system-wide knob to do this:
> /proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
> not reuse some pages in the future.
> 
> The standard can only provide 2). If we also want 1) at the file
> granularity, I think we'd need to introduce something linux specific to
> avoid having portability problems.

True, though Linux is the reference for posix_fadvise() implementations,
given its lack of support on other platforms.

So just to summarize for _my_ reference.
You're changing DONTNEED to mean "drop if !PageActive()".
I.E. according to http://linux-mm.org/PageReplacementDesign
"drop if files only accessed once".

This will mean that there is no way currently to
remove a particular file from the cache on linux.
Hopefully that won't affect any of:
http://codesearch.google.com/#search/&q=POSIX_FADV_DONTNEED

Ideally I'd like cache functions for:
 DROP, ADD, ADD if space1
which could correspond to:
 DONTNEED, WILLNEED, NOREUSE
but what we're going for are these somewhat overlapping functions:
 DROP if used once2, ADD, ADD if space

cheers,
Padraig.

1 Not implemented yet.

2 Hopefully there are no access patterns a single
process can do to make a PageActive as that would
probably not be desired in relation to "Drop if used once"
functionality.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-06-27 11:53           ` Pádraig Brady
@ 2011-06-27 12:39             ` Andrea Righi
  -1 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27 12:39 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: KOSAKI Motohiro, akpm, minchan.kim, riel, peterz, hannes,
	kamezawa.hiroyu, aarcange, hughd, jamesjer, marcus, matt, tytso,
	shaohua.li, linux-mm, linux-kernel

On Mon, Jun 27, 2011 at 12:53:53PM +0100, Pádraig Brady wrote:
> On 27/06/11 11:29, Andrea Righi wrote:
> > The actual problem I think is that apps expect that DONTNEED can be used
> > to drop cache, but this is not written anywhere in the POSIX standard.
> > 
> > I would also like to have both functionalities: 1) be sure to drop page
> > cache pages (now there's only a system-wide knob to do this:
> > /proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
> > not reuse some pages in the future.
> > 
> > The standard can only provide 2). If we also want 1) at the file
> > granularity, I think we'd need to introduce something linux specific to
> > avoid having portability problems.
> 
> True, though Linux is the reference for posix_fadvise() implementations,
> given its lack of support on other platforms.
> 
> So just to summarize for _my_ reference.
> You're changing DONTNEED to mean "drop if !PageActive()".
> I.E. according to http://linux-mm.org/PageReplacementDesign
> "drop if files only accessed once".

Drop if pages were only accessed once, they're not mapped by any other
process and they're not unevictable.

> 
> This will mean that there is no way currently to
> remove a particular file from the cache on linux.

Correct. There's not a way to do this for a single file (except running
POSIX_FADV_DONTNEED twice...).

> Hopefully that won't affect any of:
> http://codesearch.google.com/#search/&q=POSIX_FADV_DONTNEED
> 
> Ideally I'd like cache functions for:
>  DROP, ADD, ADD if space¹
> which could correspond to:
>  DONTNEED, WILLNEED, NOREUSE
> but what we're going for are these somewhat overlapping functions:
>  DROP if used once², ADD, ADD if space

IIUC, NOREUSE means "the application will use this range of the file
once". It's something that we do _before_ accessing the file.  And the
kernel needs to remember the ranges of NOREUSE data for each file, so
that page cache can be immediately dropped after the data has been
accessed (if possible).

-Andrea

> 
> cheers,
> Pádraig.
> 
> ¹ Not implemented yet.
> 
> ² Hopefully there are no access patterns a single
> process can do to make a PageActive as that would
> probably not be desired in relation to "Drop if used once"
> functionality.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
@ 2011-06-27 12:39             ` Andrea Righi
  0 siblings, 0 replies; 31+ messages in thread
From: Andrea Righi @ 2011-06-27 12:39 UTC (permalink / raw)
  To: Pádraig Brady
  Cc: KOSAKI Motohiro, akpm, minchan.kim, riel, peterz, hannes,
	kamezawa.hiroyu, aarcange, hughd, jamesjer, marcus, matt, tytso,
	shaohua.li, linux-mm, linux-kernel

On Mon, Jun 27, 2011 at 12:53:53PM +0100, Padraig Brady wrote:
> On 27/06/11 11:29, Andrea Righi wrote:
> > The actual problem I think is that apps expect that DONTNEED can be used
> > to drop cache, but this is not written anywhere in the POSIX standard.
> > 
> > I would also like to have both functionalities: 1) be sure to drop page
> > cache pages (now there's only a system-wide knob to do this:
> > /proc/sys/vm/drop_caches), 2) give an advice to the kernel that I will
> > not reuse some pages in the future.
> > 
> > The standard can only provide 2). If we also want 1) at the file
> > granularity, I think we'd need to introduce something linux specific to
> > avoid having portability problems.
> 
> True, though Linux is the reference for posix_fadvise() implementations,
> given its lack of support on other platforms.
> 
> So just to summarize for _my_ reference.
> You're changing DONTNEED to mean "drop if !PageActive()".
> I.E. according to http://linux-mm.org/PageReplacementDesign
> "drop if files only accessed once".

Drop if pages were only accessed once, they're not mapped by any other
process and they're not unevictable.

> 
> This will mean that there is no way currently to
> remove a particular file from the cache on linux.

Correct. There's not a way to do this for a single file (except running
POSIX_FADV_DONTNEED twice...).

> Hopefully that won't affect any of:
> http://codesearch.google.com/#search/&q=POSIX_FADV_DONTNEED
> 
> Ideally I'd like cache functions for:
>  DROP, ADD, ADD if space1
> which could correspond to:
>  DONTNEED, WILLNEED, NOREUSE
> but what we're going for are these somewhat overlapping functions:
>  DROP if used once2, ADD, ADD if space

IIUC, NOREUSE means "the application will use this range of the file
once". It's something that we do _before_ accessing the file.  And the
kernel needs to remember the ranges of NOREUSE data for each file, so
that page cache can be immediately dropped after the data has been
accessed (if possible).

-Andrea

> 
> cheers,
> Padraig.
> 
> 1 Not implemented yet.
> 
> 2 Hopefully there are no access patterns a single
> process can do to make a PageActive as that would
> probably not be desired in relation to "Drop if used once"
> functionality.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-06-27 12:39             ` Andrea Righi
  (?)
@ 2011-07-12 21:52             ` Patrick J. LoPresti
  2011-07-12 22:22               ` Andrea Righi
  -1 siblings, 1 reply; 31+ messages in thread
From: Patrick J. LoPresti @ 2011-07-12 21:52 UTC (permalink / raw)
  To: Andrea Righi; +Cc: linux-kernel, akpm


> IIUC, NOREUSE means "the application will use this range of the file
> once". It's something that we do _before_ accessing the file.  And the
> kernel needs to remember the ranges of NOREUSE data for each file, so
> that page cache can be immediately dropped after the data has been
> accessed (if possible).

I am no expert on the Linux page cache, but my applications have a great
interest in exercising some control over it...

Could NOREUSE be as simple as setting a bit on the page that means
"never mark this page active"?

Or more conservatively, "clear this bit before marking the page active"?

So POSIX_FADV_NOREUSE would set the bit on the page.  Then any operation
that would normally mark the page active would instead merely clear the
bit.  This would keep the page on the inactive list _after_ the first
read and allow it to be reclaimed, which is at least in the "spirit" of
NOREUSE.

 - Pat

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-07-12 21:52             ` Patrick J. LoPresti
@ 2011-07-12 22:22               ` Andrea Righi
  2011-07-13  0:36                 ` Patrick J. LoPresti
  0 siblings, 1 reply; 31+ messages in thread
From: Andrea Righi @ 2011-07-12 22:22 UTC (permalink / raw)
  To: Patrick J. LoPresti; +Cc: linux-kernel, akpm

On Tue, Jul 12, 2011 at 02:52:49PM -0700, Patrick J. LoPresti wrote:
> 
> > IIUC, NOREUSE means "the application will use this range of the file
> > once". It's something that we do _before_ accessing the file.  And the
> > kernel needs to remember the ranges of NOREUSE data for each file, so
> > that page cache can be immediately dropped after the data has been
> > accessed (if possible).
> 
> I am no expert on the Linux page cache, but my applications have a great
> interest in exercising some control over it...
> 
> Could NOREUSE be as simple as setting a bit on the page that means
> "never mark this page active"?
> 
> Or more conservatively, "clear this bit before marking the page active"?
> 
> So POSIX_FADV_NOREUSE would set the bit on the page.  Then any operation
> that would normally mark the page active would instead merely clear the
> bit.  This would keep the page on the inactive list _after_ the first
> read and allow it to be reclaimed, which is at least in the "spirit" of
> NOREUSE.

If the file data is not in memory you can't set a bit in any struct
page. You could even open a file and execute POSIX_FADV_NOREUSE without
reading any page from the file.

I think it would be better to maintain a list of file offset/length
structures per file descriptor or (as a starting solution) even mark the
entire file as non-cacheable without considering the ranges.

-Andrea

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE
  2011-07-12 22:22               ` Andrea Righi
@ 2011-07-13  0:36                 ` Patrick J. LoPresti
  0 siblings, 0 replies; 31+ messages in thread
From: Patrick J. LoPresti @ 2011-07-13  0:36 UTC (permalink / raw)
  To: Andrea Righi; +Cc: linux-kernel, akpm

On Tue, Jul 12, 2011 at 3:22 PM, Andrea Righi <andrea@betterlinux.com> wrote:
>
> If the file data is not in memory you can't set a bit in any struct
> page. You could even open a file and execute POSIX_FADV_NOREUSE without
> reading any page from the file.

Mmm, too bad.

> I think it would be better to maintain a list of file offset/length
> structures per file descriptor or (as a starting solution) even mark the
> entire file as non-cacheable without considering the ranges.

I recommend against the "whole file" approach.  In my application (for
instance), I deal with files in the hundreds of gigabytes.  Marking
the whole file non-cacheable, when I was just trying to give a hint
about a few hundred megs, would be a total disaster.

I would suggest implementing it right or leaving it as a no-op.  Just my $0.02.

Thanks for working on this.  Better support for these page cache hints
will be great to see.

 - Pat

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2011-07-13  0:36 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-24 13:49 [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE Andrea Righi
2011-06-24 13:49 ` Andrea Righi
2011-06-24 13:49 ` [PATCH v3 1/2] mm: introduce __invalidate_mapping_pages() Andrea Righi
2011-06-24 13:49   ` Andrea Righi
2011-06-26 22:46   ` Rik van Riel
2011-06-26 22:46     ` Rik van Riel
2011-06-27  7:05     ` Andrea Righi
2011-06-27  7:05       ` Andrea Righi
2011-06-24 13:49 ` [PATCH v3 2/2] fadvise: implement POSIX_FADV_NOREUSE Andrea Righi
2011-06-24 13:49   ` Andrea Righi
2011-06-26 22:47   ` Rik van Riel
2011-06-26 22:47     ` Rik van Riel
2011-06-27  7:05     ` Andrea Righi
2011-06-27  7:05       ` Andrea Righi
2011-06-27  3:04 ` [PATCH v3 0/2] fadvise: support POSIX_FADV_NOREUSE KOSAKI Motohiro
2011-06-27  3:04   ` KOSAKI Motohiro
2011-06-27  7:11   ` Andrea Righi
2011-06-27  7:11     ` Andrea Righi
2011-06-27  7:42     ` KOSAKI Motohiro
2011-06-27  7:42       ` KOSAKI Motohiro
2011-06-27 10:17     ` Pádraig Brady
2011-06-27 10:17       ` Pádraig Brady
2011-06-27 10:29       ` Andrea Righi
2011-06-27 10:29         ` Andrea Righi
2011-06-27 11:53         ` Pádraig Brady
2011-06-27 11:53           ` Pádraig Brady
2011-06-27 12:39           ` Andrea Righi
2011-06-27 12:39             ` Andrea Righi
2011-07-12 21:52             ` Patrick J. LoPresti
2011-07-12 22:22               ` Andrea Righi
2011-07-13  0:36                 ` Patrick J. LoPresti

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.