linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] [RFC] Fallocate Volatile Ranges
@ 2012-05-25 19:17 John Stultz
  2012-05-25 19:17 ` [PATCH 1/4] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE John Stultz
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: John Stultz @ 2012-05-25 19:17 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Aneesh Kumar K.V, Taras Glek, Mike Hommey

So here's the first pass at re-implementing the volatile range
idea using the fallocate interface, as suggested by Dave Chinner.

The range tree code is mostly unchanged.

The volatile range management code has been somewhat abstracted
out as helper functions that allow filesystems to get the same
logic but leaving it to the filesystem to handle the page purging.

Finally there is the tmpfs enablement for FALLOC_FL_MARK_VOLATILE.

This is all done on top of Hugh/Cong's tmpfs FALLOC_FL_PUNCH_HOLE
enablement.

I've only managed to do minimal testing at this point, but I wanted
to get this out before the long-weekend holiday in the US.

Few of the things still on my todo list:
* Move the volatile mapping hash to be per volatile_fs_head
* Kill the extra hash locking
* Keep track of unpurged pages in volatile_fs_head, to avoid
running through the lru at shrink time.

Your thoughts and feedback will be appreciated!

thanks
-john


CC: Andrew Morton <akpm@linux-foundation.org>
CC: Android Kernel Team <kernel-team@android.com>
CC: Robert Love <rlove@google.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Hugh Dickins <hughd@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Dave Chinner <david@fromorbit.com>
CC: Neil Brown <neilb@suse.de>
CC: Andrea Righi <andrea@betterlinux.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Taras Glek <tgek@mozilla.com>
CC: Mike Hommey <mh@glandium.org>


Hugh Dickins (1):
  tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE

John Stultz (3):
  [RFC] Range tree implementation
  [RFC] Add volatile range management code
  [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers

 fs/open.c                 |    3 +-
 include/linux/falloc.h    |    7 +-
 include/linux/rangetree.h |   53 +++++
 include/linux/volatile.h  |   43 ++++
 lib/Makefile              |    2 +-
 lib/rangetree.c           |  107 ++++++++++
 mm/Makefile               |    2 +-
 mm/shmem.c                |  175 +++++++++++++++-
 mm/volatile.c             |  493 +++++++++++++++++++++++++++++++++++++++++++++
 9 files changed, 868 insertions(+), 17 deletions(-)
 create mode 100644 include/linux/rangetree.h
 create mode 100644 include/linux/volatile.h
 create mode 100644 lib/rangetree.c
 create mode 100644 mm/volatile.c

-- 
1.7.3.2.146.gca209


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE
  2012-05-25 19:17 [PATCH 0/4] [RFC] Fallocate Volatile Ranges John Stultz
@ 2012-05-25 19:17 ` John Stultz
  2012-05-25 19:17 ` [PATCH 2/4] [RFC] Range tree implementation John Stultz
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: John Stultz @ 2012-05-25 19:17 UTC (permalink / raw)
  To: LKML
  Cc: Hugh Dickins, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Dave Hansen, Rik van Riel, Dmitry Adamushko,
	Dave Chinner, Neil Brown, Andrea Righi, Aneesh Kumar K.V,
	Taras Glek, Mike Hommey, John Stultz

From: Hugh Dickins <hughd@google.com>

tmpfs has supported hole-punching since 2.6.16, via madvise(,,MADV_REMOVE).
But nowadays fallocate(,FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,,) is the
agreed way to punch holes.

So add shmem_fallocate() to support that, and tweak shmem_truncate_range()
to support partial pages at both the beginning and end of range (never
needed for madvise, which demands rounded addr and rounds up length).

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Android Kernel Team <kernel-team@android.com>
CC: Robert Love <rlove@google.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Hugh Dickins <hughd@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Dave Chinner <david@fromorbit.com>
CC: Neil Brown <neilb@suse.de>
CC: Andrea Righi <andrea@betterlinux.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Taras Glek <tgek@mozilla.com>
CC: Mike Hommey <mh@glandium.org>

Based-on-patch-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 mm/shmem.c |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 files changed, 57 insertions(+), 11 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index d7b433a..9b1c6b4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -53,6 +53,7 @@ static struct vfsmount *shm_mnt;
 #include <linux/blkdev.h>
 #include <linux/pagevec.h>
 #include <linux/percpu_counter.h>
+#include <linux/falloc.h>
 #include <linux/splice.h>
 #include <linux/security.h>
 #include <linux/swapops.h>
@@ -429,21 +430,23 @@ void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 	struct address_space *mapping = inode->i_mapping;
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	pgoff_t start = (lstart + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
-	unsigned partial = lstart & (PAGE_CACHE_SIZE - 1);
-	pgoff_t end = (lend >> PAGE_CACHE_SHIFT);
+	pgoff_t end = (lend + 1) >> PAGE_CACHE_SHIFT;
+	unsigned int partial_start = lstart & (PAGE_CACHE_SIZE - 1);
+	unsigned int partial_end = (lend + 1) & (PAGE_CACHE_SIZE - 1);
 	struct pagevec pvec;
 	pgoff_t indices[PAGEVEC_SIZE];
 	long nr_swaps_freed = 0;
 	pgoff_t index;
 	int i;
 
-	BUG_ON((lend & (PAGE_CACHE_SIZE - 1)) != (PAGE_CACHE_SIZE - 1));
+	if (lend == -1)
+		end = -1;	/* unsigned, so actually very big */
 
 	pagevec_init(&pvec, 0);
 	index = start;
-	while (index <= end) {
+	while (index < end) {
 		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
+				min(end - index, (pgoff_t)PAGEVEC_SIZE),
 							pvec.pages, indices);
 		if (!pvec.nr)
 			break;
@@ -452,7 +455,7 @@ void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 			struct page *page = pvec.pages[i];
 
 			index = indices[i];
-			if (index > end)
+			if (index >= end)
 				break;
 
 			if (radix_tree_exceptional_entry(page)) {
@@ -476,22 +479,39 @@ void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 		index++;
 	}
 
-	if (partial) {
+	if (partial_start) {
 		struct page *page = NULL;
 		shmem_getpage(inode, start - 1, &page, SGP_READ, NULL);
 		if (page) {
-			zero_user_segment(page, partial, PAGE_CACHE_SIZE);
+			unsigned int top = PAGE_CACHE_SIZE;
+			if (start > end) {
+				top = partial_end;
+				partial_end = 0;
+			}
+			zero_user_segment(page, partial_start, top);
+			set_page_dirty(page);
+			unlock_page(page);
+			page_cache_release(page);
+		}
+	}
+	if (partial_end) {
+		struct page *page = NULL;
+		shmem_getpage(inode, end, &page, SGP_READ, NULL);
+		if (page) {
+			zero_user_segment(page, 0, partial_end);
 			set_page_dirty(page);
 			unlock_page(page);
 			page_cache_release(page);
 		}
 	}
+	if (start >= end)
+		return;
 
 	index = start;
 	for ( ; ; ) {
 		cond_resched();
 		pvec.nr = shmem_find_get_pages_and_swap(mapping, index,
-			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
+				min(end - index, (pgoff_t)PAGEVEC_SIZE),
 							pvec.pages, indices);
 		if (!pvec.nr) {
 			if (index == start)
@@ -499,7 +519,7 @@ void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 			index = start;
 			continue;
 		}
-		if (index == start && indices[0] > end) {
+		if (index == start && indices[0] >= end) {
 			shmem_deswap_pagevec(&pvec);
 			pagevec_release(&pvec);
 			break;
@@ -509,7 +529,7 @@ void shmem_truncate_range(struct inode *inode, loff_t lstart, loff_t lend)
 			struct page *page = pvec.pages[i];
 
 			index = indices[i];
-			if (index > end)
+			if (index >= end)
 				break;
 
 			if (radix_tree_exceptional_entry(page)) {
@@ -1462,6 +1482,31 @@ static ssize_t shmem_file_splice_read(struct file *in, loff_t *ppos,
 	return error;
 }
 
+static long shmem_fallocate(struct file *file, int mode, loff_t offset,
+							 loff_t len)
+{
+	struct inode *inode = file->f_path.dentry->d_inode;
+	int error = -EOPNOTSUPP;
+
+	mutex_lock(&inode->i_mutex);
+
+	if (mode & FALLOC_FL_PUNCH_HOLE) {
+		struct address_space *mapping = file->f_mapping;
+		loff_t unmap_start = round_up(offset, PAGE_SIZE);
+		loff_t unmap_end = round_down(offset + len, PAGE_SIZE) - 1;
+
+		if ((u64)unmap_end > (u64)unmap_start)
+			unmap_mapping_range(mapping, unmap_start,
+					    1 + unmap_end - unmap_start, 0);
+		shmem_truncate_range(inode, offset, offset + len - 1);
+		/* No need to unmap again: hole-punching leaves COWed pages */
+		error = 0;
+	}
+
+	mutex_unlock(&inode->i_mutex);
+	return error;
+}
+
 static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
 {
 	struct shmem_sb_info *sbinfo = SHMEM_SB(dentry->d_sb);
@@ -2372,6 +2417,7 @@ static const struct file_operations shmem_file_operations = {
 	.fsync		= noop_fsync,
 	.splice_read	= shmem_file_splice_read,
 	.splice_write	= generic_file_splice_write,
+	.fallocate	= shmem_fallocate,
 #endif
 };
 
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] [RFC] Range tree implementation
  2012-05-25 19:17 [PATCH 0/4] [RFC] Fallocate Volatile Ranges John Stultz
  2012-05-25 19:17 ` [PATCH 1/4] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE John Stultz
@ 2012-05-25 19:17 ` John Stultz
  2012-05-31 20:48   ` Jan Kara
  2012-05-25 19:17 ` [PATCH 3/4] [RFC] Add volatile range management code John Stultz
  2012-05-25 19:17 ` [PATCH 4/4] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz
  3 siblings, 1 reply; 9+ messages in thread
From: John Stultz @ 2012-05-25 19:17 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Aneesh Kumar K.V, Taras Glek, Mike Hommey

After Andrew suggested something like his mumbletree idea
to better store a list of ranges, I worked on a few different
approaches, and this is what I've finally managed to get working.

I suspect range-tree isn't a totally accurate name, but I
couldn't quite make out the difference between range trees
and interval trees, so I just picked one to call it. Do
let me know if you have a better name.

The idea of storing ranges in a tree is nice, but has a number
of complications. When adding a range, its possible that a
large range will consume and merge a number of smaller ranges.
When removing a range, its possible you may end up splitting an
existing range, causing one range to become two. This makes it
very difficult to provide generic list_head like behavior, as
the parent structures would need to be duplicated and removed,
and that has lots of memory ownership issues.

So, this is a much simplified and more list_head like
implementation. You can add a node to a tree, or remove a node
to a tree, but the generic implementation doesn't do the
merging or splitting for you. But it does provide helpers to
find overlapping and adjacent ranges.

Andrew also really wanted this range-tree implementation to be
resuable so we don't duplicate the file locking logic. I'm not
totally convinced that the requirements between the volatile
ranges and file locking are really equivelent, but this reduced
impelementation may make it possible.

Changelog:
v2:
* Reworked code to use an rbtree instead of splaying

v3:
* Added range_tree_next_in_range() to avoid having to start
  lookups from the root every time.
* Fixed some comments and return NULL instead of 0, as suggested
  by Aneesh Kumar K.V

v6:
* Fixed range_tree_in_range() so that it finds the earliest range,
  rather then the first. This allows the next_in_range() function
  to properly cover all the ranges in the tree.
* Minor clenaups to simplify some of the functions

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Android Kernel Team <kernel-team@android.com>
CC: Robert Love <rlove@google.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Hugh Dickins <hughd@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Dave Chinner <david@fromorbit.com>
CC: Neil Brown <neilb@suse.de>
CC: Andrea Righi <andrea@betterlinux.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Taras Glek <tgek@mozilla.com>
CC: Mike Hommey <mh@glandium.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/rangetree.h |   53 ++++++++++++++++++++++
 lib/Makefile              |    2 +-
 lib/rangetree.c           |  107 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 161 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/rangetree.h
 create mode 100644 lib/rangetree.c

diff --git a/include/linux/rangetree.h b/include/linux/rangetree.h
new file mode 100644
index 0000000..9fa24de
--- /dev/null
+++ b/include/linux/rangetree.h
@@ -0,0 +1,53 @@
+#ifndef _LINUX_RANGETREE_H
+#define _LINUX_RANGETREE_H
+
+#include <linux/types.h>
+#include <linux/rbtree.h>
+
+struct range_tree_node {
+	struct rb_node rb;
+	u64 start;
+	u64 end;
+};
+
+struct range_tree_root {
+	struct rb_root head;
+};
+
+static inline void range_tree_init(struct range_tree_root *root)
+{
+	root->head = RB_ROOT;
+}
+
+static inline void range_tree_node_init(struct range_tree_node *node)
+{
+	rb_init_node(&node->rb);
+	node->start = 0;
+	node->end = 0;
+}
+
+static inline int range_tree_empty(struct range_tree_root *root)
+{
+	return RB_EMPTY_ROOT(&root->head);
+}
+
+static inline
+struct range_tree_node *range_tree_root_node(struct range_tree_root *root)
+{
+	struct range_tree_node *ret;
+	ret = container_of(root->head.rb_node, struct range_tree_node, rb);
+	return ret;
+}
+
+extern struct range_tree_node *range_tree_in_range(struct range_tree_root *root,
+							 u64 start, u64 end);
+extern struct range_tree_node *range_tree_next_in_range(
+						struct range_tree_node *node,
+						u64 start, u64 end);
+extern void range_tree_add(struct range_tree_root *root,
+						struct range_tree_node *node);
+extern void range_tree_remove(struct range_tree_root *root,
+						struct range_tree_node *node);
+#endif
+
+
diff --git a/lib/Makefile b/lib/Makefile
index 74290c9..a636a2d 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -12,7 +12,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 idr.o int_sqrt.o extable.o prio_tree.o \
 	 sha1.o md5.o irq_regs.o reciprocal_div.o argv_split.o \
 	 proportions.o prio_heap.o ratelimit.o show_mem.o \
-	 is_single_threaded.o plist.o decompress.o
+	 is_single_threaded.o plist.o decompress.o rangetree.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
diff --git a/lib/rangetree.c b/lib/rangetree.c
new file mode 100644
index 0000000..7586d37
--- /dev/null
+++ b/lib/rangetree.c
@@ -0,0 +1,107 @@
+#include <linux/rangetree.h>
+#include <linux/kernel.h>
+#include <linux/slab.h>
+
+/**
+ * range_tree_in_range - Returns the first node that overlaps with the
+ *                       given range
+ * @root: range_tree root
+ * @start: range start
+ * @end: range end
+ *
+ */
+struct range_tree_node *range_tree_in_range(struct range_tree_root *root,
+						u64 start, u64 end)
+{
+	struct rb_node *p = root->head.rb_node;
+	struct range_tree_node *candidate, *match = NULL;
+
+	while (p) {
+		candidate = rb_entry(p, struct range_tree_node, rb);
+		if (end < candidate->start)
+			p = p->rb_left;
+		else if (start > candidate->end)
+			p = p->rb_right;
+		else {
+			/* We found one, but try to find an earlier match */
+			match = candidate;
+			p = p->rb_left;
+		}
+	}
+
+	return match;
+}
+
+
+/**
+ * range_tree_next_in_range - Return the next range in a range_tree still
+ *                            contained within a specified range.
+ * @root: range_tree root
+ * @start: range start
+ * @end: range end
+ *
+ */
+struct range_tree_node *range_tree_next_in_range(struct range_tree_node *node,
+							u64 start, u64 end)
+{
+	struct rb_node *next;
+	struct range_tree_node *candidate;
+	if (!node)
+		return NULL;
+	next = rb_next(&node->rb);
+	if (!next)
+		return NULL;
+
+	candidate = container_of(next, struct range_tree_node, rb);
+
+	if ((candidate->start > end) || (candidate->end < start))
+		return NULL;
+
+	return candidate;
+}
+
+/**
+ * range_tree_add - Add a node to a range tree
+ * @root: range tree to be added to
+ * @node: range_tree_node to be added
+ *
+ * Adds a node to the range tree.
+ */
+void range_tree_add(struct range_tree_root *root,
+					struct range_tree_node *node)
+{
+	struct rb_node **p = &root->head.rb_node;
+	struct rb_node *parent = NULL;
+	struct range_tree_node *ptr;
+
+	WARN_ON_ONCE(!RB_EMPTY_NODE(&node->rb));
+
+	while (*p) {
+		parent = *p;
+		ptr = rb_entry(parent, struct range_tree_node, rb);
+		if (node->start < ptr->start)
+			p = &(*p)->rb_left;
+		else
+			p = &(*p)->rb_right;
+	}
+	rb_link_node(&node->rb, parent, p);
+	rb_insert_color(&node->rb, &root->head);
+
+}
+
+
+/**
+ * range_tree_remove: Removes a given node from the tree
+ * @root: root of tree
+ * @node: Node to be removed
+ *
+ * Removes a node and splays the tree
+ */
+void range_tree_remove(struct range_tree_root *root,
+						struct range_tree_node *node)
+{
+	WARN_ON_ONCE(RB_EMPTY_NODE(&node->rb));
+
+	rb_erase(&node->rb, &root->head);
+	RB_CLEAR_NODE(&node->rb);
+}
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] [RFC] Add volatile range management code
  2012-05-25 19:17 [PATCH 0/4] [RFC] Fallocate Volatile Ranges John Stultz
  2012-05-25 19:17 ` [PATCH 1/4] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE John Stultz
  2012-05-25 19:17 ` [PATCH 2/4] [RFC] Range tree implementation John Stultz
@ 2012-05-25 19:17 ` John Stultz
  2012-05-25 19:17 ` [PATCH 4/4] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz
  3 siblings, 0 replies; 9+ messages in thread
From: John Stultz @ 2012-05-25 19:17 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Aneesh Kumar K.V, Taras Glek, Mike Hommey

This patch provides the volatile range management code
that filesystems can utilize when implementing
FALLOC_FL_MARK_VOLATILE.

It tracks a collection of page ranges against a mapping
stored in a range-tree. This code handles coalescing
overlapping and adjacent ranges, as well as splitting
ranges when sub-chunks are removed.

The ranges can be marked purged or unpurged. And there is
a per-fs lru list that tracks all the unpurged ranges for
that fs.

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Android Kernel Team <kernel-team@android.com>
CC: Robert Love <rlove@google.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Hugh Dickins <hughd@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Dave Chinner <david@fromorbit.com>
CC: Neil Brown <neilb@suse.de>
CC: Andrea Righi <andrea@betterlinux.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Taras Glek <tgek@mozilla.com>
CC: Mike Hommey <mh@glandium.org>

Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/volatile.h |   43 ++++
 mm/Makefile              |    2 +-
 mm/volatile.c            |  493 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 537 insertions(+), 1 deletions(-)
 create mode 100644 include/linux/volatile.h
 create mode 100644 mm/volatile.c

diff --git a/include/linux/volatile.h b/include/linux/volatile.h
new file mode 100644
index 0000000..69148d0
--- /dev/null
+++ b/include/linux/volatile.h
@@ -0,0 +1,43 @@
+#ifndef _LINUX_VOLATILE_H
+#define _LINUX_VOLATILE_H
+
+#include <linux/fs.h>
+
+struct volatile_fs_head {
+	struct mutex lock;
+	struct list_head lru_head;
+};
+
+
+#define DEFINE_VOLATILE_FS_HEAD(name) struct volatile_fs_head name = {	\
+	.lock = __MUTEX_INITIALIZER(name.lock),				\
+	.lru_head = LIST_HEAD_INIT(name.lru_head)			\
+}
+
+
+static inline void volatile_range_lock(struct volatile_fs_head *head)
+{
+	mutex_lock(&head->lock);
+}
+
+static inline void volatile_range_unlock(struct volatile_fs_head *head)
+{
+	mutex_unlock(&head->lock);
+}
+
+extern long volatile_range_add(struct volatile_fs_head *head,
+				struct address_space *mapping,
+				pgoff_t start_index, pgoff_t end_index);
+extern long volatile_range_remove(struct volatile_fs_head *head,
+				struct address_space *mapping,
+				pgoff_t start_index, pgoff_t end_index);
+
+extern s64 volatile_range_lru_size(struct volatile_fs_head *head);
+
+extern void volatile_range_clear(struct volatile_fs_head *head,
+					struct address_space *mapping);
+
+extern s64 volatile_ranges_get_last_used(struct volatile_fs_head *head,
+				struct address_space **mapping,
+				loff_t *start, loff_t *end);
+#endif /* _LINUX_VOLATILE_H */
diff --git a/mm/Makefile b/mm/Makefile
index 50ec00e..7b6c7a8 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -13,7 +13,7 @@ obj-y			:= filemap.o mempool.o oom_kill.o fadvise.o \
 			   readahead.o swap.o truncate.o vmscan.o shmem.o \
 			   prio_tree.o util.o mmzone.o vmstat.o backing-dev.o \
 			   page_isolation.o mm_init.o mmu_context.o percpu.o \
-			   $(mmu-y)
+			   volatile.o $(mmu-y)
 obj-y += init-mm.o
 
 ifdef CONFIG_NO_BOOTMEM
diff --git a/mm/volatile.c b/mm/volatile.c
new file mode 100644
index 0000000..13f6a88
--- /dev/null
+++ b/mm/volatile.c
@@ -0,0 +1,493 @@
+/* mm/volatile.c
+ *
+ * Volatile page range managment.
+ *      Copyright 2011 Linaro
+ *
+ * Based on mm/ashmem.c
+ *      by Robert Love <rlove@google.com>
+ *      Copyright (C) 2008 Google, Inc.
+ *
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * The volatile range management is a helper layer on top of the range tree
+ * code, which is used to help filesystems manage page ranges that are volatile.
+ *
+ * These ranges are stored in a per-mapping range tree. Storing both purged and
+ * unpurged ranges connected to that address_space. Unpurged ranges are also
+ * linked together in an lru list that is per-volatile-fs-head (basically
+ * per-filesystem).
+ *
+ * The goal behind volatile ranges is to allow applications to interact
+ * with the kernel's cache management infrastructure.  In particular an
+ * application can say "this memory contains data that might be useful in
+ * the future, but can be reconstructed if necessary, so if the kernel
+ * needs, it can zap and reclaim this memory without having to swap it out.
+ *
+ * The proposed mechanism - at a high level - is for user-space to be able
+ * to say "This memory is volatile" and then later "this memory is no longer
+ * volatile".  If the content of the memory is still available the second
+ * request succeeds.  If not, the memory is marked non-volatile and an
+ * error is returned to denote that the contents have been lost.
+ *
+ * Credits to Neil Brown for the above description.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/fs.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/pagemap.h>
+#include <linux/volatile.h>
+#include <linux/rangetree.h>
+#include <linux/hash.h>
+#include <linux/shmem_fs.h>
+
+
+struct volatile_range {
+	struct list_head	lru;
+	struct range_tree_node	range_node;
+	unsigned int		purged;
+	struct address_space	*mapping;
+};
+
+
+/*
+ * To avoid bloating the address_space structure, we use
+ * a hash structure to map from address_space mappings to
+ * the range_tree root that stores volatile ranges
+ */
+static DEFINE_MUTEX(hash_mutex);
+static struct hlist_head *mapping_hash;
+static long mapping_hash_shift = 8;
+struct mapping_hash_entry {
+	struct range_tree_root root;
+	struct address_space *mapping;
+	struct hlist_node hnode;
+};
+
+
+static inline
+struct range_tree_root *__mapping_to_root(struct address_space *mapping)
+{
+	struct hlist_node *elem;
+	struct mapping_hash_entry *entry;
+	struct range_tree_root *ret = NULL;
+
+	hlist_for_each_entry_rcu(entry, elem,
+			&mapping_hash[hash_ptr(mapping, mapping_hash_shift)],
+				hnode)
+		if (entry->mapping == mapping)
+			ret =  &entry->root;
+
+	return ret;
+}
+
+
+static inline
+struct range_tree_root *mapping_to_root(struct address_space *mapping)
+{
+	struct range_tree_root *ret;
+
+	mutex_lock(&hash_mutex);
+	ret =  __mapping_to_root(mapping);
+	mutex_unlock(&hash_mutex);
+	return ret;
+}
+
+
+static inline
+struct range_tree_root *mapping_allocate_root(struct address_space *mapping)
+{
+	struct mapping_hash_entry *entry;
+	struct range_tree_root *dblchk;
+	struct range_tree_root *ret = NULL;
+
+	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+	if (!entry)
+		return NULL;
+
+	mutex_lock(&hash_mutex);
+	/* Since we dropped the lock, double check that no one has
+	 * created the same hash entry.
+	 */
+	dblchk = __mapping_to_root(mapping);
+	if (dblchk) {
+		kfree(entry);
+		ret = dblchk;
+		goto out;
+	}
+
+	INIT_HLIST_NODE(&entry->hnode);
+	entry->mapping = mapping;
+	range_tree_init(&entry->root);
+
+	hlist_add_head_rcu(&entry->hnode,
+		&mapping_hash[hash_ptr(mapping, mapping_hash_shift)]);
+
+	ret = &entry->root;
+out:
+	mutex_unlock(&hash_mutex);
+	return ret;
+}
+
+
+static inline void mapping_free_root(struct range_tree_root *root)
+{
+	struct mapping_hash_entry *entry;
+
+	mutex_lock(&hash_mutex);
+	entry = container_of(root, struct mapping_hash_entry, root);
+
+	hlist_del_rcu(&entry->hnode);
+	kfree(entry);
+	mutex_unlock(&hash_mutex);
+}
+
+
+/* volatile range helpers */
+static inline void volatile_range_resize(struct volatile_range *range,
+				pgoff_t start_index, pgoff_t end_index)
+{
+	range->range_node.start = start_index;
+	range->range_node.end = end_index;
+}
+
+static struct volatile_range *vrange_alloc(void)
+{
+	struct volatile_range *new;
+
+	new = kzalloc(sizeof(struct volatile_range), GFP_KERNEL);
+	if (!new)
+		return 0;
+	range_tree_node_init(&new->range_node);
+	return new;
+}
+
+static void vrange_del(struct range_tree_root *root,
+				struct volatile_range *vrange)
+{
+	if (!vrange->purged)
+		list_del(&vrange->lru);
+	range_tree_remove(root, &vrange->range_node);
+	kfree(vrange);
+}
+
+
+/**
+ * volatile_range_add: Marks a page interval as volatile
+ * @head: per-fs volatile head
+ * @mapping: address space who's range is being marked volatile
+ * @start_index: Starting page in range to be marked volatile
+ * @end_index: Ending page in range to be marked volatile
+ *
+ * Mark a region as volatile. Coalesces overlapping and neighboring regions.
+ *
+ * Must lock the volatile_fs_head before calling!
+ *
+ * Returns 1 if the range was coalesced with any purged ranges.
+ * Returns 0 on success.
+ */
+long volatile_range_add(struct volatile_fs_head *head,
+				struct address_space *mapping,
+				pgoff_t start_index, pgoff_t end_index)
+{
+	struct volatile_range *new;
+	struct range_tree_node *node;
+	struct volatile_range *vrange;
+	struct range_tree_root *root;
+	int purged = 0;
+	u64 start = (u64)start_index;
+	u64 end = (u64)end_index;
+
+	/* Make sure we're properly locked */
+	WARN_ON(!mutex_is_locked(&head->lock));
+
+	/*
+	 * Because the lock might be held in a shrinker, release
+	 * it during allocation.
+	 */
+	mutex_unlock(&head->lock);
+	new = vrange_alloc();
+	mutex_lock(&head->lock);
+	if (!new)
+		return -ENOMEM;
+
+	root = mapping_to_root(mapping);
+	if (!root) {
+		mutex_unlock(&head->lock);
+		root = mapping_allocate_root(mapping);
+		mutex_lock(&head->lock);
+		if (!root) {
+			kfree(new);
+			return -ENOMEM;
+		}
+	}
+
+	/* First, find any existing ranges that overlap */
+	node = range_tree_in_range(root, start, end);
+	while (node) {
+		/* Already entirely marked volatile, so we're done */
+		if (node->start < start && node->end > end) {
+			/* don't need the allocated value */
+			kfree(new);
+			return purged;
+		}
+
+		/* Grab containing volatile range */
+		vrange = container_of(node, struct volatile_range, range_node);
+
+		/* Resize the new range to cover all overlapping ranges */
+		start = min_t(u64, start, node->start);
+		end = max_t(u64, end, node->end);
+
+		/* Inherit purged state from overlapping ranges */
+		purged |= vrange->purged;
+
+		node = range_tree_next_in_range(&vrange->range_node,
+								start, end);
+		/* Delete the old range, as we consume it */
+		vrange_del(root, vrange);
+	}
+
+	/* Coalesce left-adjacent ranges */
+	node = range_tree_in_range(root, start-1, start);
+	if (node) {
+		vrange = container_of(node, struct volatile_range, range_node);
+		/* Only coalesce if both are either purged or unpurged */
+		if (vrange->purged == purged) {
+			/* resize new range */
+			start = min_t(u64, start, node->start);
+			end = max_t(u64, end, node->end);
+			/* delete old range */
+			vrange_del(root, vrange);
+		}
+	}
+
+	/* Coalesce right-adjacent ranges */
+	node = range_tree_in_range(root, end, end+1);
+	if (node) {
+		vrange = container_of(node, struct volatile_range, range_node);
+		/* Only coalesce if both are either purged or unpurged */
+		if (vrange->purged == purged) {
+			/* resize new range */
+			start = min_t(u64, start, node->start);
+			end = max_t(u64, end, node->end);
+			/* delete old range */
+			vrange_del(root, vrange);
+		}
+	}
+	/* Assign and store the new range in the range tree */
+	new->mapping = mapping;
+	new->range_node.start = start;
+	new->range_node.end = end;
+	new->purged = purged;
+	range_tree_add(root, &new->range_node);
+
+	/* Only add unpurged ranges to LRU */
+	if (!purged)
+		list_add_tail(&new->lru, &head->lru_head);
+
+	return purged;
+}
+
+
+/**
+ * volatile_range_remove: Marks a page interval as nonvolatile
+ * @head: per-fs volatile head
+ * @mapping: address space who's range is being marked nonvolatile
+ * @start_index: Starting page in range to be marked nonvolatile
+ * @end_index: Ending page in range to be marked nonvolatile
+ *
+ * Mark a region as nonvolatile. And remove any contained pages
+ * from the volatile range tree.
+ *
+ * Must lock the volatile_fs_head before calling!
+ *
+ * Returns 1 if any portion of the range was purged.
+ * Returns 0 on success.
+ */
+long volatile_range_remove(struct volatile_fs_head *head,
+				struct address_space *mapping,
+				pgoff_t start_index, pgoff_t end_index)
+{
+	struct volatile_range *new;
+	struct range_tree_node *node;
+	struct range_tree_root *root;
+	int ret		= 0;
+	int used_new	= 0;
+	u64 start	= (u64)start_index;
+	u64 end		= (u64)end_index;
+
+	/* Make sure we're properly locked */
+	WARN_ON(!mutex_is_locked(&head->lock));
+
+	/*
+	 * Because the lock might be held in a shrinker, release
+	 * it during allocation.
+	 */
+	mutex_unlock(&head->lock);
+	new = vrange_alloc();
+	mutex_lock(&head->lock);
+	if (!new)
+		return -ENOMEM;
+
+	root = mapping_to_root(mapping);
+	if (!root)
+		goto out; /* if no range tree root, there's nothing to unmark */
+
+
+	/* Find any overlapping ranges */
+	node = range_tree_in_range(root, start, end);
+	while (node) {
+		struct volatile_range *vrange;
+		vrange = container_of(node, struct volatile_range, range_node);
+
+		ret |= vrange->purged;
+
+		if (start <= node->start && end >= node->end) {
+			/* delete: volatile range is totally within range */
+			node = range_tree_next_in_range(&vrange->range_node,
+								start, end);
+			vrange_del(root, vrange);
+		} else if (node->start >= start) {
+			/* resize: volatile range right-overlaps range */
+			volatile_range_resize(vrange, end+1, node->end);
+			node = range_tree_next_in_range(&vrange->range_node,
+								start, end);
+
+		} else if (node->end <= end) {
+			/* resize: volatile range left-overlaps range */
+			volatile_range_resize(vrange, node->start, start-1);
+			node = range_tree_next_in_range(&vrange->range_node,
+								start, end);
+		} else {
+			/* split: range is totally within a volatile range */
+			used_new = 1; /* we only do this once */
+			new->mapping = mapping;
+			new->range_node.start = end + 1;
+			new->range_node.end = node->end;
+			new->purged = vrange->purged;
+			range_tree_add(root, &new->range_node);
+			if (!new->purged)
+				list_add_tail(&new->lru, &head->lru_head);
+			volatile_range_resize(vrange, node->start, start-1);
+
+			break;
+		}
+	}
+
+out:
+	if (!used_new)
+		kfree(new);
+
+	return ret;
+}
+
+/**
+ * volatile_range_lru_size: Returns the number of unpurged pages on the lru
+ * @head: per-fs volatile head
+ *
+ * Returns the number of unpurged pages on the LRU
+ *
+ * Must lock the volatile_fs_head before calling!
+ *
+ */
+s64 volatile_range_lru_size(struct volatile_fs_head *head)
+{
+	struct volatile_range *range, *next;
+	s64 size = 0;
+
+	WARN_ON(!mutex_is_locked(&head->lock));
+
+	list_for_each_entry_safe(range, next, &head->lru_head, lru) {
+		size += range->range_node.end - range->range_node.start;
+	}
+	return size;
+}
+
+
+/**
+ * volatile_ranges_get_last_used: Returns mapping and size of lru unpurged range
+ * @head: per-fs volatile head
+ * @mapping: dbl pointer to mapping who's range is being purged
+ * @start: Pointer to starting address of range being purged
+ * @end: Pointer to ending address of range being purged
+ *
+ * Returns the mapping, start and end values of the least recently used
+ * range. Marks the range as purged and removes it from the LRU.
+ *
+ * Must lock the volatile_fs_head before calling!
+ *
+ * Returns 1 on success if a range was returned
+ * Return 0 if no ranges were found.
+ */
+s64 volatile_ranges_get_last_used(struct volatile_fs_head *head,
+				struct address_space **mapping,
+				loff_t *start, loff_t *end)
+{
+	struct volatile_range *range;
+
+	WARN_ON(!mutex_is_locked(&head->lock));
+
+	if (list_empty(&head->lru_head))
+		return 0;
+
+	range = list_first_entry(&head->lru_head, struct volatile_range, lru);
+
+	*start = range->range_node.start << PAGE_CACHE_SHIFT;
+	*end = ((range->range_node.end + 1) << PAGE_CACHE_SHIFT) - 1;
+	*mapping = range->mapping;
+
+	list_del(&range->lru);
+	range->purged = 1;
+
+	return 1;
+}
+
+
+/*
+ * Cleans up any volatile ranges.
+ */
+void volatile_range_clear(struct volatile_fs_head *head,
+				struct address_space *mapping)
+{
+	struct volatile_range *tozap;
+	struct range_tree_root *root;
+
+	WARN_ON(!mutex_is_locked(&head->lock));
+
+	root = mapping_to_root(mapping);
+	if (!root)
+		return;
+
+	while (!range_tree_empty(root)) {
+		struct range_tree_node *tmp;
+		tmp = range_tree_root_node(root);
+		tozap = container_of(tmp, struct volatile_range, range_node);
+		vrange_del(root, tozap);
+	}
+	mapping_free_root(root);
+}
+
+
+static int __init volatile_init(void)
+{
+	int i, size;
+
+	size = 1U << mapping_hash_shift;
+	mapping_hash = kzalloc(sizeof(mapping_hash)*size, GFP_KERNEL);
+	for (i = 0; i < size; i++)
+		INIT_HLIST_HEAD(&mapping_hash[i]);
+
+	return 0;
+}
+arch_initcall(volatile_init);
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers
  2012-05-25 19:17 [PATCH 0/4] [RFC] Fallocate Volatile Ranges John Stultz
                   ` (2 preceding siblings ...)
  2012-05-25 19:17 ` [PATCH 3/4] [RFC] Add volatile range management code John Stultz
@ 2012-05-25 19:17 ` John Stultz
  3 siblings, 0 replies; 9+ messages in thread
From: John Stultz @ 2012-05-25 19:17 UTC (permalink / raw)
  To: LKML
  Cc: John Stultz, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Aneesh Kumar K.V, Taras Glek, Mike Hommey

This patch enables FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE
functionality for tmpfs making use of the volatile range
management code.

Conceptually, FALLOC_FL_MARK_VOLATILE is like a delayed
FALLOC_FL_PUNCH_HOLE.  This allows applications that have
data caches that can be re-created to tell the kernel that
some memory contains data that is useful in the future, but
can be recreated if needed, so if the kernel needs, it can
zap the memory without having to swap it out.

In use, applications use FALLOC_FL_MARK_VOLATILE to mark
page ranges as volatile when they are not in use. Then later
if they wants to reuse the data, they use
FALLOC_FL_UNMARK_VOLATILE, which will return an error if the
data has been purged.

This is very much influenced by the Android Ashmem interface by
Robert Love so credits to him and the Android developers.
In many cases the code & logic come directly from the ashmem patch.
The intent of this patch is to allow for ashmem-like behavior, but
embeds the idea a little deeper into the VM code.

This is a reworked version of the fadvise volatile idea submitted
earlier to the list. Thanks to Dave Chinner for suggesting to
rework the idea in this fashion. Also thanks to Dmitry Adamushko
for continued review and bug reporting, and Dave Hansen for
help with the original design and mentoring me in the VM code.

CC: Andrew Morton <akpm@linux-foundation.org>
CC: Android Kernel Team <kernel-team@android.com>
CC: Robert Love <rlove@google.com>
CC: Mel Gorman <mel@csn.ul.ie>
CC: Hugh Dickins <hughd@google.com>
CC: Dave Hansen <dave@linux.vnet.ibm.com>
CC: Rik van Riel <riel@redhat.com>
CC: Dmitry Adamushko <dmitry.adamushko@gmail.com>
CC: Dave Chinner <david@fromorbit.com>
CC: Neil Brown <neilb@suse.de>
CC: Andrea Righi <andrea@betterlinux.com>
CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
CC: Taras Glek <tgek@mozilla.com>
CC: Mike Hommey <mh@glandium.org>

Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 fs/open.c              |    3 +-
 include/linux/falloc.h |    7 ++-
 mm/shmem.c             |  107 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 113 insertions(+), 4 deletions(-)

diff --git a/fs/open.c b/fs/open.c
index d543012..448ed5a 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -223,7 +223,8 @@ int do_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
 		return -EINVAL;
 
 	/* Return error if mode is not supported */
-	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE))
+	if (mode & ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE |
+			FALLOC_FL_MARK_VOLATILE | FALLOC_FL_UNMARK_VOLATILE))
 		return -EOPNOTSUPP;
 
 	/* Punch hole must have keep size set */
diff --git a/include/linux/falloc.h b/include/linux/falloc.h
index 73e0b62..3e47ad5 100644
--- a/include/linux/falloc.h
+++ b/include/linux/falloc.h
@@ -1,9 +1,10 @@
 #ifndef _FALLOC_H_
 #define _FALLOC_H_
 
-#define FALLOC_FL_KEEP_SIZE	0x01 /* default is extend size */
-#define FALLOC_FL_PUNCH_HOLE	0x02 /* de-allocates range */
-
+#define FALLOC_FL_KEEP_SIZE		0x01 /* default is extend size */
+#define FALLOC_FL_PUNCH_HOLE		0x02 /* de-allocates range */
+#define FALLOC_FL_MARK_VOLATILE		0x04 /* mark range volatile */
+#define FALLOC_FL_UNMARK_VOLATILE	0x08 /* mark range non-volatile */
 #ifdef __KERNEL__
 
 /*
diff --git a/mm/shmem.c b/mm/shmem.c
index 9b1c6b4..a0911ba 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -64,6 +64,7 @@ static struct vfsmount *shm_mnt;
 #include <linux/highmem.h>
 #include <linux/seq_file.h>
 #include <linux/magic.h>
+#include <linux/volatile.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -594,11 +595,109 @@ static int shmem_setattr(struct dentry *dentry, struct iattr *attr)
 	return error;
 }
 
+static DEFINE_VOLATILE_FS_HEAD(shmem_volatile_head);
+
+static int shmem_mark_volatile(struct inode *inode, loff_t offset, loff_t len)
+{
+	loff_t lstart, lend;
+	int ret;
+
+	lstart = offset >> PAGE_CACHE_SHIFT;
+	lend = (offset+len) >> PAGE_CACHE_SHIFT;
+
+	volatile_range_lock(&shmem_volatile_head);
+	ret = volatile_range_add(&shmem_volatile_head, &inode->i_data,
+					lstart, lend);
+	if (ret > 0) { /* immdiately purge */
+		shmem_truncate_range(inode, lstart<<PAGE_CACHE_SHIFT,
+					(lend<<PAGE_CACHE_SHIFT)-1);
+		ret = 0;
+	}
+	volatile_range_unlock(&shmem_volatile_head);
+
+	return ret;
+}
+
+static int shmem_unmark_volatile(struct inode *inode, loff_t offset, loff_t len)
+{
+	loff_t lstart, lend;
+	int ret;
+
+	lstart = offset >> PAGE_CACHE_SHIFT;
+	lend = (offset+len) >> PAGE_CACHE_SHIFT;
+
+	volatile_range_lock(&shmem_volatile_head);
+	ret = volatile_range_remove(&shmem_volatile_head,
+					&inode->i_data,
+					lstart, lend);
+	volatile_range_unlock(&shmem_volatile_head);
+
+	return ret;
+}
+
+static void shmem_clear_volatile(struct inode *inode)
+{
+	volatile_range_lock(&shmem_volatile_head);
+	volatile_range_clear(&shmem_volatile_head, &inode->i_data);
+	volatile_range_unlock(&shmem_volatile_head);
+}
+
+static
+int shmem_volatile_shrink(struct shrinker *ignored, struct shrink_control *sc)
+{
+	s64 nr_to_scan = sc->nr_to_scan;
+	const gfp_t gfp_mask = sc->gfp_mask;
+	struct address_space *mapping;
+	loff_t start, end;
+	int ret;
+	s64 page_count;
+
+	if (nr_to_scan && !(gfp_mask & __GFP_FS))
+		return -1;
+
+	volatile_range_lock(&shmem_volatile_head);
+	page_count = volatile_range_lru_size(&shmem_volatile_head);
+	if (!nr_to_scan)
+		goto out;
+
+	do {
+		ret = volatile_ranges_get_last_used(&shmem_volatile_head,
+							&mapping, &start, &end);
+		if (ret) {
+			shmem_truncate_range(mapping->host,
+						start<<PAGE_CACHE_SHIFT,
+						(end<<PAGE_CACHE_SHIFT)-1);
+			nr_to_scan -= end-start;
+			page_count -= end-start;
+		};
+	} while (ret && (nr_to_scan > 0));
+
+out:
+	volatile_range_unlock(&shmem_volatile_head);
+
+	return page_count;
+}
+
+static struct shrinker shmem_volatile_shrinker = {
+	.shrink = shmem_volatile_shrink,
+	.seeks = DEFAULT_SEEKS,
+};
+
+static int __init shmem_shrinker_init(void)
+{
+	register_shrinker(&shmem_volatile_shrinker);
+	return 0;
+}
+arch_initcall(shmem_shrinker_init);
+
+
 static void shmem_evict_inode(struct inode *inode)
 {
 	struct shmem_inode_info *info = SHMEM_I(inode);
 	struct shmem_xattr *xattr, *nxattr;
 
+	shmem_clear_volatile(inode);
+
 	if (inode->i_mapping->a_ops == &shmem_aops) {
 		shmem_unacct_size(info->flags, inode->i_size);
 		inode->i_size = 0;
@@ -1501,10 +1600,18 @@ static long shmem_fallocate(struct file *file, int mode, loff_t offset,
 		shmem_truncate_range(inode, offset, offset + len - 1);
 		/* No need to unmap again: hole-punching leaves COWed pages */
 		error = 0;
+	} else if (mode & FALLOC_FL_MARK_VOLATILE) {
+		/* Mark pages volatile, sort of delayed hole punching */
+		error = shmem_mark_volatile(inode, offset, len);
+	} else if (mode & FALLOC_FL_UNMARK_VOLATILE) {
+		/* Mark pages non-volatile, return error if pages were purged */
+		error = shmem_unmark_volatile(inode, offset, len);
 	}
 
 	mutex_unlock(&inode->i_mutex);
 	return error;
+
+
 }
 
 static int shmem_statfs(struct dentry *dentry, struct kstatfs *buf)
-- 
1.7.3.2.146.gca209


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] [RFC] Range tree implementation
  2012-05-25 19:17 ` [PATCH 2/4] [RFC] Range tree implementation John Stultz
@ 2012-05-31 20:48   ` Jan Kara
  2012-05-31 21:04     ` John Stultz
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2012-05-31 20:48 UTC (permalink / raw)
  To: John Stultz
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Aneesh Kumar K.V, Taras Glek, Mike Hommey

On Fri 25-05-12 12:17:34, John Stultz wrote:
> I suspect range-tree isn't a totally accurate name, but I
> couldn't quite make out the difference between range trees
> and interval trees, so I just picked one to call it. Do
> let me know if you have a better name.
  Well, interval tree is a data structure for tracking a set of
possibly overlapping intervals. Range tree is a data structure tracking
points allowing for fast queries on a set of points contained in a given
range (gets useful and interesting when dimension > 1). Your data structure
is neither so it would be good to have a different name. OTOH there are so
many data structures that it's hard to find a reasonable unused name ;)

> +/**
> + * range_tree_next_in_range - Return the next range in a range_tree still
> + *                            contained within a specified range.
  'within' isn't really correct, right? It should rather be 'intersecting'.

> + * @root: range_tree root
> + * @start: range start
> + * @end: range end
> + *
> + */
> +struct range_tree_node *range_tree_next_in_range(struct range_tree_node *node,
> +							u64 start, u64 end)
> +{
> +	struct rb_node *next;
> +	struct range_tree_node *candidate;
> +	if (!node)
> +		return NULL;
> +	next = rb_next(&node->rb);
> +	if (!next)
> +		return NULL;
> +
> +	candidate = container_of(next, struct range_tree_node, rb);
> +
> +	if ((candidate->start > end) || (candidate->end < start))
> +		return NULL;
> +
> +	return candidate;
> +}
> +
> +/**
> + * range_tree_add - Add a node to a range tree
> + * @root: range tree to be added to
> + * @node: range_tree_node to be added
> + *
> + * Adds a node to the range tree.
  I think you should document here that the added range must not intersect
with any other range in the tree.

> + */
> +void range_tree_add(struct range_tree_root *root,
> +					struct range_tree_node *node)
> +{
> +	struct rb_node **p = &root->head.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct range_tree_node *ptr;
> +
> +	WARN_ON_ONCE(!RB_EMPTY_NODE(&node->rb));
> +
> +	while (*p) {
> +		parent = *p;
> +		ptr = rb_entry(parent, struct range_tree_node, rb);
> +		if (node->start < ptr->start)
> +			p = &(*p)->rb_left;
> +		else
> +			p = &(*p)->rb_right;
> +	}
> +	rb_link_node(&node->rb, parent, p);
> +	rb_insert_color(&node->rb, &root->head);
> +
  ^^ Spurious empty line.

> +}
> +

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] [RFC] Range tree implementation
  2012-05-31 20:48   ` Jan Kara
@ 2012-05-31 21:04     ` John Stultz
  2012-05-31 22:35       ` Jan Kara
  0 siblings, 1 reply; 9+ messages in thread
From: John Stultz @ 2012-05-31 21:04 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Aneesh Kumar K.V, Taras Glek, Mike Hommey

On 05/31/2012 01:48 PM, Jan Kara wrote:
> On Fri 25-05-12 12:17:34, John Stultz wrote:
>> I suspect range-tree isn't a totally accurate name, but I
>> couldn't quite make out the difference between range trees
>> and interval trees, so I just picked one to call it. Do
>> let me know if you have a better name.
>    Well, interval tree is a data structure for tracking a set of
> possibly overlapping intervals. Range tree is a data structure tracking
> points allowing for fast queries on a set of points contained in a given
> range (gets useful and interesting when dimension>  1). Your data structure
> is neither so it would be good to have a different name. OTOH there are so
> many data structures that it's hard to find a reasonable unused name ;)

Although I'm not sure your interval tree description doesn't match what 
I'm trying to provide. Could you clarify why that doesn't match?

>> +/**
>> + * range_tree_next_in_range - Return the next range in a range_tree still
>> + *                            contained within a specified range.
>    'within' isn't really correct, right? It should rather be 'intersecting'.

That is more clear, thanks for the suggestion!

>
>> + * @root: range_tree root
>> + * @start: range start
>> + * @end: range end
>> + *
>> + */
>> +struct range_tree_node *range_tree_next_in_range(struct range_tree_node *node,
>> +							u64 start, u64 end)
>> +{
>> +	struct rb_node *next;
>> +	struct range_tree_node *candidate;
>> +	if (!node)
>> +		return NULL;
>> +	next = rb_next(&node->rb);
>> +	if (!next)
>> +		return NULL;
>> +
>> +	candidate = container_of(next, struct range_tree_node, rb);
>> +
>> +	if ((candidate->start>  end) || (candidate->end<  start))
>> +		return NULL;
>> +
>> +	return candidate;
>> +}
>> +
>> +/**
>> + * range_tree_add - Add a node to a range tree
>> + * @root: range tree to be added to
>> + * @node: range_tree_node to be added
>> + *
>> + * Adds a node to the range tree.
>    I think you should document here that the added range must not intersect
> with any other range in the tree.

So for my usage in the volatile range code, I don't want intersecting or 
overlapping ranges added, but I didn't feel it was necessary to add this 
restriction to my rangetree code as well, since someone might want to 
store overlapping ranges.

thanks for the review!
-john



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] [RFC] Range tree implementation
  2012-05-31 21:04     ` John Stultz
@ 2012-05-31 22:35       ` Jan Kara
  2012-05-31 23:12         ` John Stultz
  0 siblings, 1 reply; 9+ messages in thread
From: Jan Kara @ 2012-05-31 22:35 UTC (permalink / raw)
  To: John Stultz
  Cc: Jan Kara, LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Aneesh Kumar K.V, Taras Glek, Mike Hommey

On Thu 31-05-12 14:04:23, John Stultz wrote:
> On 05/31/2012 01:48 PM, Jan Kara wrote:
> >On Fri 25-05-12 12:17:34, John Stultz wrote:
> >>I suspect range-tree isn't a totally accurate name, but I
> >>couldn't quite make out the difference between range trees
> >>and interval trees, so I just picked one to call it. Do
> >>let me know if you have a better name.
> >   Well, interval tree is a data structure for tracking a set of
> >possibly overlapping intervals. Range tree is a data structure tracking
> >points allowing for fast queries on a set of points contained in a given
> >range (gets useful and interesting when dimension>  1). Your data structure
> >is neither so it would be good to have a different name. OTOH there are so
> >many data structures that it's hard to find a reasonable unused name ;)
> 
> Although I'm not sure your interval tree description doesn't match
> what I'm trying to provide. Could you clarify why that doesn't
> match?
  Wikipedia has a good description of Interval trees:
  http://en.wikipedia.org/wiki/Interval_tree

  For example they are tertiary trees.

> >>+ * @root: range_tree root
> >>+ * @start: range start
> >>+ * @end: range end
> >>+ *
> >>+ */
> >>+struct range_tree_node *range_tree_next_in_range(struct range_tree_node *node,
> >>+							u64 start, u64 end)
> >>+{
> >>+	struct rb_node *next;
> >>+	struct range_tree_node *candidate;
> >>+	if (!node)
> >>+		return NULL;
> >>+	next = rb_next(&node->rb);
> >>+	if (!next)
> >>+		return NULL;
> >>+
> >>+	candidate = container_of(next, struct range_tree_node, rb);
> >>+
> >>+	if ((candidate->start>  end) || (candidate->end<  start))
> >>+		return NULL;
> >>+
> >>+	return candidate;
> >>+}
> >>+
> >>+/**
> >>+ * range_tree_add - Add a node to a range tree
> >>+ * @root: range tree to be added to
> >>+ * @node: range_tree_node to be added
> >>+ *
> >>+ * Adds a node to the range tree.
> >   I think you should document here that the added range must not intersect
> >with any other range in the tree.
> 
> So for my usage in the volatile range code, I don't want
> intersecting or overlapping ranges added, but I didn't feel it was
> necessary to add this restriction to my rangetree code as well,
> since someone might want to store overlapping ranges.
  Ok, but then you should define where an interval that is intersecting
other intervals ends up sorted...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/4] [RFC] Range tree implementation
  2012-05-31 22:35       ` Jan Kara
@ 2012-05-31 23:12         ` John Stultz
  0 siblings, 0 replies; 9+ messages in thread
From: John Stultz @ 2012-05-31 23:12 UTC (permalink / raw)
  To: Jan Kara
  Cc: LKML, Andrew Morton, Android Kernel Team, Robert Love,
	Mel Gorman, Hugh Dickins, Dave Hansen, Rik van Riel,
	Dmitry Adamushko, Dave Chinner, Neil Brown, Andrea Righi,
	Aneesh Kumar K.V, Taras Glek, Mike Hommey

On 05/31/2012 03:35 PM, Jan Kara wrote:
> On Thu 31-05-12 14:04:23, John Stultz wrote:
>> On 05/31/2012 01:48 PM, Jan Kara wrote:
>>> On Fri 25-05-12 12:17:34, John Stultz wrote:
>>>> I suspect range-tree isn't a totally accurate name, but I
>>>> couldn't quite make out the difference between range trees
>>>> and interval trees, so I just picked one to call it. Do
>>>> let me know if you have a better name.
>>>    Well, interval tree is a data structure for tracking a set of
>>> possibly overlapping intervals. Range tree is a data structure tracking
>>> points allowing for fast queries on a set of points contained in a given
>>> range (gets useful and interesting when dimension>   1). Your data structure
>>> is neither so it would be good to have a different name. OTOH there are so
>>> many data structures that it's hard to find a reasonable unused name ;)
>> Although I'm not sure your interval tree description doesn't match
>> what I'm trying to provide. Could you clarify why that doesn't
>> match?
>    Wikipedia has a good description of Interval trees:
>    http://en.wikipedia.org/wiki/Interval_tree
>
>    For example they are tertiary trees.
So roughly the naive approach listed in the wikipedia link is what I'm 
using here.
I'm fine renaming it to interval_tree, but if you really think it should 
be something else,  by all means let me know.

>>>> +
>>>> +/**
>>>> + * range_tree_add - Add a node to a range tree
>>>> + * @root: range tree to be added to
>>>> + * @node: range_tree_node to be added
>>>> + *
>>>> + * Adds a node to the range tree.
>>>    I think you should document here that the added range must not intersect
>>> with any other range in the tree.
>> So for my usage in the volatile range code, I don't want
>> intersecting or overlapping ranges added, but I didn't feel it was
>> necessary to add this restriction to my rangetree code as well,
>> since someone might want to store overlapping ranges.
>    Ok, but then you should define where an interval that is intersecting
> other intervals ends up sorted...
Ah. I see the issue you're concerned about!

Since the ranges are sorted by starting value, you could have a set of 
values: (0,5),(0,100),(0,50) and since they have the same start value, 
they could be connected adjacently in any way possible in the tree.  
Thus, the current search function wouldn't necessarily handle searching 
for (25,30) properly.  Since if it hit (0,5) first, it would move right, 
where as (0,100) could reasonably be left of (0,5).

Thanks for pointing this out. While I didn't want to needlessly add the 
restriction that added ranges didn't intersect,  it seems by focusing on 
my usage where it didn't intersect, I in effect encoded that restriction 
into the search.  I'll go ahead and clarify this restriction in the 
comment and if folks want to extend the code to handle overlapping 
intervals, we can re-implement the logic then.

thanks
-john


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-05-31 23:12 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-25 19:17 [PATCH 0/4] [RFC] Fallocate Volatile Ranges John Stultz
2012-05-25 19:17 ` [PATCH 1/4] tmpfs: support fallocate FALLOC_FL_PUNCH_HOLE John Stultz
2012-05-25 19:17 ` [PATCH 2/4] [RFC] Range tree implementation John Stultz
2012-05-31 20:48   ` Jan Kara
2012-05-31 21:04     ` John Stultz
2012-05-31 22:35       ` Jan Kara
2012-05-31 23:12         ` John Stultz
2012-05-25 19:17 ` [PATCH 3/4] [RFC] Add volatile range management code John Stultz
2012-05-25 19:17 ` [PATCH 4/4] [RFC] tmpfs: Add FALLOC_FL_MARK_VOLATILE/UNMARK_VOLATILE handlers John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).