All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] mm: tmpfs radix_tree swap leftovers
@ 2011-07-19 22:51 ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-19 22:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Here's three miscellaneous patches on top of the tmpfs radix_tree swap
work in 3.0-rc6-mm1: an overdue unrelated cleanup in radix_tree_tag_get(),
an unfortunately necessary speedup to tmpfs swapoff, and an attempt to
address your review feedback on the exceptional cases in filemap.c.

1/3 radix_tree: clean away saw_unset_tag leftovers
2/3 tmpfs radix_tree: locate_item to speed up swapoff
3/3 mm: clarify the radix_tree exceptional cases

 include/linux/radix-tree.h |    1 
 lib/radix-tree.c           |  102 ++++++++++++++++++++++++++++++++---
 mm/filemap.c               |   66 +++++++++++++++-------
 mm/mincore.c               |    1 
 mm/shmem.c                 |   50 +++--------------
 5 files changed, 149 insertions(+), 71 deletions(-)

Hugh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/3] mm: tmpfs radix_tree swap leftovers
@ 2011-07-19 22:51 ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-19 22:51 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Here's three miscellaneous patches on top of the tmpfs radix_tree swap
work in 3.0-rc6-mm1: an overdue unrelated cleanup in radix_tree_tag_get(),
an unfortunately necessary speedup to tmpfs swapoff, and an attempt to
address your review feedback on the exceptional cases in filemap.c.

1/3 radix_tree: clean away saw_unset_tag leftovers
2/3 tmpfs radix_tree: locate_item to speed up swapoff
3/3 mm: clarify the radix_tree exceptional cases

 include/linux/radix-tree.h |    1 
 lib/radix-tree.c           |  102 ++++++++++++++++++++++++++++++++---
 mm/filemap.c               |   66 +++++++++++++++-------
 mm/mincore.c               |    1 
 mm/shmem.c                 |   50 +++--------------
 5 files changed, 149 insertions(+), 71 deletions(-)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] radix_tree: clean away saw_unset_tag leftovers
  2011-07-19 22:51 ` Hugh Dickins
@ 2011-07-19 22:52   ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-19 22:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

radix_tree_tag_get()'s BUG (when it sees a tag after saw_unset_tag) was
unsafe and removed in 2.6.34, but the pointless saw_unset_tag left behind.

Remove it now, and return 0 as soon as we see unset tag - we already rely
upon the root tag to be correct, returning 0 immediately if it's not set.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 lib/radix-tree.c |   10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

--- mmotm.orig/lib/radix-tree.c	2011-07-08 18:57:14.810702665 -0700
+++ mmotm/lib/radix-tree.c	2011-07-19 11:11:21.285234139 -0700
@@ -576,7 +576,6 @@ int radix_tree_tag_get(struct radix_tree
 {
 	unsigned int height, shift;
 	struct radix_tree_node *node;
-	int saw_unset_tag = 0;
 
 	/* check the root's tag bit */
 	if (!root_tag_get(root, tag))
@@ -603,15 +602,10 @@ int radix_tree_tag_get(struct radix_tree
 			return 0;
 
 		offset = (index >> shift) & RADIX_TREE_MAP_MASK;
-
-		/*
-		 * This is just a debug check.  Later, we can bale as soon as
-		 * we see an unset tag.
-		 */
 		if (!tag_get(node, tag, offset))
-			saw_unset_tag = 1;
+			return 0;
 		if (height == 1)
-			return !!tag_get(node, tag, offset);
+			return 1;
 		node = rcu_dereference_raw(node->slots[offset]);
 		shift -= RADIX_TREE_MAP_SHIFT;
 		height--;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/3] radix_tree: clean away saw_unset_tag leftovers
@ 2011-07-19 22:52   ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-19 22:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

radix_tree_tag_get()'s BUG (when it sees a tag after saw_unset_tag) was
unsafe and removed in 2.6.34, but the pointless saw_unset_tag left behind.

Remove it now, and return 0 as soon as we see unset tag - we already rely
upon the root tag to be correct, returning 0 immediately if it's not set.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 lib/radix-tree.c |   10 ++--------
 1 file changed, 2 insertions(+), 8 deletions(-)

--- mmotm.orig/lib/radix-tree.c	2011-07-08 18:57:14.810702665 -0700
+++ mmotm/lib/radix-tree.c	2011-07-19 11:11:21.285234139 -0700
@@ -576,7 +576,6 @@ int radix_tree_tag_get(struct radix_tree
 {
 	unsigned int height, shift;
 	struct radix_tree_node *node;
-	int saw_unset_tag = 0;
 
 	/* check the root's tag bit */
 	if (!root_tag_get(root, tag))
@@ -603,15 +602,10 @@ int radix_tree_tag_get(struct radix_tree
 			return 0;
 
 		offset = (index >> shift) & RADIX_TREE_MAP_MASK;
-
-		/*
-		 * This is just a debug check.  Later, we can bale as soon as
-		 * we see an unset tag.
-		 */
 		if (!tag_get(node, tag, offset))
-			saw_unset_tag = 1;
+			return 0;
 		if (height == 1)
-			return !!tag_get(node, tag, offset);
+			return 1;
 		node = rcu_dereference_raw(node->slots[offset]);
 		shift -= RADIX_TREE_MAP_SHIFT;
 		height--;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff
  2011-07-19 22:51 ` Hugh Dickins
@ 2011-07-19 22:54   ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-19 22:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

We have already acknowledged that swapoff of a tmpfs file is slower
than it was before conversion to the generic radix_tree: a little
slower there will be acceptable, if the hotter paths are faster.

But it was a shock to find swapoff of a 500MB file 20 times slower
on my laptop, taking 10 minutes; and at that rate it significantly
slows down my testing.

Now, most of that turned out to be overhead from PROVE_LOCKING and
PROVE_RCU: without those it was only 4 times slower than before;
and more realistic tests on other machines don't fare as badly.

I've tried a number of things to improve it, including tagging the
swap entries, then doing lookup by tag: I'd expected that to halve
the time, but in practice it's erratic, and often counter-productive.

The only change I've so far found to make a consistent improvement,
is to short-circuit the way we go back and forth, gang lookup packing
entries into the array supplied, then shmem scanning that array for the
target entry.  Scanning in place doubles the speed, so it's now only
twice as slow as before (or three times slower when the PROVEs are on).

So, add radix_tree_locate_item() as an expedient, once-off, single-caller
hack to do the lookup directly in place.  #ifdef it on CONFIG_SHMEM and
CONFIG_SWAP, as much to document its limited applicability as save space
in other configurations.  And, sadly, #include sched.h for cond_resched().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/radix-tree.h |    1 
 lib/radix-tree.c           |   92 +++++++++++++++++++++++++++++++++++
 mm/shmem.c                 |   38 --------------
 3 files changed, 94 insertions(+), 37 deletions(-)

--- mmotm.orig/include/linux/radix-tree.h	2011-07-08 18:57:14.810702665 -0700
+++ mmotm/include/linux/radix-tree.h	2011-07-19 11:11:33.705295709 -0700
@@ -252,6 +252,7 @@ unsigned long radix_tree_range_tag_if_ta
 		unsigned long nr_to_tag,
 		unsigned int fromtag, unsigned int totag);
 int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag);
+unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item);
 
 static inline void radix_tree_preload_end(void)
 {
--- mmotm.orig/lib/radix-tree.c	2011-07-19 11:11:21.285234139 -0700
+++ mmotm/lib/radix-tree.c	2011-07-19 11:13:20.249824040 -0700
@@ -1197,6 +1197,98 @@ radix_tree_gang_lookup_tag_slot(struct r
 }
 EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
 
+#if defined(CONFIG_SHMEM) && defined(CONFIG_SWAP)
+#include <linux/sched.h> /* for cond_resched() */
+
+/*
+ * This linear search is at present only useful to shmem_unuse_inode().
+ */
+static unsigned long __locate(struct radix_tree_node *slot, void *item,
+			      unsigned long index, unsigned long *found_index)
+{
+	unsigned int shift, height;
+	unsigned long i;
+
+	height = slot->height;
+	shift = (height-1) * RADIX_TREE_MAP_SHIFT;
+
+	for ( ; height > 1; height--) {
+		i = (index >> shift) & RADIX_TREE_MAP_MASK;
+		for (;;) {
+			if (slot->slots[i] != NULL)
+				break;
+			index &= ~((1UL << shift) - 1);
+			index += 1UL << shift;
+			if (index == 0)
+				goto out;	/* 32-bit wraparound */
+			i++;
+			if (i == RADIX_TREE_MAP_SIZE)
+				goto out;
+		}
+
+		shift -= RADIX_TREE_MAP_SHIFT;
+		slot = rcu_dereference_raw(slot->slots[i]);
+		if (slot == NULL)
+			goto out;
+	}
+
+	/* Bottom level: check items */
+	for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
+		if (slot->slots[i] == item) {
+			*found_index = index + i;
+			index = 0;
+			goto out;
+		}
+	}
+	index += RADIX_TREE_MAP_SIZE;
+out:
+	return index;
+}
+
+/**
+ *	radix_tree_locate_item - search through radix tree for item
+ *	@root:		radix tree root
+ *	@item:		item to be found
+ *
+ *	Returns index where item was found, or -1 if not found.
+ *	Caller must hold no lock (since this time-consuming function needs
+ *	to be preemptible), and must check afterwards if item is still there.
+ */
+unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item)
+{
+	struct radix_tree_node *node;
+	unsigned long max_index;
+	unsigned long cur_index = 0;
+	unsigned long found_index = -1;
+
+	do {
+		rcu_read_lock();
+		node = rcu_dereference_raw(root->rnode);
+		if (!radix_tree_is_indirect_ptr(node)) {
+			rcu_read_unlock();
+			if (node == item)
+				found_index = 0;
+			break;
+		}
+
+		node = indirect_to_ptr(node);
+		max_index = radix_tree_maxindex(node->height);
+		if (cur_index > max_index)
+			break;
+
+		cur_index = __locate(node, item, cur_index, &found_index);
+		rcu_read_unlock();
+		cond_resched();
+	} while (cur_index != 0 && cur_index <= max_index);
+
+	return found_index;
+}
+#else
+unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item)
+{
+	return -1;
+}
+#endif /* CONFIG_SHMEM && CONFIG_SWAP */
 
 /**
  *	radix_tree_shrink    -    shrink height of a radix tree to minimal
--- mmotm.orig/mm/shmem.c	2011-07-08 18:57:15.114704171 -0700
+++ mmotm/mm/shmem.c	2011-07-19 11:11:33.709295729 -0700
@@ -357,42 +357,6 @@ export:
 }
 
 /*
- * Lockless lookup of swap entry in radix tree, avoiding refcount on pages.
- */
-static pgoff_t shmem_find_swap(struct address_space *mapping, void *radswap)
-{
-	void  **slots[PAGEVEC_SIZE];
-	pgoff_t indices[PAGEVEC_SIZE];
-	unsigned int nr_found;
-
-restart:
-	nr_found = 1;
-	indices[0] = -1;
-	while (nr_found) {
-		pgoff_t index = indices[nr_found - 1] + 1;
-		unsigned int i;
-
-		rcu_read_lock();
-		nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
-					slots, indices, index, PAGEVEC_SIZE);
-		for (i = 0; i < nr_found; i++) {
-			void *item = radix_tree_deref_slot(slots[i]);
-			if (radix_tree_deref_retry(item)) {
-				rcu_read_unlock();
-				goto restart;
-			}
-			if (item == radswap) {
-				rcu_read_unlock();
-				return indices[i];
-			}
-		}
-		rcu_read_unlock();
-		cond_resched();
-	}
-	return -1;
-}
-
-/*
  * Remove swap entry from radix tree, free the swap and its page cache.
  */
 static int shmem_free_swap(struct address_space *mapping,
@@ -612,7 +576,7 @@ static int shmem_unuse_inode(struct shme
 	int error;
 
 	radswap = swp_to_radix_entry(swap);
-	index = shmem_find_swap(mapping, radswap);
+	index = radix_tree_locate_item(&mapping->page_tree, radswap);
 	if (index == -1)
 		return 0;
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff
@ 2011-07-19 22:54   ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-19 22:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

We have already acknowledged that swapoff of a tmpfs file is slower
than it was before conversion to the generic radix_tree: a little
slower there will be acceptable, if the hotter paths are faster.

But it was a shock to find swapoff of a 500MB file 20 times slower
on my laptop, taking 10 minutes; and at that rate it significantly
slows down my testing.

Now, most of that turned out to be overhead from PROVE_LOCKING and
PROVE_RCU: without those it was only 4 times slower than before;
and more realistic tests on other machines don't fare as badly.

I've tried a number of things to improve it, including tagging the
swap entries, then doing lookup by tag: I'd expected that to halve
the time, but in practice it's erratic, and often counter-productive.

The only change I've so far found to make a consistent improvement,
is to short-circuit the way we go back and forth, gang lookup packing
entries into the array supplied, then shmem scanning that array for the
target entry.  Scanning in place doubles the speed, so it's now only
twice as slow as before (or three times slower when the PROVEs are on).

So, add radix_tree_locate_item() as an expedient, once-off, single-caller
hack to do the lookup directly in place.  #ifdef it on CONFIG_SHMEM and
CONFIG_SWAP, as much to document its limited applicability as save space
in other configurations.  And, sadly, #include sched.h for cond_resched().

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 include/linux/radix-tree.h |    1 
 lib/radix-tree.c           |   92 +++++++++++++++++++++++++++++++++++
 mm/shmem.c                 |   38 --------------
 3 files changed, 94 insertions(+), 37 deletions(-)

--- mmotm.orig/include/linux/radix-tree.h	2011-07-08 18:57:14.810702665 -0700
+++ mmotm/include/linux/radix-tree.h	2011-07-19 11:11:33.705295709 -0700
@@ -252,6 +252,7 @@ unsigned long radix_tree_range_tag_if_ta
 		unsigned long nr_to_tag,
 		unsigned int fromtag, unsigned int totag);
 int radix_tree_tagged(struct radix_tree_root *root, unsigned int tag);
+unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item);
 
 static inline void radix_tree_preload_end(void)
 {
--- mmotm.orig/lib/radix-tree.c	2011-07-19 11:11:21.285234139 -0700
+++ mmotm/lib/radix-tree.c	2011-07-19 11:13:20.249824040 -0700
@@ -1197,6 +1197,98 @@ radix_tree_gang_lookup_tag_slot(struct r
 }
 EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
 
+#if defined(CONFIG_SHMEM) && defined(CONFIG_SWAP)
+#include <linux/sched.h> /* for cond_resched() */
+
+/*
+ * This linear search is at present only useful to shmem_unuse_inode().
+ */
+static unsigned long __locate(struct radix_tree_node *slot, void *item,
+			      unsigned long index, unsigned long *found_index)
+{
+	unsigned int shift, height;
+	unsigned long i;
+
+	height = slot->height;
+	shift = (height-1) * RADIX_TREE_MAP_SHIFT;
+
+	for ( ; height > 1; height--) {
+		i = (index >> shift) & RADIX_TREE_MAP_MASK;
+		for (;;) {
+			if (slot->slots[i] != NULL)
+				break;
+			index &= ~((1UL << shift) - 1);
+			index += 1UL << shift;
+			if (index == 0)
+				goto out;	/* 32-bit wraparound */
+			i++;
+			if (i == RADIX_TREE_MAP_SIZE)
+				goto out;
+		}
+
+		shift -= RADIX_TREE_MAP_SHIFT;
+		slot = rcu_dereference_raw(slot->slots[i]);
+		if (slot == NULL)
+			goto out;
+	}
+
+	/* Bottom level: check items */
+	for (i = 0; i < RADIX_TREE_MAP_SIZE; i++) {
+		if (slot->slots[i] == item) {
+			*found_index = index + i;
+			index = 0;
+			goto out;
+		}
+	}
+	index += RADIX_TREE_MAP_SIZE;
+out:
+	return index;
+}
+
+/**
+ *	radix_tree_locate_item - search through radix tree for item
+ *	@root:		radix tree root
+ *	@item:		item to be found
+ *
+ *	Returns index where item was found, or -1 if not found.
+ *	Caller must hold no lock (since this time-consuming function needs
+ *	to be preemptible), and must check afterwards if item is still there.
+ */
+unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item)
+{
+	struct radix_tree_node *node;
+	unsigned long max_index;
+	unsigned long cur_index = 0;
+	unsigned long found_index = -1;
+
+	do {
+		rcu_read_lock();
+		node = rcu_dereference_raw(root->rnode);
+		if (!radix_tree_is_indirect_ptr(node)) {
+			rcu_read_unlock();
+			if (node == item)
+				found_index = 0;
+			break;
+		}
+
+		node = indirect_to_ptr(node);
+		max_index = radix_tree_maxindex(node->height);
+		if (cur_index > max_index)
+			break;
+
+		cur_index = __locate(node, item, cur_index, &found_index);
+		rcu_read_unlock();
+		cond_resched();
+	} while (cur_index != 0 && cur_index <= max_index);
+
+	return found_index;
+}
+#else
+unsigned long radix_tree_locate_item(struct radix_tree_root *root, void *item)
+{
+	return -1;
+}
+#endif /* CONFIG_SHMEM && CONFIG_SWAP */
 
 /**
  *	radix_tree_shrink    -    shrink height of a radix tree to minimal
--- mmotm.orig/mm/shmem.c	2011-07-08 18:57:15.114704171 -0700
+++ mmotm/mm/shmem.c	2011-07-19 11:11:33.709295729 -0700
@@ -357,42 +357,6 @@ export:
 }
 
 /*
- * Lockless lookup of swap entry in radix tree, avoiding refcount on pages.
- */
-static pgoff_t shmem_find_swap(struct address_space *mapping, void *radswap)
-{
-	void  **slots[PAGEVEC_SIZE];
-	pgoff_t indices[PAGEVEC_SIZE];
-	unsigned int nr_found;
-
-restart:
-	nr_found = 1;
-	indices[0] = -1;
-	while (nr_found) {
-		pgoff_t index = indices[nr_found - 1] + 1;
-		unsigned int i;
-
-		rcu_read_lock();
-		nr_found = radix_tree_gang_lookup_slot(&mapping->page_tree,
-					slots, indices, index, PAGEVEC_SIZE);
-		for (i = 0; i < nr_found; i++) {
-			void *item = radix_tree_deref_slot(slots[i]);
-			if (radix_tree_deref_retry(item)) {
-				rcu_read_unlock();
-				goto restart;
-			}
-			if (item == radswap) {
-				rcu_read_unlock();
-				return indices[i];
-			}
-		}
-		rcu_read_unlock();
-		cond_resched();
-	}
-	return -1;
-}
-
-/*
  * Remove swap entry from radix tree, free the swap and its page cache.
  */
 static int shmem_free_swap(struct address_space *mapping,
@@ -612,7 +576,7 @@ static int shmem_unuse_inode(struct shme
 	int error;
 
 	radswap = swp_to_radix_entry(swap);
-	index = shmem_find_swap(mapping, radswap);
+	index = radix_tree_locate_item(&mapping->page_tree, radswap);
 	if (index == -1)
 		return 0;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/3] mm: clarify the radix_tree exceptional cases
  2011-07-19 22:51 ` Hugh Dickins
@ 2011-07-19 22:56   ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-19 22:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Pekka Enberg, linux-kernel, linux-mm

Make the radix_tree exceptional cases, mostly in filemap.c, clearer.

It's hard to devise a suitable snappy name that illuminates the use
by shmem/tmpfs for swap, while keeping filemap/pagecache/radix_tree
generality.  And akpm points out that /* radix_tree_deref_retry(page) */
comments look like calls that have been commented out for unknown reason.

Skirt the naming difficulty by rearranging these blocks to handle the
transient radix_tree_deref_retry(page) case first; then just explain
the remaining shmem/tmpfs swap case in a comment.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/filemap.c |   66 ++++++++++++++++++++++++++++++++-----------------
 mm/mincore.c |    1 
 mm/shmem.c   |   12 +++++---
 3 files changed, 53 insertions(+), 26 deletions(-)

--- mmotm.orig/mm/filemap.c	2011-07-08 18:57:15.142704312 -0700
+++ mmotm/mm/filemap.c	2011-07-19 11:13:31.945882037 -0700
@@ -703,10 +703,14 @@ repeat:
 		if (unlikely(!page))
 			goto out;
 		if (radix_tree_exception(page)) {
-			if (radix_tree_exceptional_entry(page))
-				goto out;
-			/* radix_tree_deref_retry(page) */
-			goto repeat;
+			if (radix_tree_deref_retry(page))
+				goto repeat;
+			/*
+			 * Otherwise, shmem/tmpfs must be storing a swap entry
+			 * here as an exceptional entry: so return it without
+			 * attempting to raise page count.
+			 */
+			goto out;
 		}
 		if (!page_cache_get_speculative(page))
 			goto repeat;
@@ -841,15 +845,21 @@ repeat:
 			continue;
 
 		if (radix_tree_exception(page)) {
-			if (radix_tree_exceptional_entry(page))
-				continue;
+			if (radix_tree_deref_retry(page)) {
+				/*
+				 * Transient condition which can only trigger
+				 * when entry at index 0 moves out of or back
+				 * to root: none yet gotten, safe to restart.
+				 */
+				WARN_ON(start | i);
+				goto restart;
+			}
 			/*
-			 * radix_tree_deref_retry(page):
-			 * can only trigger when entry at index 0 moves out of
-			 * or back to root: none yet gotten, safe to restart.
+			 * Otherwise, shmem/tmpfs must be storing a swap entry
+			 * here as an exceptional entry: so skip over it -
+			 * we only reach this from invalidate_mapping_pages().
 			 */
-			WARN_ON(start | i);
-			goto restart;
+			continue;
 		}
 
 		if (!page_cache_get_speculative(page))
@@ -907,14 +917,20 @@ repeat:
 			continue;
 
 		if (radix_tree_exception(page)) {
-			if (radix_tree_exceptional_entry(page))
-				break;
+			if (radix_tree_deref_retry(page)) {
+				/*
+				 * Transient condition which can only trigger
+				 * when entry at index 0 moves out of or back
+				 * to root: none yet gotten, safe to restart.
+				 */
+				goto restart;
+			}
 			/*
-			 * radix_tree_deref_retry(page):
-			 * can only trigger when entry at index 0 moves out of
-			 * or back to root: none yet gotten, safe to restart.
+			 * Otherwise, shmem/tmpfs must be storing a swap entry
+			 * here as an exceptional entry: so stop looking for
+			 * contiguous pages.
 			 */
-			goto restart;
+			break;
 		}
 
 		if (!page_cache_get_speculative(page))
@@ -976,13 +992,19 @@ repeat:
 			continue;
 
 		if (radix_tree_exception(page)) {
-			BUG_ON(radix_tree_exceptional_entry(page));
+			if (radix_tree_deref_retry(page)) {
+				/*
+				 * Transient condition which can only trigger
+				 * when entry at index 0 moves out of or back
+				 * to root: none yet gotten, safe to restart.
+				 */
+				goto restart;
+			}
 			/*
-			 * radix_tree_deref_retry(page):
-			 * can only trigger when entry at index 0 moves out of
-			 * or back to root: none yet gotten, safe to restart.
+			 * This function is never used on a shmem/tmpfs
+			 * mapping, so a swap entry won't be found here.
 			 */
-			goto restart;
+			BUG();
 		}
 
 		if (!page_cache_get_speculative(page))
--- mmotm.orig/mm/mincore.c	2011-07-08 18:57:15.174704475 -0700
+++ mmotm/mm/mincore.c	2011-07-19 11:13:31.949882063 -0700
@@ -72,6 +72,7 @@ static unsigned char mincore_page(struct
 	 */
 	page = find_get_page(mapping, pgoff);
 #ifdef CONFIG_SWAP
+	/* shmem/tmpfs may return swap: account for swapcache page too. */
 	if (radix_tree_exceptional_entry(page)) {
 		swp_entry_t swap = radix_to_swp_entry(page);
 		page = find_get_page(&swapper_space, swap.val);
--- mmotm.orig/mm/shmem.c	2011-07-19 11:11:33.709295729 -0700
+++ mmotm/mm/shmem.c	2011-07-19 11:13:31.953882076 -0700
@@ -332,10 +332,14 @@ repeat:
 		if (unlikely(!page))
 			continue;
 		if (radix_tree_exception(page)) {
-			if (radix_tree_exceptional_entry(page))
-				goto export;
-			/* radix_tree_deref_retry(page) */
-			goto restart;
+			if (radix_tree_deref_retry(page))
+				goto restart;
+			/*
+			 * Otherwise, we must be storing a swap entry
+			 * here as an exceptional entry: so return it
+			 * without attempting to raise page count.
+			 */
+			goto export;
 		}
 		if (!page_cache_get_speculative(page))
 			goto repeat;

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 3/3] mm: clarify the radix_tree exceptional cases
@ 2011-07-19 22:56   ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-19 22:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Pekka Enberg, linux-kernel, linux-mm

Make the radix_tree exceptional cases, mostly in filemap.c, clearer.

It's hard to devise a suitable snappy name that illuminates the use
by shmem/tmpfs for swap, while keeping filemap/pagecache/radix_tree
generality.  And akpm points out that /* radix_tree_deref_retry(page) */
comments look like calls that have been commented out for unknown reason.

Skirt the naming difficulty by rearranging these blocks to handle the
transient radix_tree_deref_retry(page) case first; then just explain
the remaining shmem/tmpfs swap case in a comment.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/filemap.c |   66 ++++++++++++++++++++++++++++++++-----------------
 mm/mincore.c |    1 
 mm/shmem.c   |   12 +++++---
 3 files changed, 53 insertions(+), 26 deletions(-)

--- mmotm.orig/mm/filemap.c	2011-07-08 18:57:15.142704312 -0700
+++ mmotm/mm/filemap.c	2011-07-19 11:13:31.945882037 -0700
@@ -703,10 +703,14 @@ repeat:
 		if (unlikely(!page))
 			goto out;
 		if (radix_tree_exception(page)) {
-			if (radix_tree_exceptional_entry(page))
-				goto out;
-			/* radix_tree_deref_retry(page) */
-			goto repeat;
+			if (radix_tree_deref_retry(page))
+				goto repeat;
+			/*
+			 * Otherwise, shmem/tmpfs must be storing a swap entry
+			 * here as an exceptional entry: so return it without
+			 * attempting to raise page count.
+			 */
+			goto out;
 		}
 		if (!page_cache_get_speculative(page))
 			goto repeat;
@@ -841,15 +845,21 @@ repeat:
 			continue;
 
 		if (radix_tree_exception(page)) {
-			if (radix_tree_exceptional_entry(page))
-				continue;
+			if (radix_tree_deref_retry(page)) {
+				/*
+				 * Transient condition which can only trigger
+				 * when entry at index 0 moves out of or back
+				 * to root: none yet gotten, safe to restart.
+				 */
+				WARN_ON(start | i);
+				goto restart;
+			}
 			/*
-			 * radix_tree_deref_retry(page):
-			 * can only trigger when entry at index 0 moves out of
-			 * or back to root: none yet gotten, safe to restart.
+			 * Otherwise, shmem/tmpfs must be storing a swap entry
+			 * here as an exceptional entry: so skip over it -
+			 * we only reach this from invalidate_mapping_pages().
 			 */
-			WARN_ON(start | i);
-			goto restart;
+			continue;
 		}
 
 		if (!page_cache_get_speculative(page))
@@ -907,14 +917,20 @@ repeat:
 			continue;
 
 		if (radix_tree_exception(page)) {
-			if (radix_tree_exceptional_entry(page))
-				break;
+			if (radix_tree_deref_retry(page)) {
+				/*
+				 * Transient condition which can only trigger
+				 * when entry at index 0 moves out of or back
+				 * to root: none yet gotten, safe to restart.
+				 */
+				goto restart;
+			}
 			/*
-			 * radix_tree_deref_retry(page):
-			 * can only trigger when entry at index 0 moves out of
-			 * or back to root: none yet gotten, safe to restart.
+			 * Otherwise, shmem/tmpfs must be storing a swap entry
+			 * here as an exceptional entry: so stop looking for
+			 * contiguous pages.
 			 */
-			goto restart;
+			break;
 		}
 
 		if (!page_cache_get_speculative(page))
@@ -976,13 +992,19 @@ repeat:
 			continue;
 
 		if (radix_tree_exception(page)) {
-			BUG_ON(radix_tree_exceptional_entry(page));
+			if (radix_tree_deref_retry(page)) {
+				/*
+				 * Transient condition which can only trigger
+				 * when entry at index 0 moves out of or back
+				 * to root: none yet gotten, safe to restart.
+				 */
+				goto restart;
+			}
 			/*
-			 * radix_tree_deref_retry(page):
-			 * can only trigger when entry at index 0 moves out of
-			 * or back to root: none yet gotten, safe to restart.
+			 * This function is never used on a shmem/tmpfs
+			 * mapping, so a swap entry won't be found here.
 			 */
-			goto restart;
+			BUG();
 		}
 
 		if (!page_cache_get_speculative(page))
--- mmotm.orig/mm/mincore.c	2011-07-08 18:57:15.174704475 -0700
+++ mmotm/mm/mincore.c	2011-07-19 11:13:31.949882063 -0700
@@ -72,6 +72,7 @@ static unsigned char mincore_page(struct
 	 */
 	page = find_get_page(mapping, pgoff);
 #ifdef CONFIG_SWAP
+	/* shmem/tmpfs may return swap: account for swapcache page too. */
 	if (radix_tree_exceptional_entry(page)) {
 		swp_entry_t swap = radix_to_swp_entry(page);
 		page = find_get_page(&swapper_space, swap.val);
--- mmotm.orig/mm/shmem.c	2011-07-19 11:11:33.709295729 -0700
+++ mmotm/mm/shmem.c	2011-07-19 11:13:31.953882076 -0700
@@ -332,10 +332,14 @@ repeat:
 		if (unlikely(!page))
 			continue;
 		if (radix_tree_exception(page)) {
-			if (radix_tree_exceptional_entry(page))
-				goto export;
-			/* radix_tree_deref_retry(page) */
-			goto restart;
+			if (radix_tree_deref_retry(page))
+				goto restart;
+			/*
+			 * Otherwise, we must be storing a swap entry
+			 * here as an exceptional entry: so return it
+			 * without attempting to raise page count.
+			 */
+			goto export;
 		}
 		if (!page_cache_get_speculative(page))
 			goto repeat;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff
  2011-07-19 22:54   ` Hugh Dickins
@ 2011-07-27 23:28     ` Andrew Morton
  -1 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2011-07-27 23:28 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-kernel, linux-mm

On Tue, 19 Jul 2011 15:54:23 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

> We have already acknowledged that swapoff of a tmpfs file is slower
> than it was before conversion to the generic radix_tree: a little
> slower there will be acceptable, if the hotter paths are faster.
> 
> But it was a shock to find swapoff of a 500MB file 20 times slower
> on my laptop, taking 10 minutes; and at that rate it significantly
> slows down my testing.

So it used to take half a minute?  That was already awful.  Why?  Was
it IO-bound?  It doesn't sound like it.

> Now, most of that turned out to be overhead from PROVE_LOCKING and
> PROVE_RCU: without those it was only 4 times slower than before;
> and more realistic tests on other machines don't fare as badly.

What's unrealistic about doing swapoff of a 500MB tmpfs file?

Also, confused.  You're talking about creating a regular file on tmpfs
and then using that as a swapfile?  If so, that's a
kernel-hacker-curiosity only?

> I've tried a number of things to improve it, including tagging the
> swap entries, then doing lookup by tag: I'd expected that to halve
> the time, but in practice it's erratic, and often counter-productive.
> 
> The only change I've so far found to make a consistent improvement,
> is to short-circuit the way we go back and forth, gang lookup packing
> entries into the array supplied, then shmem scanning that array for the
> target entry.  Scanning in place doubles the speed, so it's now only
> twice as slow as before (or three times slower when the PROVEs are on).
> 
> So, add radix_tree_locate_item() as an expedient, once-off, single-caller
> hack to do the lookup directly in place.  #ifdef it on CONFIG_SHMEM and
> CONFIG_SWAP, as much to document its limited applicability as save space
> in other configurations.  And, sadly, #include sched.h for cond_resched().
> 

How much did that 10 minutes improve?


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff
@ 2011-07-27 23:28     ` Andrew Morton
  0 siblings, 0 replies; 14+ messages in thread
From: Andrew Morton @ 2011-07-27 23:28 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-kernel, linux-mm

On Tue, 19 Jul 2011 15:54:23 -0700 (PDT)
Hugh Dickins <hughd@google.com> wrote:

> We have already acknowledged that swapoff of a tmpfs file is slower
> than it was before conversion to the generic radix_tree: a little
> slower there will be acceptable, if the hotter paths are faster.
> 
> But it was a shock to find swapoff of a 500MB file 20 times slower
> on my laptop, taking 10 minutes; and at that rate it significantly
> slows down my testing.

So it used to take half a minute?  That was already awful.  Why?  Was
it IO-bound?  It doesn't sound like it.

> Now, most of that turned out to be overhead from PROVE_LOCKING and
> PROVE_RCU: without those it was only 4 times slower than before;
> and more realistic tests on other machines don't fare as badly.

What's unrealistic about doing swapoff of a 500MB tmpfs file?

Also, confused.  You're talking about creating a regular file on tmpfs
and then using that as a swapfile?  If so, that's a
kernel-hacker-curiosity only?

> I've tried a number of things to improve it, including tagging the
> swap entries, then doing lookup by tag: I'd expected that to halve
> the time, but in practice it's erratic, and often counter-productive.
> 
> The only change I've so far found to make a consistent improvement,
> is to short-circuit the way we go back and forth, gang lookup packing
> entries into the array supplied, then shmem scanning that array for the
> target entry.  Scanning in place doubles the speed, so it's now only
> twice as slow as before (or three times slower when the PROVEs are on).
> 
> So, add radix_tree_locate_item() as an expedient, once-off, single-caller
> hack to do the lookup directly in place.  #ifdef it on CONFIG_SHMEM and
> CONFIG_SWAP, as much to document its limited applicability as save space
> in other configurations.  And, sadly, #include sched.h for cond_resched().
> 

How much did that 10 minutes improve?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff
  2011-07-27 23:28     ` Andrew Morton
@ 2011-07-28  1:54       ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-28  1:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, linux-mm

On Wed, 27 Jul 2011, Andrew Morton wrote:
> On Tue, 19 Jul 2011 15:54:23 -0700 (PDT)
> Hugh Dickins <hughd@google.com> wrote:
> 
> > We have already acknowledged that swapoff of a tmpfs file is slower
> > than it was before conversion to the generic radix_tree: a little
> > slower there will be acceptable, if the hotter paths are faster.
> > 
> > But it was a shock to find swapoff of a 500MB file 20 times slower
> > on my laptop, taking 10 minutes; and at that rate it significantly
> > slows down my testing.
> 
> So it used to take half a minute?  That was already awful.

Yes, awful, half a minute on 3.0 (and everything before it back to
around 2.4.10 I imagine).  But no complaints have reached my ears for
years: it's as if I'm the only one to notice (pray I'm not opening a
floodgate with that remark).

> Why?  Was it IO-bound?  It doesn't sound like it.

No, not IO-bound at all.  One of the alternatives I did try was to
readahead the IO, but that was just irrelevant: it's the poor cpu
searching through (old or new) radix tree memory to find each
matching swap entry, one after another.

We've always taken the view, that until someone complains, leave
swapoff simple and slow, rather than being a little cleverer about
it, using more memory for hashes or batching the lookups.

And I don't think we'd want to get into making it cleverer now.
I expect we shall want to move away from putting swap entries, which
encode final disk destination, in page tables and tmpfs trees: we'll
want another layer of indirection, so we can shuffle swap around
between fast and slow devices, without troubling the rest of mm.

Rik pointed out that swapoff then comes down to just the IO: with a
layer of indirection in there, a swapcache page can be valid without
disk backing, without having to fix up page tables and tmpfs trees.

> 
> > Now, most of that turned out to be overhead from PROVE_LOCKING and
> > PROVE_RCU: without those it was only 4 times slower than before;
> > and more realistic tests on other machines don't fare as badly.
> 
> What's unrealistic about doing swapoff of a 500MB tmpfs file?

It's not unrealistic, but it happened to be a simple artificial test
I did for this; my usual swap testing appeared not to suffer so badly,
though there was still a noticeable and tiresome slowdown.

> 
> Also, confused.  You're talking about creating a regular file on tmpfs
> and then using that as a swapfile?

No, that is and must be prohibited (the lack of bmap used to catch
that case, now lack of readpage catches it sooner).  With tmpfs pages
pushed out to swap, it's not a good idea to have your swap on tmpfs!

> If so, that's a kernel-hacker-curiosity only?

No, this is just the usual business of the pages of a tmpfs file being
pushed out to swap under memory pressure.  Then later swapoff bringing
them back into memory, and connecting them back to the tmpfs file.

> 
> > I've tried a number of things to improve it, including tagging the
> > swap entries, then doing lookup by tag: I'd expected that to halve
> > the time, but in practice it's erratic, and often counter-productive.
> > 
> > The only change I've so far found to make a consistent improvement,
> > is to short-circuit the way we go back and forth, gang lookup packing
> > entries into the array supplied, then shmem scanning that array for the
> > target entry.  Scanning in place doubles the speed, so it's now only
> > twice as slow as before (or three times slower when the PROVEs are on).
> > 
> > So, add radix_tree_locate_item() as an expedient, once-off, single-caller
> > hack to do the lookup directly in place.  #ifdef it on CONFIG_SHMEM and
> > CONFIG_SWAP, as much to document its limited applicability as save space
> > in other configurations.  And, sadly, #include sched.h for cond_resched().
> > 
> 
> How much did that 10 minutes improve?

To 1 minute: still twice as slow as before.  I believe that's because of
the smaller nodes and greater height of the generic radix tree.  I ought
to experiment with a bigger RADIX_TREE_MAP_SHIFT to verify that belief
(though I don't think tmpfs swapoff would justify raising it): will do.

Hugh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff
@ 2011-07-28  1:54       ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-28  1:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, linux-mm

On Wed, 27 Jul 2011, Andrew Morton wrote:
> On Tue, 19 Jul 2011 15:54:23 -0700 (PDT)
> Hugh Dickins <hughd@google.com> wrote:
> 
> > We have already acknowledged that swapoff of a tmpfs file is slower
> > than it was before conversion to the generic radix_tree: a little
> > slower there will be acceptable, if the hotter paths are faster.
> > 
> > But it was a shock to find swapoff of a 500MB file 20 times slower
> > on my laptop, taking 10 minutes; and at that rate it significantly
> > slows down my testing.
> 
> So it used to take half a minute?  That was already awful.

Yes, awful, half a minute on 3.0 (and everything before it back to
around 2.4.10 I imagine).  But no complaints have reached my ears for
years: it's as if I'm the only one to notice (pray I'm not opening a
floodgate with that remark).

> Why?  Was it IO-bound?  It doesn't sound like it.

No, not IO-bound at all.  One of the alternatives I did try was to
readahead the IO, but that was just irrelevant: it's the poor cpu
searching through (old or new) radix tree memory to find each
matching swap entry, one after another.

We've always taken the view, that until someone complains, leave
swapoff simple and slow, rather than being a little cleverer about
it, using more memory for hashes or batching the lookups.

And I don't think we'd want to get into making it cleverer now.
I expect we shall want to move away from putting swap entries, which
encode final disk destination, in page tables and tmpfs trees: we'll
want another layer of indirection, so we can shuffle swap around
between fast and slow devices, without troubling the rest of mm.

Rik pointed out that swapoff then comes down to just the IO: with a
layer of indirection in there, a swapcache page can be valid without
disk backing, without having to fix up page tables and tmpfs trees.

> 
> > Now, most of that turned out to be overhead from PROVE_LOCKING and
> > PROVE_RCU: without those it was only 4 times slower than before;
> > and more realistic tests on other machines don't fare as badly.
> 
> What's unrealistic about doing swapoff of a 500MB tmpfs file?

It's not unrealistic, but it happened to be a simple artificial test
I did for this; my usual swap testing appeared not to suffer so badly,
though there was still a noticeable and tiresome slowdown.

> 
> Also, confused.  You're talking about creating a regular file on tmpfs
> and then using that as a swapfile?

No, that is and must be prohibited (the lack of bmap used to catch
that case, now lack of readpage catches it sooner).  With tmpfs pages
pushed out to swap, it's not a good idea to have your swap on tmpfs!

> If so, that's a kernel-hacker-curiosity only?

No, this is just the usual business of the pages of a tmpfs file being
pushed out to swap under memory pressure.  Then later swapoff bringing
them back into memory, and connecting them back to the tmpfs file.

> 
> > I've tried a number of things to improve it, including tagging the
> > swap entries, then doing lookup by tag: I'd expected that to halve
> > the time, but in practice it's erratic, and often counter-productive.
> > 
> > The only change I've so far found to make a consistent improvement,
> > is to short-circuit the way we go back and forth, gang lookup packing
> > entries into the array supplied, then shmem scanning that array for the
> > target entry.  Scanning in place doubles the speed, so it's now only
> > twice as slow as before (or three times slower when the PROVEs are on).
> > 
> > So, add radix_tree_locate_item() as an expedient, once-off, single-caller
> > hack to do the lookup directly in place.  #ifdef it on CONFIG_SHMEM and
> > CONFIG_SWAP, as much to document its limited applicability as save space
> > in other configurations.  And, sadly, #include sched.h for cond_resched().
> > 
> 
> How much did that 10 minutes improve?

To 1 minute: still twice as slow as before.  I believe that's because of
the smaller nodes and greater height of the generic radix tree.  I ought
to experiment with a bigger RADIX_TREE_MAP_SHIFT to verify that belief
(though I don't think tmpfs swapoff would justify raising it): will do.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff
  2011-07-28  1:54       ` Hugh Dickins
@ 2011-07-28  9:26         ` Hugh Dickins
  -1 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-28  9:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, linux-mm

On Wed, 27 Jul 2011, Hugh Dickins wrote:
> On Wed, 27 Jul 2011, Andrew Morton wrote:
> > On Tue, 19 Jul 2011 15:54:23 -0700 (PDT)
> > Hugh Dickins <hughd@google.com> wrote:
> > 
> > > But it was a shock to find swapoff of a 500MB file 20 times slower
> > > on my laptop, taking 10 minutes; and at that rate it significantly
> > > slows down my testing.
> > 
> > So it used to take half a minute?  That was already awful.
> > Why?  Was it IO-bound?  It doesn't sound like it.
> 
> No, not IO-bound at all.

I oversimplified: about 10 seconds of that was waiting for IO,
the rest (of 10 minutes or of half a minute) was cpu.  It's the cpu
part of it which the change of radix tree has affected, for the worse.

> > How much did that 10 minutes improve?
> 
> To 1 minute: still twice as slow as before.  I believe that's because of
> the smaller nodes and greater height of the generic radix tree.  I ought
> to experiment with a bigger RADIX_TREE_MAP_SHIFT to verify that belief
> (though I don't think tmpfs swapoff would justify raising it): will do.

Yes, raising RADIX_TREE_MAP_SHIFT from 6 to 10 (so on 32-bit the rnode
is just over 4kB, comparable with the old shmem's use of pages for this)
brings the time down considerably: still slower than before, but 12%
slower instead of twice as slow (or 20% slower instead of 3 times as
slow when comparing sys times).

Not that making a radix_tree_node need order:1 page would be sensible.

Hugh

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff
@ 2011-07-28  9:26         ` Hugh Dickins
  0 siblings, 0 replies; 14+ messages in thread
From: Hugh Dickins @ 2011-07-28  9:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Rik van Riel, linux-kernel, linux-mm

On Wed, 27 Jul 2011, Hugh Dickins wrote:
> On Wed, 27 Jul 2011, Andrew Morton wrote:
> > On Tue, 19 Jul 2011 15:54:23 -0700 (PDT)
> > Hugh Dickins <hughd@google.com> wrote:
> > 
> > > But it was a shock to find swapoff of a 500MB file 20 times slower
> > > on my laptop, taking 10 minutes; and at that rate it significantly
> > > slows down my testing.
> > 
> > So it used to take half a minute?  That was already awful.
> > Why?  Was it IO-bound?  It doesn't sound like it.
> 
> No, not IO-bound at all.

I oversimplified: about 10 seconds of that was waiting for IO,
the rest (of 10 minutes or of half a minute) was cpu.  It's the cpu
part of it which the change of radix tree has affected, for the worse.

> > How much did that 10 minutes improve?
> 
> To 1 minute: still twice as slow as before.  I believe that's because of
> the smaller nodes and greater height of the generic radix tree.  I ought
> to experiment with a bigger RADIX_TREE_MAP_SHIFT to verify that belief
> (though I don't think tmpfs swapoff would justify raising it): will do.

Yes, raising RADIX_TREE_MAP_SHIFT from 6 to 10 (so on 32-bit the rnode
is just over 4kB, comparable with the old shmem's use of pages for this)
brings the time down considerably: still slower than before, but 12%
slower instead of twice as slow (or 20% slower instead of 3 times as
slow when comparing sys times).

Not that making a radix_tree_node need order:1 page would be sensible.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-07-28  9:27 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-19 22:51 [PATCH 0/3] mm: tmpfs radix_tree swap leftovers Hugh Dickins
2011-07-19 22:51 ` Hugh Dickins
2011-07-19 22:52 ` [PATCH 1/3] radix_tree: clean away saw_unset_tag leftovers Hugh Dickins
2011-07-19 22:52   ` Hugh Dickins
2011-07-19 22:54 ` [PATCH 2/3] tmpfs radix_tree: locate_item to speed up swapoff Hugh Dickins
2011-07-19 22:54   ` Hugh Dickins
2011-07-27 23:28   ` Andrew Morton
2011-07-27 23:28     ` Andrew Morton
2011-07-28  1:54     ` Hugh Dickins
2011-07-28  1:54       ` Hugh Dickins
2011-07-28  9:26       ` Hugh Dickins
2011-07-28  9:26         ` Hugh Dickins
2011-07-19 22:56 ` [PATCH 3/3] mm: clarify the radix_tree exceptional cases Hugh Dickins
2011-07-19 22:56   ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.