All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Fix races & improve the radix tree iterator patterns
@ 2016-01-27 21:17 ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

From: Matthew Wilcox <willy@linux.intel.com>

The first two patches here are bugfixes, and I would like to see them
make their way into stable ASAP since they can lead to data corruption
(very low probabilty).

The last three patches do not qualify as bugfixes.  They simply improve
the standard pattern used to do radix tree iterations by removing the
'goto restart' part.  Partially this is because this is an ugly &
confusing goto, and partially because with multi-order entries in the
tree, it'll be more likely that we'll see an indirect_ptr bit, and
it's more efficient to kep going from the point of the iteration we're
currently in than restart from the beginning each time.

Matthew Wilcox (5):
  radix-tree: Fix race in gang lookup
  hwspinlock: Fix race between radix tree insertion and lookup
  btrfs: Use radix_tree_iter_retry()
  mm: Use radix_tree_iter_retry()
  radix-tree,shmem: Introduce radix_tree_iter_next()

 drivers/hwspinlock/hwspinlock_core.c |  4 +++
 fs/btrfs/tests/btrfs-tests.c         |  3 +-
 include/linux/radix-tree.h           | 31 +++++++++++++++++++++
 lib/radix-tree.c                     | 12 ++++++--
 mm/filemap.c                         | 53 ++++++++++++------------------------
 mm/shmem.c                           | 30 ++++++++++----------
 6 files changed, 78 insertions(+), 55 deletions(-)

-- 
2.7.0.rc3

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 0/5] Fix races & improve the radix tree iterator patterns
@ 2016-01-27 21:17 ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

From: Matthew Wilcox <willy@linux.intel.com>

The first two patches here are bugfixes, and I would like to see them
make their way into stable ASAP since they can lead to data corruption
(very low probabilty).

The last three patches do not qualify as bugfixes.  They simply improve
the standard pattern used to do radix tree iterations by removing the
'goto restart' part.  Partially this is because this is an ugly &
confusing goto, and partially because with multi-order entries in the
tree, it'll be more likely that we'll see an indirect_ptr bit, and
it's more efficient to kep going from the point of the iteration we're
currently in than restart from the beginning each time.

Matthew Wilcox (5):
  radix-tree: Fix race in gang lookup
  hwspinlock: Fix race between radix tree insertion and lookup
  btrfs: Use radix_tree_iter_retry()
  mm: Use radix_tree_iter_retry()
  radix-tree,shmem: Introduce radix_tree_iter_next()

 drivers/hwspinlock/hwspinlock_core.c |  4 +++
 fs/btrfs/tests/btrfs-tests.c         |  3 +-
 include/linux/radix-tree.h           | 31 +++++++++++++++++++++
 lib/radix-tree.c                     | 12 ++++++--
 mm/filemap.c                         | 53 ++++++++++++------------------------
 mm/shmem.c                           | 30 ++++++++++----------
 6 files changed, 78 insertions(+), 55 deletions(-)

-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/5] radix-tree: Fix race in gang lookup
  2016-01-27 21:17 ` Matthew Wilcox
@ 2016-01-27 21:17   ` Matthew Wilcox
  -1 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm, stable

From: Matthew Wilcox <willy@linux.intel.com>

If the indirect_ptr bit is set on a slot, that indicates we need to
redo the lookup.  Introduce a new function radix_tree_iter_retry()
which forces the loop to retry the lookup by setting 'slot' to NULL and
turning the iterator back to point at the problematic entry.

This is a pretty rare problem to hit at the moment; the lookup has to
race with a grow of the radix tree from a height of 0.  The consequences
of hitting this race are that gang lookup could return a pointer to a
radix_tree_node instead of a pointer to whatever the user had inserted
in the tree.

Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: stable@vger.kernel.org
---
 include/linux/radix-tree.h | 16 ++++++++++++++++
 lib/radix-tree.c           | 12 ++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index f9a3da5bf892..db0ed595749b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
 			     struct radix_tree_iter *iter, unsigned flags);
 
 /**
+ * radix_tree_iter_retry - retry this chunk of the iteration
+ * @iter:	iterator state
+ *
+ * If we iterate over a tree protected only by the RCU lock, a race
+ * against deletion or creation may result in seeing a slot for which
+ * radix_tree_deref_retry() returns true.  If so, call this function
+ * and continue the iteration.
+ */
+static inline __must_check
+void **radix_tree_iter_retry(struct radix_tree_iter *iter)
+{
+	iter->next_index = iter->index;
+	return NULL;
+}
+
+/**
  * radix_tree_chunk_size - get current chunk size
  *
  * @iter:	pointer to radix tree iterator
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index a25f635dcc56..65422ac17114 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
 		return 0;
 
 	radix_tree_for_each_slot(slot, root, &iter, first_index) {
-		results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
+		results[ret] = rcu_dereference_raw(*slot);
 		if (!results[ret])
 			continue;
+		if (radix_tree_is_indirect_ptr(results[ret])) {
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
 		if (++ret == max_items)
 			break;
 	}
@@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
 		return 0;
 
 	radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
-		results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
+		results[ret] = rcu_dereference_raw(*slot);
 		if (!results[ret])
 			continue;
+		if (radix_tree_is_indirect_ptr(results[ret])) {
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
 		if (++ret == max_items)
 			break;
 	}
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 1/5] radix-tree: Fix race in gang lookup
@ 2016-01-27 21:17   ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm, stable

From: Matthew Wilcox <willy@linux.intel.com>

If the indirect_ptr bit is set on a slot, that indicates we need to
redo the lookup.  Introduce a new function radix_tree_iter_retry()
which forces the loop to retry the lookup by setting 'slot' to NULL and
turning the iterator back to point at the problematic entry.

This is a pretty rare problem to hit at the moment; the lookup has to
race with a grow of the radix tree from a height of 0.  The consequences
of hitting this race are that gang lookup could return a pointer to a
radix_tree_node instead of a pointer to whatever the user had inserted
in the tree.

Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: stable@vger.kernel.org
---
 include/linux/radix-tree.h | 16 ++++++++++++++++
 lib/radix-tree.c           | 12 ++++++++++--
 2 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index f9a3da5bf892..db0ed595749b 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
 			     struct radix_tree_iter *iter, unsigned flags);
 
 /**
+ * radix_tree_iter_retry - retry this chunk of the iteration
+ * @iter:	iterator state
+ *
+ * If we iterate over a tree protected only by the RCU lock, a race
+ * against deletion or creation may result in seeing a slot for which
+ * radix_tree_deref_retry() returns true.  If so, call this function
+ * and continue the iteration.
+ */
+static inline __must_check
+void **radix_tree_iter_retry(struct radix_tree_iter *iter)
+{
+	iter->next_index = iter->index;
+	return NULL;
+}
+
+/**
  * radix_tree_chunk_size - get current chunk size
  *
  * @iter:	pointer to radix tree iterator
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index a25f635dcc56..65422ac17114 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
 		return 0;
 
 	radix_tree_for_each_slot(slot, root, &iter, first_index) {
-		results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
+		results[ret] = rcu_dereference_raw(*slot);
 		if (!results[ret])
 			continue;
+		if (radix_tree_is_indirect_ptr(results[ret])) {
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
 		if (++ret == max_items)
 			break;
 	}
@@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
 		return 0;
 
 	radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
-		results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
+		results[ret] = rcu_dereference_raw(*slot);
 		if (!results[ret])
 			continue;
+		if (radix_tree_is_indirect_ptr(results[ret])) {
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
 		if (++ret == max_items)
 			break;
 	}
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/5] hwspinlock: Fix race between radix tree insertion and lookup
  2016-01-27 21:17 ` Matthew Wilcox
@ 2016-01-27 21:17   ` Matthew Wilcox
  -1 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm, stable

From: Matthew Wilcox <willy@linux.intel.com>

of_hwspin_lock_get_id() is protected by the RCU lock, which means that
insertions can occur simultaneously with the lookup.  If the radix tree
transitions from a height of 0, we can see a slot with the indirect_ptr
bit set, which will cause us to at least read random memory, and could
cause other havoc.

Fix this by using the newly introduced radix_tree_iter_retry().

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/hwspinlock/hwspinlock_core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/hwspinlock/hwspinlock_core.c b/drivers/hwspinlock/hwspinlock_core.c
index 52f708bcf77f..d50c701b19d6 100644
--- a/drivers/hwspinlock/hwspinlock_core.c
+++ b/drivers/hwspinlock/hwspinlock_core.c
@@ -313,6 +313,10 @@ int of_hwspin_lock_get_id(struct device_node *np, int index)
 		hwlock = radix_tree_deref_slot(slot);
 		if (unlikely(!hwlock))
 			continue;
+		if (radix_tree_is_indirect_ptr(hwlock)) {
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
 
 		if (hwlock->bank->dev->of_node == args.np) {
 			ret = 0;
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/5] hwspinlock: Fix race between radix tree insertion and lookup
@ 2016-01-27 21:17   ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm, stable

From: Matthew Wilcox <willy@linux.intel.com>

of_hwspin_lock_get_id() is protected by the RCU lock, which means that
insertions can occur simultaneously with the lookup.  If the radix tree
transitions from a height of 0, we can see a slot with the indirect_ptr
bit set, which will cause us to at least read random memory, and could
cause other havoc.

Fix this by using the newly introduced radix_tree_iter_retry().

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: stable@vger.kernel.org
---
 drivers/hwspinlock/hwspinlock_core.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/hwspinlock/hwspinlock_core.c b/drivers/hwspinlock/hwspinlock_core.c
index 52f708bcf77f..d50c701b19d6 100644
--- a/drivers/hwspinlock/hwspinlock_core.c
+++ b/drivers/hwspinlock/hwspinlock_core.c
@@ -313,6 +313,10 @@ int of_hwspin_lock_get_id(struct device_node *np, int index)
 		hwlock = radix_tree_deref_slot(slot);
 		if (unlikely(!hwlock))
 			continue;
+		if (radix_tree_is_indirect_ptr(hwlock)) {
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
 
 		if (hwlock->bank->dev->of_node == args.np) {
 			ret = 0;
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/5] btrfs: Use radix_tree_iter_retry()
  2016-01-27 21:17 ` Matthew Wilcox
@ 2016-01-27 21:17   ` Matthew Wilcox
  -1 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

From: Matthew Wilcox <willy@linux.intel.com>

Even though this is a 'can't happen' situation, use the new
radix_tree_iter_retry() pattern to eliminate a goto.
---
 fs/btrfs/tests/btrfs-tests.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index b1d920b30070..0da78c54317a 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -137,7 +137,6 @@ static void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info)
 	void **slot;
 
 	spin_lock(&fs_info->buffer_lock);
-restart:
 	radix_tree_for_each_slot(slot, &fs_info->buffer_radix, &iter, 0) {
 		struct extent_buffer *eb;
 
@@ -147,7 +146,7 @@ restart:
 		/* Shouldn't happen but that kind of thinking creates CVE's */
 		if (radix_tree_exception(eb)) {
 			if (radix_tree_deref_retry(eb))
-				goto restart;
+				slot = radix_tree_iter_retry(iter);
 			continue;
 		}
 		spin_unlock(&fs_info->buffer_lock);
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/5] btrfs: Use radix_tree_iter_retry()
@ 2016-01-27 21:17   ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

From: Matthew Wilcox <willy@linux.intel.com>

Even though this is a 'can't happen' situation, use the new
radix_tree_iter_retry() pattern to eliminate a goto.
---
 fs/btrfs/tests/btrfs-tests.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/tests/btrfs-tests.c b/fs/btrfs/tests/btrfs-tests.c
index b1d920b30070..0da78c54317a 100644
--- a/fs/btrfs/tests/btrfs-tests.c
+++ b/fs/btrfs/tests/btrfs-tests.c
@@ -137,7 +137,6 @@ static void btrfs_free_dummy_fs_info(struct btrfs_fs_info *fs_info)
 	void **slot;
 
 	spin_lock(&fs_info->buffer_lock);
-restart:
 	radix_tree_for_each_slot(slot, &fs_info->buffer_radix, &iter, 0) {
 		struct extent_buffer *eb;
 
@@ -147,7 +146,7 @@ restart:
 		/* Shouldn't happen but that kind of thinking creates CVE's */
 		if (radix_tree_exception(eb)) {
 			if (radix_tree_deref_retry(eb))
-				goto restart;
+				slot = radix_tree_iter_retry(iter);
 			continue;
 		}
 		spin_unlock(&fs_info->buffer_lock);
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/5] mm: Use radix_tree_iter_retry()
  2016-01-27 21:17 ` Matthew Wilcox
@ 2016-01-27 21:17   ` Matthew Wilcox
  -1 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

From: Matthew Wilcox <willy@linux.intel.com>

Instead of a 'goto restart', we can now use radix_tree_iter_retry()
to restart from our current position.  This will make a difference
when there are more ways to happen across an indirect pointer.  And it
eliminates some confusing gotos.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 mm/filemap.c | 53 +++++++++++++++++------------------------------------
 mm/shmem.c   | 18 ++++++++++++------
 2 files changed, 29 insertions(+), 42 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 7705dac561ba..e0c4b905fe1c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1245,7 +1245,6 @@ unsigned find_get_entries(struct address_space *mapping,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		struct page *page;
 repeat:
@@ -1253,8 +1252,10 @@ repeat:
 		if (unlikely(!page))
 			continue;
 		if (radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				goto restart;
+			if (radix_tree_deref_retry(page)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
 			/*
 			 * A shadow entry of a recently evicted page, a swap
 			 * entry from shmem/tmpfs or a DAX entry.  Return it
@@ -1307,7 +1308,6 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		struct page *page;
 repeat:
@@ -1317,13 +1317,8 @@ repeat:
 
 		if (radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
-				/*
-				 * Transient condition which can only trigger
-				 * when entry at index 0 moves out of or back
-				 * to root: none yet gotten, safe to restart.
-				 */
-				WARN_ON(iter.index);
-				goto restart;
+				slot = radix_tree_iter_retry(&iter);
+				continue;
 			}
 			/*
 			 * A shadow entry of a recently evicted page,
@@ -1374,7 +1369,6 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_contig(slot, &mapping->page_tree, &iter, index) {
 		struct page *page;
 repeat:
@@ -1385,12 +1379,8 @@ repeat:
 
 		if (radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
-				/*
-				 * Transient condition which can only trigger
-				 * when entry at index 0 moves out of or back
-				 * to root: none yet gotten, safe to restart.
-				 */
-				goto restart;
+				slot = radix_tree_iter_retry(&iter);
+				continue;
 			}
 			/*
 			 * A shadow entry of a recently evicted page,
@@ -1450,7 +1440,6 @@ unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_tagged(slot, &mapping->page_tree,
 				   &iter, *index, tag) {
 		struct page *page;
@@ -1461,12 +1450,8 @@ repeat:
 
 		if (radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
-				/*
-				 * Transient condition which can only trigger
-				 * when entry at index 0 moves out of or back
-				 * to root: none yet gotten, safe to restart.
-				 */
-				goto restart;
+				slot = radix_tree_iter_retry(&iter);
+				continue;
 			}
 			/*
 			 * A shadow entry of a recently evicted page.
@@ -1529,7 +1514,6 @@ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_tagged(slot, &mapping->page_tree,
 				   &iter, start, tag) {
 		struct page *page;
@@ -1539,12 +1523,8 @@ repeat:
 			continue;
 		if (radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
-				/*
-				 * Transient condition which can only trigger
-				 * when entry at index 0 moves out of or back
-				 * to root: none yet gotten, safe to restart.
-				 */
-				goto restart;
+				slot = radix_tree_iter_retry(&iter);
+				continue;
 			}
 
 			/*
@@ -2151,10 +2131,11 @@ repeat:
 		if (unlikely(!page))
 			goto next;
 		if (radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				break;
-			else
-				goto next;
+			if (radix_tree_deref_retry(page)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
+			goto next;
 		}
 
 		if (!page_cache_get_speculative(page))
diff --git a/mm/shmem.c b/mm/shmem.c
index fa2ceb2d2655..6ec14b70d82d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -388,8 +388,10 @@ restart:
 		 * don't need to reset the counter, nor do we risk infinite
 		 * restarts.
 		 */
-		if (radix_tree_deref_retry(page))
-			goto restart;
+		if (radix_tree_deref_retry(page)) {
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
 
 		if (radix_tree_exceptional_entry(page))
 			swapped++;
@@ -1952,8 +1954,10 @@ restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		page = radix_tree_deref_slot(slot);
 		if (!page || radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				goto restart;
+			if (radix_tree_deref_retry(page)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
 		} else if (page_count(page) - page_mapcount(page) > 1) {
 			spin_lock_irq(&mapping->tree_lock);
 			radix_tree_tag_set(&mapping->page_tree, iter.index,
@@ -2007,8 +2011,10 @@ restart:
 
 			page = radix_tree_deref_slot(slot);
 			if (radix_tree_exception(page)) {
-				if (radix_tree_deref_retry(page))
-					goto restart;
+				if (radix_tree_deref_retry(page)) {
+					slot = radix_tree_iter_retry(&iter);
+					continue;
+				}
 
 				page = NULL;
 			}
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 4/5] mm: Use radix_tree_iter_retry()
@ 2016-01-27 21:17   ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

From: Matthew Wilcox <willy@linux.intel.com>

Instead of a 'goto restart', we can now use radix_tree_iter_retry()
to restart from our current position.  This will make a difference
when there are more ways to happen across an indirect pointer.  And it
eliminates some confusing gotos.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 mm/filemap.c | 53 +++++++++++++++++------------------------------------
 mm/shmem.c   | 18 ++++++++++++------
 2 files changed, 29 insertions(+), 42 deletions(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 7705dac561ba..e0c4b905fe1c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1245,7 +1245,6 @@ unsigned find_get_entries(struct address_space *mapping,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		struct page *page;
 repeat:
@@ -1253,8 +1252,10 @@ repeat:
 		if (unlikely(!page))
 			continue;
 		if (radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				goto restart;
+			if (radix_tree_deref_retry(page)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
 			/*
 			 * A shadow entry of a recently evicted page, a swap
 			 * entry from shmem/tmpfs or a DAX entry.  Return it
@@ -1307,7 +1308,6 @@ unsigned find_get_pages(struct address_space *mapping, pgoff_t start,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		struct page *page;
 repeat:
@@ -1317,13 +1317,8 @@ repeat:
 
 		if (radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
-				/*
-				 * Transient condition which can only trigger
-				 * when entry at index 0 moves out of or back
-				 * to root: none yet gotten, safe to restart.
-				 */
-				WARN_ON(iter.index);
-				goto restart;
+				slot = radix_tree_iter_retry(&iter);
+				continue;
 			}
 			/*
 			 * A shadow entry of a recently evicted page,
@@ -1374,7 +1369,6 @@ unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t index,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_contig(slot, &mapping->page_tree, &iter, index) {
 		struct page *page;
 repeat:
@@ -1385,12 +1379,8 @@ repeat:
 
 		if (radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
-				/*
-				 * Transient condition which can only trigger
-				 * when entry at index 0 moves out of or back
-				 * to root: none yet gotten, safe to restart.
-				 */
-				goto restart;
+				slot = radix_tree_iter_retry(&iter);
+				continue;
 			}
 			/*
 			 * A shadow entry of a recently evicted page,
@@ -1450,7 +1440,6 @@ unsigned find_get_pages_tag(struct address_space *mapping, pgoff_t *index,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_tagged(slot, &mapping->page_tree,
 				   &iter, *index, tag) {
 		struct page *page;
@@ -1461,12 +1450,8 @@ repeat:
 
 		if (radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
-				/*
-				 * Transient condition which can only trigger
-				 * when entry at index 0 moves out of or back
-				 * to root: none yet gotten, safe to restart.
-				 */
-				goto restart;
+				slot = radix_tree_iter_retry(&iter);
+				continue;
 			}
 			/*
 			 * A shadow entry of a recently evicted page.
@@ -1529,7 +1514,6 @@ unsigned find_get_entries_tag(struct address_space *mapping, pgoff_t start,
 		return 0;
 
 	rcu_read_lock();
-restart:
 	radix_tree_for_each_tagged(slot, &mapping->page_tree,
 				   &iter, start, tag) {
 		struct page *page;
@@ -1539,12 +1523,8 @@ repeat:
 			continue;
 		if (radix_tree_exception(page)) {
 			if (radix_tree_deref_retry(page)) {
-				/*
-				 * Transient condition which can only trigger
-				 * when entry at index 0 moves out of or back
-				 * to root: none yet gotten, safe to restart.
-				 */
-				goto restart;
+				slot = radix_tree_iter_retry(&iter);
+				continue;
 			}
 
 			/*
@@ -2151,10 +2131,11 @@ repeat:
 		if (unlikely(!page))
 			goto next;
 		if (radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				break;
-			else
-				goto next;
+			if (radix_tree_deref_retry(page)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
+			goto next;
 		}
 
 		if (!page_cache_get_speculative(page))
diff --git a/mm/shmem.c b/mm/shmem.c
index fa2ceb2d2655..6ec14b70d82d 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -388,8 +388,10 @@ restart:
 		 * don't need to reset the counter, nor do we risk infinite
 		 * restarts.
 		 */
-		if (radix_tree_deref_retry(page))
-			goto restart;
+		if (radix_tree_deref_retry(page)) {
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
 
 		if (radix_tree_exceptional_entry(page))
 			swapped++;
@@ -1952,8 +1954,10 @@ restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		page = radix_tree_deref_slot(slot);
 		if (!page || radix_tree_exception(page)) {
-			if (radix_tree_deref_retry(page))
-				goto restart;
+			if (radix_tree_deref_retry(page)) {
+				slot = radix_tree_iter_retry(&iter);
+				continue;
+			}
 		} else if (page_count(page) - page_mapcount(page) > 1) {
 			spin_lock_irq(&mapping->tree_lock);
 			radix_tree_tag_set(&mapping->page_tree, iter.index,
@@ -2007,8 +2011,10 @@ restart:
 
 			page = radix_tree_deref_slot(slot);
 			if (radix_tree_exception(page)) {
-				if (radix_tree_deref_retry(page))
-					goto restart;
+				if (radix_tree_deref_retry(page)) {
+					slot = radix_tree_iter_retry(&iter);
+					continue;
+				}
 
 				page = NULL;
 			}
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next()
  2016-01-27 21:17 ` Matthew Wilcox
@ 2016-01-27 21:17   ` Matthew Wilcox
  -1 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

From: Matthew Wilcox <willy@linux.intel.com>

shmem likes to occasionally drop the lock, schedule, then reacqire
the lock and continue with the iteration from the last place it
left off.  This is currently done with a pretty ugly goto.  Introduce
radix_tree_iter_next() and use it throughout shmem.c.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 include/linux/radix-tree.h | 15 +++++++++++++++
 mm/shmem.c                 | 12 +++---------
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index db0ed595749b..dec2c6c77eea 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter)
 }
 
 /**
+ * radix_tree_iter_next - resume iterating when the chunk may be invalid
+ * @iter:	iterator state
+ *
+ * If the iterator needs to release then reacquire a lock, the chunk may
+ * have been invalidated by an insertion or deletion.  Call this function
+ * to continue the iteration from the next index.
+ */
+static inline __must_check
+void **radix_tree_iter_next(struct radix_tree_iter *iter)
+{
+	iter->next_index = iter->index + 1;
+	return NULL;
+}
+
+/**
  * radix_tree_chunk_size - get current chunk size
  *
  * @iter:	pointer to radix tree iterator
diff --git a/mm/shmem.c b/mm/shmem.c
index 6ec14b70d82d..438ea8004c26 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 
 	rcu_read_lock();
 
-restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		if (iter.index >= end)
 			break;
@@ -398,8 +397,7 @@ restart:
 
 		if (need_resched()) {
 			cond_resched_rcu();
-			start = iter.index + 1;
-			goto restart;
+			slot = radix_tree_iter_next(&iter);
 		}
 	}
 
@@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping)
 	start = 0;
 	rcu_read_lock();
 
-restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		page = radix_tree_deref_slot(slot);
 		if (!page || radix_tree_exception(page)) {
@@ -1967,8 +1964,7 @@ restart:
 
 		if (need_resched()) {
 			cond_resched_rcu();
-			start = iter.index + 1;
-			goto restart;
+			slot = radix_tree_iter_next(&iter);
 		}
 	}
 	rcu_read_unlock();
@@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 
 		start = 0;
 		rcu_read_lock();
-restart:
 		radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter,
 					   start, SHMEM_TAG_PINNED) {
 
@@ -2039,8 +2034,7 @@ restart:
 continue_resched:
 			if (need_resched()) {
 				cond_resched_rcu();
-				start = iter.index + 1;
-				goto restart;
+				slot = radix_tree_iter_next(&iter);
 			}
 		}
 		rcu_read_unlock();
-- 
2.7.0.rc3

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next()
@ 2016-01-27 21:17   ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-27 21:17 UTC (permalink / raw)
  To: Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

From: Matthew Wilcox <willy@linux.intel.com>

shmem likes to occasionally drop the lock, schedule, then reacqire
the lock and continue with the iteration from the last place it
left off.  This is currently done with a pretty ugly goto.  Introduce
radix_tree_iter_next() and use it throughout shmem.c.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 include/linux/radix-tree.h | 15 +++++++++++++++
 mm/shmem.c                 | 12 +++---------
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index db0ed595749b..dec2c6c77eea 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter)
 }
 
 /**
+ * radix_tree_iter_next - resume iterating when the chunk may be invalid
+ * @iter:	iterator state
+ *
+ * If the iterator needs to release then reacquire a lock, the chunk may
+ * have been invalidated by an insertion or deletion.  Call this function
+ * to continue the iteration from the next index.
+ */
+static inline __must_check
+void **radix_tree_iter_next(struct radix_tree_iter *iter)
+{
+	iter->next_index = iter->index + 1;
+	return NULL;
+}
+
+/**
  * radix_tree_chunk_size - get current chunk size
  *
  * @iter:	pointer to radix tree iterator
diff --git a/mm/shmem.c b/mm/shmem.c
index 6ec14b70d82d..438ea8004c26 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 
 	rcu_read_lock();
 
-restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		if (iter.index >= end)
 			break;
@@ -398,8 +397,7 @@ restart:
 
 		if (need_resched()) {
 			cond_resched_rcu();
-			start = iter.index + 1;
-			goto restart;
+			slot = radix_tree_iter_next(&iter);
 		}
 	}
 
@@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping)
 	start = 0;
 	rcu_read_lock();
 
-restart:
 	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
 		page = radix_tree_deref_slot(slot);
 		if (!page || radix_tree_exception(page)) {
@@ -1967,8 +1964,7 @@ restart:
 
 		if (need_resched()) {
 			cond_resched_rcu();
-			start = iter.index + 1;
-			goto restart;
+			slot = radix_tree_iter_next(&iter);
 		}
 	}
 	rcu_read_unlock();
@@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping)
 
 		start = 0;
 		rcu_read_lock();
-restart:
 		radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter,
 					   start, SHMEM_TAG_PINNED) {
 
@@ -2039,8 +2034,7 @@ restart:
 continue_resched:
 			if (need_resched()) {
 				cond_resched_rcu();
-				start = iter.index + 1;
-				goto restart;
+				slot = radix_tree_iter_next(&iter);
 			}
 		}
 		rcu_read_unlock();
-- 
2.7.0.rc3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns
  2016-01-27 21:17 ` Matthew Wilcox
@ 2016-01-28  7:17   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-28  7:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm

On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> The first two patches here are bugfixes, and I would like to see them
> make their way into stable ASAP since they can lead to data corruption
> (very low probabilty).
>
> The last three patches do not qualify as bugfixes.  They simply improve
> the standard pattern used to do radix tree iterations by removing the
> 'goto restart' part.  Partially this is because this is an ugly &
> confusing goto, and partially because with multi-order entries in the
> tree, it'll be more likely that we'll see an indirect_ptr bit, and
> it's more efficient to kep going from the point of the iteration we're
> currently in than restart from the beginning each time.

Ack  whole set.

I think we should go deeper in hide dereference/retry inside iterator.
Something like radix_tree_for_each_data(data, slot, root, iter, start).
I'll prepare patch for that.

>
> Matthew Wilcox (5):
>   radix-tree: Fix race in gang lookup
>   hwspinlock: Fix race between radix tree insertion and lookup
>   btrfs: Use radix_tree_iter_retry()
>   mm: Use radix_tree_iter_retry()
>   radix-tree,shmem: Introduce radix_tree_iter_next()
>
>  drivers/hwspinlock/hwspinlock_core.c |  4 +++
>  fs/btrfs/tests/btrfs-tests.c         |  3 +-
>  include/linux/radix-tree.h           | 31 +++++++++++++++++++++
>  lib/radix-tree.c                     | 12 ++++++--
>  mm/filemap.c                         | 53 ++++++++++++------------------------
>  mm/shmem.c                           | 30 ++++++++++----------
>  6 files changed, 78 insertions(+), 55 deletions(-)
>
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns
@ 2016-01-28  7:17   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-01-28  7:17 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm

On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> The first two patches here are bugfixes, and I would like to see them
> make their way into stable ASAP since they can lead to data corruption
> (very low probabilty).
>
> The last three patches do not qualify as bugfixes.  They simply improve
> the standard pattern used to do radix tree iterations by removing the
> 'goto restart' part.  Partially this is because this is an ugly &
> confusing goto, and partially because with multi-order entries in the
> tree, it'll be more likely that we'll see an indirect_ptr bit, and
> it's more efficient to kep going from the point of the iteration we're
> currently in than restart from the beginning each time.

Ack  whole set.

I think we should go deeper in hide dereference/retry inside iterator.
Something like radix_tree_for_each_data(data, slot, root, iter, start).
I'll prepare patch for that.

>
> Matthew Wilcox (5):
>   radix-tree: Fix race in gang lookup
>   hwspinlock: Fix race between radix tree insertion and lookup
>   btrfs: Use radix_tree_iter_retry()
>   mm: Use radix_tree_iter_retry()
>   radix-tree,shmem: Introduce radix_tree_iter_next()
>
>  drivers/hwspinlock/hwspinlock_core.c |  4 +++
>  fs/btrfs/tests/btrfs-tests.c         |  3 +-
>  include/linux/radix-tree.h           | 31 +++++++++++++++++++++
>  lib/radix-tree.c                     | 12 ++++++--
>  mm/filemap.c                         | 53 ++++++++++++------------------------
>  mm/shmem.c                           | 30 ++++++++++----------
>  6 files changed, 78 insertions(+), 55 deletions(-)
>
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
  2016-01-27 21:17   ` Matthew Wilcox
  (?)
@ 2016-01-29 14:45     ` Vlastimil Babka
  -1 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-01-29 14:45 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

On 01/27/2016 10:17 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Instead of a 'goto restart', we can now use radix_tree_iter_retry()
> to restart from our current position.  This will make a difference
> when there are more ways to happen across an indirect pointer.  And it
> eliminates some confusing gotos.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

[...]

> diff --git a/mm/shmem.c b/mm/shmem.c
> index fa2ceb2d2655..6ec14b70d82d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -388,8 +388,10 @@ restart:
>  		 * don't need to reset the counter, nor do we risk infinite
>  		 * restarts.
>  		 */
> -		if (radix_tree_deref_retry(page))
> -			goto restart;
> +		if (radix_tree_deref_retry(page)) {
> +			slot = radix_tree_iter_retry(&iter);
> +			continue;
> +		}
>  
>  		if (radix_tree_exceptional_entry(page))
>  			swapped++;

This should be applied on top. There are no restarts anymore.

----8<----
>From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 29 Jan 2016 15:41:31 +0100
Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix

Remove now-obsolete-and-misleading comment.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/shmem.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 8f89abd4eaee..4d758938340c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 
 		page = radix_tree_deref_slot(slot);
 
-		/*
-		 * This should only be possible to happen at index 0, so we
-		 * don't need to reset the counter, nor do we risk infinite
-		 * restarts.
-		 */
 		if (radix_tree_deref_retry(page)) {
 			slot = radix_tree_iter_retry(&iter);
 			continue;
-- 
2.7.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
@ 2016-01-29 14:45     ` Vlastimil Babka
  0 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-01-29 14:45 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

On 01/27/2016 10:17 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Instead of a 'goto restart', we can now use radix_tree_iter_retry()
> to restart from our current position.  This will make a difference
> when there are more ways to happen across an indirect pointer.  And it
> eliminates some confusing gotos.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

[...]

> diff --git a/mm/shmem.c b/mm/shmem.c
> index fa2ceb2d2655..6ec14b70d82d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -388,8 +388,10 @@ restart:
>  		 * don't need to reset the counter, nor do we risk infinite
>  		 * restarts.
>  		 */
> -		if (radix_tree_deref_retry(page))
> -			goto restart;
> +		if (radix_tree_deref_retry(page)) {
> +			slot = radix_tree_iter_retry(&iter);
> +			continue;
> +		}
>  
>  		if (radix_tree_exceptional_entry(page))
>  			swapped++;

This should be applied on top. There are no restarts anymore.

----8<----
>From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Fri, 29 Jan 2016 15:41:31 +0100
Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix

Remove now-obsolete-and-misleading comment.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/shmem.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 8f89abd4eaee..4d758938340c 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 
 		page = radix_tree_deref_slot(slot);
 
-		/*
-		 * This should only be possible to happen at index 0, so we
-		 * don't need to reset the counter, nor do we risk infinite
-		 * restarts.
-		 */
 		if (radix_tree_deref_retry(page)) {
 			slot = radix_tree_iter_retry(&iter);
 			continue;
-- 
2.7.0



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
@ 2016-01-29 14:45     ` Vlastimil Babka
  0 siblings, 0 replies; 32+ messages in thread
From: Vlastimil Babka @ 2016-01-29 14:45 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

On 01/27/2016 10:17 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Instead of a 'goto restart', we can now use radix_tree_iter_retry()
> to restart from our current position.  This will make a difference
> when there are more ways to happen across an indirect pointer.  And it
> eliminates some confusing gotos.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

[...]

> diff --git a/mm/shmem.c b/mm/shmem.c
> index fa2ceb2d2655..6ec14b70d82d 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -388,8 +388,10 @@ restart:
>  		 * don't need to reset the counter, nor do we risk infinite
>  		 * restarts.
>  		 */
> -		if (radix_tree_deref_retry(page))
> -			goto restart;
> +		if (radix_tree_deref_retry(page)) {
> +			slot = radix_tree_iter_retry(&iter);
> +			continue;
> +		}
>  
>  		if (radix_tree_exceptional_entry(page))
>  			swapped++;

This should be applied on top. There are no restarts anymore.

----8<----

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
  2016-01-29 14:45     ` Vlastimil Babka
@ 2016-01-29 14:50       ` Matthew Wilcox
  -1 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-29 14:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Andrew Morton, Hugh Dickins,
	Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm

On Fri, Jan 29, 2016 at 03:45:59PM +0100, Vlastimil Babka wrote:
> This should be applied on top. There are no restarts anymore.

Quite right.  Sorry I missed the comment.

Acked-by: Matthwe Wilcox <willy@linux.intel.com>

> ----8<----
> >From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Fri, 29 Jan 2016 15:41:31 +0100
> Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix
> 
> Remove now-obsolete-and-misleading comment.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/shmem.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 8f89abd4eaee..4d758938340c 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>  
>  		page = radix_tree_deref_slot(slot);
>  
> -		/*
> -		 * This should only be possible to happen at index 0, so we
> -		 * don't need to reset the counter, nor do we risk infinite
> -		 * restarts.
> -		 */
>  		if (radix_tree_deref_retry(page)) {
>  			slot = radix_tree_iter_retry(&iter);
>  			continue;
> -- 
> 2.7.0
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
@ 2016-01-29 14:50       ` Matthew Wilcox
  0 siblings, 0 replies; 32+ messages in thread
From: Matthew Wilcox @ 2016-01-29 14:50 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Matthew Wilcox, Andrew Morton, Hugh Dickins,
	Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm

On Fri, Jan 29, 2016 at 03:45:59PM +0100, Vlastimil Babka wrote:
> This should be applied on top. There are no restarts anymore.

Quite right.  Sorry I missed the comment.

Acked-by: Matthwe Wilcox <willy@linux.intel.com>

> ----8<----
> >From 3b0bdd370b57fb6d83b213e140cd1fb0e8962af8 Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Fri, 29 Jan 2016 15:41:31 +0100
> Subject: [PATCH] mm: Use radix_tree_iter_retry()-fix
> 
> Remove now-obsolete-and-misleading comment.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---
>  mm/shmem.c | 5 -----
>  1 file changed, 5 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 8f89abd4eaee..4d758938340c 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -382,11 +382,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>  
>  		page = radix_tree_deref_slot(slot);
>  
> -		/*
> -		 * This should only be possible to happen at index 0, so we
> -		 * don't need to reset the counter, nor do we risk infinite
> -		 * restarts.
> -		 */
>  		if (radix_tree_deref_retry(page)) {
>  			slot = radix_tree_iter_retry(&iter);
>  			continue;
> -- 
> 2.7.0
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] btrfs: Use radix_tree_iter_retry()
  2016-01-27 21:17   ` Matthew Wilcox
@ 2016-02-01 14:34     ` David Sterba
  -1 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2016-02-01 14:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Matthew Wilcox,
	Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm

On Wed, Jan 27, 2016 at 04:17:50PM -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Even though this is a 'can't happen' situation, use the new
> radix_tree_iter_retry() pattern to eliminate a goto.

Andrew's tree contains a fixup for a build failure

> @@ -147,7 +146,7 @@ restart:
>  		/* Shouldn't happen but that kind of thinking creates CVE's */
>  		if (radix_tree_exception(eb)) {
>  			if (radix_tree_deref_retry(eb))
> -				goto restart;
> +				slot = radix_tree_iter_retry(iter);

				slot = radix_tree_iter_retry(&iter);

http://ozlabs.org/~akpm/mmots/broken-out/btrfs-use-radix_tree_iter_retry-fix.patch

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/5] btrfs: Use radix_tree_iter_retry()
@ 2016-02-01 14:34     ` David Sterba
  0 siblings, 0 replies; 32+ messages in thread
From: David Sterba @ 2016-02-01 14:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Matthew Wilcox,
	Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm

On Wed, Jan 27, 2016 at 04:17:50PM -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Even though this is a 'can't happen' situation, use the new
> radix_tree_iter_retry() pattern to eliminate a goto.

Andrew's tree contains a fixup for a build failure

> @@ -147,7 +146,7 @@ restart:
>  		/* Shouldn't happen but that kind of thinking creates CVE's */
>  		if (radix_tree_exception(eb)) {
>  			if (radix_tree_deref_retry(eb))
> -				goto restart;
> +				slot = radix_tree_iter_retry(iter);

				slot = radix_tree_iter_retry(&iter);

http://ozlabs.org/~akpm/mmots/broken-out/btrfs-use-radix_tree_iter_retry-fix.patch

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns
  2016-01-28  7:17   ` Konstantin Khlebnikov
@ 2016-02-03  6:27     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-03  6:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm

On Thu, Jan 28, 2016 at 10:17 AM, Konstantin Khlebnikov
<koct9i@gmail.com> wrote:
> On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
> <matthew.r.wilcox@intel.com> wrote:
>> From: Matthew Wilcox <willy@linux.intel.com>
>>
>> The first two patches here are bugfixes, and I would like to see them
>> make their way into stable ASAP since they can lead to data corruption
>> (very low probabilty).
>>
>> The last three patches do not qualify as bugfixes.  They simply improve
>> the standard pattern used to do radix tree iterations by removing the
>> 'goto restart' part.  Partially this is because this is an ugly &
>> confusing goto, and partially because with multi-order entries in the
>> tree, it'll be more likely that we'll see an indirect_ptr bit, and
>> it's more efficient to kep going from the point of the iteration we're
>> currently in than restart from the beginning each time.
>
> Ack  whole set.
>
> I think we should go deeper in hide dereference/retry inside iterator.
> Something like radix_tree_for_each_data(data, slot, root, iter, start).
> I'll prepare patch for that.

After second thought: there'ra not so many users for new sugar.
This scheme with radix_tree_deref_retry - radix_tree_iter_retry
complicated but fine.

>
>>
>> Matthew Wilcox (5):
>>   radix-tree: Fix race in gang lookup
>>   hwspinlock: Fix race between radix tree insertion and lookup
>>   btrfs: Use radix_tree_iter_retry()
>>   mm: Use radix_tree_iter_retry()
>>   radix-tree,shmem: Introduce radix_tree_iter_next()
>>
>>  drivers/hwspinlock/hwspinlock_core.c |  4 +++
>>  fs/btrfs/tests/btrfs-tests.c         |  3 +-
>>  include/linux/radix-tree.h           | 31 +++++++++++++++++++++
>>  lib/radix-tree.c                     | 12 ++++++--
>>  mm/filemap.c                         | 53 ++++++++++++------------------------
>>  mm/shmem.c                           | 30 ++++++++++----------
>>  6 files changed, 78 insertions(+), 55 deletions(-)
>>
>> --
>> 2.7.0.rc3
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 0/5] Fix races & improve the radix tree iterator patterns
@ 2016-02-03  6:27     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-03  6:27 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm

On Thu, Jan 28, 2016 at 10:17 AM, Konstantin Khlebnikov
<koct9i@gmail.com> wrote:
> On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
> <matthew.r.wilcox@intel.com> wrote:
>> From: Matthew Wilcox <willy@linux.intel.com>
>>
>> The first two patches here are bugfixes, and I would like to see them
>> make their way into stable ASAP since they can lead to data corruption
>> (very low probabilty).
>>
>> The last three patches do not qualify as bugfixes.  They simply improve
>> the standard pattern used to do radix tree iterations by removing the
>> 'goto restart' part.  Partially this is because this is an ugly &
>> confusing goto, and partially because with multi-order entries in the
>> tree, it'll be more likely that we'll see an indirect_ptr bit, and
>> it's more efficient to kep going from the point of the iteration we're
>> currently in than restart from the beginning each time.
>
> Ack  whole set.
>
> I think we should go deeper in hide dereference/retry inside iterator.
> Something like radix_tree_for_each_data(data, slot, root, iter, start).
> I'll prepare patch for that.

After second thought: there'ra not so many users for new sugar.
This scheme with radix_tree_deref_retry - radix_tree_iter_retry
complicated but fine.

>
>>
>> Matthew Wilcox (5):
>>   radix-tree: Fix race in gang lookup
>>   hwspinlock: Fix race between radix tree insertion and lookup
>>   btrfs: Use radix_tree_iter_retry()
>>   mm: Use radix_tree_iter_retry()
>>   radix-tree,shmem: Introduce radix_tree_iter_next()
>>
>>  drivers/hwspinlock/hwspinlock_core.c |  4 +++
>>  fs/btrfs/tests/btrfs-tests.c         |  3 +-
>>  include/linux/radix-tree.h           | 31 +++++++++++++++++++++
>>  lib/radix-tree.c                     | 12 ++++++--
>>  mm/filemap.c                         | 53 ++++++++++++------------------------
>>  mm/shmem.c                           | 30 ++++++++++----------
>>  6 files changed, 78 insertions(+), 55 deletions(-)
>>
>> --
>> 2.7.0.rc3
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] radix-tree: Fix race in gang lookup
  2016-01-27 21:17   ` Matthew Wilcox
@ 2016-02-03 21:37     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-03 21:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm, Stable

On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> If the indirect_ptr bit is set on a slot, that indicates we need to
> redo the lookup.  Introduce a new function radix_tree_iter_retry()
> which forces the loop to retry the lookup by setting 'slot' to NULL and
> turning the iterator back to point at the problematic entry.
>
> This is a pretty rare problem to hit at the moment; the lookup has to
> race with a grow of the radix tree from a height of 0.  The consequences
> of hitting this race are that gang lookup could return a pointer to a
> radix_tree_node instead of a pointer to whatever the user had inserted
> in the tree.
>
> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
>  include/linux/radix-tree.h | 16 ++++++++++++++++
>  lib/radix-tree.c           | 12 ++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index f9a3da5bf892..db0ed595749b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
>                              struct radix_tree_iter *iter, unsigned flags);
>
>  /**
> + * radix_tree_iter_retry - retry this chunk of the iteration
> + * @iter:      iterator state
> + *
> + * If we iterate over a tree protected only by the RCU lock, a race
> + * against deletion or creation may result in seeing a slot for which
> + * radix_tree_deref_retry() returns true.  If so, call this function
> + * and continue the iteration.
> + */
> +static inline __must_check
> +void **radix_tree_iter_retry(struct radix_tree_iter *iter)
> +{
> +       iter->next_index = iter->index;
> +       return NULL;
> +}
> +
> +/**
>   * radix_tree_chunk_size - get current chunk size
>   *
>   * @iter:      pointer to radix tree iterator
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index a25f635dcc56..65422ac17114 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
>                 return 0;
>
>         radix_tree_for_each_slot(slot, root, &iter, first_index) {
> -               results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> +               results[ret] = rcu_dereference_raw(*slot);
>                 if (!results[ret])
>                         continue;
> +               if (radix_tree_is_indirect_ptr(results[ret])) {
> +                       slot = radix_tree_iter_retry(&iter);
> +                       continue;
> +               }
>                 if (++ret == max_items)
>                         break;
>         }

Looks like your fix doesn't work.

After radix_tree_iter_retry: radix_tree_for_each_slot will call
radix_tree_next_slot which isn't safe to call for NULL slot.

#define radix_tree_for_each_slot(slot, root, iter, start) \
for (slot = radix_tree_iter_init(iter, start) ; \
    slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \
    slot = radix_tree_next_slot(slot, iter, 0))

tagged iterator works becase restart happens only at root - tags
filled with single bit.

quick (untested) fix for that

--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -457,9 +457,9 @@ radix_tree_next_slot(void **slot, struct
radix_tree_iter *iter, unsigned flags)
                        return slot + offset + 1;
                }
        } else {
-               unsigned size = radix_tree_chunk_size(iter) - 1;
+               int size = radix_tree_chunk_size(iter) - 1;

-               while (size--) {
+               while (size-- > 0) {
                        slot++;
                        iter->index++;
                        if (likely(*slot))


> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
>                 return 0;
>
>         radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
> -               results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> +               results[ret] = rcu_dereference_raw(*slot);
>                 if (!results[ret])
>                         continue;
> +               if (radix_tree_is_indirect_ptr(results[ret])) {
> +                       slot = radix_tree_iter_retry(&iter);
> +                       continue;
> +               }
>                 if (++ret == max_items)
>                         break;
>         }
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] radix-tree: Fix race in gang lookup
@ 2016-02-03 21:37     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-03 21:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm, Stable

On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> If the indirect_ptr bit is set on a slot, that indicates we need to
> redo the lookup.  Introduce a new function radix_tree_iter_retry()
> which forces the loop to retry the lookup by setting 'slot' to NULL and
> turning the iterator back to point at the problematic entry.
>
> This is a pretty rare problem to hit at the moment; the lookup has to
> race with a grow of the radix tree from a height of 0.  The consequences
> of hitting this race are that gang lookup could return a pointer to a
> radix_tree_node instead of a pointer to whatever the user had inserted
> in the tree.
>
> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
>  include/linux/radix-tree.h | 16 ++++++++++++++++
>  lib/radix-tree.c           | 12 ++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index f9a3da5bf892..db0ed595749b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
>                              struct radix_tree_iter *iter, unsigned flags);
>
>  /**
> + * radix_tree_iter_retry - retry this chunk of the iteration
> + * @iter:      iterator state
> + *
> + * If we iterate over a tree protected only by the RCU lock, a race
> + * against deletion or creation may result in seeing a slot for which
> + * radix_tree_deref_retry() returns true.  If so, call this function
> + * and continue the iteration.
> + */
> +static inline __must_check
> +void **radix_tree_iter_retry(struct radix_tree_iter *iter)
> +{
> +       iter->next_index = iter->index;
> +       return NULL;
> +}
> +
> +/**
>   * radix_tree_chunk_size - get current chunk size
>   *
>   * @iter:      pointer to radix tree iterator
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index a25f635dcc56..65422ac17114 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
>                 return 0;
>
>         radix_tree_for_each_slot(slot, root, &iter, first_index) {
> -               results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> +               results[ret] = rcu_dereference_raw(*slot);
>                 if (!results[ret])
>                         continue;
> +               if (radix_tree_is_indirect_ptr(results[ret])) {
> +                       slot = radix_tree_iter_retry(&iter);
> +                       continue;
> +               }
>                 if (++ret == max_items)
>                         break;
>         }

Looks like your fix doesn't work.

After radix_tree_iter_retry: radix_tree_for_each_slot will call
radix_tree_next_slot which isn't safe to call for NULL slot.

#define radix_tree_for_each_slot(slot, root, iter, start) \
for (slot = radix_tree_iter_init(iter, start) ; \
    slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \
    slot = radix_tree_next_slot(slot, iter, 0))

tagged iterator works becase restart happens only at root - tags
filled with single bit.

quick (untested) fix for that

--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -457,9 +457,9 @@ radix_tree_next_slot(void **slot, struct
radix_tree_iter *iter, unsigned flags)
                        return slot + offset + 1;
                }
        } else {
-               unsigned size = radix_tree_chunk_size(iter) - 1;
+               int size = radix_tree_chunk_size(iter) - 1;

-               while (size--) {
+               while (size-- > 0) {
                        slot++;
                        iter->index++;
                        if (likely(*slot))


> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
>                 return 0;
>
>         radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
> -               results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> +               results[ret] = rcu_dereference_raw(*slot);
>                 if (!results[ret])
>                         continue;
> +               if (radix_tree_is_indirect_ptr(results[ret])) {
> +                       slot = radix_tree_iter_retry(&iter);
> +                       continue;
> +               }
>                 if (++ret == max_items)
>                         break;
>         }
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] radix-tree: Fix race in gang lookup
  2016-02-03 21:37     ` Konstantin Khlebnikov
  (?)
@ 2016-02-04  8:44     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-04  8:44 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm, Stable

[-- Attachment #1: Type: text/plain, Size: 5224 bytes --]

On Thu, Feb 4, 2016 at 12:37 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
> <matthew.r.wilcox@intel.com> wrote:
>> From: Matthew Wilcox <willy@linux.intel.com>
>>
>> If the indirect_ptr bit is set on a slot, that indicates we need to
>> redo the lookup.  Introduce a new function radix_tree_iter_retry()
>> which forces the loop to retry the lookup by setting 'slot' to NULL and
>> turning the iterator back to point at the problematic entry.
>>
>> This is a pretty rare problem to hit at the moment; the lookup has to
>> race with a grow of the radix tree from a height of 0.  The consequences
>> of hitting this race are that gang lookup could return a pointer to a
>> radix_tree_node instead of a pointer to whatever the user had inserted
>> in the tree.
>>
>> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
>> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
>> Cc: stable@vger.kernel.org
>> ---
>>  include/linux/radix-tree.h | 16 ++++++++++++++++
>>  lib/radix-tree.c           | 12 ++++++++++--
>>  2 files changed, 26 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
>> index f9a3da5bf892..db0ed595749b 100644
>> --- a/include/linux/radix-tree.h
>> +++ b/include/linux/radix-tree.h
>> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
>>                              struct radix_tree_iter *iter, unsigned flags);
>>
>>  /**
>> + * radix_tree_iter_retry - retry this chunk of the iteration
>> + * @iter:      iterator state
>> + *
>> + * If we iterate over a tree protected only by the RCU lock, a race
>> + * against deletion or creation may result in seeing a slot for which
>> + * radix_tree_deref_retry() returns true.  If so, call this function
>> + * and continue the iteration.
>> + */
>> +static inline __must_check
>> +void **radix_tree_iter_retry(struct radix_tree_iter *iter)
>> +{
>> +       iter->next_index = iter->index;
>> +       return NULL;
>> +}
>> +
>> +/**
>>   * radix_tree_chunk_size - get current chunk size
>>   *
>>   * @iter:      pointer to radix tree iterator
>> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
>> index a25f635dcc56..65422ac17114 100644
>> --- a/lib/radix-tree.c
>> +++ b/lib/radix-tree.c
>> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
>>                 return 0;
>>
>>         radix_tree_for_each_slot(slot, root, &iter, first_index) {
>> -               results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
>> +               results[ret] = rcu_dereference_raw(*slot);
>>                 if (!results[ret])
>>                         continue;
>> +               if (radix_tree_is_indirect_ptr(results[ret])) {
>> +                       slot = radix_tree_iter_retry(&iter);
>> +                       continue;
>> +               }
>>                 if (++ret == max_items)
>>                         break;
>>         }
>
> Looks like your fix doesn't work.
>
> After radix_tree_iter_retry: radix_tree_for_each_slot will call
> radix_tree_next_slot which isn't safe to call for NULL slot.
>
> #define radix_tree_for_each_slot(slot, root, iter, start) \
> for (slot = radix_tree_iter_init(iter, start) ; \
>     slot || (slot = radix_tree_next_chunk(root, iter, 0)) ; \
>     slot = radix_tree_next_slot(slot, iter, 0))
>
> tagged iterator works becase restart happens only at root - tags
> filled with single bit.
>
> quick (untested) fix for that
>
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -457,9 +457,9 @@ radix_tree_next_slot(void **slot, struct
> radix_tree_iter *iter, unsigned flags)
>                         return slot + offset + 1;
>                 }
>         } else {
> -               unsigned size = radix_tree_chunk_size(iter) - 1;
> +               int size = radix_tree_chunk_size(iter) - 1;
>
> -               while (size--) {
> +               while (size-- > 0) {
>                         slot++;
>                         iter->index++;
>                         if (likely(*slot))
>
>

Yep. Kernel crashes. Test in attachment.

fix: https://lkml.kernel.org/r/145457528789.31321.4441662473067711123.stgit@zurg

>> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
>>                 return 0;
>>
>>         radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
>> -               results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
>> +               results[ret] = rcu_dereference_raw(*slot);
>>                 if (!results[ret])
>>                         continue;
>> +               if (radix_tree_is_indirect_ptr(results[ret])) {
>> +                       slot = radix_tree_iter_retry(&iter);
>> +                       continue;
>> +               }
>>                 if (++ret == max_items)
>>                         break;
>>         }
>> --
>> 2.7.0.rc3
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

[-- Attachment #2: radix-tree-test-radix_tree_iter_retry --]
[-- Type: application/octet-stream, Size: 2218 bytes --]

radix-tree: test radix_tree_iter_retry

From: Konstantin Khlebnikov <koct9i@gmail.com>

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
---
 lib/radix-tree.c |   62 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 6b79e9026e24..f489334b9cb7 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -1491,6 +1491,66 @@ static int radix_tree_callback(struct notifier_block *nfb,
        return NOTIFY_OK;
 }
 
+static void test_iter_retry(void)
+{
+	RADIX_TREE(root, GFP_KERNEL);
+	void *ptr = (void *)4ul;
+	struct radix_tree_iter iter;
+	void **slot;
+	bool first;
+
+	radix_tree_insert(&root, 0, ptr);
+	radix_tree_tag_set(&root, 0, 0);
+
+	first = true;
+	radix_tree_for_each_tagged(slot, &root, &iter, 0, 0) {
+		printk("tagged %ld %p\n", iter.index, *slot);
+		if (first) {
+			radix_tree_insert(&root, 1, ptr);
+			radix_tree_tag_set(&root, 1, 0);
+			first = false;
+		}
+		if (radix_tree_deref_retry(*slot)) {
+			printk("retry %ld\n", iter.index);
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
+	}
+	radix_tree_delete(&root, 1);
+
+	first = true;
+	radix_tree_for_each_slot(slot, &root, &iter, 0) {
+		printk("slot %ld %p\n", iter.index, *slot);
+		if (first) {
+			radix_tree_insert(&root, 1, ptr);
+			first = false;
+		}
+		if (radix_tree_deref_retry(*slot)) {
+			printk("retry %ld\n", iter.index);
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
+	}
+	radix_tree_delete(&root, 1);
+
+	first = true;
+	radix_tree_for_each_contig(slot, &root, &iter, 0) {
+		printk("contig %ld %p\n", iter.index, *slot);
+		if (first) {
+			radix_tree_insert(&root, 1, ptr);
+			first = false;
+		}
+		if (radix_tree_deref_retry(*slot)) {
+			printk("retry %ld\n", iter.index);
+			slot = radix_tree_iter_retry(&iter);
+			continue;
+		}
+	}
+
+	radix_tree_delete(&root, 0);
+	radix_tree_delete(&root, 1);
+}
+
 void __init radix_tree_init(void)
 {
 	radix_tree_node_cachep = kmem_cache_create("radix_tree_node",
@@ -1499,4 +1559,6 @@ void __init radix_tree_init(void)
 			radix_tree_node_ctor);
 	radix_tree_init_maxindex();
 	hotcpu_notifier(radix_tree_callback, 0);
+
+	test_iter_retry();
 }

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next()
  2016-01-27 21:17   ` Matthew Wilcox
@ 2016-02-04  8:50     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-04  8:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm

On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> shmem likes to occasionally drop the lock, schedule, then reacqire
> the lock and continue with the iteration from the last place it
> left off.  This is currently done with a pretty ugly goto.  Introduce
> radix_tree_iter_next() and use it throughout shmem.c.
>
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> ---
>  include/linux/radix-tree.h | 15 +++++++++++++++
>  mm/shmem.c                 | 12 +++---------
>  2 files changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index db0ed595749b..dec2c6c77eea 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter)
>  }
>
>  /**
> + * radix_tree_iter_next - resume iterating when the chunk may be invalid
> + * @iter:      iterator state
> + *
> + * If the iterator needs to release then reacquire a lock, the chunk may
> + * have been invalidated by an insertion or deletion.  Call this function
> + * to continue the iteration from the next index.
> + */
> +static inline __must_check
> +void **radix_tree_iter_next(struct radix_tree_iter *iter)
> +{
> +       iter->next_index = iter->index + 1;
> +       return NULL;
> +}
> +
> +/**

This works for normal iterator but not for tagged.
It must also reset iter->tags to zero.

>   * radix_tree_chunk_size - get current chunk size
>   *
>   * @iter:      pointer to radix tree iterator
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 6ec14b70d82d..438ea8004c26 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>
>         rcu_read_lock();
>
> -restart:
>         radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
>                 if (iter.index >= end)
>                         break;
> @@ -398,8 +397,7 @@ restart:
>
>                 if (need_resched()) {
>                         cond_resched_rcu();
> -                       start = iter.index + 1;
> -                       goto restart;
> +                       slot = radix_tree_iter_next(&iter);
>                 }
>         }
>
> @@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping)
>         start = 0;
>         rcu_read_lock();
>
> -restart:
>         radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
>                 page = radix_tree_deref_slot(slot);
>                 if (!page || radix_tree_exception(page)) {
> @@ -1967,8 +1964,7 @@ restart:
>
>                 if (need_resched()) {
>                         cond_resched_rcu();
> -                       start = iter.index + 1;
> -                       goto restart;
> +                       slot = radix_tree_iter_next(&iter);
>                 }
>         }
>         rcu_read_unlock();
> @@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping)
>
>                 start = 0;
>                 rcu_read_lock();
> -restart:
>                 radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter,
>                                            start, SHMEM_TAG_PINNED) {
>
> @@ -2039,8 +2034,7 @@ restart:
>  continue_resched:
>                         if (need_resched()) {
>                                 cond_resched_rcu();
> -                               start = iter.index + 1;
> -                               goto restart;
> +                               slot = radix_tree_iter_next(&iter);
>                         }
>                 }
>                 rcu_read_unlock();
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next()
@ 2016-02-04  8:50     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 32+ messages in thread
From: Konstantin Khlebnikov @ 2016-02-04  8:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Matthew Wilcox,
	Linux Kernel Mailing List, linux-fsdevel, linux-mm

On Thu, Jan 28, 2016 at 12:17 AM, Matthew Wilcox
<matthew.r.wilcox@intel.com> wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> shmem likes to occasionally drop the lock, schedule, then reacqire
> the lock and continue with the iteration from the last place it
> left off.  This is currently done with a pretty ugly goto.  Introduce
> radix_tree_iter_next() and use it throughout shmem.c.
>
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> ---
>  include/linux/radix-tree.h | 15 +++++++++++++++
>  mm/shmem.c                 | 12 +++---------
>  2 files changed, 18 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index db0ed595749b..dec2c6c77eea 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -403,6 +403,21 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter)
>  }
>
>  /**
> + * radix_tree_iter_next - resume iterating when the chunk may be invalid
> + * @iter:      iterator state
> + *
> + * If the iterator needs to release then reacquire a lock, the chunk may
> + * have been invalidated by an insertion or deletion.  Call this function
> + * to continue the iteration from the next index.
> + */
> +static inline __must_check
> +void **radix_tree_iter_next(struct radix_tree_iter *iter)
> +{
> +       iter->next_index = iter->index + 1;
> +       return NULL;
> +}
> +
> +/**

This works for normal iterator but not for tagged.
It must also reset iter->tags to zero.

>   * radix_tree_chunk_size - get current chunk size
>   *
>   * @iter:      pointer to radix tree iterator
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 6ec14b70d82d..438ea8004c26 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -376,7 +376,6 @@ unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>
>         rcu_read_lock();
>
> -restart:
>         radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
>                 if (iter.index >= end)
>                         break;
> @@ -398,8 +397,7 @@ restart:
>
>                 if (need_resched()) {
>                         cond_resched_rcu();
> -                       start = iter.index + 1;
> -                       goto restart;
> +                       slot = radix_tree_iter_next(&iter);
>                 }
>         }
>
> @@ -1950,7 +1948,6 @@ static void shmem_tag_pins(struct address_space *mapping)
>         start = 0;
>         rcu_read_lock();
>
> -restart:
>         radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
>                 page = radix_tree_deref_slot(slot);
>                 if (!page || radix_tree_exception(page)) {
> @@ -1967,8 +1964,7 @@ restart:
>
>                 if (need_resched()) {
>                         cond_resched_rcu();
> -                       start = iter.index + 1;
> -                       goto restart;
> +                       slot = radix_tree_iter_next(&iter);
>                 }
>         }
>         rcu_read_unlock();
> @@ -2005,7 +2001,6 @@ static int shmem_wait_for_pins(struct address_space *mapping)
>
>                 start = 0;
>                 rcu_read_lock();
> -restart:
>                 radix_tree_for_each_tagged(slot, &mapping->page_tree, &iter,
>                                            start, SHMEM_TAG_PINNED) {
>
> @@ -2039,8 +2034,7 @@ restart:
>  continue_resched:
>                         if (need_resched()) {
>                                 cond_resched_rcu();
> -                               start = iter.index + 1;
> -                               goto restart;
> +                               slot = radix_tree_iter_next(&iter);
>                         }
>                 }
>                 rcu_read_unlock();
> --
> 2.7.0.rc3
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
  2016-01-27 21:17   ` Matthew Wilcox
@ 2016-02-19 18:02     ` Sasha Levin
  -1 siblings, 0 replies; 32+ messages in thread
From: Sasha Levin @ 2016-02-19 18:02 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

On 01/27/2016 04:17 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Instead of a 'goto restart', we can now use radix_tree_iter_retry()
> to restart from our current position.  This will make a difference
> when there are more ways to happen across an indirect pointer.  And it
> eliminates some confusing gotos.

Hey Matthew,

I'm seeing the following NULL ptr deref while fuzzing:

[ 3380.120501] general protection fault: 0000 [#1] SMP KASAN
[ 3380.120529] Modules linked in:
[ 3380.120555] CPU: 2 PID: 23271 Comm: syz-executor Not tainted 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978
[ 3380.120569] task: ffff8800a5181000 ti: ffff8801a63b8000 task.ti: ffff8801a63b8000
[ 3380.120681] RIP: shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100)
[ 3380.120692] RSP: 0018:ffff8801a63bfd58  EFLAGS: 00010202
[ 3380.120703] RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 0000000000940000
[ 3380.120714] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8800a5181b3c
[ 3380.120725] RBP: ffff8801a63bfe58 R08: ffff8800a5181b40 R09: 0000000000000001
[ 3380.120736] R10: fffff44e6f425fff R11: ffffffffbdb0a420 R12: 0000000000000008
[ 3380.120745] R13: 0000000000000001 R14: 0000000000000001 R15: ffffea0002ad1660
[ 3380.120759] FS:  00007fbc71e9c700(0000) GS:ffff8801d3c00000(0000) knlGS:0000000000000000
[ 3380.120769] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3380.120780] CR2: 0000000020010ff7 CR3: 00000001a0728000 CR4: 00000000000406e0
[ 3380.120794] Stack:
[ 3380.120815]  ffffffffa2738239 00000008a63bfdc8 ffff8800a499e740 1ffff10034c77fba
[ 3380.120834]  ffff8801ac446da0 ffff8800a499e8f0 0000000000000000 1ffff10034c77001
[ 3380.120852]  ffff8801a63b8000 ffff8801a63b8008 ffff8801ac446f90 ffff8801ac446f98
[ 3380.120856] Call Trace:
[ 3380.120929] shmem_fcntl (mm/shmem.c:2135)
[ 3380.120963] SyS_fcntl (fs/fcntl.c:336 fs/fcntl.c:372 fs/fcntl.c:357)
[ 3380.121112] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200)
[ 3380.122294] Code: c7 45 a0 00 00 00 00 e9 86 02 00 00 e8 cf a8 ee ff 4d 85 e4 0f 84 b2 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 d4 08 00 00 49 8b 1c 24 e8 12 34 de ff 85 c0
All code
========
   0:   c7 45 a0 00 00 00 00    movl   $0x0,-0x60(%rbp)
   7:   e9 86 02 00 00          jmpq   0x292
   c:   e8 cf a8 ee ff          callq  0xffffffffffeea8e0
  11:   4d 85 e4                test   %r12,%r12
  14:   0f 84 b2 07 00 00       je     0x7cc
  1a:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  21:   fc ff df
  24:   4c 89 e2                mov    %r12,%rdx
  27:   48 c1 ea 03             shr    $0x3,%rdx
  2b:*  80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)               <-- trapping instruction
  2f:   0f 85 d4 08 00 00       jne    0x909
  35:   49 8b 1c 24             mov    (%r12),%rbx
  39:   e8 12 34 de ff          callq  0xffffffffffde3450
  3e:   85 c0                   test   %eax,%eax
        ...

Code starting with the faulting instruction
===========================================
   0:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
   4:   0f 85 d4 08 00 00       jne    0x8de
   a:   49 8b 1c 24             mov    (%r12),%rbx
   e:   e8 12 34 de ff          callq  0xffffffffffde3425
  13:   85 c0                   test   %eax,%eax
        ...
[ 3380.122312] RIP shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100)
[ 3380.122317]  RSP <ffff8801a63bfd58>


Thanks,
Sasha

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 4/5] mm: Use radix_tree_iter_retry()
@ 2016-02-19 18:02     ` Sasha Levin
  0 siblings, 0 replies; 32+ messages in thread
From: Sasha Levin @ 2016-02-19 18:02 UTC (permalink / raw)
  To: Matthew Wilcox, Andrew Morton, Hugh Dickins
  Cc: Matthew Wilcox, Konstantin Khlebnikov, linux-kernel,
	linux-fsdevel, linux-mm

On 01/27/2016 04:17 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Instead of a 'goto restart', we can now use radix_tree_iter_retry()
> to restart from our current position.  This will make a difference
> when there are more ways to happen across an indirect pointer.  And it
> eliminates some confusing gotos.

Hey Matthew,

I'm seeing the following NULL ptr deref while fuzzing:

[ 3380.120501] general protection fault: 0000 [#1] SMP KASAN
[ 3380.120529] Modules linked in:
[ 3380.120555] CPU: 2 PID: 23271 Comm: syz-executor Not tainted 4.5.0-rc4-next-20160219-sasha-00026-g7978205-dirty #2978
[ 3380.120569] task: ffff8800a5181000 ti: ffff8801a63b8000 task.ti: ffff8801a63b8000
[ 3380.120681] RIP: shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100)
[ 3380.120692] RSP: 0018:ffff8801a63bfd58  EFLAGS: 00010202
[ 3380.120703] RAX: dffffc0000000000 RBX: 0000000000000001 RCX: 0000000000940000
[ 3380.120714] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffff8800a5181b3c
[ 3380.120725] RBP: ffff8801a63bfe58 R08: ffff8800a5181b40 R09: 0000000000000001
[ 3380.120736] R10: fffff44e6f425fff R11: ffffffffbdb0a420 R12: 0000000000000008
[ 3380.120745] R13: 0000000000000001 R14: 0000000000000001 R15: ffffea0002ad1660
[ 3380.120759] FS:  00007fbc71e9c700(0000) GS:ffff8801d3c00000(0000) knlGS:0000000000000000
[ 3380.120769] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3380.120780] CR2: 0000000020010ff7 CR3: 00000001a0728000 CR4: 00000000000406e0
[ 3380.120794] Stack:
[ 3380.120815]  ffffffffa2738239 00000008a63bfdc8 ffff8800a499e740 1ffff10034c77fba
[ 3380.120834]  ffff8801ac446da0 ffff8800a499e8f0 0000000000000000 1ffff10034c77001
[ 3380.120852]  ffff8801a63b8000 ffff8801a63b8008 ffff8801ac446f90 ffff8801ac446f98
[ 3380.120856] Call Trace:
[ 3380.120929] shmem_fcntl (mm/shmem.c:2135)
[ 3380.120963] SyS_fcntl (fs/fcntl.c:336 fs/fcntl.c:372 fs/fcntl.c:357)
[ 3380.121112] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:200)
[ 3380.122294] Code: c7 45 a0 00 00 00 00 e9 86 02 00 00 e8 cf a8 ee ff 4d 85 e4 0f 84 b2 07 00 00 48 b8 00 00 00 00 00 fc ff df 4c 89 e2 48 c1 ea 03 <80> 3c 02 00 0f 85 d4 08 00 00 49 8b 1c 24 e8 12 34 de ff 85 c0
All code
========
   0:   c7 45 a0 00 00 00 00    movl   $0x0,-0x60(%rbp)
   7:   e9 86 02 00 00          jmpq   0x292
   c:   e8 cf a8 ee ff          callq  0xffffffffffeea8e0
  11:   4d 85 e4                test   %r12,%r12
  14:   0f 84 b2 07 00 00       je     0x7cc
  1a:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  21:   fc ff df
  24:   4c 89 e2                mov    %r12,%rdx
  27:   48 c1 ea 03             shr    $0x3,%rdx
  2b:*  80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)               <-- trapping instruction
  2f:   0f 85 d4 08 00 00       jne    0x909
  35:   49 8b 1c 24             mov    (%r12),%rbx
  39:   e8 12 34 de ff          callq  0xffffffffffde3450
  3e:   85 c0                   test   %eax,%eax
        ...

Code starting with the faulting instruction
===========================================
   0:   80 3c 02 00             cmpb   $0x0,(%rdx,%rax,1)
   4:   0f 85 d4 08 00 00       jne    0x8de
   a:   49 8b 1c 24             mov    (%r12),%rbx
   e:   e8 12 34 de ff          callq  0xffffffffffde3425
  13:   85 c0                   test   %eax,%eax
        ...
[ 3380.122312] RIP shmem_add_seals (include/linux/compiler.h:222 include/linux/radix-tree.h:206 mm/shmem.c:2001 mm/shmem.c:2100)
[ 3380.122317]  RSP <ffff8801a63bfd58>


Thanks,
Sasha

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] radix-tree: Fix race in gang lookup
  2016-01-27 21:17   ` Matthew Wilcox
@ 2016-03-04 13:21     ` zhong jiang
  -1 siblings, 0 replies; 32+ messages in thread
From: zhong jiang @ 2016-03-04 13:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm,
	stable

On 2016/1/28 5:17, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> If the indirect_ptr bit is set on a slot, that indicates we need to
> redo the lookup.  Introduce a new function radix_tree_iter_retry()
> which forces the loop to retry the lookup by setting 'slot' to NULL and
> turning the iterator back to point at the problematic entry.
>
> This is a pretty rare problem to hit at the moment; the lookup has to
> race with a grow of the radix tree from a height of 0.  The consequences
> of hitting this race are that gang lookup could return a pointer to a
> radix_tree_node instead of a pointer to whatever the user had inserted
> in the tree.
>
> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
>  include/linux/radix-tree.h | 16 ++++++++++++++++
>  lib/radix-tree.c           | 12 ++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index f9a3da5bf892..db0ed595749b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
>  			     struct radix_tree_iter *iter, unsigned flags);
>  
>  /**
> + * radix_tree_iter_retry - retry this chunk of the iteration
> + * @iter:	iterator state
> + *
> + * If we iterate over a tree protected only by the RCU lock, a race
> + * against deletion or creation may result in seeing a slot for which
> + * radix_tree_deref_retry() returns true.  If so, call this function
> + * and continue the iteration.
> + */
> +static inline __must_check
> +void **radix_tree_iter_retry(struct radix_tree_iter *iter)
> +{
> +	iter->next_index = iter->index;
> +	return NULL;
> +}
> +
> +/**
>   * radix_tree_chunk_size - get current chunk size
>   *
>   * @iter:	pointer to radix tree iterator
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index a25f635dcc56..65422ac17114 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
>  		return 0;
>  
>  	radix_tree_for_each_slot(slot, root, &iter, first_index) {
> -		results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> +		results[ret] = rcu_dereference_raw(*slot);
>  		if (!results[ret])
>  			continue;
> +		if (radix_tree_is_indirect_ptr(results[ret])) {
> +			slot = radix_tree_iter_retry(&iter);
> +			continue;
> +		}
>  		if (++ret == max_items)
>  			break;
>  	}
according to your patch, after  A race occur,  slot equals to null.  radix_tree_next_slot() will continue
to work. Therefore, it will not return the problematic entry. 
> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
>  		return 0;
>  
>  	radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
> -		results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> +		results[ret] = rcu_dereference_raw(*slot);
>  		if (!results[ret])
>  			continue;
> +		if (radix_tree_is_indirect_ptr(results[ret])) {
> +			slot = radix_tree_iter_retry(&iter);
> +			continue;
> +		}
>  		if (++ret == max_items)
>  			break;
>  	}

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/5] radix-tree: Fix race in gang lookup
@ 2016-03-04 13:21     ` zhong jiang
  0 siblings, 0 replies; 32+ messages in thread
From: zhong jiang @ 2016-03-04 13:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Andrew Morton, Hugh Dickins, Ohad Ben-Cohen, Matthew Wilcox,
	Konstantin Khlebnikov, linux-kernel, linux-fsdevel, linux-mm,
	stable

On 2016/1/28 5:17, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
>
> If the indirect_ptr bit is set on a slot, that indicates we need to
> redo the lookup.  Introduce a new function radix_tree_iter_retry()
> which forces the loop to retry the lookup by setting 'slot' to NULL and
> turning the iterator back to point at the problematic entry.
>
> This is a pretty rare problem to hit at the moment; the lookup has to
> race with a grow of the radix tree from a height of 0.  The consequences
> of hitting this race are that gang lookup could return a pointer to a
> radix_tree_node instead of a pointer to whatever the user had inserted
> in the tree.
>
> Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> Cc: stable@vger.kernel.org
> ---
>  include/linux/radix-tree.h | 16 ++++++++++++++++
>  lib/radix-tree.c           | 12 ++++++++++--
>  2 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index f9a3da5bf892..db0ed595749b 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -387,6 +387,22 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
>  			     struct radix_tree_iter *iter, unsigned flags);
>  
>  /**
> + * radix_tree_iter_retry - retry this chunk of the iteration
> + * @iter:	iterator state
> + *
> + * If we iterate over a tree protected only by the RCU lock, a race
> + * against deletion or creation may result in seeing a slot for which
> + * radix_tree_deref_retry() returns true.  If so, call this function
> + * and continue the iteration.
> + */
> +static inline __must_check
> +void **radix_tree_iter_retry(struct radix_tree_iter *iter)
> +{
> +	iter->next_index = iter->index;
> +	return NULL;
> +}
> +
> +/**
>   * radix_tree_chunk_size - get current chunk size
>   *
>   * @iter:	pointer to radix tree iterator
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index a25f635dcc56..65422ac17114 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -1105,9 +1105,13 @@ radix_tree_gang_lookup(struct radix_tree_root *root, void **results,
>  		return 0;
>  
>  	radix_tree_for_each_slot(slot, root, &iter, first_index) {
> -		results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> +		results[ret] = rcu_dereference_raw(*slot);
>  		if (!results[ret])
>  			continue;
> +		if (radix_tree_is_indirect_ptr(results[ret])) {
> +			slot = radix_tree_iter_retry(&iter);
> +			continue;
> +		}
>  		if (++ret == max_items)
>  			break;
>  	}
according to your patch, after  A race occur,  slot equals to null.  radix_tree_next_slot() will continue
to work. Therefore, it will not return the problematic entry. 
> @@ -1184,9 +1188,13 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results,
>  		return 0;
>  
>  	radix_tree_for_each_tagged(slot, root, &iter, first_index, tag) {
> -		results[ret] = indirect_to_ptr(rcu_dereference_raw(*slot));
> +		results[ret] = rcu_dereference_raw(*slot);
>  		if (!results[ret])
>  			continue;
> +		if (radix_tree_is_indirect_ptr(results[ret])) {
> +			slot = radix_tree_iter_retry(&iter);
> +			continue;
> +		}
>  		if (++ret == max_items)
>  			break;
>  	}


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-03-04 13:33 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-27 21:17 [PATCH 0/5] Fix races & improve the radix tree iterator patterns Matthew Wilcox
2016-01-27 21:17 ` Matthew Wilcox
2016-01-27 21:17 ` [PATCH 1/5] radix-tree: Fix race in gang lookup Matthew Wilcox
2016-01-27 21:17   ` Matthew Wilcox
2016-02-03 21:37   ` Konstantin Khlebnikov
2016-02-03 21:37     ` Konstantin Khlebnikov
2016-02-04  8:44     ` Konstantin Khlebnikov
2016-03-04 13:21   ` zhong jiang
2016-03-04 13:21     ` zhong jiang
2016-01-27 21:17 ` [PATCH 2/5] hwspinlock: Fix race between radix tree insertion and lookup Matthew Wilcox
2016-01-27 21:17   ` Matthew Wilcox
2016-01-27 21:17 ` [PATCH 3/5] btrfs: Use radix_tree_iter_retry() Matthew Wilcox
2016-01-27 21:17   ` Matthew Wilcox
2016-02-01 14:34   ` David Sterba
2016-02-01 14:34     ` David Sterba
2016-01-27 21:17 ` [PATCH 4/5] mm: " Matthew Wilcox
2016-01-27 21:17   ` Matthew Wilcox
2016-01-29 14:45   ` Vlastimil Babka
2016-01-29 14:45     ` Vlastimil Babka
2016-01-29 14:45     ` Vlastimil Babka
2016-01-29 14:50     ` Matthew Wilcox
2016-01-29 14:50       ` Matthew Wilcox
2016-02-19 18:02   ` Sasha Levin
2016-02-19 18:02     ` Sasha Levin
2016-01-27 21:17 ` [PATCH 5/5] radix-tree,shmem: Introduce radix_tree_iter_next() Matthew Wilcox
2016-01-27 21:17   ` Matthew Wilcox
2016-02-04  8:50   ` Konstantin Khlebnikov
2016-02-04  8:50     ` Konstantin Khlebnikov
2016-01-28  7:17 ` [PATCH 0/5] Fix races & improve the radix tree iterator patterns Konstantin Khlebnikov
2016-01-28  7:17   ` Konstantin Khlebnikov
2016-02-03  6:27   ` Konstantin Khlebnikov
2016-02-03  6:27     ` Konstantin Khlebnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.