All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] radix-tree: fix writeback livelock avoidance code
@ 2010-08-20  5:22 Dave Chinner
  2010-08-20  5:22 ` [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free Dave Chinner
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Dave Chinner @ 2010-08-20  5:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, jack, npiggin

The following two patches fix bugs in the new radix tree functionality used to
implement the writeback livelock avoidance code. Both bugs manifest themselves
as stray PAGECACHE_TAG_TOWRITE tags in the mapping->page_tree radix tree
resulting in livelocks during tag lookups. More subtly, they also appear to
result in writeback tree walks occasionally terminating early and so not
actually writing all the pages they are supposed to.

Please review and test - these are pretty serious problems for the writeback code.

I've been reproducing the problems with xfstests test 013 using 2.6.36-rc1 and
a bunch of new XFS patches that worked just fine on 2.6.35. The XFS tree that
demonstrates the problem is available here:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git for-oss

and the two patches following this mail make the problem go away. They are also
available in this branch here:

  git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git radix-tree

Dave Chinner (2):
      radix-tree: clear all tags in radix_tree_node_rcu_free
      radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags

 lib/radix-tree.c |   63 +++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 48 insertions(+), 15 deletions(-)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free
  2010-08-20  5:22 [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Dave Chinner
@ 2010-08-20  5:22 ` Dave Chinner
  2010-08-20  8:00   ` Nick Piggin
  2010-08-20 13:00   ` Jan Kara
  2010-08-20  5:22 ` [PATCH 2/2] radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags Dave Chinner
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2010-08-20  5:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, jack, npiggin

From: Dave Chinner <dchinner@redhat.com>

Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement
writeback livelock avoidance using page tagging") introduced a new
radix tree tag, increasing the number of tags in each node from 2 to
3. It did not, however, fix up the code in
radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
and hence could leave stray tags set in the new tag array.

The result is that the livelock avoidance code added in the the
above commit would hit stale tags when doing tag based lookups,
resulting in livelocks when trying to traverse the tree.

Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
again in the future by using a loop to walk all the tags up to
RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
leaves behind.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 lib/radix-tree.c |    6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index e907858..1014171 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -174,14 +174,16 @@ static void radix_tree_node_rcu_free(struct rcu_head *head)
 {
 	struct radix_tree_node *node =
 			container_of(head, struct radix_tree_node, rcu_head);
+	int i;
 
 	/*
 	 * must only free zeroed nodes into the slab. radix_tree_shrink
 	 * can leave us with a non-NULL entry in the first slot, so clear
 	 * that here to make sure.
 	 */
-	tag_clear(node, 0, 0);
-	tag_clear(node, 1, 0);
+	for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
+		tag_clear(node, i, 0);
+
 	node->slots[0] = NULL;
 	node->count = 0;
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
  2010-08-20  5:22 [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Dave Chinner
  2010-08-20  5:22 ` [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free Dave Chinner
@ 2010-08-20  5:22 ` Dave Chinner
  2010-08-20  8:02   ` Nick Piggin
  2010-08-20 13:39   ` Jan Kara
  2010-08-20 13:51 ` [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Jan Kara
  2010-08-25 20:11 ` Jan Kara
  3 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2010-08-20  5:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-fsdevel, jack, npiggin

From: Dave Chinner <dchinner@redhat.com>

Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree:
omplement function radix_tree_range_tag_if_tagged") does not safely
set tags on on intermediate tree nodes. The code walks down the tree
setting tags before it has fully resolved the path to the leaf under
the assumption there will be a leaf slot with the tag set in the
range it is searching.

Unfortunately, this is not a valid assumption - we can abort after
setting a tag on an intermediate node if we overrun the number of
tags we are allowed to set in a batch, or stop scanning because we
we have passed the last scan index before we reach a leaf slot with
the tag we are searching for set.

As a result, we can leave the function with tags set on intemediate
nodes which can be tripped over later by tag-based lookups. The
result of these stale tags is that lookup may end prematurely or
livelock because the lookup cannot make progress.

The fix for the problem involves reocrding the traversal path we
take to the leaf nodes, and only propagating the tags back up the
tree once the tag is set in the leaf node slot. We are already
recording the path for efficient traversal, so there is no
additional overhead to do the intermediately node tag setting in
this manner.

This fixes a radix tree lookup livelock triggered by the new
writeback sync livelock avoidance code introduced in commit
f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback
livelock avoidance using page tagging").

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 lib/radix-tree.c |   57 +++++++++++++++++++++++++++++++++++++++++------------
 1 files changed, 44 insertions(+), 13 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1014171..e0ee8cb 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -625,6 +625,13 @@ EXPORT_SYMBOL(radix_tree_tag_get);
  * also settag. The function stops either after tagging nr_to_tag items or
  * after reaching last_index.
  *
+ * The tags must be set from the leaf level only and propagated back up the
+ * path to the root. We must do this so that we resolve the full path before
+ * setting any tags on intermediate nodes. If we set tags as we descend, then
+ * we can get to the leaf node and find that the index that has the iftag
+ * set is outside the range we are scanning. This reults in dangling tags and
+ * can lead to problems with later tag operations (e.g. livelocks on lookups).
+ *
  * The function returns number of leaves where the tag was set and sets
  * *first_indexp to the first unscanned index.
  */
@@ -633,9 +640,13 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 		unsigned long nr_to_tag,
 		unsigned int iftag, unsigned int settag)
 {
-	unsigned int height = root->height, shift;
-	unsigned long tagged = 0, index = *first_indexp;
-	struct radix_tree_node *open_slots[height], *slot;
+	unsigned int height = root->height;
+	struct radix_tree_path path[height];
+	struct radix_tree_path *pathp = path;
+	struct radix_tree_node *slot;
+	unsigned int shift;
+	unsigned long tagged = 0;
+	unsigned long index = *first_indexp;
 
 	last_index = min(last_index, radix_tree_maxindex(height));
 	if (index > last_index)
@@ -655,6 +666,13 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 	shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
 	slot = radix_tree_indirect_to_ptr(root->rnode);
 
+	/*
+	 * we fill the path from (root->height - 2) to 0, leaving the index at
+	 * (root->height - 1) as a terminator. Zero the node in the terminator
+	 * so that we can use this to end walk loops back up the path.
+	 */
+	path[height - 1].node = NULL;
+
 	for (;;) {
 		int offset;
 
@@ -663,17 +681,30 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 			goto next;
 		if (!tag_get(slot, iftag, offset))
 			goto next;
+		if (height > 1) {
+			/* Go down one level */
+			height--;
+			shift -= RADIX_TREE_MAP_SHIFT;
+			path[height - 1].node = slot;
+			path[height - 1].offset = offset;
+			slot = slot->slots[offset];
+			continue;
+		}
+
+		/* tag the leaf */
+		tagged++;
 		tag_set(slot, settag, offset);
-		if (height == 1) {
-			tagged++;
-			goto next;
+
+		/* walk back up the path tagging interior nodes */
+		pathp = &path[0];
+		while (pathp->node) {
+			/* stop if we find a node with the tag already set */
+			if (tag_get(pathp->node, settag, pathp->offset))
+				break;
+			tag_set(pathp->node, settag, pathp->offset);
+			pathp++;
 		}
-		/* Go down one level */
-		height--;
-		shift -= RADIX_TREE_MAP_SHIFT;
-		open_slots[height] = slot;
-		slot = slot->slots[offset];
-		continue;
+
 next:
 		/* Go to next item at level determined by 'shift' */
 		index = ((index >> shift) + 1) << shift;
@@ -687,7 +718,7 @@ next:
 			 * last_index is guaranteed to be in the tree, what
 			 * we do below cannot wander astray.
 			 */
-			slot = open_slots[height];
+			slot = path[height - 1].node;
 			height++;
 			shift += RADIX_TREE_MAP_SHIFT;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free
  2010-08-20  5:22 ` [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free Dave Chinner
@ 2010-08-20  8:00   ` Nick Piggin
  2010-08-20 13:00   ` Jan Kara
  1 sibling, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2010-08-20  8:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, jack, npiggin

On Fri, Aug 20, 2010 at 03:22:06PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement
> writeback livelock avoidance using page tagging") introduced a new
> radix tree tag, increasing the number of tags in each node from 2 to
> 3. It did not, however, fix up the code in
> radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
> and hence could leave stray tags set in the new tag array.
> 
> The result is that the livelock avoidance code added in the the
> above commit would hit stale tags when doing tag based lookups,
> resulting in livelocks when trying to traverse the tree.
> 
> Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
> again in the future by using a loop to walk all the tags up to
> RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
> leaves behind.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Acked-by: Nick Piggin <npiggin@kernel.dk>

> ---
>  lib/radix-tree.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index e907858..1014171 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -174,14 +174,16 @@ static void radix_tree_node_rcu_free(struct rcu_head *head)
>  {
>  	struct radix_tree_node *node =
>  			container_of(head, struct radix_tree_node, rcu_head);
> +	int i;
>  
>  	/*
>  	 * must only free zeroed nodes into the slab. radix_tree_shrink
>  	 * can leave us with a non-NULL entry in the first slot, so clear
>  	 * that here to make sure.
>  	 */
> -	tag_clear(node, 0, 0);
> -	tag_clear(node, 1, 0);
> +	for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
> +		tag_clear(node, i, 0);
> +
>  	node->slots[0] = NULL;
>  	node->count = 0;
>  
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
  2010-08-20  5:22 ` [PATCH 2/2] radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags Dave Chinner
@ 2010-08-20  8:02   ` Nick Piggin
  2010-08-20 13:39   ` Jan Kara
  1 sibling, 0 replies; 10+ messages in thread
From: Nick Piggin @ 2010-08-20  8:02 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, jack, npiggin

On Fri, Aug 20, 2010 at 03:22:07PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree:
> omplement function radix_tree_range_tag_if_tagged") does not safely
> set tags on on intermediate tree nodes. The code walks down the tree
> setting tags before it has fully resolved the path to the leaf under
> the assumption there will be a leaf slot with the tag set in the
> range it is searching.
> 
> Unfortunately, this is not a valid assumption - we can abort after
> setting a tag on an intermediate node if we overrun the number of
> tags we are allowed to set in a batch, or stop scanning because we
> we have passed the last scan index before we reach a leaf slot with
> the tag we are searching for set.
> 
> As a result, we can leave the function with tags set on intemediate
> nodes which can be tripped over later by tag-based lookups. The
> result of these stale tags is that lookup may end prematurely or
> livelock because the lookup cannot make progress.
> 
> The fix for the problem involves reocrding the traversal path we
> take to the leaf nodes, and only propagating the tags back up the
> tree once the tag is set in the leaf node slot. We are already
> recording the path for efficient traversal, so there is no
> additional overhead to do the intermediately node tag setting in
> this manner.
> 
> This fixes a radix tree lookup livelock triggered by the new
> writeback sync livelock avoidance code introduced in commit
> f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback
> livelock avoidance using page tagging").

It seems OK to me, thanks.

> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  lib/radix-tree.c |   57 +++++++++++++++++++++++++++++++++++++++++------------
>  1 files changed, 44 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 1014171..e0ee8cb 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -625,6 +625,13 @@ EXPORT_SYMBOL(radix_tree_tag_get);
>   * also settag. The function stops either after tagging nr_to_tag items or
>   * after reaching last_index.
>   *
> + * The tags must be set from the leaf level only and propagated back up the
> + * path to the root. We must do this so that we resolve the full path before
> + * setting any tags on intermediate nodes. If we set tags as we descend, then
> + * we can get to the leaf node and find that the index that has the iftag
> + * set is outside the range we are scanning. This reults in dangling tags and
> + * can lead to problems with later tag operations (e.g. livelocks on lookups).
> + *
>   * The function returns number of leaves where the tag was set and sets
>   * *first_indexp to the first unscanned index.
>   */
> @@ -633,9 +640,13 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
>  		unsigned long nr_to_tag,
>  		unsigned int iftag, unsigned int settag)
>  {
> -	unsigned int height = root->height, shift;
> -	unsigned long tagged = 0, index = *first_indexp;
> -	struct radix_tree_node *open_slots[height], *slot;
> +	unsigned int height = root->height;
> +	struct radix_tree_path path[height];
> +	struct radix_tree_path *pathp = path;
> +	struct radix_tree_node *slot;
> +	unsigned int shift;
> +	unsigned long tagged = 0;
> +	unsigned long index = *first_indexp;
>  
>  	last_index = min(last_index, radix_tree_maxindex(height));
>  	if (index > last_index)
> @@ -655,6 +666,13 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
>  	shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
>  	slot = radix_tree_indirect_to_ptr(root->rnode);
>  
> +	/*
> +	 * we fill the path from (root->height - 2) to 0, leaving the index at
> +	 * (root->height - 1) as a terminator. Zero the node in the terminator
> +	 * so that we can use this to end walk loops back up the path.
> +	 */
> +	path[height - 1].node = NULL;
> +
>  	for (;;) {
>  		int offset;
>  
> @@ -663,17 +681,30 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
>  			goto next;
>  		if (!tag_get(slot, iftag, offset))
>  			goto next;
> +		if (height > 1) {
> +			/* Go down one level */
> +			height--;
> +			shift -= RADIX_TREE_MAP_SHIFT;
> +			path[height - 1].node = slot;
> +			path[height - 1].offset = offset;
> +			slot = slot->slots[offset];
> +			continue;
> +		}
> +
> +		/* tag the leaf */
> +		tagged++;
>  		tag_set(slot, settag, offset);
> -		if (height == 1) {
> -			tagged++;
> -			goto next;
> +
> +		/* walk back up the path tagging interior nodes */
> +		pathp = &path[0];
> +		while (pathp->node) {
> +			/* stop if we find a node with the tag already set */
> +			if (tag_get(pathp->node, settag, pathp->offset))
> +				break;
> +			tag_set(pathp->node, settag, pathp->offset);
> +			pathp++;
>  		}
> -		/* Go down one level */
> -		height--;
> -		shift -= RADIX_TREE_MAP_SHIFT;
> -		open_slots[height] = slot;
> -		slot = slot->slots[offset];
> -		continue;
> +
>  next:
>  		/* Go to next item at level determined by 'shift' */
>  		index = ((index >> shift) + 1) << shift;
> @@ -687,7 +718,7 @@ next:
>  			 * last_index is guaranteed to be in the tree, what
>  			 * we do below cannot wander astray.
>  			 */
> -			slot = open_slots[height];
> +			slot = path[height - 1].node;
>  			height++;
>  			shift += RADIX_TREE_MAP_SHIFT;
>  		}
> -- 
> 1.7.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free
  2010-08-20  5:22 ` [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free Dave Chinner
  2010-08-20  8:00   ` Nick Piggin
@ 2010-08-20 13:00   ` Jan Kara
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Kara @ 2010-08-20 13:00 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, jack, npiggin

On Fri 20-08-10 15:22:06, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Commit f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement
> writeback livelock avoidance using page tagging") introduced a new
> radix tree tag, increasing the number of tags in each node from 2 to
> 3. It did not, however, fix up the code in
> radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
> and hence could leave stray tags set in the new tag array.
> 
> The result is that the livelock avoidance code added in the the
> above commit would hit stale tags when doing tag based lookups,
> resulting in livelocks when trying to traverse the tree.
> 
> Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
> again in the future by using a loop to walk all the tags up to
> RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
> leaves behind.
  The patch looks good. Thanks!
Acked-by: Jan Kara <jack@suse.cz>

								Honza
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  lib/radix-tree.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index e907858..1014171 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -174,14 +174,16 @@ static void radix_tree_node_rcu_free(struct rcu_head *head)
>  {
>  	struct radix_tree_node *node =
>  			container_of(head, struct radix_tree_node, rcu_head);
> +	int i;
>  
>  	/*
>  	 * must only free zeroed nodes into the slab. radix_tree_shrink
>  	 * can leave us with a non-NULL entry in the first slot, so clear
>  	 * that here to make sure.
>  	 */
> -	tag_clear(node, 0, 0);
> -	tag_clear(node, 1, 0);
> +	for (i = 0; i < RADIX_TREE_MAX_TAGS; i++)
> +		tag_clear(node, i, 0);
> +
>  	node->slots[0] = NULL;
>  	node->count = 0;
>  
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags
  2010-08-20  5:22 ` [PATCH 2/2] radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags Dave Chinner
  2010-08-20  8:02   ` Nick Piggin
@ 2010-08-20 13:39   ` Jan Kara
  1 sibling, 0 replies; 10+ messages in thread
From: Jan Kara @ 2010-08-20 13:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, jack, npiggin

  Hi,

On Fri 20-08-10 15:22:07, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Commit ebf8aa44beed48cd17893a83d92a4403e5f9d9e2 ("radix-tree:
> omplement function radix_tree_range_tag_if_tagged") does not safely
> set tags on on intermediate tree nodes. The code walks down the tree
> setting tags before it has fully resolved the path to the leaf under
> the assumption there will be a leaf slot with the tag set in the
> range it is searching.
> 
> Unfortunately, this is not a valid assumption - we can abort after
> setting a tag on an intermediate node if we overrun the number of
> tags we are allowed to set in a batch, or stop scanning because we
> we have passed the last scan index before we reach a leaf slot with
> the tag we are searching for set.
> 
> As a result, we can leave the function with tags set on intemediate
> nodes which can be tripped over later by tag-based lookups. The
> result of these stale tags is that lookup may end prematurely or
> livelock because the lookup cannot make progress.
> 
> The fix for the problem involves reocrding the traversal path we
> take to the leaf nodes, and only propagating the tags back up the
> tree once the tag is set in the leaf node slot. We are already
> recording the path for efficient traversal, so there is no
> additional overhead to do the intermediately node tag setting in
> this manner.
> 
> This fixes a radix tree lookup livelock triggered by the new
> writeback sync livelock avoidance code introduced in commit
> f446daaea9d4a420d16c606f755f3689dcb2d0ce ("mm: implement writeback
> livelock avoidance using page tagging").
  I've reviewed the patch and it looks good. I'll put the patches to
testing on my machine and also uptade Andrew's radix-tree test suite to
catch these types of bugs and run it just to be sure.
  Anyway, for now:
Acked-by: Jan Kara <jack@suse.cz>

								Honza
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  lib/radix-tree.c |   57 +++++++++++++++++++++++++++++++++++++++++------------
>  1 files changed, 44 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 1014171..e0ee8cb 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -625,6 +625,13 @@ EXPORT_SYMBOL(radix_tree_tag_get);
>   * also settag. The function stops either after tagging nr_to_tag items or
>   * after reaching last_index.
>   *
> + * The tags must be set from the leaf level only and propagated back up the
> + * path to the root. We must do this so that we resolve the full path before
> + * setting any tags on intermediate nodes. If we set tags as we descend, then
> + * we can get to the leaf node and find that the index that has the iftag
> + * set is outside the range we are scanning. This reults in dangling tags and
> + * can lead to problems with later tag operations (e.g. livelocks on lookups).
> + *
>   * The function returns number of leaves where the tag was set and sets
>   * *first_indexp to the first unscanned index.
>   */
> @@ -633,9 +640,13 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
>  		unsigned long nr_to_tag,
>  		unsigned int iftag, unsigned int settag)
>  {
> -	unsigned int height = root->height, shift;
> -	unsigned long tagged = 0, index = *first_indexp;
> -	struct radix_tree_node *open_slots[height], *slot;
> +	unsigned int height = root->height;
> +	struct radix_tree_path path[height];
> +	struct radix_tree_path *pathp = path;
> +	struct radix_tree_node *slot;
> +	unsigned int shift;
> +	unsigned long tagged = 0;
> +	unsigned long index = *first_indexp;
>  
>  	last_index = min(last_index, radix_tree_maxindex(height));
>  	if (index > last_index)
> @@ -655,6 +666,13 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
>  	shift = (height - 1) * RADIX_TREE_MAP_SHIFT;
>  	slot = radix_tree_indirect_to_ptr(root->rnode);
>  
> +	/*
> +	 * we fill the path from (root->height - 2) to 0, leaving the index at
> +	 * (root->height - 1) as a terminator. Zero the node in the terminator
> +	 * so that we can use this to end walk loops back up the path.
> +	 */
> +	path[height - 1].node = NULL;
> +
>  	for (;;) {
>  		int offset;
>  
> @@ -663,17 +681,30 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
>  			goto next;
>  		if (!tag_get(slot, iftag, offset))
>  			goto next;
> +		if (height > 1) {
> +			/* Go down one level */
> +			height--;
> +			shift -= RADIX_TREE_MAP_SHIFT;
> +			path[height - 1].node = slot;
> +			path[height - 1].offset = offset;
> +			slot = slot->slots[offset];
> +			continue;
> +		}
> +
> +		/* tag the leaf */
> +		tagged++;
>  		tag_set(slot, settag, offset);
> -		if (height == 1) {
> -			tagged++;
> -			goto next;
> +
> +		/* walk back up the path tagging interior nodes */
> +		pathp = &path[0];
> +		while (pathp->node) {
> +			/* stop if we find a node with the tag already set */
> +			if (tag_get(pathp->node, settag, pathp->offset))
> +				break;
> +			tag_set(pathp->node, settag, pathp->offset);
> +			pathp++;
>  		}
> -		/* Go down one level */
> -		height--;
> -		shift -= RADIX_TREE_MAP_SHIFT;
> -		open_slots[height] = slot;
> -		slot = slot->slots[offset];
> -		continue;
> +
>  next:
>  		/* Go to next item at level determined by 'shift' */
>  		index = ((index >> shift) + 1) << shift;
> @@ -687,7 +718,7 @@ next:
>  			 * last_index is guaranteed to be in the tree, what
>  			 * we do below cannot wander astray.
>  			 */
> -			slot = open_slots[height];
> +			slot = path[height - 1].node;
>  			height++;
>  			shift += RADIX_TREE_MAP_SHIFT;
>  		}
> -- 
> 1.7.1
> 
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] radix-tree: fix writeback livelock avoidance code
  2010-08-20  5:22 [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Dave Chinner
  2010-08-20  5:22 ` [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free Dave Chinner
  2010-08-20  5:22 ` [PATCH 2/2] radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags Dave Chinner
@ 2010-08-20 13:51 ` Jan Kara
  2010-08-20 14:29   ` Dave Chinner
  2010-08-25 20:11 ` Jan Kara
  3 siblings, 1 reply; 10+ messages in thread
From: Jan Kara @ 2010-08-20 13:51 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, jack, npiggin

On Fri 20-08-10 15:22:05, Dave Chinner wrote:
> The following two patches fix bugs in the new radix tree functionality used to
> implement the writeback livelock avoidance code. Both bugs manifest themselves
> as stray PAGECACHE_TAG_TOWRITE tags in the mapping->page_tree radix tree
> resulting in livelocks during tag lookups. More subtly, they also appear to
> result in writeback tree walks occasionally terminating early and so not
> actually writing all the pages they are supposed to.
  Really, how that early termination could happen? I'm just wondering
because I don't see that.. The code just mindlessly copies tags regardless
of how target flags are set so that's why I'd think that any stale copied
flags just don't matter...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] radix-tree: fix writeback livelock avoidance code
  2010-08-20 13:51 ` [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Jan Kara
@ 2010-08-20 14:29   ` Dave Chinner
  0 siblings, 0 replies; 10+ messages in thread
From: Dave Chinner @ 2010-08-20 14:29 UTC (permalink / raw)
  To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, npiggin

On Fri, Aug 20, 2010 at 03:51:41PM +0200, Jan Kara wrote:
> On Fri 20-08-10 15:22:05, Dave Chinner wrote:
> > The following two patches fix bugs in the new radix tree functionality used to
> > implement the writeback livelock avoidance code. Both bugs manifest themselves
> > as stray PAGECACHE_TAG_TOWRITE tags in the mapping->page_tree radix tree
> > resulting in livelocks during tag lookups. More subtly, they also appear to
> > result in writeback tree walks occasionally terminating early and so not
> > actually writing all the pages they are supposed to.
>   Really, how that early termination could happen? I'm just wondering
> because I don't see that.. The code just mindlessly copies tags regardless
> of how target flags are set so that's why I'd think that any stale copied
> flags just don't matter...

With the debug I had in pace, I saw a couple of find_get_pages_tag
loops stop (nr_found == 0) rather than livelock when they
encountered a stray tag, which appears to result in writeback not
writing all the pages. I also saw invalidation removing pages from
the page cache that had the PAGECACHE_TAG_TOWRITE tag set, which
indicated that sometimes they weren't getting written back as they
should have been. These were quite rare - they maybe occurred once
for every 1000 livelock occurrences I saw....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] radix-tree: fix writeback livelock avoidance code
  2010-08-20  5:22 [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Dave Chinner
                   ` (2 preceding siblings ...)
  2010-08-20 13:51 ` [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Jan Kara
@ 2010-08-25 20:11 ` Jan Kara
  3 siblings, 0 replies; 10+ messages in thread
From: Jan Kara @ 2010-08-25 20:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, jack, npiggin

On Fri 20-08-10 15:22:05, Dave Chinner wrote:
> The following two patches fix bugs in the new radix tree functionality used to
> implement the writeback livelock avoidance code. Both bugs manifest themselves
> as stray PAGECACHE_TAG_TOWRITE tags in the mapping->page_tree radix tree
> resulting in livelocks during tag lookups. More subtly, they also appear to
> result in writeback tree walks occasionally terminating early and so not
> actually writing all the pages they are supposed to.
> 
> Please review and test - these are pretty serious problems for the writeback code.
  OK, I've updated Andrew's radix tree test suite to use the latest
incarnation of radix-tree.c and added check to verify consistency of tags
in a radix tree. Without your patches, the new error check triggers almost
immediately for radix_tree_range_tag_if_tagged, with them the test passes.

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-08-25 20:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-20  5:22 [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Dave Chinner
2010-08-20  5:22 ` [PATCH 1/2] radix-tree: clear all tags in radix_tree_node_rcu_free Dave Chinner
2010-08-20  8:00   ` Nick Piggin
2010-08-20 13:00   ` Jan Kara
2010-08-20  5:22 ` [PATCH 2/2] radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags Dave Chinner
2010-08-20  8:02   ` Nick Piggin
2010-08-20 13:39   ` Jan Kara
2010-08-20 13:51 ` [PATCH 0/2] radix-tree: fix writeback livelock avoidance code Jan Kara
2010-08-20 14:29   ` Dave Chinner
2010-08-25 20:11 ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.