* [bug] radix_tree_gang_lookup_tag_slot() looping endlessly @ 2010-08-18 13:56 Dave Chinner 2010-08-18 17:37 ` Jan Kara 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-18 13:56 UTC (permalink / raw) To: linux-kernel; +Cc: linux-fsdevel, npiggin, a.p.zijlstra, jack Folks, I'm seeing a livelock with the new writeback sync livelock avoidance code. The problem is that the radix tree lookup via pagevec_lookup_tag()->find_get_pages_tag() is getting stuck in radix_tree_gang_lookup_tag_slot() and never exitting. The reproducer I'm running is xfstests 013 on 2.6.35-rc1 with some pending XFS changes available here: git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git for-oss It's 100% reproducable, and a regression against 2.6.35 patched wth exactly the same extra XFS commits as the above branch. I tried applying Nick's recent indirect pointer fixup patch for the radix tree, but that didn't fix the problem. I applied the patch below on top of that to detect when __lookup_tag is not making progress and the livelock has gone away. Someone who knows the how the radix tree code is supposed to work might be able to pinpoint the problem exactly from this. Cheers, Dave. -- Dave Chinner david@fromorbit.com --- lib/radix-tree.c | 8 ++++++++ 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 9eeb9f3..5d2872c 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -1077,6 +1077,11 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, break; slots_found = __lookup_tag(node, (void ***)results + ret, cur_index, max_items - ret, &next_index, tag); + + /* livelock avoidance */ + if (slots_found == 0 && cur_index == next_index) + break; + nr_found = 0; for (i = 0; i < slots_found; i++) { struct radix_tree_node *slot; @@ -1147,6 +1152,9 @@ radix_tree_gang_lookup_tag_slot(struct radix_tree_root *root, void ***results, break; slots_found = __lookup_tag(node, results + ret, cur_index, max_items - ret, &next_index, tag); + /* livelock avoidance */ + if (slots_found == 0 && cur_index == next_index) + break; ret += slots_found; if (next_index == 0) break; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly 2010-08-18 13:56 [bug] radix_tree_gang_lookup_tag_slot() looping endlessly Dave Chinner @ 2010-08-18 17:37 ` Jan Kara 2010-08-18 23:29 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Jan Kara @ 2010-08-18 17:37 UTC (permalink / raw) To: Dave Chinner; +Cc: linux-kernel, linux-fsdevel, npiggin, a.p.zijlstra, jack Hi, On Wed 18-08-10 23:56:51, Dave Chinner wrote: > I'm seeing a livelock with the new writeback sync livelock avoidance > code. The problem is that the radix tree lookup via > pagevec_lookup_tag()->find_get_pages_tag() is getting stuck in > radix_tree_gang_lookup_tag_slot() and never exitting. Is this pagevec_lookup_tag() from write_cache_pages() which was called for fsync() or so? > The reproducer I'm running is xfstests 013 on 2.6.35-rc1 with some > pending XFS changes available here: > > git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git for-oss > > It's 100% reproducable, and a regression against 2.6.35 patched wth exactly > the same extra XFS commits as the above branch. Hmm, what HW config do you have? I didn't hit the livelock and I've been running xfstests several times with the livelock avoidance patch. Hmm, looking at the code maybe what you describe could happen if we remove the page from page cache but leave a dangling tag in the radix tree... But remove_from_page_cache() is called with tree_lock held and it removes all tags from the index we just remove so it shouldn't really happen. Could you dump more info about the inode this happens on? Like the i_size, the index we stall at... Thanks. > I tried applying Nick's recent indirect pointer fixup patch for the > radix tree, but that didn't fix the problem. I applied the patch > below on top of that to detect when __lookup_tag is not making > progress and the livelock has gone away. Someone who knows the how > the radix tree code is supposed to work might be able to pinpoint > the problem exactly from this. Honza > --- > lib/radix-tree.c | 8 ++++++++ > 1 files changed, 8 insertions(+), 0 deletions(-) > > diff --git a/lib/radix-tree.c b/lib/radix-tree.c > index 9eeb9f3..5d2872c 100644 > --- a/lib/radix-tree.c > +++ b/lib/radix-tree.c > @@ -1077,6 +1077,11 @@ radix_tree_gang_lookup_tag(struct radix_tree_root *root, void **results, > break; > slots_found = __lookup_tag(node, (void ***)results + ret, > cur_index, max_items - ret, &next_index, tag); > + > + /* livelock avoidance */ > + if (slots_found == 0 && cur_index == next_index) > + break; > + > nr_found = 0; > for (i = 0; i < slots_found; i++) { > struct radix_tree_node *slot; > @@ -1147,6 +1152,9 @@ radix_tree_gang_lookup_tag_slot(struct radix_tree_root *root, void ***results, > break; > slots_found = __lookup_tag(node, results + ret, > cur_index, max_items - ret, &next_index, tag); > + /* livelock avoidance */ > + if (slots_found == 0 && cur_index == next_index) > + break; > ret += slots_found; > if (next_index == 0) > break; -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly 2010-08-18 17:37 ` Jan Kara @ 2010-08-18 23:29 ` Dave Chinner 2010-08-19 7:25 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-18 23:29 UTC (permalink / raw) To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, npiggin, a.p.zijlstra On Wed, Aug 18, 2010 at 07:37:09PM +0200, Jan Kara wrote: > Hi, > > On Wed 18-08-10 23:56:51, Dave Chinner wrote: > > I'm seeing a livelock with the new writeback sync livelock avoidance > > code. The problem is that the radix tree lookup via > > pagevec_lookup_tag()->find_get_pages_tag() is getting stuck in > > radix_tree_gang_lookup_tag_slot() and never exitting. > Is this pagevec_lookup_tag() from write_cache_pages() which was called > for fsync() or so? Called from a direct IO doing a cache flush-invalidate call across the range the direct IO spans. fsstress R running task 0 2514 2513 0x00000008 ffff88007da5fa98 ffffffff8110c0d5 ffff88007da5fc28 ffff880078f0c418 ffff88007da5fbc8 ffffffff8110ae7b ffff88007da5fb08 0000000000000297 ffffffffffffffff 0000000100000000 ffff88007da5fb20 00000002810d79ae Call Trace: [<ffffffff8110c0d5>] ? pagevec_lookup_tag+0x25/0x40 [<ffffffff8110ae7b>] write_cache_pages+0x10b/0x490 [<ffffffff81109d30>] ? __writepage+0x0/0x50 [<ffffffff813fc1fe>] ? do_raw_spin_unlock+0x5e/0xb0 [<ffffffff8110c7dc>] ? release_pages+0x20c/0x270 [<ffffffff813fc2a4>] ? do_raw_spin_lock+0x54/0x160 [<ffffffff813f0ca2>] ? radix_tree_gang_lookup_slot+0x72/0xb0 [<ffffffff8110b227>] generic_writepages+0x27/0x30 [<ffffffff8130fc5d>] xfs_vm_writepages+0x5d/0x80 [<ffffffff8110b254>] do_writepages+0x24/0x40 [<ffffffff8110237b>] __filemap_fdatawrite_range+0x5b/0x60 [<ffffffff811023da>] filemap_write_and_wait_range+0x5a/0x80 [<ffffffff81103117>] generic_file_aio_read+0x417/0x6d0 [<ffffffff81315f7c>] xfs_file_aio_read+0x15c/0x310 [<ffffffff811456da>] do_sync_read+0xda/0x120 [<ffffffff813c36ff>] ? security_file_permission+0x6f/0x80 [<ffffffff81145a25>] vfs_read+0xc5/0x180 [<ffffffff81146151>] sys_read+0x51/0x80 [<ffffffff81036032>] system_call_fastpath+0x16/0x1b >From the writeback tracing, it shows it stuck like with his writeback control: fsstress-2514 [001] 950360.214327: wbc_writepage: bdi 253:0: towrt=9223372036854775807 skip=0 mode=1 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x79000 end=0x7fffffffffffffff fsstress-2514 [001] 950360.214348: wbc_writepage: bdi 253:0: towrt=9223372036854775806 skip=0 mode=1 kupd=0 bgrd=0 reclm=0 cyclic=0 more=0 older=0x0 start=0x79000 end=0x7fffffffffffffff > > The reproducer I'm running is xfstests 013 on 2.6.35-rc1 with some > > pending XFS changes available here: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev.git for-oss > > > > It's 100% reproducable, and a regression against 2.6.35 patched wth exactly > > the same extra XFS commits as the above branch. > Hmm, what HW config do you have? It's a VM started with: $ cat /vm-images/vm-2/run-vm-2.sh #!/bin/sh sudo /usr/bin/kvm \ -kvm-shadow-memory 16 \ -no-fd-bootchk \ -localtime \ -boot c \ -serial pty \ -nographic \ -alt-grab \ -smp 2 -m 2048 \ -hda /vm-images/vm-2/root.img \ -drive file=/vm-images/vm-2/vm-2-test.img,if=virtio,cache=none \ -drive file=/vm-images/vm-2/vm-2-scratch.img,if=virtio,cache=none \ -net nic,vlan=0,macaddr=00:e4:b6:63:63:6e,model=virtio \ -net tap,vlan=0,script=/vm-images/qemu-ifup,downscript=no \ -kernel /vm-images/vm-2/vmlinuz \ -append "console=ttyS0,115200 root=/dev/sda1" > I didn't hit the livelock and I've been > running xfstests several times with the livelock avoidance patch. Christoph hasn't seen it either. > Hmm, > looking at the code maybe what you describe could happen if we remove the > page from page cache but leave a dangling tag in the radix tree... But > remove_from_page_cache() is called with tree_lock held and it removes all > tags from the index we just remove so it shouldn't really happen. This might be a stupid question, but here goes anyway. I know the slot contents are protected on lookup by rcu_read_lock() and rcu_dereference_raw(), but what protects the tags on read? AFAICT, they are being looked up without any locking, memory barriers, etc w.r.t. deletion. i.e. I cannot see how a tag lookup is prevented from racing with the propagation of a tag removal back up the tree (which is done under the tree lock). What am I missing? > Could > you dump more info about the inode this happens on? Like the i_size, the > index we stall at... Thanks. >From the writeback tracing I know that the index is different for every stall, and given that it is fsstress producing the hang I'd guess the inode is different every time, too. I'll try to get more data on this later today. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly 2010-08-18 23:29 ` Dave Chinner @ 2010-08-19 7:25 ` Dave Chinner 2010-08-19 13:25 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-19 7:25 UTC (permalink / raw) To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, npiggin, a.p.zijlstra On Thu, Aug 19, 2010 at 09:29:17AM +1000, Dave Chinner wrote: > On Wed, Aug 18, 2010 at 07:37:09PM +0200, Jan Kara wrote: > > Hi, > > > > On Wed 18-08-10 23:56:51, Dave Chinner wrote: > > > I'm seeing a livelock with the new writeback sync livelock avoidance > > > code. The problem is that the radix tree lookup via > > > pagevec_lookup_tag()->find_get_pages_tag() is getting stuck in > > > radix_tree_gang_lookup_tag_slot() and never exitting. [snip] > > > Hmm, > > looking at the code maybe what you describe could happen if we remove the > > page from page cache but leave a dangling tag in the radix tree... But > > remove_from_page_cache() is called with tree_lock held and it removes all > > tags from the index we just remove so it shouldn't really happen. > > This might be a stupid question, but here goes anyway. I know the > slot contents are protected on lookup by rcu_read_lock() and > rcu_dereference_raw(), but what protects the tags on read? AFAICT, > they are being looked up without any locking, memory barriers, etc > w.r.t. deletion. i.e. I cannot see how a tag lookup is prevented > from racing with the propagation of a tag removal back up the tree > (which is done under the tree lock). What am I missing? Definitely looks like corrupted tags: [ 97.301618] lookup ino 9283137, size 2106992, mapping pages 146, root 0xffff880073d83e20, index 497, nr_pages 14, tag 1 [ 97.301711] lookup ino 9283137, size 2106992, mapping pages 9, root 0xffff880073d83e20, index 75, nr_pages 14, tag 2 [ 97.301713] livelock @ root 0xffff880073d83e20, index 256, first 75 [ 97.301715] height 2 [ 97.301716] shift 6 [ 97.301717] tag_get 0xffff8800769f5b40, 4 [ 97.301718] height 1 [ 97.301719] shift 0 [ 97.301720] no more slots 4 [ 97.301721] livelock @ root 0xffff880073d83e20, index 256, first 75 The slot (#4) has the tag set, but the actual slot is empty and so the lookup aborts without changing the index, and as such we have an endless loop. In this case, it apears to have occurred directly after the mapping was almost entirely invalidated.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly 2010-08-19 7:25 ` Dave Chinner @ 2010-08-19 13:25 ` Dave Chinner 2010-08-19 15:58 ` Jan Kara 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-19 13:25 UTC (permalink / raw) To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, npiggin, a.p.zijlstra On Thu, Aug 19, 2010 at 05:25:20PM +1000, Dave Chinner wrote: > On Thu, Aug 19, 2010 at 09:29:17AM +1000, Dave Chinner wrote: > > On Wed, Aug 18, 2010 at 07:37:09PM +0200, Jan Kara wrote: > > > Hi, > > > > > > On Wed 18-08-10 23:56:51, Dave Chinner wrote: > > > > I'm seeing a livelock with the new writeback sync livelock avoidance > > > > code. The problem is that the radix tree lookup via > > > > pagevec_lookup_tag()->find_get_pages_tag() is getting stuck in > > > > radix_tree_gang_lookup_tag_slot() and never exitting. > > [snip] > > > > > > Hmm, > > > looking at the code maybe what you describe could happen if we remove the > > > page from page cache but leave a dangling tag in the radix tree... But > > > remove_from_page_cache() is called with tree_lock held and it removes all > > > tags from the index we just remove so it shouldn't really happen. > > > > This might be a stupid question, but here goes anyway. I know the > > slot contents are protected on lookup by rcu_read_lock() and > > rcu_dereference_raw(), but what protects the tags on read? AFAICT, > > they are being looked up without any locking, memory barriers, etc > > w.r.t. deletion. i.e. I cannot see how a tag lookup is prevented > > from racing with the propagation of a tag removal back up the tree > > (which is done under the tree lock). What am I missing? > > Definitely looks like corrupted tags: > > [ 97.301618] lookup ino 9283137, size 2106992, mapping pages 146, root 0xffff880073d83e20, index 497, nr_pages 14, tag 1 > [ 97.301711] lookup ino 9283137, size 2106992, mapping pages 9, root 0xffff880073d83e20, index 75, nr_pages 14, tag 2 > [ 97.301713] livelock @ root 0xffff880073d83e20, index 256, first 75 > [ 97.301715] height 2 > [ 97.301716] shift 6 > [ 97.301717] tag_get 0xffff8800769f5b40, 4 > [ 97.301718] height 1 > [ 97.301719] shift 0 > [ 97.301720] no more slots 4 > [ 97.301721] livelock @ root 0xffff880073d83e20, index 256, first 75 > > The slot (#4) has the tag set, but the actual slot is empty and so > the lookup aborts without changing the index, and as such we have an > endless loop. In this case, it apears to have occurred directly > after the mapping was almost entirely invalidated.... And it look slike the corrupted tags are coming through radix_tree_set_tag_if_tagged: [ 29.533595] tag @ root 0xffff880078088d60, pages 261 (466 -> 472), nr 0 [ 29.534410] settag root @ 0xffff880078088d60, index 466, offset 7, height 2, shift 6 [ 29.535331] slot[settag] 0x80 iftag 0x88 [ 29.535805] leveldown root @ 0xffff880078088d60, index 466, offset 7, height 1, shift 0 [ 29.536842] tag @ root 0xffff880078088d60, pages 261 (473 -> 472), nr 0 ^^^^^^^^^^ ^^^^ Here we've tried to set the tags on the index 462 -> 472, but we have scanned to index 472 and not set any tags on pages. *However*, because radix_tree_set_tag_if_tagged() does a top-down traversal it has set the tag on the parent node before checking if any of the child nodes can have the tag set. hence when radix_tree_gang_lookup_tag_slot() comes along: [ 29.543718] lookup ino 4452983, size 1453202, mapping pages 256, root 0xffff880078088d60, index 502, nr_pages 14, tag 2 [ 29.545015] livelock @ root 0xffff880078088d60, index 502, first 502 [ 29.545785] height 2 [ 29.546117] shift 6 [ 29.546381] tag_get 0xffff880078040d70, 7 ^ The parent node has the tag set for slot 7 [ 29.546862] slot[tag] 0x80 [ 29.547192] height 1 [ 29.547461] shift 0 [ 29.547721] no more slots 7 but slot 7 has no children. Because the children didn't have the TO_WRITE tag set, the tags ont eh parent node never got removed. Hence we get a livelock because whenever this bad tag is encountered we abort without increasing the start index, so we re-enter and traverse exactly the sae path again.... [ 29.548090] livelock @ root 0xffff880078088d60, index 502, first 502 It looks to me like radix_tree_set_tag_if_tagged() is fundamentally broken. All the tag set/clear code stores the tree path in a cursor and uses that to propagate the tags if and only if the full path from root to leaf is resolved. radix_tree_set_tag_if_tagged() sets tags on intermediate nodes before it has resolved the full path and hence can set tags when it should not. The "should not" cases occur when we have to tag sub-ranges or the scan aborts because it's reached the number ot tag in a batch. Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly 2010-08-19 13:25 ` Dave Chinner @ 2010-08-19 15:58 ` Jan Kara 2010-08-19 22:25 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Jan Kara @ 2010-08-19 15:58 UTC (permalink / raw) To: Dave Chinner; +Cc: Jan Kara, linux-kernel, linux-fsdevel, npiggin, a.p.zijlstra Hi Dave, On Thu 19-08-10 23:25:52, Dave Chinner wrote: > It looks to me like radix_tree_set_tag_if_tagged() is fundamentally > broken. All the tag set/clear code stores the tree path in a cursor > and uses that to propagate the tags if and only if the full path > from root to leaf is resolved. radix_tree_set_tag_if_tagged() sets > tags on intermediate nodes before it has resolved the full path and > hence can set tags when it should not. The "should not" cases occur > when we have to tag sub-ranges or the scan aborts because it's > reached the number ot tag in a batch. Thanks for debugging this! You are right that the code can leave dangling tag when we end the scan at the end of given range but the first tagged leaf is after the end of the given range (there shouldn't be a problem with the batches because there we can exit only just after we tag a leaf so that should be OK). There are two possibilities how to fix the bug: a) Always tag bottom up - i.e., when we see leaf that should be tagged, go up and tag the parent as well if it is not already tagged. b) When we exit the search and we didn't not set any leaf tag since last time we went down, we walk up the tree and do an equivalent of radix_tree_clear_tag(). I'll probably go for a) since it looks more robust but b) would be probably faster. Honza -- Jan Kara <jack@suse.cz> SUSE Labs, CR ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly 2010-08-19 15:58 ` Jan Kara @ 2010-08-19 22:25 ` Dave Chinner 2010-08-20 2:04 ` Dave Chinner 0 siblings, 1 reply; 8+ messages in thread From: Dave Chinner @ 2010-08-19 22:25 UTC (permalink / raw) To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, npiggin, a.p.zijlstra On Thu, Aug 19, 2010 at 05:58:39PM +0200, Jan Kara wrote: > Hi Dave, > > On Thu 19-08-10 23:25:52, Dave Chinner wrote: > > It looks to me like radix_tree_set_tag_if_tagged() is fundamentally > > broken. All the tag set/clear code stores the tree path in a cursor > > and uses that to propagate the tags if and only if the full path > > from root to leaf is resolved. radix_tree_set_tag_if_tagged() sets > > tags on intermediate nodes before it has resolved the full path and > > hence can set tags when it should not. The "should not" cases occur > > when we have to tag sub-ranges or the scan aborts because it's > > reached the number ot tag in a batch. > Thanks for debugging this! You are right that the code can leave dangling > tag when we end the scan at the end of given range but the first tagged > leaf is after the end of the given range (there shouldn't be a problem with > the batches because there we can exit only just after we tag a leaf so that > should be OK). > There are two possibilities how to fix the bug: > a) Always tag bottom up - i.e., when we see leaf that should be tagged, go > up and tag the parent as well if it is not already tagged. > b) When we exit the search and we didn't not set any leaf tag since last > time we went down, we walk up the tree and do an equivalent of > radix_tree_clear_tag(). > I'll probably go for a) since it looks more robust but b) would be > probably faster. I think that when it comes to data integrity, more robust should win over speed every time. I think it can be done quite easily, though, having slept on it - we have the current path in the open_slots[] array, so we could just walk that when we set a leaf tag. That should be easy to optimise as well - just keep track of how high up the path we have set the tag and only walk that far when setting the tags. That way we don't continually set the tag on the root higher level slots. That shouldn't be any slower than the current code... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly 2010-08-19 22:25 ` Dave Chinner @ 2010-08-20 2:04 ` Dave Chinner 0 siblings, 0 replies; 8+ messages in thread From: Dave Chinner @ 2010-08-20 2:04 UTC (permalink / raw) To: Jan Kara; +Cc: linux-kernel, linux-fsdevel, npiggin, a.p.zijlstra On Fri, Aug 20, 2010 at 08:25:59AM +1000, Dave Chinner wrote: > On Thu, Aug 19, 2010 at 05:58:39PM +0200, Jan Kara wrote: > > Hi Dave, > > > > On Thu 19-08-10 23:25:52, Dave Chinner wrote: > > > It looks to me like radix_tree_set_tag_if_tagged() is fundamentally > > > broken. All the tag set/clear code stores the tree path in a cursor > > > and uses that to propagate the tags if and only if the full path > > > from root to leaf is resolved. radix_tree_set_tag_if_tagged() sets > > > tags on intermediate nodes before it has resolved the full path and > > > hence can set tags when it should not. The "should not" cases occur > > > when we have to tag sub-ranges or the scan aborts because it's > > > reached the number ot tag in a batch. > > Thanks for debugging this! You are right that the code can leave dangling > > tag when we end the scan at the end of given range but the first tagged > > leaf is after the end of the given range (there shouldn't be a problem with > > the batches because there we can exit only just after we tag a leaf so that > > should be OK). > > There are two possibilities how to fix the bug: > > a) Always tag bottom up - i.e., when we see leaf that should be tagged, go > > up and tag the parent as well if it is not already tagged. > > b) When we exit the search and we didn't not set any leaf tag since last > > time we went down, we walk up the tree and do an equivalent of > > radix_tree_clear_tag(). > > I'll probably go for a) since it looks more robust but b) would be > > probably faster. > > I think that when it comes to data integrity, more robust should > win over speed every time. I think it can be done quite easily, > though, having slept on it - we have the current path in the > open_slots[] array, so we could just walk that when we set a leaf > tag. That should be easy to optimise as well - just keep track of > how high up the path we have set the tag and only walk that far > when setting the tags. That way we don't continually set the tag on > the root higher level slots. That shouldn't be any slower than the > current code... Fixing this indicates that there is a second bug also corrupting the PAGECACHE_TAG_TOWRITE tags - it takes quite a bit longer to hit, but when it fails it is generally because the bit at slot offset zero in a high-up intermediate node is incorrectly set. It appears that none of the code is actually setting it, so it's been quite difficult to track down. Eventually I noticed through code inspection that radix_tree_node_rcu_free() clears the tag at offset zero for the because of the radix_tree_shrink implementation potentially leaving the first slot non-null. The addition of the third tag did not add this clearing of the tag in the zero slot. Adding this: */ tag_clear(node, 0, 0); tag_clear(node, 1, 0); + tag_clear(node, 2, 0); node->slots[0] = NULL; node->count = 0; To radix_tree_node_rcu_free() appears to fix the problem. Whoever failed to coment the definition of the number of tags the radix tree supports left a really nasty landmine that Jan stepped on. Cleaning up the mess hasn't been pretty, either. So, after a couple of days of debugging I finally have test 013 passing without failing. Now to clean up the mess I have and test some proper patches.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2010-08-20 2:04 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-08-18 13:56 [bug] radix_tree_gang_lookup_tag_slot() looping endlessly Dave Chinner 2010-08-18 17:37 ` Jan Kara 2010-08-18 23:29 ` Dave Chinner 2010-08-19 7:25 ` Dave Chinner 2010-08-19 13:25 ` Dave Chinner 2010-08-19 15:58 ` Jan Kara 2010-08-19 22:25 ` Dave Chinner 2010-08-20 2:04 ` Dave Chinner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.