All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4
@ 2011-04-18 12:43 Ajeet Yadav
  2011-04-20 11:26 ` Ajeet Yadav
  2011-04-22  6:51 ` Christoph Hellwig
  0 siblings, 2 replies; 7+ messages in thread
From: Ajeet Yadav @ 2011-04-18 12:43 UTC (permalink / raw)
  To: xfs

xfsprogs: fixes a regression hang in xfs_repair phase 4

Hang in phase 4 of xfs_repair (This hang is not easily reproducable),
that occur because of corruption in btree that xfs_repair uses.
Scenerio: This problem was in for loop of phase4.c:phase4():line 232
that never completes that reason was that in a very rare scenerio the
btree get corrupted so that the key in current node is greater than
the next node.

ex: current key = 2894 next key = 2880, and evaluate the for loop when j=2894
for (j = ag_hdr_block; j < ag_end; j += blen) {
        bstate = get_bmap_ext(i, j, ag_end, &blen);
}

get_bmap_ext() with j=2894 will return blen=-14
j += blen -> j=2880
get_bmap_ext() with j=2880 will return blen=14
j += blen -> j=2894
endless toggeling to j

Solution: btree for fast performance caches the last accessed node at each
level in struct btree_cursor during btree_search, it will research the new
key in btree only if the given condition fails

if (root->keys_valid && key <= root->cur_key && (!root->prev_value ||
key > root->prev_key))

Now consider the case: 2684 3552 3554
A> cur_key=3552 and prev_key=2684
B> In btree 3552 key is updated to 2880 with btree_update_key() but the cache is
   not invalidated therefore cur_key=3552 still.
C> Insert a new key in btree=2894 with btree_insert()
   btree_insert() first calls the btree_search() to get the correct
node to insert
   the new key 2894 but since above if condition is still true it will
not research
   the btree and will insert new key node between 2684 2894 3552 3554,
but in reality
   cur_key=3552 is pointing to key=2880 which is less than 2894, so
the btree get
   corrupted to 2684 2894 2880 3554.
D> Solution would be to invalidate cache after updating the old
key=3552 to new key=2880,
   so that btree_search() researches in that case 2894 will be
inserted after 2880,
   i.e 2684 2880 2894 3554.
   or
E> Update the cache cur_key=new key this would be better in term of performance
   as it will prevent researching of btree during next btree_search().
F> The btree was corrupted in phase 3 but hang was produced in phase 4.

Signed-off-by: Ajeet Yadav <ajeet.yadav.77@gmail.com>

diff -Nurp xfsprogs-3.1.5/repair/btree.c xfsprogs-3.1.5-dirty/repair/btree.c
--- xfsprogs-3.1.5/repair/btree.c       2011-03-31 12:11:25.000000000 +0900
+++ xfsprogs-3.1.5-dirty/repair/btree.c 2011-04-17 16:04:14.000000000 +0900
@@ -520,6 +520,7 @@ btree_update_key(
                return EINVAL;

        btree_update_node_key(root, root->cursor, 0, new_key);
+       root->cur_key = new_key;

        return 0;
 }

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4
  2011-04-18 12:43 [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4 Ajeet Yadav
@ 2011-04-20 11:26 ` Ajeet Yadav
  2011-04-21 19:23   ` Eric Sandeen
  2011-04-22  6:51 ` Christoph Hellwig
  1 sibling, 1 reply; 7+ messages in thread
From: Ajeet Yadav @ 2011-04-20 11:26 UTC (permalink / raw)
  To: xfs, Dave Chinner

I guess the patch is not reviewed because of patch subject, but the
problem is related to btree code.
Can anyone please take out time to understand the problem

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4
  2011-04-20 11:26 ` Ajeet Yadav
@ 2011-04-21 19:23   ` Eric Sandeen
  0 siblings, 0 replies; 7+ messages in thread
From: Eric Sandeen @ 2011-04-21 19:23 UTC (permalink / raw)
  To: Ajeet Yadav; +Cc: xfs

On 4/20/11 6:26 AM, Ajeet Yadav wrote:
> I guess the patch is not reviewed because of patch subject, but the
> problem is related to btree code.
> Can anyone please take out time to understand the problem

Do you happen to have an xfs_metadump of the corrupted filesystem which produced the hang?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4
  2011-04-18 12:43 [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4 Ajeet Yadav
  2011-04-20 11:26 ` Ajeet Yadav
@ 2011-04-22  6:51 ` Christoph Hellwig
       [not found]   ` <BANLkTi=3V0TohK6c6MYOvPQc3ADd_Pya3A@mail.gmail.com>
  1 sibling, 1 reply; 7+ messages in thread
From: Christoph Hellwig @ 2011-04-22  6:51 UTC (permalink / raw)
  To: Ajeet Yadav; +Cc: xfs

The patch looks good to me.  But I'm a bit worried about the lack of
test coverage.  As Eric said if you're able to get a metadump of
a filesystem that shows this issue it would come in useful for
regression testing.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4
       [not found]   ` <BANLkTi=3V0TohK6c6MYOvPQc3ADd_Pya3A@mail.gmail.com>
@ 2011-04-26 12:29     ` Ajeet Yadav
  2011-05-02  5:39       ` Ajeet Yadav
  0 siblings, 1 reply; 7+ messages in thread
From: Ajeet Yadav @ 2011-04-26 12:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

[-- Attachment #1: Type: text/plain, Size: 550 bytes --]

Sorry for delay, please find attached the xfs_metadump of xfs file system

On Tue, Apr 26, 2011 at 12:47 PM, Ajeet Yadav <ajeet.yadav.77@gmail.com> wrote:
> Sorry for delay, please find the metadump of file system.
>
> On Fri, Apr 22, 2011 at 12:21 PM, Christoph Hellwig <hch@infradead.org> wrote:
>> The patch looks good to me.  But I'm a bit worried about the lack of
>> test coverage.  As Eric said if you're able to get a metadump of
>> a filesystem that shows this issue it would come in useful for
>> regression testing.
>>
>>
>

[-- Attachment #2: xfs_dump.tgz --]
[-- Type: application/x-gzip, Size: 85308 bytes --]

[-- Attachment #3: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4
  2011-04-26 12:29     ` Ajeet Yadav
@ 2011-05-02  5:39       ` Ajeet Yadav
  2011-05-03 17:33         ` Christoph Hellwig
  0 siblings, 1 reply; 7+ messages in thread
From: Ajeet Yadav @ 2011-05-02  5:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

It will be fine for me, if you have received the xfs_metadump file I
sent in last mail.
I am sure it will help you find problem in repair btree, please
correct me if I left you anything from my side.

On Tue, Apr 26, 2011 at 5:59 PM, Ajeet Yadav <ajeet.yadav.77@gmail.com> wrote:
> Sorry for delay, please find attached the xfs_metadump of xfs file system
>
> On Tue, Apr 26, 2011 at 12:47 PM, Ajeet Yadav <ajeet.yadav.77@gmail.com> wrote:
>> Sorry for delay, please find the metadump of file system.
>>
>> On Fri, Apr 22, 2011 at 12:21 PM, Christoph Hellwig <hch@infradead.org> wrote:
>>> The patch looks good to me.  But I'm a bit worried about the lack of
>>> test coverage.  As Eric said if you're able to get a metadump of
>>> a filesystem that shows this issue it would come in useful for
>>> regression testing.
>>>
>>>
>>
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4
  2011-05-02  5:39       ` Ajeet Yadav
@ 2011-05-03 17:33         ` Christoph Hellwig
  0 siblings, 0 replies; 7+ messages in thread
From: Christoph Hellwig @ 2011-05-03 17:33 UTC (permalink / raw)
  To: Ajeet Yadav; +Cc: xfs

On Mon, May 02, 2011 at 11:09:32AM +0530, Ajeet Yadav wrote:
> It will be fine for me, if you have received the xfs_metadump file I
> sent in last mail.
> I am sure it will help you find problem in repair btree, please
> correct me if I left you anything from my side.

Yes, got it.  I'll apply your patch shortly.

Thanks a lot!

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-05-03 17:29 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-18 12:43 [patch] xfsprogs: fixes a regression hang in xfs_repair phase 4 Ajeet Yadav
2011-04-20 11:26 ` Ajeet Yadav
2011-04-21 19:23   ` Eric Sandeen
2011-04-22  6:51 ` Christoph Hellwig
     [not found]   ` <BANLkTi=3V0TohK6c6MYOvPQc3ADd_Pya3A@mail.gmail.com>
2011-04-26 12:29     ` Ajeet Yadav
2011-05-02  5:39       ` Ajeet Yadav
2011-05-03 17:33         ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.